CNN model-based image classification for canine brain MRI abnormalities

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 206,277 characters · extracted from preprint-html · click to expand
CNN model-based image classification for canine brain MRI abnormalities | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article CNN model-based image classification for canine brain MRI abnormalities Samira Abani, Merlin Laue, Peter J Dickinson, Steven De Decker, and 11 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7537077/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract Diagnostic imaging represents one of the most promising clinical applications of artificial intelligence (AI) algorithms. The purpose of this study was to evaluate the customised convolutional neural network (CNN) SepNetDense for distinguishing between normal and abnormal canine brain magnetic resonance images (MRIs), with the aim of enhancing diagnostic efficiency and assisting radiologists in identifying abnormal images. The dataset comprised T1-weighted (T1w) pre- and post-contrast sequences in transverse, sagittal, and dorsal planes from 550 dogs, collected from four universities. Dogs were included if they had a complete clinical diagnosis confirmed either through histopathology or in accordance with current clinical consensus. Patients were randomly divided into a training set (n = 444), a validation set (n = 53), and a test set (n = 53). Each MRI was labelled on a slice-by-slice basis as normal or abnormal. The model was trained on 205 normal imaging datasets (e.g., extracranial aetiologies, idiopathic epilepsy, paroxysmal dyskinesia) and 239 abnormal ones (e.g., neoplasms, inflammatory lesions, other pathologies). The model correctly predicted 74% of the true normal slices in the test set as normal and 73% of the true abnormal as abnormal. A ROC analysis of the model’s prediction at the patient level revealed that, at a threshold of 51% abnormal slices per patient, the model reached an optimal balance of 83% sensitivity and 78% specificity, with a maximal accuracy of 80%. ANCOVA revealed that the CNN’s classification performance was influenced by multiple biological, technical, and institutional factors. Lesion diagnosis, institutional setting, and breed size had significant large effects, whereas body weight showed a significant medium effect. Significant interactions with large effect sizes were also observed between diagnosis and institute, as well as between breed size and weight. Additionally, a distinct pattern was observed in the distribution of prediction categories along the anatomical axes, with a trend towards better CNN performance in the central quartiles across all MRI sequences. The T1w pre-contrast sagittal sequence demonstrated the highest classification accuracy (81.8%) compared to the other T1w sequences. This study evaluates a CNN-based model designed to support a triage system for classifying canine brain MRI studies, with the aim of identifying abnormalities and improving reporting efficiency. Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Medical research Health sciences/Neurology Artificial intelligence (AI) Deep learning Magnetic resonance imaging (MRI) Brain MRI Abnormality detection Diagnostic imaging Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction In recent decades, technical innovations in veterinary diagnostic imaging, especially magnetic resonance imaging (MRI), have profoundly advanced clinical practice in veterinary neurology and neurosurgery [1]. MRI serves as a non-invasive, highly sensitive tool for detecting abnormalities, tracking disease progression, and evaluating treatment and post-surgical outcomes [2,3]. Serial MRI studies further enable in vivo monitoring of lesion progression in real time, making MRI an essential and versatile tool for the antemortem assessment of canine brain pathologies [3]. The demand for imaging services, not only in human but also in veterinary medicine, has outpaced the number of trained diagnostic imaging specialists, creating a widening gap between the availability of imaging technology and diagnostic expertise [1,4]. While the number of MRI units in universities and private practices continues to rise, the workforce of veterinary radiologists and neurologists has not increased at the same rate, posing challenges for maintaining high-quality image interpretation [1,5]. In recent years, convolutional neural networks (CNNs) have become a powerful tool in medical imaging for tasks such as classification, detection, localisation, segmentation, augmentation, and automated diagnosis [6-9]. CNNs are particularly effective, as they can automatically learn complex patterns and hierarchical features from image data by adjusting millions of parameters through optimisation [10,11]. The architecture of a CNN typically comprises three main components: the input layer, the hidden layers, and the output layer. The hidden layers include convolutional layers, which use filters to extract key features from images; pooling layers, which reduce dimensionality; and fully connected layers, which make final predictions. This ability to process and learn from visual data makes CNNs particularly valuable for medical imaging applications [11]. Numerous studies have demonstrated that CNNs can assist clinicians in a range of image analysis applications across different anatomical regions, achieving performance levels comparable to those of human experts [12-19]. While CNNs continue to advance in human medicine, improving accuracy, robustness, and generalisability, their adoption in veterinary diagnostic practice remains at an early stage [20,21]. In 2018, several studies began to explore the potential of CNNs for analysing veterinary image diagnostic [22-28]. Among veterinary diagnostic imaging modalities, radiography is the most widely studied due to its common use in routine veterinary practice [20,29]. MRI, particularly in the context of neurological conditions, is an emerging area of increasing research interest but remains comparatively underexplored relative to radiography [20,29]. Niemeyer et al. (2024) fine-tuned a deep learning tool (VGG-16 network), originally developed for the automatic grading of human lumbar intervertebral disc (IVD) degeneration using the Pfirrmann scheme, to apply the same grading system to T2-weighted (T2w) midsagittal images of lumbar canine spines. The model achieved an average accuracy of 94.1%, a sensitivity of 85.2%, and a specificity of 96.3% for grades 1 to 5, demonstrating the potential to advance both veterinary care and human biomedical research [30]. In another study, Biercher et al. (2021) demonstrated the potential of CNNs in differentiating thoracolumbar spinal cord diseases in dogs. The CNN model achieved a sensitivity of 90.80% and a specificity of 98.98% for intervertebral disc extrusion (IVDE) using T2w sagittal images; a sensitivity of 100% and a specificity of 95.10% for intervertebral disc prolapse (IVDP) using sagittal T1-weighted (T1w) images; and a sensitivity of 90.98% and a specificity of 90.12% for fibrocartilaginous embolism/acute non-compressive nucleus pulposus extrusion (FCE/ANNPE) using T2w transverse images. This study emphasises the potential of CNNs to provide second opinions in the assessment of spinal cord lesions on MRI, while also highlighting the challenges posed by limited training data for certain diagnoses [31]. Since 2018, the use of CNNs for analysing canine brain tumours using MRI has gained increasing attention in veterinary neuroimaging [25]. Banzato et al. (2018) utilised two neural networks (NNs), one based on AlexNet and the other a customised model, to predict the histopathological grading of canine meningiomas. Their preliminary study tested the models on ten randomly selected images, correctly classifying the meningioma grade in eight out of ten cases. This result underscores the potential of MRI-based tumour grading in guiding treatment decisions [25]. Later that year, Banzato et al. trained a CNN based on GoogLeNet to distinguish between meningiomas and gliomas on canine MRI images [23]. The model performed best on post-contrast T1w images, achieving an accuracy of 94%. Performance was slightly lower on pre-contrast T1w and T2w images, with respective accuracies of 91% and 90%. These findings further demonstrate the potential of deep learning models to assist in the diagnosis and classification of various tumour types [23]. While recent studies have highlighted the growing use of CNNs in veterinary imaging diagnostics, their application in triaging or enhancing diagnostic workflows for complex imaging modalities, such as brain MRI, remains limited. This is partly due to the lack of heterogeneous large datasets that adequately reflect real-world variability. The hypothesis of this study is that a CNN model trained on a heterogeneous brain MRI dataset can accurately predict whether a dog’s brain is normal or abnormal. To test this hypothesis, we: (1) annotated 79,310 MRI slices from 550 dogs across four institutions, comprising both unremarkable brains as well as brains with multiple diseases; (2) trained the customised CNN model, SepNetDense , using the training and validation sets; and (3) evaluated the model’s performance on an independent test set. Material and Methods Dataset and Features. The dataset comprised T1w sequences acquired pre- and post-contrast in the transverse, sagittal, and dorsal planes. It included 241 patients with unremarkable brain imaging findings (e.g., extracranial aetiologies, idiopathic epilepsy, and paroxysmal dyskinesia), and 309 patients with remarkable imaging findings (e.g., neoplasms, inflammatory lesions, and other pathological abnormalities). Dogs were included if they had a complete clinical diagnosis, confirmed either by histopathology or in accordance with current clinical scientific consensus, which served as the ground truth. Breed, age, weight, sex, presenting complaint, imaging findings, and final diagnoses were extracted from the electronic medical records of each patient. All MRI studies were reviewed by at least one veterinary neuroradiology expert. No animals were directly involved in this study, as all data were retrospectively collected from clinical cases that had previously undergone brain MRI for diagnostic purposes. Patient Selection. Electronic databases from two university referral centres, the Department of Small Animal Medicine and Surgery at the University of Veterinary Medicine Hannover, Germany (TiHo), and the William R. Pritchard Veterinary Medical Teaching Hospital at the University of California, Davis (UC Davis), were searched for dogs presenting with clinical signs of encephalopathy between April 2003 and June 2022. A total of 178 patients with MRI studies showing remarkable brain findings were included if they had a comprehensive clinical history and received a histopathological diagnosis from a board-certified pathologist, based on either post-mortem examination or stereotactic brain biopsy. In addition, 84 cases with histopathologically confirmed primary or secondary intracranial neoplasia, from the dataset of a previous study conducted at the Royal Veterinary College, Small Animal Referral Hospital, London (RVC), were included [ 32 ]. Furthermore, 45 cases of meningoencephalitis of unknown origin, with a final or presumptive diagnosis as reported in a previous study conducted at the Small Animal Hospital of the University of Glasgow (Glasgow), were incorporated [ 33 ]. MRI studies were acquired at four university referral centres using clinical MRI scanners from Philips, GE Medical Systems, and Siemens. At TiHo, imaging was performed using 3.0 T scanners, including the Philips Achieva and Achieva dStream models. UC Davis used 1.5 T scanners from GE Medical Systems, including the Genesis Signa and Signa HDxt. Glasgow acquired scans using 1.5 T systems, including the Philips NT Intera and Siemens Magnetom Essenza. At RVC, studies were obtained using various 1.5 T scanners, including the Philips Achieva, Gyroscan, Intera, NT Intera, and the GE Genesis Signa. Different imaging protocols were applied at each institution. Differences in scanner models, magnet strength, and acquisition protocols across institutions were considered as part of the dataset’s heterogeneity. Control Group. The control group consisted of MR images from 241 dogs with unremarkable brain findings, obtained from four university referral centres. This group included dogs with conditions such as extracranial aetiologies; paroxysmal dyskinesia, diagnosed in accordance with the consensus statement of the European College of Veterinary Neurology (ECVN) [ 34 ]; and idiopathic epilepsy, classified as tier II according to the International Veterinary Epilepsy Task Force consensus statement [ 35 ]. As individual dogs may have undergone multiple MRI examinations on different dates at the referral centres, only one MRI study per case was included in the dataset. Follow-up MRI studies performed post-surgery or post-treatment were excluded. Approval for the use of medical data was obtained through the owner's informed consent prior to hospital admission, in accordance with each university’s institutional guidelines. The acquisition of MRI images was supervised by veterinary technicians from the diagnostic imaging units of the referral centres, in compliance with their respective animal welfare policies. All patients were imaged under general anaesthesia. MRI studies affected by severe motion artefacts or incomplete sequences that prevented diagnostic interpretation were excluded from the study dataset. Dataset Preparation. All MR images were anonymised using Python™ (Python™, version 3.11. Python Software Foundation, DE, USA) by removing metadata tags containing personal information [ 36 ]. Labelling. All images were in Digital Imaging and Communications in Medicine (DICOM) format and analysed using available DICOM viewers: RadiAnt™ (RadiAnt™ DICOM Viewer for Windows, version 4.0.1. Poznań, Poland) or OsiriX MD (OsiriX MD, DICOM viewer for macOS, Pixmeo SARL, Geneva, Switzerland) [ 37 , 38 ]. The DICOM files were extracted and subsequently converted into PNG format. MRI slices were labelled slice by slice using the online annotation platform V7 Darwin (V7 Labs. V7 Darwin, London, UK) [ 39 ]. Brain MR images were annotated using the following tags, which served as the ground truth: (a) normal , indicating the absence of detectable abnormalities and serving as a baseline for comparison; and (b) pathological structural changes , including lesions, brain oedema, mass effect, ventricular abnormalities, midline shift, parenchymal loss, pathological contrast enhancement, and herniation. For binary classification, images labelled as normal were categorised as normal, while those tagged with any pathological condition were classified as abnormal. Experimental Design and Data Split. To train, validate, and test the performance of the presented CNN, the dataset comprising 550 dogs was randomly divided into a training set, a validation set, and a test set. Randomisation was performed using MATLAB® (MATLAB® software, MathWorks Inc. version R2022b. Natick, MA, USA), without accounting for covariates such as age, weight, breed type, institution, number of sequences, or disease category [ 40 ]. For data splitting, all slices and sequences from each patient were treated as a single unit, and the dataset was split at the patient level. The model was trained at the slice level, where normal brain MRI images consisted exclusively of normal slices, while remarkable cases contained both abnormal and normal slices, as the lesion was not present in every slice. Training set : This set consisted of 444 dogs and included 39,208 normal and 11,960 abnormal images, enabling the model to learn and adjust its parameters based on these data. Validation set : The validation set comprised 53 dogs, with 5,491 normal and 1,860 abnormal slices. It was used to fine-tune the model’s hyperparameters and to monitor performance during the training process. Test set : The test set also consisted of 53 dogs and included 4,612 normal and 1,989 abnormal imaging slices. This set was reserved for final model evaluation, providing an unbiased estimate of the model’s performance and ensuring its ability to generalise to unseen data. Pre-processing. The initial T1w image file sizes ranged from 179 bytes to 399.91 KB, with matrix dimensions varying between 144 × 144 and 1024 × 1024 pixels. Pixel dimensions were dependent on the scanner’s resolution and field of view at the time of acquisition. All T1w images (greyscale) were resized to 128 × 128 pixels, and normalised during pre-processing. To increase data variability, augmentation techniques were applied, including random 90° rotations and horizontal or vertical flips. CNN model. Although well-established networks such as ResNets and DenseNets have demonstrated strong performance across a range of vision tasks, they performed suboptimal for the present application [ 41 , 42 ]. Due to their high parameter count, these models tended to either overfit or underfit the data, failing to capture domain-specific features essential for robust classification [ 41 , 42 ]. To address this limitation, a customised, optimised architecture, SepNetDense , was developed for binary brain classification in the present study. SepNetDense features a small parameter footprint, comprising a total of 76,626 parameters, of which 65,746 are trainable [ 43 – 45 ]. The architecture of SepNetDense is illustrated in Fig. 1 . The following classification metrics were calculated: $$\:Classification\:Accuracy\:=\:(TP\:+\:TN)\:/\:(TP\:+\:TN\:+\:FP\:+\:FN)$$ 1 $$\:Precision\:=\:TP\:/\:(TP\:+\:FP)$$ 2 $$\:Recall\:=\:TP\:/\:(TP\:+\:FN)$$ 3 $$\:F1-score\:=\:2\:\times\:\:Precision\:\times\:\:Recall\:/\:(Precision\:+\:Recall)$$ 4 TP = true positives, TN = true negatives, FP = false positives, and FN = false negatives. Frequency distributions of the predictions were compared using chi-square tests or Fisher’s exact tests and visualized as heatmaps using GraphPad Prism (Prism 8 for Windows, version 8.4.3, GraphPad Software, San Diego, CA, USA) [ 46 ]. The proportion of slices predicted to be normal or abnormal as compared to all slices at the patient level was assessed for normality using a Q-Q plot and Shapiro-Wilk tests, and analysed for differences between the groups using the Mann–Whitney U test. Receiver Operating Characteristic (ROC) analysis was performed to assess classification performance and determine optimal thresholds, also using GraphPad Prism (Prism 8 for Windows, version 8.4.3, GraphPad Software, San Diego, CA, USA). An analysis of covariance (ANCOVA) was performed in SPSS (IBM SPSS Statistics for Windows, version 29.0.1.1, IBM Corp., Armonk, NY, USA) to examine and compare the effects of materials- and methods-related factors, covariates, and selected interactions on classification accuracy [ 47 ]. Levene’s test was applied to assess the assumption of homogeneity of variances. Effect sizes were reported using partial eta squared (ηₚ²), interpreted as small (0.01–0.059), medium (0.06–0.13), and large (≥ 0.14), as described by Gravetter et al. [ 48 ]. A two-way analysis of variance (ANOVA) with post hoc one-way ANOVAs and subsequent pair-wise post hoc tests with Bonferroni corrections was applied to assess the effects of the categorical factors diagnosis and institute on the CNN model’s accuracy using SPSS (IBM SPSS Statistics for Windows, version 29.0.1.1, IBM Corp., Armonk, NY, USA). Levene’s test was applied to assess the assumption of homogeneity of variances and normality was assessed using a Q-Q plot of standardized residues [ 49 ]. The influence of the categorical factors institute, manufacturer, scanner model, field strength, diagnosis, breed size, skull conformation and gender on accuracy was analysed individually. Distributions were first assessed with Levene’s test, then groups were compared using one-way ANOVA or Kruskal-Wallis tests with Bonferroni post hoc corrections, or unpaired t-tests or Mann-Whitney U tests where applicable. Linear regression analyses were used to assess associations between the continuous factors and classification accuracy using GraphPad Prism (Prism 8 for Windows, version 8.4.3, GraphPad Software, San Diego, CA, USA). To evaluate the adequacy of the linear regression models, residual plots were examined. In general, statistical significance was defined as a p ≤ 0.05. Results Dataset. The study included the MRI sequences of 550 canine brains with pre- and post-contrast T1w images, collected across four referral centres. From these patients, a total of 79,310 MRI slices were annotated manually as either normal or abnormal, based on brain MRI findings. Slices affected by artefacts or considered to be of insufficient quality were excluded, resulting in a final dataset comprising 65,120 slices. The training set consisted of 444 patients, of which 205 had normal and 239 had abnormal brain findings. The validation and test sets each comprised 53 patients, with 18 normal and 35 abnormal patients. In the study dataset (Table 1 ), the most represented diagnosis was the MRI-normal group (n = 241), comprising patients with unremarkable brain MRI findings. This was followed by the neoplastic group (n = 186) and the inflammatory group (n = 113). The least represented was the other causes group (n = 10), which included vascular, hereditary, and degenerative brain abnormalities. Table 1 Distribution of diagnostic categories across the training, validation, and test sets (n = 550 patients). Diagnostic Category Training set Validation set Test set MRI-Normal 205 (46.17%) 18 (33.96%) 18 (33.96%) Neoplastic 143 (32.21%) 21 (39.62%) 22 (41.51%) Inflammatory 89 (20.05%) 12 (22.64%) 12 (22.64%) Other causes 7 (1.58%) 2 (3.77%) 1 (1.89%) Total 444 53 53 The study dataset represented a heterogeneous mix of breeds ( Suppl. Table 1 ). The most represented breeds were crossbreeds (16%), followed by Labrador Retrievers (11%) and Boxers (5%). The mean age of the study group was 5.81 ± 3.60 years, and the mean body weight was 22.55 ± 14.06 kg. MRI studies included in the dataset were acquired using 1.5 T (n = 309) and 3.0 T (n = 241) scanners. The most commonly used manufacturers were Philips (n = 363), GE Medical Systems (n = 123), and Siemens (n = 64). Performance. Using the described test set (n = 53; Table 1 ), the performance of the CNN was evaluated in classifying individual MRI slices as either normal or abnormal. A total of 6,660 MRI slices were included in the analysis. Based on manual annotations serving as the reference standard, 73% (1,452/1,989) of abnormal slices and 74% (3,430/4,612) of normal slices were correctly classified, whereas 27% (537/1,989) of abnormal slices and 26% (1,182/4,612) of normal slices were incorrectly classified (Fig. 2 ). Based on individual MRI slice classification, the model achieved a precision of 0.86, recall of 0.74, and F1-score of 0.80 for normal slices, and a precision of 0.55, recall of 0.73, and F1-score of 0.63 for abnormal slices, with an overall accuracy of 0.74 (Table 2 ). Table 2 Performance metrics Class Precision Recall F1-score Normal 0.86 0.74 0.80 Abnormal 0.55 0.73 0.63 Overall accuracy 0.74 Findings. The spatial distribution of classification errors (false positives and false negatives) was analysed slice-wise using heatmaps to visualize prediction patterns across all orientations in pre- and post-contrast T1w sequences after normalisation (Fig. 3 ). Across all three orientations; transverse (Fig. 3 a–b), sagittal (Fig. 3 c–d), and dorsal (Fig. 3 e–f) T1w images of the normal patient subset are characterized by a high number of true negatives across slices. False positives appear as scattered patches along the slice axes, predominantly in the central portions, and vary between individual patients. In the abnormal subset, true positives dominate the central portions of the slice axes across all orientations, forming dense horizontal bands, while false negatives occur more sporadically in the same regions. By contrast, false positives and true negatives tend to cluster near the periphery of the anatomical axes, particularly at the sequence extremes. In dorsal pre- and post-contrast T1w images (Fig. 3 e–f), true positive clusters in abnormal patients are generally shorter and less continuous than in other orientations. Error patterns in this view appear more fragmented, which might suggest localized deviations and greater uncertainty in the dorsal plane. To investigate whether the prediction outcomes (true positive, true negative, false positive, false negative) varied systematically along the anatomical axes, each MRI sequence was normalised slice-wise and divided into four equal quartiles (Q1–Q4). Results showed that the distribution of the prediction categories differed significantly across the quartiles in all sequences ( Suppl. Table 2 ). Slice-wise accuracy across T1w sequences. The relationship between CNN slice-wise accuracy and MRI sequence type was examined. As illustrated in Fig. 4 , a significant difference was observed in the accuracy achieved with the different sequences (Chi-square test, p ≤ 0.001). Pre-contrast T1w sagittal images demonstrated the highest slice-wise accuracy (81.8%), significantly outperforming other sequences, including pre-contrast T1w transverse (71.2%), post-contrast T1w transverse (73.3%), post-contrast T1w sagittal (75.1%), post-contrast T1w dorsal (76.1%), and pre-contrast T1w dorsal sequence (69.7%). Assessment of the sensitivity and specificity of the CNN model at the patient level using receiver operating characteristic analysis (ROC). Since the CNN's predictions were based on individual MRI slices, the ROC curve was used to evaluate the model's performance at the patient level, determining the optimal threshold for classifying patients as abnormal or normal, based on the proportion of slices predicted to be abnormal slices among all slices. The proportion of slices predicted to be abnormal in the normal (n = 18) and abnormal (n = 35) patient groups in the test set was assessed using a Q-Q plot ( Suppl. Figure 1 ), and Shapiro–Wilk tests, which indicated significant deviations from normality in both groups (normal p = 0.027; abnormal p = 0.003). Figure 5 a shows that the proportion of slices predicted to be abnormal within each patient differs between the groups. The median of the proportion of slices predicted to be abnormal was significantly higher in the abnormal group at 0.70, compared to 0.26 in the normal group (Mann–Whitney U test p < 0.0001). The ROC curve is presented in Fig. 5 b. The area under the curve (AUC) was 0.85 ( p < 0.0001), indicating that the model reliably distinguishes between normal and abnormal patients. An optimal balance between sensitivity and specificity was achieved when a threshold of 51% predicted abnormal slices out of all slices was used to classify a patient as abnormal. This threshold resulted in a sensitivity of 83%, specificity of 78%, and a positive likelihood ratio of 3.7, corresponding to the highest overall accuracy of 80%. A more conservative threshold of 64% maximised specificity to 94%, while maintaining a reasonable sensitivity of 66%. This resulted in the highest positive likelihood ratio of 11.8, with the model's accuracy at this threshold being 80%. Detailed performance metrics at each threshold are presented in Suppl. Table 3. Identification of Parameters Influencing CNN Model Performance Using ANCOVA. In the next step, a univariate ANCOVA was conducted to evaluate the influence of materials- and methods-related parameters on the accuracy of the CNN model. The included categorical factors were institute (TiHo, UC Davis, RVC, Glasgow), breed size (small, medium, large), skull conformation (brachycephalic vs non-brachycephalic), sex (male, female), diagnosis (MRI-normal, neoplastic, inflammatory, other causes), and the quantitative covariates were age, weight, number of slices, and the number of available MRI sequences per case in the test set ( Suppl. Table 4 ). In addition, interactions between diagnosis and institute, breed size and weight, as well as sex and weight were examined. As each university referral centre employed distinct imaging protocols, which differed in MRI settings, equipment, and acquisition techniques, the factor ‘institute’ was treated as a composite factor representing the combined influence of these imaging-related variations on model accuracy. Scanner model and magnetic field strength were not included in the ANCOVA due to multicollinearity with the variable “institute,” as each institute in our dataset used a unique scanner. Attempts to substitute “manufacturer” with “scanner model” did not resolve this issue, since each model was still exclusive to one institute. Therefore, “institute” was retained in the model to account for all scanner- and site-specific factors, allowing for interpretable results without redundancy. The corrected model was statistically significant (p < 0.001, ηₚ² = 0.80). Levene’s test rejected the assumption of variance homogeneity ( p = 0.021), likely due to the large number of factor levels relative to the total sample size (n = 53). Since the Q–Q plot showed only minor deviations from normality, the model was retained, and its results were interpreted with caution (Suppl. Figure 2) . Effect sizes were calculated using partial eta squared (ηₚ²) and interpreted as small, medium, or large, as mentioned in the Materials and Methods section (Fig. 6 ). As shown in Fig. 6 a, the ANCOVA indicated that several parameters significantly influenced the accuracy of the CNN model. Main effects with large effect sizes were observed for breed size (p < 0.001, ηₚ² = 0.44), institute (p = 0.001, ηₚ² = 0.39), and diagnosis type (p = 0.003, ηₚ² = 0.35). Medium effect sizes were identified for body weight (p = 0.050, ηₚ² = 0.12). Significant interaction effects with large effect sizes were found between diagnosis and institute (p < 0.001, ηₚ² = 0.45), as well as between breed size and weight (p < 0.001, ηₚ² = 0.43). In contrast, sex, sex and weight interaction, skull conformation, age, number of sequences per case, and number of slices per case exhibited no significant effect on the CNN-models accuracy (p > 0.05, ηₚ² ranging from 0.065 to 0.004). Following the identification of the main factors influencing model performance through ANCOVA, a two-way ANOVA was conducted to further investigate the effects of institute and diagnosis on model accuracy (Fig. 6 b). The analysis revealed significant main effects of institute (p < 0.001) and diagnosis (p = 0.007), as well as a significant interaction (p < 0.001), indicating that the effect of diagnosis on model performance was dependent on the institute. A pair wise post hoc comparison of the institutes reveals higher accuracy with TiHo data as compared to RVC and Glasgow (p < 0.001 and p = 0.001), as well as higher accuracy with UC Davis data as compared to Glasgow (p = 0.029). The two-way ANOVA has to be interpreted carefully, since Levene’s test rejected the assumption of variance homogeneity ( p = 0.014). However, visual inspection of the Q-Q plot of the standardized residuals shows approximate normality ( Suppl. Figure 3 ), and Levene’s test for all post hoc one-way ANOVAs did not reject the assumption of variance homogeneity (TiHo p = 0.199; UC Davis p = 0.148; RVC p = 0.813; Glasgow p = 0.075). Accordingly, one-way ANOVA revealed a significant effect of diagnosis in the TiHo subset (ANOVA, p = 0.028), with MRI-normal cases achieving significantly higher accuracy than inflammatory cases (Bonferroni correction, p = 0.031). Furthermore, a significant effect of diagnosis was found in the RVC cohort (ANOVA, p = 0.040); however, pairwise post hoc tests could not be calculated due to the small number of cases in the ‘other causes’ group. In contrast, no significant effects of diagnosis on accuracy were detected in the UC Davis (ANOVA, p = 0.055) and Glasgow subsets (ANOVA, p = 0.143). Due to the significant interaction between weight and breed size identified in the ANCOVA, the effect of body weight on CNN classification accuracy was examined separately within each breed size category (small, medium, large) using linear regressions (Fig. 6 c; residual plot in Suppl Fig. 4) . In small breeds, no significant association was found between weight and accuracy (p = 0.6006, R² = 0.0165). Similarly, the association was not significant in medium breeds (p = 0.3390, R² = 0.0916). However, in large breeds, a significant negative association was observed (p = 0.0186, R² = 0.2471), indicating that higher body weight within the large breeds was associated with reduced CNN accuracy. In addition to the ANCOVA, the individual effects of categorical factors on CNN accuracy were examined using one-factor tests ( Suppl. Figure 5 ). Institution significantly affected accuracy, with TiHo cases showing higher accuracy than RVC and Glasgow (Levene’s test, p = 0.016; Kruskal-Wallis test, p ≤ 0.001; Suppl. Figure 5a ). Accuracy was not significantly affected by manufacturer (Levene’s test, p = 0.022; Kruskal-Wallis test, p = 0.119; Suppl. Figure 5c ). Scanner model had a significant effect, with Achieva showing higher accuracy than Intera and Magnetom Essenza (Levene’s test, p = 0.050; Kruskal-Wallis test, p = 0.003; Suppl. Figure 5e ). Scanner field strength significantly influenced accuracy, with 3 T datasets showing higher accuracy than 1.5 T datasets (Levene’s test, p = 0.052; independent t-test, p < 0.001; Suppl. Figure 5g ). Accuracy was not significantly affected by breed size (Levene’s test, p = 0.131; ANOVA, p = 0.400; Suppl. Figure 5i ), skull conformation (Levene’s test, p = 0.288; independent t-test, p = 0.374; Suppl. Figure 5k ), sex (Levene’s test, p = 0.877; independent t-test, p = 0.815; Suppl. Figure 5m ), or diagnosis (Levene’s test, p = 0.001; Kruskal-Wallis test, p = 0.848; Suppl. Figure 5o ). Simple linear regression analyses were performed to assess whether CNN classification accuracy was individually influenced by age at MRI, weight at MRI, number of available sequences, and number of slices per case ( Suppl. Figure 6 ). The number of sequences demonstrated a positive linear relationship with accuracy (p = 0.003, R² = 0.162; Suppl. Figure 6a ). Similarly, the number of slices was positively linearly correlated with accuracy (p < 0.001, R² = 0.222; Suppl. Figure 6b) . In contrast, no significant linear correlation was found for weight at MRI (p = 0.839, R 2 = 0.001; Suppl. Figure 6e ) and age at MRI (p = 0.853, R² = 0.001; Suppl. Figure 6g ). Discussion CNN models are emerging as promising diagnostic tools in medical imaging, though their application in veterinary medicine remains in its early stages. In this study, we evaluated a CNN model to classify canine brain pre- and post-contrast T1w MRI studies as normal or abnormal, aiming to enable computer-assisted detection of brain lesions and improve reporting efficiency in clinical practice. Beyond evaluating overall performance, we examined the influence of biological, technical, and institutional factors on model accuracy. Our ANCOVA analysis revealed that lesion diagnosis, institutional setting, and breed size each had significant effects, all with large effect sizes, while body weight showed a medium effect. Significant interactions with large effect sizes were also observed between diagnosis and institute, as well as between breed size and weight. The model achieved its highest accuracy on the T1w pre-contrast sagittal sequence. Moreover, a spatial pattern was observed along the three anatomical axes, with higher performance in central brain regions and more classification errors in peripheral slices. These findings underscore the multifactorial nature of CNN performance and highlight the importance of dataset diversity, anatomical context, and sequence selection in developing robust and generalisable AI models for veterinary neurology. The model was trained on 444 MRI data sets collected from four veterinary referral centres, including a variety of breeds, ages, body weights, diagnoses, and MRI scanner types. This multi-institutional dataset was designed to reflect real-world variability and to support the development of a model capable of generalising across diverse clinical settings. Previous studies exploring CNNs in veterinary imaging have provided valuable insights; however, they were limited by small cohorts or single-centre dataset [ 22 – 28 ]. In contrast, our study utilises a broader and more heterogeneous dataset that better represents the complexity of clinical practice. The customised CNN model presented in this study demonstrated a good ability to classify canine brain MRI slices as normal or abnormal. At the slice level, it achieved an overall accuracy of 74%, with comparable classification rates for normal (74%) and abnormal (73%) slices (Fig. 2 ). As shown in Table 2 , the model achieved a precision of 0.86 for normal slices, indicating that predictions of normality were generally reliable. In contrast, the lower precision for abnormal slices of 0.55 reflects a higher rate of false positives in this class. Nonetheless, recall values were similar for both classes (0.74 for normal and 0.73 for abnormal), indicating that the model successfully identified the majority of true instances. The corresponding F1-scores were 0.80 for normal and 0.63 for abnormal slices. This discrepancy may be attributed to the relatively limited dataset size compared with the complexity of the classification task, which may have constrained the model’s exposure to the full range of abnormal presentations [ 50 ]. Furthermore, the heterogeneous and often subtle nature of abnormalities likely contributed to the increased false positive rate [ 6 , 50 ]. To date, few studies have used CNN models to evaluate canine brain MRI. Banzato et al. reported high classification accuracies in CNN-based canine brain MRI studies, achieving up to 94% when distinguishing between meningiomas and gliomas [ 23 ], and correctly predicting meningioma grade in 80% of cases [ 25 ]. The discrepancy in accuracy between our study and previous applications of CNNs in veterinary neurology is likely attributable to differences in sample size and methodological design. Importantly, small sample sizes in neuroimaging studies employing machine learning have been shown to introduce substantial variability in performance estimates, thereby limiting the reliability of conclusions and the generalisability of findings [ 51 ]. These studies focused on well-defined tumour types and employed smaller and simpler datasets, which may have simplified the classification task. In contrast, our model was trained on a larger, multi-institutional dataset that included a broader spectrum of diagnoses and greater variability in breed, MRI scanner type, and acquisition protocols. Hence, based on the ANCOVA results, the intentional inclusion of real-world variability in our experimental design is likely the primary factor contributing to the lower accuracy observed compared to the previous studies. Our analysis shows that the almost unlimited variety of possible diagnoses and the variability of the institutional framework conditions are the biggest factors influencing the performance of the CNN model. While this heterogeneity may result in the numerically lower performance as compared to the previously mentioned studies, the inclusion of real-life variability is a prerequisite for the development of practical applications, since it offers a more realistic representation of clinical diversity. Training on low-variability data sourced from a single institution carries the risk of poor reproducibility, as models may end up capturing dataset-specific artefacts rather than learning generalisable features [ 52 ]. Zhang et al. (2016) showed that deep neural networks are capable of fitting random labels or noise, emphasising their potential to memorise data without extracting meaningful patterns [ 52 ]. Consequently, models that demonstrate strong performance under narrowly controlled research settings often fail to translate to real-world scenarios, highlighting the need for large, heterogeneous datasets and thorough external validation to ensure robust generalisation. Ultimately, while this heterogeneity may result in numerically lower performance compared to the previous studies, incorporating real-life variability is essential for the development of practical applications, as it provides a more accurate reflection of clinical diversity [ 23 , 25 ]. The ANCOVA indicated that CNN classification performance was influenced by several biological, technical, and institutional factors, which may need to be considered as confounding factors in future studies (Fig. 6 ). Imaging-related differences, including MRI protocols, manufacturer, scanner model, magnetic field strength, and acquisition techniques, summarised as 'institute,' along with the range of morphological changes represented by 'diagnosis', are shown in our results to be the major factors affecting CNN model performance. The CNN model performed best on cases from TiHo, and poorest on cases from Glasgow, reflecting the different number of cases from the institutes in the training dataset. Lower accuracy for a particular institute or diagnostic category indicates suboptimal performance on that subset, but does not imply that these cases are inherently more challenging for the CNN model. This finding aligns with previous studies that underscore the susceptibility of CNN-based models to domain shifts in medical imaging [ 53 , 54 ]. Variations in patient demographics, imaging protocols, and institutional practices present significant challenges for cross-site generalisation, ultimately affecting the model’s ability to perform consistently across different clinical environments [ 53 , 54 ]. A critical comparison of the ANCOVA results with the one-factor analyses shows that, although multiple factors individually influence accuracy, their effects are negligible compared to the main factors, which are institutional setting, diagnostic category, breed size, and weight, when the full experimental design is considered. Since CNN-based models learn patterns from the data they are exposed to, it is crucial to train them on diverse datasets from multiple institutes and a variety of diseases, ensuring the model can distinguish between different classes in the test set [ 55 , 56 ]. Recent research has shown that differences in patient populations, imaging methods, and institutional protocols can lead to substantial domain shifts, which consequently hinder the generalisability of CNN-based models across diverse clinical settings [ 53 , 54 ]. To address this, training on diverse, multi-centre datasets that incorporate a wide range of pathologies is essential to achieve reliable model performance in independent, real-world environments [ 52 , 55 ]. In this study, a slice-by-slice approach was employed for CNN training, resulting in predictions made on individual MRI slices. To interpret these predictions at the patient level, we explored the optimal threshold for classifying a patient as abnormal or normal based on the proportion of slices predicted to be abnormal versus all slices within the respective patient. The results indicated that there is no single optimal threshold; rather, two thresholds (a balanced threshold of 51% and a more conservative 64%) may be applicable depending on the clinical context. For triage applications, where the model serves as a secondary reader or “second pair of eyes,” a more sensitive threshold of 51% may be preferable to minimise the risk of missing potential abnormalities. In contrast, for situations where high specificity is critical, such as to avoid unnecessary referrals or follow-ups, the 64% threshold may be more appropriate. Thus, the classification threshold should be adjusted according to the specific clinical objective. As shown in Fig. 3 , a pattern is observed in the distribution of prediction categories (true positive, true negative, false positive, false negative) along the anatomical axes of the slices within all sequences. This pattern is confirmed by statistical analysis, which revealed that the CNN model's performance varied significantly across quartiles (Q1–Q4) within each MRI sequence. The true positive category is most prominent in the central region of the slice axes for abnormal patients, indicating that the model’s predictions are more stable in these mid-slice regions. One possible explanation is the greater anatomical homogeneity of central brain structures across patients, allowing the model to learn more generalisable and discriminative features. Furthermore, central slices contain a larger volume of brain tissue, providing richer spatial context, which may enhance model confidence and prediction accuracy. In contrast, a higher density of classification errors is observed near the extremes of the slice axes. This may be due to reduced structural information, increased anatomical variability, inconsistent signal characteristics, or a combination of biological and imaging-related factors. Peripheral slices typically contain less brain tissue and a greater proportion of non-neural structures, such as bone, cerebrospinal fluid, or air-filled spaces. These regions are more prone to partial volume effects and imaging artefacts, both of which can degrade feature quality and reduce the model’s predictive reliability. Based on these findings, we concluded that the relative position of each slice along the anatomical axes influences prediction consistency, with this effect varying by sequence and orientation. To address these limitations, Kamnitsas et al. (2017) suggested excluding peripheral slices from training due to their limited diagnostic value. In their multi-scale 3D CNN for brain lesion segmentation, they excluded outer slices, citing poor information quality at the volume boundaries [ 57 ]. Furthermore, the implementation of a robust skull-stripping step, as described by Nour Eddin et al. (2023), prior to training could reduce the influence of non-brain structures [ 58 ]. However, excluding peripheral slices should be approached with caution, as these slices are routinely reviewed in clinical practice. Incorporating peripheral slices into model training might enhance robustness and better reflect the diagnostic process. It is worth noting that in our study, body weight was negatively associated with classification accuracy in large breeds, a pattern not observed in small or medium breeds. This is in contrast to findings in human neonatal MRI studies, where smaller brain volume in infants have been associated with reduced image quality and diagnostic performance, largely due to lower tissue contrast, increased partial volume effects, and technical limitations [ 59 ]. While CNNs do not interpret images in the same way as human experts, they identify and weight statistical patterns in the input data that are associated with specific outputs (e.g., normal vs abnormal), without possessing any semantic understanding of anatomical structures such as a “lesion” or “brain” [ 60 ]. As a result, misclassifications in peripheral slices or in large-breed dogs may arise from non-neural structures, signal artefacts, or anatomical variability that resemble features the model has learned to associate with abnormal cases. Similar findings have been reported in human imaging studies, where CNNs were shown to rely on spurious correlations rather than pathology-specific features [ 61 ]. Explainability tools such as Gradient-weighted Class Activation Mapping (Grad-CAM), which provide visual explanations by using the gradient of a target class flowing into the final convolutional layer to generate a coarse localisation map, may help determine whether the CNN is attending to biologically relevant brain regions or being misled by non-neural features such as bone, cerebrospinal fluid, or imaging artefacts in peripheral slices [ 62 ]. Simple regression analyses showed that both the number of slices and available sequences per case positively affected accuracy, indicating that more imaging data leads to better classification performance. The model benefits from access to more data per patient, capturing additional spatial context and subtle abnormalities [ 63 ]. Another finding of this study was that the T1w pre-contrast sagittal sequence demonstrated significantly higher classification accuracy (81.8%) compared to other T1w sequences. This may be because the sagittal plane provides a clearer view of midline structures, particularly the ventricular system, which can exhibit subtle morphological changes associated with underlying pathology. In human medicine, similar observations were reported by Grigas et al. (2025) in a study distinguishing individual with mild cognitive impairment from cognitively normal controls. They found that sagittal T1w slices provided the highest diagnostic performance, followed by axial and coronal planes [ 59 ]. This was likely due to the presence of informative features in midline structures such as the corpus callosum, thalamus, and lateral ventricles, which are known to undergo structural and metabolic alterations in the early stages of cognitive decline [ 64 ]. Considering all aspects, the mentioned factors can be considered potential confounders and are important to acknowledge when designing, interpreting, or applying CNN models in a clinical context. Limitations. This study has several limitations, and key challenges remain to be addressed in future work. A primary limitation is the relatively small number of cases included. The reason for this was that we aimed to include only cases which had a very high diagnostic accuracy, e.g. all brain neoplasms included were histopathologically confirmed. This limited the dataset significantly, albeit using the dataset from large academic institutions. Although the training set was heterogeneous and sourced from multiple institutions, the total of 444 cases may be insufficient for training a complex deep learning model, potentially limiting the model’s ability to generalise across the full spectrum of pathological presentations observed in clinical practice [ 50 ]. Nonetheless, the performance achieved can be considered acceptable given the size and diversity of the dataset, providing a valuable proof of concept for future research. In veterinary clinical practice, it is uncommon for completely healthy dogs without clinical signs to undergo brain MRI, making the acquisition of true normal controls challenging. As a result, the control group in this study included patients with Tier II idiopathic epilepsy, paroxysmal dyskinesia, and extracranial causes, rather than dogs with entirely normal neurological status. Although these dogs showed no structural brain abnormalities on MRI, the presence of underlying neurological disease could introduce subtle changes not detectable by expert visual assessment, potentially influencing model training and performance. Future studies would benefit from a broader and more representative control population to further enhance model reliability and generalisability. Another key limitation of this study is the exclusive use of T1w pre- and post-contrast images for model development. While T1w imaging is effective for detecting mass lesions and abnormalities associated with blood-brain barrier disruption, it is less sensitive to other important pathological changes. Conditions such as oedema, demyelination, mild inflammatory processes, early ischemic changes, and subtle structural abnormalities are often better visualised on T2w, FLAIR, or diffusion-weighted sequences [ 65 ]. In fact, some abnormalities, particularly those without significant contrast enhancement, may have been underrepresented or missed during model training. The next phase of this study will focus on incorporating additional MRI sequences to enable more comprehensive detection of a wider range of brain pathologies. While a direct comparison with veterinary radiologists was beyond the scope of this study, it will be essential for future validation and clinical integration of such models in real-world scenarios. Evaluating the model’s performance relative to expert clinicians is a critical next step to determine whether it can match or complement human expertise. These comparisons are vital for defining the model’s potential role in clinical practice, whether as a decision support tool, a triage system, or an aid for less experienced clinicians. In this study, slices with noise, motion artefacts, and incomplete imaging were removed during dataset preparation to ensure clean inputs for model training. However, in clinical practice, MRI studies submitted for automated analysis often contain motion artefacts, suboptimal image quality, or missing sequences. As a result, models developed under controlled research conditions may perform less reliably when applied to real-world clinical data. Future work should focus on enhancing model robustness to such variability by incorporating artefact detection modules, implementing quality control procedures, and training on more heterogeneous, imperfect datasets that better reflect clinical conditions. Conclusion Despite the limitations mentioned, the present CNN model provides a foundation for further exploration of CNN-based methods in canine brain MRI classification. This study also highlights the value of using real-world, multi-institutional data with a broad range of diagnoses to train CNN-based models for clinical settings. Moreover, distinct patterns in classification performance along the anatomical axes highlight the need to use explainability tools to verify whether the CNN is focusing on biologically relevant brain regions. In the medical domain, such tools are essential to build trust, ensure clinical relevance, and identify potential biases in model predictions. The findings underscore the potential of deep learning to support diagnostic workflows in veterinary neuroimaging. Future work should prioritise collaborative data sharing, the use of larger and more balanced datasets, and rigorous external validation to enhance model generalisability and facilitate clinical integration. Declarations Competing interests Authors J. Janisch, A. Hansch, and M. Laue are employed by DOS Software-Systeme GmbH, a commercial developer of AI solutions. The employer, DOS Software-Systeme GmbH, had no role in the interpretation of the data or in the conclusions drawn from this study. S. Abani, J. Nessler, A. Hansch, and M. Laue were funded by the “Central Innovation Program for Small- and Medium-Sized Enterprises” of the German Federal Ministry for Economic Affairs and Climate Action (grant no. KK5066602LB). Additionally, we acknowledge financial support from the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation. Both funders had no influence on the study design, data collection and analysis, or the conclusions drawn in this paper. The remaining authors declare no competing interests. Funding This work was funded by the Central Innovation Program for small- and medium-sized enterprises of the German Federal Ministry for Economic Affairs and Climate Action - grant number KK5066602LB. Open access publishing was supported by the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation. Author Contribution H.V. supervised this study. H.V., S.A. and J.N. designed the experiments. P.J.D., S.D., R.G.Q., and E.M. contributed to data curation by providing raw MRI data and patient information. S.A. and A.S. annotated the cases. S.A. and F.S. extracted the patient data. S.A., S.T., S.GH., and R.U. analysed the data. S.A. wrote the first draft of the manuscript. M.L., A.H. and J.J. developed the CNN model. All authors reviewed the manuscript and approved the final version. Acknowledgement The authors acknowledge the radiologists and pathologists at the University of Veterinary Medicine Hannover, Hannover, Germany; the Royal Veterinary College, London, United Kingdom; the School of Veterinary Medicine, University of California, Davis, California, USA; and the Small Animal Hospital, School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom, for providing access to and permission to use the radiology and necropsy reports for this study. We also gratefully acknowledge L. Böhringer and L. Lemke for assistance with patient selection, G. Lester, P. Hallur, D. Sanchez-Masian, and A. Wang Leonardo for their valuable consultation during the course of this study. Data Availability The datasets generated during the current study is available from the corresponding author upon reasonable request. No animals were directly involved in this study, as all data were retrospectively collected from clinical cases that had previously undergone brain (MRI for diagnostic purposes. All procedures were performed as part of routine clinical care at four university veterinary referral centers: the Department of Small Animal Medicine and Surgery at the University of Veterinary Medicine Hannover (Germany), the William R. Pritchard Veterinary Medical Teaching Hospital at the University of California, Davis (USA), the Small Animal Referral Hospital at the Royal Veterinary College in London (UK), and the Small Animal Hospital of the University of Glasgow (UK). Approval for the use of medical data was obtained through the owner's informed consent prior to hospital admission, in accordance with each university’s institutional guidelines. The acquisition of MRI data was supervised by veterinary technicians from the diagnostic imaging units of the referral centres, in compliance with their respective animal welfare policies. All patients were imaged under general anaesthesia. References Gavin, P. R. Growth of clinical veterinary magnetic resonance imaging. Vet. Radiol. Ultrasound . 52 , S2–S4. https://doi.org/10.1111/j.1740-8261.2010.01779.x (2011). Vickram, A. S., Infant, S. S., Priyanka & Chopra, H. AI-powered techniques in anatomical imaging: impacts on veterinary diagnostics and surgery. Ann. Anat. 258 , 152355. https://doi.org/10.1016/j.aanat.2024.152355 (2025). Werring, D. J. et al. The pathogenesis of lesions and normal-appearing white matter changes in multiple sclerosis: a serial diffusion MRI study. Brain 123 , 1667–1676. https://doi.org/10.1093/brain/123.8.1667 (2000). The Royal College of Radiologists. Clinical radiology workforce census report 2023. (2024). Available at https://www.rcr.ac.uk/media/4imb5jge/_rcr-2024-clinical-radiology-workforce-census-report.pdf Dick White Referrals. Addressing the shortage of veterinary imaging specialists. (2020). Available at https://www.dickwhitereferrals.com/news/addressing-the-shortage-of-veterinary-imaging-specialists/ Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115–118. https://doi.org/10.1038/nature21056 (2017). Avendi, M., Kheradvar, A. & Jafarkhani, H. A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal. 30 , 108–119. https://doi.org/10.1016/j.media.2016.01.005 (2016). Garcea, F., Serra, A., Lamberti, F. & Morra, L. Data augmentation for medical imaging: a systematic literature review. Comput. Biol. Med. 152 , 106391. https://doi.org/10.1016/j.compbiomed.2022.106391 (2023). Bohmrah, M. K. & Kaur, H. Advanced hybridization and optimization of DNNs for medical imaging: a survey on disease detection techniques. Artif. Intell. Rev. 58 , 122. https://doi.org/10.1007/s10462-024-11049-x (2025). Mienye, I. D., Swart, T. G., Obaido, G., Jordan, M. & Ilono, P. Deep convolutional neural networks in medical image analysis: a review. Information 16 , 195. https://doi.org/10.3390/info16030195 (2025). Yu, H., Yang, L. T., Zhang, Q., Armstrong, D. & Deen, M. J. Convolutional neural networks for medical image analysis: state-of-the-art, comparisons, improvement and perspectives. Neurocomputing 444 , 92–110. https://doi.org/10.1016/j.neucom.2020.04.157 (2021). Liang, G. & Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput. Methods Programs Biomed. 187 , 104964. https://doi.org/10.1016/j.cmpb.2019.06.023 (2020). Nahiduzzaman, M., Islam, M. R. & Hassan, R. ChestX-ray6: prediction of multiple diseases including COVID-19 from chest X-ray images using convolutional neural network. Expert Syst. Appl. 211 , 118576. https://doi.org/10.1016/j.eswa.2022.118576 (2023). Li, L., Xu, M., Wang, X., Jiang, L. & Liu, H. Attention based glaucoma detection: a large-scale database and CNN model. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 10563–10572; (2019). https://doi.org/10.1109/CVPR.2019.01082 Eltoukhy, M. M., Hosny, K. M. & Kassem, M. A. Classification of multiclass histopathological breast images using residual deep learning. Comput. Intell. Neurosci. 9086060; (2022). https://doi.org/10.1155/2022/9086060 (2022). Khened, M., Kollerathu, V. A. & Krishnamurthi, G. Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Med. Image Anal. 51 , 21–45. https://doi.org/10.1016/j.media.2018.10.004 (2019). Al-masni, M. A., Kim, D. H. & Kim, T. S. Multiple skin lesions diagnostics via integrated deep convolutional networks for segmentation and classification. Comput. Methods Programs Biomed. 190 , 105351. https://doi.org/10.1016/j.cmpb.2020.105351 (2020). Lee, S. M. & Kim, N. Deep learning model ensemble for the accuracy of classification degenerative arthritis. Comput. Mater. Continua . 75 , 1981–1994. https://doi.org/10.32604/cmc.2023.035245 (2023). Sarwinda, D., Paradisa, R. H., Bustamam, A. & Anggia, P. Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. Procedia Comput. Sci. 179 , 423–431. https://doi.org/10.1016/j.procs.2021.01.025 (2021). Xiao, S. et al. Review of applications of deep learning in veterinary diagnostics and animal health. Front. Vet. Sci. 12 , 1511522. https://doi.org/10.3389/fvets.2025.1511522 (2025). Ezanno, P. et al. Research perspectives on animal health in the era of artificial intelligence. Vet. Res. 52 , 40. https://doi.org/10.1186/s13567-021-00902-4 (2021). Banzato, T. et al. Use of transfer learning to detect diffuse degenerative hepatic diseases from ultrasound images in dogs: a methodological study. Vet. J. 233 , 35–40. https://doi.org/10.1016/j.tvjl.2017.12.026 (2018). Banzato, T., Bernardini, M., Cherubini, G. & Zotti, A. A methodological approach for deep learning to distinguish between meningiomas and gliomas on canine MR images. BMC Vet. Res. 14 , 317. https://doi.org/10.1186/s12917-018-1638-2 (2018). Yoon, Y., Hwang, T. & Lee, H. Prediction of radiographic abnormalities by the use of bag-of-features and convolutional neural networks. Vet. J. 237 , 43–48. https://doi.org/10.1016/j.tvjl.2018.05.009 (2018). Banzato, T., Cherubini, G. B., Atzori, M. & Zotti, A. Development of a deep convolutional neural network to predict grading of canine meningiomas from magnetic resonance images. Vet. J. 235 , 90–92. https://doi.org/10.1016/j.tvjl.2018.04.001 (2018). Dumortier, L., Guépin, F., Delignette-Muller, M. L., Boulocher, C. & Grenier, T. Deep learning in veterinary medicine, an approach based on CNN to detect pulmonary abnormalities from lateral thoracic radiographs in cats. Sci. Rep. 12 , 11418. https://doi.org/10.1038/s41598-022-14993-2 (2022). Vinicki, K., Ferrari, P., Belic, M. & Turk, R. Using convolutional neural networks for determining reticulocyte percentage in cats. arXiv preprint arXiv . https://doi.org/10.48550/arXiv.1803.04873 (2018). :1803.04873. Kim, H. J., Lee, S. H., Kim, H. J. & Lee, S. H. CNN-based diagnosis models for canine ulcerative keratitis. Sci. Rep. 9 , 1–10. https://doi.org/10.1038/s41598-019-50437-0 (2019). Pereira, A. et al. Artificial intelligence in veterinary imaging: an overview. Vet. Sci. 10 , 320. https://doi.org/10.3390/vetsci10050320 (2023). Niemeyer, F. et al. Automatic grading of intervertebral disc degeneration in lumbar dog spines. JOR Spine . 7 , e1326. https://doi.org/10.1002/jsp2.1326 (2024). Biercher, A. et al. Using deep learning to detect spinal cord diseases on thoracolumbar magnetic resonance images of dogs. Front. Vet. Sci. 8 , 8. https://doi.org/10.3389/fvets.2021.721167 (2021). Kaczmarska, A. et al. Postencephalitic epilepsy in dogs with meningoencephalitis of unknown origin: clinical features, risk factors, and long-term outcome. J. Vet. Intern. Med. 34 , 808–820. https://doi.org/10.1111/jvim.15687 (2020). Schwartz, M., Lamb, C. R., Brodbelt, D. C. & Volk, H. A. Canine intracranial neoplasia: clinical risk factors for development of epileptic seizures. J. Small Anim. Pract. 52 , 632–637. https://doi.org/10.1111/j.1748-5827.2011.01131.x (2011). Cerda-Gonzalez, S. et al. International veterinary canine dyskinesia task force ECVN consensus statement: terminology and classification. J. Vet. Intern. Med. 35 , 1218–1230. https://doi.org/10.1111/jvim.16108 (2021). De Risio, L. et al. International veterinary epilepsy task force consensus proposal: diagnostic approach to epilepsy in dogs. BMC Vet. Res. 11 , 148. https://doi.org/10.1186/s12917-015-0462-1 (2015). Python Software Foundation. Python, version 3.11. Wilmington, DE, USA ; Available at https://www.python.org/downloads/release/python-3110/ Medixant RadiAnt DICOM Viewer , version 4.0.1. Poznań, Poland ; Available at https://www.radiantviewer.com Pixmeo, S. A. R. L. OsiriX MD . FDA-cleared and CE-labeled DICOM viewer for macOS. Geneva, Switzerland ; Available at https://www.osirix-viewer.com/osirix/osirix-md/ V7 Labs. V7 Darwin: AI Data Labeling Platform . London, UK ; (2025). Available at https://www.v7labs.com The MathWorks Inc. Statistics and Machine Learning Toolbox , version R Natick, MA, USA ; (2022b). Available at https://www.mathworks.com/products/statistics.html He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , 770–778; (2016). https://doi.org/10.1109/CVPR.2016.90 Iandola, F. N. et al. Densenet: implementing efficient ConvNet descriptor pyramids. arXiv preprint arXiv :1404.1869; (2014). https://arxiv.org/abs/1404.1869 Chollet, F. & Xception Deep learning with depthwise separable convolutions. In Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 1800–1807. IEEE Computer Society. (2017). https://doi.org/10.1109/CVPR.2017.195 Howard, A. G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv :170404861. (2017). Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2818–2826 (Las Vegas, NV, USA, 27–30 June 2016). (2016). https://doi.org/10.1109/CVPR.2016.308 GraphPad, S. GraphPad Prism , Prism 8 for Windows, version 8.4.3. San Diego, CA, USA ; Available at https://www.graphpad.com IBM Corp. IBM SPSS Statistics for Windows , version 29.0.1.1. Armonk, NY, USA ; Available at https://www.ibm.com/products/spss-statistics Gravetter, F. J. & Wallnau, L. B. Statistics for the Behavioral Sciences 10th edn (Cengage Learning, 2016). Hoffmann, G., Lichtinghagen, R. & Wosniok, W. Ein einfaches Verfahren zur Schätzung von Referenzintervallen aus routinemäßig erhobenen Labordaten. Lab. Med. 39 https://doi.org/10.1515/labmed-2015-0082 (2015). Buda, M., Maki, A. & Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106 , 249–259. https://doi.org/10.1016/j.neunet.2018.07.011 (2018). Varoquaux, G. Cross-validation failure: Small sample sizes lead to large error bars. NeuroImage 180, 68–77 (2018). https://doi.org/10.1016/j.neuroimage.2017.06.061 . New advances in encoding and decoding of brain signals. Zhang, C., Bengio, S., Hardt, M. & Recht, B. & Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv :161103530 (2017). Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLOS Med. 15 , e1002683. https://doi.org/10.1371/journal.pmed.1002683 (2018). Mahmutoglu, M. et al. Optimizing MRI sequence classification performance: insights from domain shift analysis. Eur. Radiol. https://doi.org/10.1007/s00330-025-11671-5 (2025). Garrucho, L. et al. Domain generalization in deep learning-based mass detection in mammography: a large-scale multi-center study. Artif. Intell. Med. 132 , 102386. https://doi.org/10.1016/j.artmed.2022.102386 (2022). Holzschuh, J. et al. The impact of multicentric datasets for the automated tumor delineation in primary prostate cancer using convolutional neural networks on 18F-PSMA-1007 PET. Radiat. Oncol. 19 , 106. https://doi.org/10.1186/s13014-024-02491-w (2024). Kamnitsas, K. et al. DeepMedic for brain tumor segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries , 138–149 (Springer International Publishing, 2016). https://doi.org/10.1007/978-3-319-55524-9_14 Nour Eddin, J., Dorez, H. & Curcio, V. Automatic brain extraction and brain tissues segmentation on multi-contrast animal MRI. Sci. Rep. 13 , 6416. https://doi.org/10.1038/s41598-023-33289-7 (2023). Dubois, J. et al. MRI of the neonatal brain: a review of methodological challenges and neuroscientific advances. J. Magn. Reson. Imaging . 53 , 1318–1343. https://doi.org/10.1002/jmri.27192 (2021). Huff, D., Weisman, A. & Jeraj, R. Interpretation and visualization techniques for deep learning models in medical imaging. Phys. Med. Biol. 66 https://doi.org/10.1088/1361-6560/abcd17 (2021). DeGrave, A. J., Janizek, J. D. & Lee, S. I. AI for radiographic COVID-19 detection selects shortcuts over signal. medRxiv (2020). https://doi.org/10.1101/2020.09.13.20193565 Preprint. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128 , 336–359. https://doi.org/10.1007/s11263-019-01228-7 (2019). Coupet, M. et al. A multi-sequences MRI deep framework study applied to glioma classification. Multimed Tools Appl. 81 , 13563–13591. https://doi.org/10.1007/s11042-022-12316-1 (2022). Grigas, O., Damaševičius, R. & Maskeliūnas, R. Multimodal convolutional mixer for mild cognitive impairment detection. Comput. Mater. Contin . 84 , 1805–1838. https://doi.org/10.32604/cmc.2025.064354 (2025). Mai, W. Diagnostic MRI in Dogs and Cats 1st edn (CRC, 2018). Additional Declarations Competing interest reported. Authors J. Janisch, A. Hansch, and M. Laue are employed by DOS Software-Systeme GmbH, a commercial developer of AI solutions. The employer, DOS Software-Systeme GmbH, had no role in the interpretation of the data or in the conclusions drawn from this study. S. Abani, J. Nessler, A. Hansch, and M. Laue were funded by the “Central Innovation Program for Small- and Medium-Sized Enterprises” of the German Federal Ministry for Economic Affairs and Climate Action (grant no. KK5066602LB). Additionally, we acknowledge financial support from the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation. Both funders had no influence on the study design, data collection and analysis, or the conclusions drawn in this paper. The remaining authors declare no competing interests. Supplementary Files 20250904SamiraAbaniSupplementaryMaterial.docx Sup.Figure1.tif Sup.Figure2.tif Sup.Figure3.tif Sup.Figure4.tif Sup.Figure5.tif Sup.Figure6.tif Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 23 Dec, 2025 Reviews received at journal 22 Dec, 2025 Reviewers agreed at journal 02 Nov, 2025 Reviews received at journal 26 Sep, 2025 Reviewers agreed at journal 17 Sep, 2025 Reviewers invited by journal 16 Sep, 2025 Editor invited by journal 08 Sep, 2025 Editor assigned by journal 06 Sep, 2025 Submission checks completed at journal 05 Sep, 2025 First submitted to journal 04 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7537077","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":511949594,"identity":"7d56dca4-457a-401c-a660-ca79c8aa4e83","order_by":0,"name":"Samira Abani","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5klEQVRIiWNgGAWjYJCCAyDCAMT4YMAgA2UTqeXgDAMGHqK0MMC0MAPVE9Yi33468XBFDYO8OfvZg4dtCux4+CWSNzD8qNiG2/AzuRsOnjnGYLizJy/hcI5BMo/kjLQCxp4zt/G4B6ilgY2BccOBHAOglgM8BjdyDJgZ23Brke9/C9Tyj8F+w/k3BoctiNHCcANoS2MbQ+IGoMrDDMRoMbgBtKWxTyJ554w3Bgd7QH7peVZwEJ9f5PtzN39s+GZju50/x/jDjz92cvzsyRsf/KjA4zAIkEDlHiCkfhSMglEwCkYBfgAAnZ1bDv9Lr+oAAAAASUVORK5CYII=","orcid":"","institution":"University of Veterinary Medicine Hannover","correspondingAuthor":true,"prefix":"","firstName":"Samira","middleName":"","lastName":"Abani","suffix":""},{"id":511949595,"identity":"e2bdea84-6442-4fb9-ad3f-151ded5e9a72","order_by":1,"name":"Merlin Laue","email":"","orcid":"","institution":"DOS Software-Systems","correspondingAuthor":false,"prefix":"","firstName":"Merlin","middleName":"","lastName":"Laue","suffix":""},{"id":511949596,"identity":"5792969f-d7d3-4853-aec0-d1e8f7c14539","order_by":2,"name":"Peter J Dickinson","email":"","orcid":"","institution":"University of California, Davis","correspondingAuthor":false,"prefix":"","firstName":"Peter","middleName":"J","lastName":"Dickinson","suffix":""},{"id":511949597,"identity":"1da8ca78-8952-48a5-b486-fc2f6bcf3f76","order_by":3,"name":"Steven De Decker","email":"","orcid":"","institution":"Royal Veterinary College","correspondingAuthor":false,"prefix":"","firstName":"Steven","middleName":"","lastName":"De Decker","suffix":""},{"id":511949598,"identity":"fed1adaf-c4da-4966-8245-055edb721b31","order_by":4,"name":"Rodrigo Gutierrez-Quintana","email":"","orcid":"","institution":"School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Rodrigo","middleName":"","lastName":"Gutierrez-Quintana","suffix":""},{"id":511949599,"identity":"5609446c-4804-4893-b787-5dc03cb49b63","order_by":5,"name":"Siavash Ghiasvand","email":"","orcid":"","institution":"TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Siavash","middleName":"","lastName":"Ghiasvand","suffix":""},{"id":511949600,"identity":"b1e19d0f-5b15-4401-ab2b-1f0b926092a6","order_by":6,"name":"Sahar Tahmasebi","email":"","orcid":"","institution":"Leibniz University Hannover","correspondingAuthor":false,"prefix":"","firstName":"Sahar","middleName":"","lastName":"Tahmasebi","suffix":""},{"id":511949601,"identity":"d82d4111-9401-4599-828e-c13557c6666a","order_by":7,"name":"Franziska Spohn","email":"","orcid":"","institution":"University of Veterinary Medicine Hannover","correspondingAuthor":false,"prefix":"","firstName":"Franziska","middleName":"","lastName":"Spohn","suffix":""},{"id":511949602,"identity":"932d33b5-ef37-49f7-8364-09de18cb2d59","order_by":8,"name":"Alexander Sabbotin","email":"","orcid":"","institution":"University of Veterinary Medicine Hannover","correspondingAuthor":false,"prefix":"","firstName":"Alexander","middleName":"","lastName":"Sabbotin","suffix":""},{"id":511949603,"identity":"3ecb8097-3c5a-4ea9-ab30-2fd51fc84114","order_by":9,"name":"Alexej Hänsch","email":"","orcid":"","institution":"DOS Software-Systems","correspondingAuthor":false,"prefix":"","firstName":"Alexej","middleName":"","lastName":"Hänsch","suffix":""},{"id":511949604,"identity":"8f18a8b7-8659-447c-a064-a2e83a20da6b","order_by":10,"name":"Jörg Janisch","email":"","orcid":"","institution":"DOS Software-Systems","correspondingAuthor":false,"prefix":"","firstName":"Jörg","middleName":"","lastName":"Janisch","suffix":""},{"id":511949606,"identity":"59be8346-fbb8-4bce-8d6e-1a6501f781ba","order_by":11,"name":"Reiner Ulrich","email":"","orcid":"","institution":"Leipzig University","correspondingAuthor":false,"prefix":"","firstName":"Reiner","middleName":"","lastName":"Ulrich","suffix":""},{"id":511949608,"identity":"02b51eec-f1c0-4fe7-a9a2-8967c446cf21","order_by":12,"name":"Ehren McLarty","email":"","orcid":"","institution":"University of California, Davis","correspondingAuthor":false,"prefix":"","firstName":"Ehren","middleName":"","lastName":"McLarty","suffix":""},{"id":511949612,"identity":"fbc4f45e-8f84-4cf4-a3dd-e3d5f849afe7","order_by":13,"name":"Jasmin Nessler","email":"","orcid":"","institution":"University of Veterinary Medicine Hannover","correspondingAuthor":false,"prefix":"","firstName":"Jasmin","middleName":"","lastName":"Nessler","suffix":""},{"id":511949615,"identity":"883482e9-50e9-4398-bb47-615d06866846","order_by":14,"name":"Holger Volk","email":"","orcid":"","institution":"University of Veterinary Medicine Hannover","correspondingAuthor":false,"prefix":"","firstName":"Holger","middleName":"","lastName":"Volk","suffix":""}],"badges":[],"createdAt":"2025-09-04 14:23:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7537077/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7537077/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":91077493,"identity":"c12252f8-c0cb-43c2-91b9-41435af953d0","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1824273,"visible":true,"origin":"","legend":"\u003cp\u003eSchematic illustration of the \u003cem\u003eSepNetDense\u003c/em\u003e model architecture. The input is a grayscale image (128 × 128 × 1) passed through a 3×3 convolutional layer (32 filters). The main building blocks of the proposed architecture are the Weight Blocks (WB), consisting of a series of depthwise separable convolution (DSC) layers, with intermediate batch normalization (BN) layers and a residual connection. The architecture of the WBs is shown on the right side of the image. Every three WBs are followed by a MaxPooling layer. At the end, a Dropout layer prevents overfitting, followed by Global Average Pooling (GAP) and a Fully Connected (FC) layer with two output units to provide the final binary classification results. It is important to note that the intermediate BN layer from the first DSC block and the MaxPooling layer from the last WB are removed.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/4ee74af690c13cdc094fea69.png"},{"id":91077491,"identity":"2d964ced-29ab-4dd9-a512-72aa652a54b4","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":334857,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix illustrating the classification performance of the CNN model. The matrix shows the absolute number of slices labelled as abnormal or normal in the ground truth (y-axis) and predicted by the model (x-axis). Correct predictions appear along the main diagonal (abnormal/abnormal and normal/normal), while off-diagonal entries indicate misclassifications.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/39bda9444f4e8c9a498b5a6b.png"},{"id":91081925,"identity":"956043a3-33f9-4ef6-8a31-48675d51b105","added_by":"auto","created_at":"2025-09-11 11:48:19","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":25881641,"visible":true,"origin":"","legend":"\u003cp\u003eSpatial distribution of the slice-wise prediction outcomes after comparison with the ground truth across all six T1w MRI sequences in the test set, grouped by patient status (normal, abnormal). Each heatmap displays the classification outcome (true positive, violet; true negative, light blue; false positive, yellow; false negative, red) for individual slices along the normalized anatomical axis (x-axis). Panels represent: \u003cstrong\u003e(a)\u003c/strong\u003e transverse pre-contrast, \u003cstrong\u003e(b)\u003c/strong\u003e transverse post-contrast, \u003cstrong\u003e(c)\u003c/strong\u003e sagittal pre-contrast, \u003cstrong\u003e(d)\u003c/strong\u003esagittal post-contrast, \u003cstrong\u003e(e)\u003c/strong\u003e dorsal pre-contrast, and \u003cstrong\u003e(f)\u003c/strong\u003e dorsal post-contrast sequences. Normal patients are predominantly characterized by true negatives, whereas abnormal patients show dense clusters of true positives, especially in the central portion of the brain. The number of patients (y-axis) varies across the sequences due to differences in image acquisition.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/bedaee786aaca852d62adeaf.png"},{"id":91079272,"identity":"ed5a5b74-a5d5-4b70-ac4b-f1addd1f88a0","added_by":"auto","created_at":"2025-09-11 11:24:19","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":704054,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of slice-wise accuracy across T1-weighed MRI sequences. Overall accuracy differed significantly between the six sequences (Chi-square test, p \u0026lt; 0.001), with highest performance observed for sagittal pre-contrast images. Asterisks indicate statistically significant pairwise differences based on Fisher’s exact tests with Bonferroni correction (p ≤ 0.05).\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/76c3da14b72659311187f9de.png"},{"id":91077495,"identity":"34a5e3bd-8dda-4aa9-8a3f-333a5f6cfbdf","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1030169,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance of the CNN model for classifying patients as normal or abnormal based on the proportion of slices predicted as abnormal. \u003cstrong\u003e(a)\u003c/strong\u003e Box-and-whisker plot showing that the proportion of slices predicted as abnormal is significantly higher in clinically abnormal patients than in normal patients (Mann–Whitney U test, p \u0026lt; 0.0001). \u003cstrong\u003e(b)\u003c/strong\u003e Receiver operating characteristic (ROC) curve illustrating how a varying threshold of slices predicted as abnormal affects the sensitivity and specificity at the patient level. The area under the curve (AUC) is 0.8508 (p \u0026lt; 0.0001), indicating robust model performance.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/03f321470f528d323d15b119.png"},{"id":91079275,"identity":"6f72a55e-a1b6-4312-85d1-ea80f9e1bc4c","added_by":"auto","created_at":"2025-09-11 11:24:19","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":7382889,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis of factors influencing the classification accuracy of the CNN model. \u003cstrong\u003e(a)\u003c/strong\u003e The effect sizes from ANCOVA reveal that diagnosis and institute are the strongest predictors of model performance. \u003cstrong\u003e(b)\u003c/strong\u003e Dot plot of the accuracy values stratified by diagnosis and institute, assessed by two-way ANOVA. Post hoc comparisons of diagnoses within each institute were performed using one-way ANOVA with pairwise post hoc tests with Bonferroni correction. The asterisk indicates a statistically significant difference (p \u0026lt; 0.05). \u003cstrong\u003e(c)\u003c/strong\u003e Scatter plot of the accuracy values stratified by weight and breed size. Due to the significant interaction in the ANCOVA, the regression analysis was split for small (yellow), medium (red), and large (green) dog breeds. A significant negative effect of weight on the accuracy of the CNN model was observed only in large breed dogs.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/9b35fbda0ecbc1a52236439d.png"},{"id":91149203,"identity":"15f79650-f294-4b04-a9ff-58cd8c958607","added_by":"auto","created_at":"2025-09-12 06:47:22","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":27312422,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/17d26216-ec01-4521-a7e3-e007766f194e.pdf"},{"id":91077492,"identity":"0ca3519d-9c5c-4544-b0c5-8ca7f5c53ab4","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":31895,"visible":true,"origin":"","legend":"","description":"","filename":"20250904SamiraAbaniSupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/014a38841a071fefaa18e90e.docx"},{"id":91077497,"identity":"ffa0d243-e189-46b8-9b83-e8aaf6296ece","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"tif","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":239603,"visible":true,"origin":"","legend":"","description":"","filename":"Sup.Figure1.tif","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/b889c26f01b28244051ab59e.tif"},{"id":91077535,"identity":"03e912d2-e428-4e2f-925f-0c7b7db17992","added_by":"auto","created_at":"2025-09-11 11:16:21","extension":"tif","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":256299,"visible":true,"origin":"","legend":"","description":"","filename":"Sup.Figure2.tif","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/4e2b9b4b76737220fabb8a2a.tif"},{"id":91077531,"identity":"2db15355-b6c7-4c74-9d96-a9275ce46f79","added_by":"auto","created_at":"2025-09-11 11:16:20","extension":"tif","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":262090,"visible":true,"origin":"","legend":"","description":"","filename":"Sup.Figure3.tif","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/2f65f594b2b2de9ebadebc99.tif"},{"id":91077500,"identity":"e49ee0bd-ef5e-4bde-9e32-1a36dafd445e","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"tif","order_by":12,"title":"","display":"","copyAsset":false,"role":"supplement","size":176308,"visible":true,"origin":"","legend":"","description":"","filename":"Sup.Figure4.tif","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/2ebc378eafbfce757806c479.tif"},{"id":91079280,"identity":"f840bf94-944a-4107-83da-9d534399bed1","added_by":"auto","created_at":"2025-09-11 11:24:19","extension":"tif","order_by":13,"title":"","display":"","copyAsset":false,"role":"supplement","size":1916809,"visible":true,"origin":"","legend":"","description":"","filename":"Sup.Figure5.tif","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/24052fc39ea693ff6ab553fb.tif"},{"id":91077511,"identity":"361a7d4c-f893-4759-b5ab-bfe2971dced4","added_by":"auto","created_at":"2025-09-11 11:16:19","extension":"tif","order_by":16,"title":"","display":"","copyAsset":false,"role":"supplement","size":1123299,"visible":true,"origin":"","legend":"","description":"","filename":"Sup.Figure6.tif","url":"https://assets-eu.researchsquare.com/files/rs-7537077/v1/9702e3ed4c2b39a9bd6fdbec.tif"}],"financialInterests":"Competing interest reported. Authors J. Janisch, A. Hansch, and M. Laue are employed by DOS Software-Systeme GmbH, a commercial developer of AI solutions. The employer, DOS Software-Systeme GmbH, had no role in the interpretation of the data or in the conclusions drawn from this study. S. Abani, J. Nessler, A. Hansch, and M. Laue were funded by the “Central Innovation Program for Small- and Medium-Sized Enterprises” of the German Federal Ministry for Economic Affairs and Climate Action (grant no. KK5066602LB). Additionally, we acknowledge financial support from the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation. Both funders had no influence on the study design, data collection and analysis, or the conclusions drawn in this paper. The remaining authors declare no competing interests.","formattedTitle":"CNN model-based image classification for canine brain MRI abnormalities","fulltext":[{"header":"Introduction","content":"\u003cp\u003eIn recent decades, technical innovations in veterinary diagnostic imaging, especially magnetic resonance imaging (MRI), have profoundly advanced clinical practice in veterinary neurology and neurosurgery [1]. MRI serves as a non-invasive, highly sensitive tool for detecting abnormalities, tracking disease progression, and evaluating treatment and post-surgical outcomes [2,3]. Serial MRI studies further enable in vivo monitoring of lesion progression in real time, making MRI an essential and versatile tool for the antemortem assessment of canine brain pathologies [3]. The demand for imaging services, not only in human but also in veterinary medicine, has outpaced the number of trained diagnostic imaging specialists, creating a widening gap between the availability of imaging technology and diagnostic expertise [1,4].\u003c/p\u003e\n\u003cp\u003eWhile the number of MRI units in universities and private practices continues to rise, the workforce of veterinary radiologists and neurologists has not increased at the same rate, posing challenges for maintaining high-quality image interpretation [1,5]. In recent years, convolutional neural networks (CNNs) have become a powerful tool in medical imaging for tasks such as classification, detection, localisation, segmentation, augmentation, and automated diagnosis [6-9].\u003c/p\u003e\n\u003cp\u003eCNNs are particularly effective, as they can automatically learn complex patterns and hierarchical features from image data by adjusting millions of parameters through optimisation [10,11]. The architecture of a CNN typically comprises three main components: the input layer, the hidden layers, and the output layer. The hidden layers include convolutional layers, which use filters to extract key features from images; pooling layers, which reduce dimensionality; and fully connected layers, which make final predictions. This ability to process and learn from visual data makes CNNs particularly valuable for medical imaging applications [11]. Numerous studies have demonstrated that CNNs can assist clinicians in a range of image analysis applications across different anatomical regions, achieving performance levels comparable to those of human experts [12-19]. While CNNs continue to advance in human medicine, improving accuracy, robustness, and generalisability, their adoption in veterinary diagnostic practice remains at an early stage [20,21].\u003c/p\u003e\n\u003cp\u003eIn 2018, several studies began to explore the potential of CNNs for analysing veterinary image diagnostic [22-28]. Among veterinary diagnostic imaging modalities, radiography is the most widely studied due to its common use in routine veterinary practice [20,29]. MRI, particularly in the context of neurological conditions, is an emerging area of increasing research interest but remains comparatively underexplored relative to radiography [20,29].\u003c/p\u003e\n\u003cp\u003eNiemeyer et al. (2024) fine-tuned a deep learning tool (VGG-16 network), originally developed for the automatic grading of human lumbar intervertebral disc (IVD) degeneration using the Pfirrmann scheme, to apply the same grading system to T2-weighted (T2w) midsagittal images of lumbar canine spines. The model achieved an average accuracy of 94.1%, a sensitivity of 85.2%, and a specificity of 96.3% for grades 1 to 5, demonstrating the potential to advance both veterinary care and human biomedical research [30].\u003c/p\u003e\n\u003cp\u003eIn another study, Biercher et al. (2021) demonstrated the potential of CNNs in differentiating thoracolumbar spinal cord diseases in dogs. The CNN model achieved a sensitivity of 90.80% and a specificity of 98.98% for intervertebral disc extrusion (IVDE) using T2w sagittal images; a sensitivity of 100% and a specificity of 95.10% for intervertebral disc prolapse (IVDP) using sagittal T1-weighted (T1w) images; and a sensitivity of 90.98% and a specificity of 90.12% for fibrocartilaginous embolism/acute non-compressive nucleus pulposus extrusion (FCE/ANNPE) using T2w transverse images. This study emphasises the potential of CNNs to provide second opinions in the assessment of spinal cord lesions on MRI, while also highlighting the challenges posed by limited training data for certain diagnoses [31].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSince 2018, the use of CNNs for analysing canine brain tumours using MRI has gained increasing attention in veterinary neuroimaging [25]. Banzato et al. (2018) utilised two neural networks (NNs), one based on AlexNet and the other a customised model, to predict the histopathological grading of canine meningiomas. Their preliminary study tested the models on ten randomly selected images, correctly classifying the meningioma grade in eight out of ten cases. This result underscores the potential of MRI-based tumour grading in guiding treatment decisions [25]. Later that year, Banzato et al. trained a CNN based on GoogLeNet to distinguish between meningiomas and gliomas on canine MRI images [23]. The model performed best on post-contrast T1w images, achieving an accuracy of 94%. Performance was slightly lower on pre-contrast T1w and T2w images, with respective accuracies of 91% and 90%. These findings further demonstrate the potential of deep learning models to assist in the diagnosis and classification of various tumour types [23].\u003c/p\u003e\n\u003cp\u003eWhile recent studies have highlighted the growing use of CNNs in veterinary imaging diagnostics, their application in triaging or enhancing diagnostic workflows for complex imaging modalities, such as brain MRI, remains limited. This is partly due to the lack of heterogeneous large datasets that adequately reflect real-world variability. The hypothesis of this study is that a CNN model trained on a heterogeneous brain MRI dataset can accurately predict whether a dog’s brain is normal or abnormal. To test this hypothesis, we: (1) annotated 79,310 MRI slices from 550 dogs across four institutions, comprising both unremarkable brains as well as brains with multiple diseases; (2) trained the customised CNN model, \u003cem\u003eSepNetDense\u003c/em\u003e, using the training and validation sets; and (3) evaluated the model’s performance on an independent test set.\u003c/p\u003e"},{"header":"Material and Methods","content":"\u003cp\u003e\u003cb\u003eDataset and Features.\u003c/b\u003e The dataset comprised T1w sequences acquired pre- and post-contrast in the transverse, sagittal, and dorsal planes. It included 241 patients with unremarkable brain imaging findings (e.g., extracranial aetiologies, idiopathic epilepsy, and paroxysmal dyskinesia), and 309 patients with remarkable imaging findings (e.g., neoplasms, inflammatory lesions, and other pathological abnormalities). Dogs were included if they had a complete clinical diagnosis, confirmed either by histopathology or in accordance with current clinical scientific consensus, which served as the ground truth.\u003c/p\u003e\u003cp\u003eBreed, age, weight, sex, presenting complaint, imaging findings, and final diagnoses were extracted from the electronic medical records of each patient. All MRI studies were reviewed by at least one veterinary neuroradiology expert. No animals were directly involved in this study, as all data were retrospectively collected from clinical cases that had previously undergone brain MRI for diagnostic purposes.\u003c/p\u003e\u003cp\u003e\u003cb\u003ePatient Selection.\u003c/b\u003e Electronic databases from two university referral centres, the Department of Small Animal Medicine and Surgery at the University of Veterinary Medicine Hannover, Germany (TiHo), and the William R. Pritchard Veterinary Medical Teaching Hospital at the University of California, Davis (UC Davis), were searched for dogs presenting with clinical signs of encephalopathy between April 2003 and June 2022. A total of 178 patients with MRI studies showing remarkable brain findings were included if they had a comprehensive clinical history and received a histopathological diagnosis from a board-certified pathologist, based on either post-mortem examination or stereotactic brain biopsy.\u003c/p\u003e\u003cp\u003eIn addition, 84 cases with histopathologically confirmed primary or secondary intracranial neoplasia, from the dataset of a previous study conducted at the Royal Veterinary College, Small Animal Referral Hospital, London (RVC), were included [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Furthermore, 45 cases of meningoencephalitis of unknown origin, with a final or presumptive diagnosis as reported in a previous study conducted at the Small Animal Hospital of the University of Glasgow (Glasgow), were incorporated [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eMRI studies were acquired at four university referral centres using clinical MRI scanners from Philips, GE Medical Systems, and Siemens. At TiHo, imaging was performed using 3.0 T scanners, including the Philips Achieva and Achieva dStream models. UC Davis used 1.5 T scanners from GE Medical Systems, including the Genesis Signa and Signa HDxt. Glasgow acquired scans using 1.5 T systems, including the Philips NT Intera and Siemens Magnetom Essenza. At RVC, studies were obtained using various 1.5 T scanners, including the Philips Achieva, Gyroscan, Intera, NT Intera, and the GE Genesis Signa. Different imaging protocols were applied at each institution. Differences in scanner models, magnet strength, and acquisition protocols across institutions were considered as part of the dataset\u0026rsquo;s heterogeneity.\u003c/p\u003e\u003cp\u003e\u003cb\u003eControl Group.\u003c/b\u003e The control group consisted of MR images from 241 dogs with unremarkable brain findings, obtained from four university referral centres. This group included dogs with conditions such as extracranial aetiologies; paroxysmal dyskinesia, diagnosed in accordance with the consensus statement of the European College of Veterinary Neurology (ECVN) [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]; and idiopathic epilepsy, classified as tier II according to the International Veterinary Epilepsy Task Force consensus statement [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eAs individual dogs may have undergone multiple MRI examinations on different dates at the referral centres, only one MRI study per case was included in the dataset. Follow-up MRI studies performed post-surgery or post-treatment were excluded. Approval for the use of medical data was obtained through the owner's informed consent prior to hospital admission, in accordance with each university\u0026rsquo;s institutional guidelines. The acquisition of MRI images was supervised by veterinary technicians from the diagnostic imaging units of the referral centres, in compliance with their respective animal welfare policies. All patients were imaged under general anaesthesia. MRI studies affected by severe motion artefacts or incomplete sequences that prevented diagnostic interpretation were excluded from the study dataset.\u003c/p\u003e\u003cp\u003e\u003cb\u003eDataset Preparation.\u003c/b\u003e All MR images were anonymised using Python\u0026trade; (Python\u0026trade;, version 3.11. Python Software Foundation, DE, USA) by removing metadata tags containing personal information [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003cb\u003eLabelling.\u003c/b\u003e All images were in Digital Imaging and Communications in Medicine (DICOM) format and analysed using available DICOM viewers: RadiAnt\u0026trade; (RadiAnt\u0026trade; DICOM Viewer for Windows, version 4.0.1. Poznań, Poland) or OsiriX MD (OsiriX MD, DICOM viewer for macOS, Pixmeo SARL, Geneva, Switzerland) [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. The DICOM files were extracted and subsequently converted into PNG format. MRI slices were labelled slice by slice using the online annotation platform V7 Darwin (V7 Labs. V7 Darwin, London, UK) [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eBrain MR images were annotated using the following tags, which served as the ground truth: \u003cb\u003e(a) normal\u003c/b\u003e, indicating the absence of detectable abnormalities and serving as a baseline for comparison; and \u003cb\u003e(b) pathological structural changes\u003c/b\u003e, including lesions, brain oedema, mass effect, ventricular abnormalities, midline shift, parenchymal loss, pathological contrast enhancement, and herniation. For binary classification, images labelled as \u003cem\u003enormal\u003c/em\u003e were categorised as normal, while those tagged with any pathological condition were classified as abnormal.\u003c/p\u003e\u003cp\u003e\u003cb\u003eExperimental Design and Data Split.\u003c/b\u003e To train, validate, and test the performance of the presented CNN, the dataset comprising 550 dogs was randomly divided into a training set, a validation set, and a test set. Randomisation was performed using MATLAB\u0026reg; (MATLAB\u0026reg; software, MathWorks Inc. version R2022b. Natick, MA, USA), without accounting for covariates such as age, weight, breed type, institution, number of sequences, or disease category [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. For data splitting, all slices and sequences from each patient were treated as a single unit, and the dataset was split at the patient level. The model was trained at the slice level, where normal brain MRI images consisted exclusively of normal slices, while remarkable cases contained both abnormal and normal slices, as the lesion was not present in every slice.\u003c/p\u003e\u003cp\u003e\u003cb\u003eTraining set\u003c/b\u003e: This set consisted of 444 dogs and included 39,208 normal and 11,960 abnormal images, enabling the model to learn and adjust its parameters based on these data. \u003cb\u003eValidation set\u003c/b\u003e: The validation set comprised 53 dogs, with 5,491 normal and 1,860 abnormal slices. It was used to fine-tune the model\u0026rsquo;s hyperparameters and to monitor performance during the training process. \u003cb\u003eTest set\u003c/b\u003e: The test set also consisted of 53 dogs and included 4,612 normal and 1,989 abnormal imaging slices. This set was reserved for final model evaluation, providing an unbiased estimate of the model\u0026rsquo;s performance and ensuring its ability to generalise to unseen data.\u003c/p\u003e\u003cp\u003e\u003cb\u003ePre-processing.\u003c/b\u003e The initial T1w image file sizes ranged from 179 bytes to 399.91 KB, with matrix dimensions varying between 144 \u0026times; 144 and 1024 \u0026times; 1024 pixels. Pixel dimensions were dependent on the scanner\u0026rsquo;s resolution and field of view at the time of acquisition. All T1w images (greyscale) were resized to 128 \u0026times; 128 pixels, and normalised during pre-processing. To increase data variability, augmentation techniques were applied, including random 90\u0026deg; rotations and horizontal or vertical flips.\u003c/p\u003e\u003cp\u003e\u003cb\u003eCNN model.\u003c/b\u003e Although well-established networks such as ResNets and DenseNets have demonstrated strong performance across a range of vision tasks, they performed suboptimal for the present application [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. Due to their high parameter count, these models tended to either overfit or underfit the data, failing to capture domain-specific features essential for robust classification [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. To address this limitation, a customised, optimised architecture, \u003cem\u003eSepNetDense\u003c/em\u003e, was developed for binary brain classification in the present study. \u003cem\u003eSepNetDense\u003c/em\u003e features a small parameter footprint, comprising a total of 76,626 parameters, of which 65,746 are trainable [\u003cspan additionalcitationids=\"CR44\" citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. The architecture of \u003cem\u003eSepNetDense\u003c/em\u003e is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe following classification metrics were calculated:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:Classification\\:Accuracy\\:=\\:(TP\\:+\\:TN)\\:/\\:(TP\\:+\\:TN\\:+\\:FP\\:+\\:FN)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:Precision\\:=\\:TP\\:/\\:(TP\\:+\\:FP)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:Recall\\:=\\:TP\\:/\\:(TP\\:+\\:FN)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:F1-score\\:=\\:2\\:\\times\\:\\:Precision\\:\\times\\:\\:Recall\\:/\\:(Precision\\:+\\:Recall)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eTP\u0026thinsp;=\u0026thinsp;true positives, TN\u0026thinsp;=\u0026thinsp;true negatives, FP\u0026thinsp;=\u0026thinsp;false positives, and FN\u0026thinsp;=\u0026thinsp;false negatives.\u003c/p\u003e\u003cp\u003eFrequency distributions of the predictions were compared using chi-square tests or Fisher\u0026rsquo;s exact tests and visualized as heatmaps using GraphPad Prism (Prism 8 for Windows, version 8.4.3, GraphPad Software, San Diego, CA, USA) [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe proportion of slices predicted to be normal or abnormal as compared to all slices at the patient level was assessed for normality using a Q-Q plot and Shapiro-Wilk tests, and analysed for differences between the groups using the Mann\u0026ndash;Whitney U test. Receiver Operating Characteristic (ROC) analysis was performed to assess classification performance and determine optimal thresholds, also using GraphPad Prism (Prism 8 for Windows, version 8.4.3, GraphPad Software, San Diego, CA, USA).\u003c/p\u003e\u003cp\u003eAn analysis of covariance (ANCOVA) was performed in SPSS (IBM SPSS Statistics for Windows, version 29.0.1.1, IBM Corp., Armonk, NY, USA) to examine and compare the effects of materials- and methods-related factors, covariates, and selected interactions on classification accuracy [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]. Levene\u0026rsquo;s test was applied to assess the assumption of homogeneity of variances. Effect sizes were reported using partial eta squared (ηₚ\u0026sup2;), interpreted as small (0.01\u0026ndash;0.059), medium (0.06\u0026ndash;0.13), and large (\u0026ge;\u0026thinsp;0.14), as described by Gravetter et al. [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eA two-way analysis of variance (ANOVA) with post hoc one-way ANOVAs and subsequent pair-wise post hoc tests with Bonferroni corrections was applied to assess the effects of the categorical factors diagnosis and institute on the CNN model\u0026rsquo;s accuracy using SPSS (IBM SPSS Statistics for Windows, version 29.0.1.1, IBM Corp., Armonk, NY, USA). Levene\u0026rsquo;s test was applied to assess the assumption of homogeneity of variances and normality was assessed using a Q-Q plot of standardized residues [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. The influence of the categorical factors institute, manufacturer, scanner model, field strength, diagnosis, breed size, skull conformation and gender on accuracy was analysed individually. Distributions were first assessed with Levene\u0026rsquo;s test, then groups were compared using one-way ANOVA or Kruskal-Wallis tests with Bonferroni post hoc corrections, or unpaired t-tests or Mann-Whitney U tests where applicable. Linear regression analyses were used to assess associations between the continuous factors and classification accuracy using GraphPad Prism (Prism 8 for Windows, version 8.4.3, GraphPad Software, San Diego, CA, USA). To evaluate the adequacy of the linear regression models, residual plots were examined.\u003c/p\u003e\u003cp\u003eIn general, statistical significance was defined as a \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026le;\u0026thinsp;0.05.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cb\u003eDataset.\u003c/b\u003e The study included the MRI sequences of 550 canine brains with pre- and post-contrast T1w images, collected across four referral centres. From these patients, a total of 79,310 MRI slices were annotated manually as either normal or abnormal, based on brain MRI findings. Slices affected by artefacts or considered to be of insufficient quality were excluded, resulting in a final dataset comprising 65,120 slices. The training set consisted of 444 patients, of which 205 had normal and 239 had abnormal brain findings. The validation and test sets each comprised 53 patients, with 18 normal and 35 abnormal patients.\u003c/p\u003e\u003cp\u003eIn the study dataset (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), the most represented diagnosis was the MRI-normal group (n\u0026thinsp;=\u0026thinsp;241), comprising patients with unremarkable brain MRI findings. This was followed by the neoplastic group (n\u0026thinsp;=\u0026thinsp;186) and the inflammatory group (n\u0026thinsp;=\u0026thinsp;113). The least represented was the other causes group (n\u0026thinsp;=\u0026thinsp;10), which included vascular, hereditary, and degenerative brain abnormalities.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eDistribution of diagnostic categories across the training, validation, and test sets (n\u0026thinsp;=\u0026thinsp;550 patients).\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDiagnostic Category\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTraining set\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eValidation set\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eTest set\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMRI-Normal\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e205 (46.17%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e18 (33.96%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e18 (33.96%)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNeoplastic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e143 (32.21%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e21 (39.62%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e22 (41.51%)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eInflammatory\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e89 (20.05%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e12 (22.64%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e12 (22.64%)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOther causes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e7 (1.58%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2 (3.77%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1 (1.89%)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTotal\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e444\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e53\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e53\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThe study dataset represented a heterogeneous mix of breeds (\u003cb\u003eSuppl. Table\u0026nbsp;1\u003c/b\u003e). The most represented breeds were crossbreeds (16%), followed by Labrador Retrievers (11%) and Boxers (5%). The mean age of the study group was 5.81\u0026thinsp;\u0026plusmn;\u0026thinsp;3.60 years, and the mean body weight was 22.55\u0026thinsp;\u0026plusmn;\u0026thinsp;14.06 kg. MRI studies included in the dataset were acquired using 1.5 T (n\u0026thinsp;=\u0026thinsp;309) and 3.0 T (n\u0026thinsp;=\u0026thinsp;241) scanners. The most commonly used manufacturers were Philips (n\u0026thinsp;=\u0026thinsp;363), GE Medical Systems (n\u0026thinsp;=\u0026thinsp;123), and Siemens (n\u0026thinsp;=\u0026thinsp;64).\u003c/p\u003e\u003cp\u003e\u003cb\u003ePerformance.\u003c/b\u003e Using the described test set (n\u0026thinsp;=\u0026thinsp;53; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), the performance of the CNN was evaluated in classifying individual MRI slices as either normal or abnormal. A total of 6,660 MRI slices were included in the analysis. Based on manual annotations serving as the reference standard, 73% (1,452/1,989) of abnormal slices and 74% (3,430/4,612) of normal slices were correctly classified, whereas 27% (537/1,989) of abnormal slices and 26% (1,182/4,612) of normal slices were incorrectly classified (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eBased on individual MRI slice classification, the model achieved a precision of 0.86, recall of 0.74, and F1-score of 0.80 for normal slices, and a precision of 0.55, recall of 0.73, and\u003c/p\u003e\u003cp\u003eF1-score of 0.63 for abnormal slices, with an overall accuracy of 0.74 (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance metrics\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eClass\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNormal\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.86\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.74\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.80\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAbnormal\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.55\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.73\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.63\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOverall accuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.74\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eFindings.\u003c/b\u003e The spatial distribution of classification errors (false positives and false negatives) was analysed slice-wise using heatmaps to visualize prediction patterns across all orientations in pre- and post-contrast T1w sequences after normalisation (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Across all three orientations; transverse (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea\u0026ndash;b), sagittal (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec\u0026ndash;d), and dorsal (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee\u0026ndash;f) T1w images of the normal patient subset are characterized by a high number of true negatives across slices. False positives appear as scattered patches along the slice axes, predominantly in the central portions, and vary between individual patients. In the abnormal subset, true positives dominate the central portions of the slice axes across all orientations, forming dense horizontal bands, while false negatives occur more sporadically in the same regions. By contrast, false positives and true negatives tend to cluster near the periphery of the anatomical axes, particularly at the sequence extremes. In dorsal pre- and post-contrast T1w images (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee\u0026ndash;f), true positive clusters in abnormal patients are generally shorter and less continuous than in other orientations. Error patterns in this view appear more fragmented, which might suggest localized deviations and greater uncertainty in the dorsal plane. To investigate whether the prediction outcomes (true positive, true negative, false positive, false negative) varied systematically along the anatomical axes, each MRI sequence was normalised slice-wise and divided into four equal quartiles (Q1\u0026ndash;Q4). Results showed that the distribution of the prediction categories differed significantly across the quartiles in all sequences (\u003cb\u003eSuppl. Table\u0026nbsp;2\u003c/b\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eSlice-wise accuracy across T1w sequences.\u003c/b\u003e The relationship between CNN slice-wise accuracy and MRI sequence type was examined. As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, a significant difference was observed in the accuracy achieved with the different sequences (Chi-square test, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026le;\u0026thinsp;0.001). Pre-contrast T1w sagittal images demonstrated the highest slice-wise accuracy (81.8%), significantly outperforming other sequences, including pre-contrast T1w transverse (71.2%), post-contrast T1w transverse (73.3%), post-contrast T1w sagittal (75.1%), post-contrast T1w dorsal (76.1%), and pre-contrast T1w dorsal sequence (69.7%).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eAssessment of the sensitivity and specificity of the CNN model at the patient level using receiver operating characteristic analysis (ROC).\u003c/b\u003e Since the CNN's predictions were based on individual MRI slices, the ROC curve was used to evaluate the model's performance at the patient level, determining the optimal threshold for classifying patients as abnormal or normal, based on the proportion of slices predicted to be abnormal slices among all slices.\u003c/p\u003e\u003cp\u003eThe proportion of slices predicted to be abnormal in the normal (n\u0026thinsp;=\u0026thinsp;18) and abnormal (n\u0026thinsp;=\u0026thinsp;35) patient groups in the test set was assessed using a Q-Q plot (\u003cb\u003eSuppl. Figure\u0026nbsp;1\u003c/b\u003e), and Shapiro\u0026ndash;Wilk tests, which indicated significant deviations from normality in both groups (normal \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.027; abnormal \u003cem\u003ep\u0026thinsp;=\u003c/em\u003e\u0026thinsp;0.003).\u003c/p\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea shows that the proportion of slices predicted to be abnormal within each patient differs between the groups. The median of the proportion of slices predicted to be abnormal was significantly higher in the abnormal group at 0.70, compared to 0.26 in the normal group (Mann\u0026ndash;Whitney U test \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.0001).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe ROC curve is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb. The area under the curve (AUC) was 0.85 (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.0001), indicating that the model reliably distinguishes between normal and abnormal patients.\u003c/p\u003e\u003cp\u003eAn optimal balance between sensitivity and specificity was achieved when a threshold of 51% predicted abnormal slices out of all slices was used to classify a patient as abnormal. This threshold resulted in a sensitivity of 83%, specificity of 78%, and a positive likelihood ratio of 3.7, corresponding to the highest overall accuracy of 80%.\u003c/p\u003e\u003cp\u003eA more conservative threshold of 64% maximised specificity to 94%, while maintaining a reasonable sensitivity of 66%. This resulted in the highest positive likelihood ratio of 11.8, with the model's accuracy at this threshold being 80%. Detailed performance metrics at each threshold are presented in \u003cb\u003eSuppl. Table\u0026nbsp;3.\u003c/b\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eIdentification of Parameters Influencing CNN Model Performance Using ANCOVA.\u003c/b\u003e In the next step, a univariate ANCOVA was conducted to evaluate the influence of materials- and methods-related parameters on the accuracy of the CNN model. The included categorical factors were institute (TiHo, UC Davis, RVC, Glasgow), breed size (small, medium, large), skull conformation (brachycephalic vs non-brachycephalic), sex (male, female), diagnosis (MRI-normal, neoplastic, inflammatory, other causes), and the quantitative covariates were age, weight, number of slices, and the number of available MRI sequences per case in the test set (\u003cb\u003eSuppl. Table\u0026nbsp;4\u003c/b\u003e). In addition, interactions between diagnosis and institute, breed size and weight, as well as sex and weight were examined. As each university referral centre employed distinct imaging protocols, which differed in MRI settings, equipment, and acquisition techniques, the factor \u0026lsquo;institute\u0026rsquo; was treated as a composite factor representing the combined influence of these imaging-related variations on model accuracy. Scanner model and magnetic field strength were not included in the ANCOVA due to multicollinearity with the variable \u0026ldquo;institute,\u0026rdquo; as each institute in our dataset used a unique scanner. Attempts to substitute \u0026ldquo;manufacturer\u0026rdquo; with \u0026ldquo;scanner model\u0026rdquo; did not resolve this issue, since each model was still exclusive to one institute. Therefore, \u0026ldquo;institute\u0026rdquo; was retained in the model to account for all scanner- and site-specific factors, allowing for interpretable results without redundancy.\u003c/p\u003e\u003cp\u003eThe corrected model was statistically significant (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, ηₚ\u0026sup2; = 0.80). Levene\u0026rsquo;s test rejected the assumption of variance homogeneity (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.021), likely due to the large number of factor levels relative to the total sample size (n\u0026thinsp;=\u0026thinsp;53). Since the Q\u0026ndash;Q plot showed only minor deviations from normality, the model was retained, and its results were interpreted with caution \u003cb\u003e(Suppl. Figure\u0026nbsp;2)\u003c/b\u003e. Effect sizes were calculated using partial eta squared (ηₚ\u0026sup2;) and interpreted as small, medium, or large, as mentioned in the Materials and Methods section (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea, the ANCOVA indicated that several parameters significantly influenced the accuracy of the CNN model. Main effects with large effect sizes were observed for breed size (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, ηₚ\u0026sup2; = 0.44), institute (p\u0026thinsp;=\u0026thinsp;0.001, ηₚ\u0026sup2; = 0.39), and diagnosis type (p\u0026thinsp;=\u0026thinsp;0.003, ηₚ\u0026sup2; = 0.35). Medium effect sizes were identified for body weight (p\u0026thinsp;=\u0026thinsp;0.050, ηₚ\u0026sup2; = 0.12). Significant interaction effects with large effect sizes were found between diagnosis and institute (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, ηₚ\u0026sup2; = 0.45), as well as between breed size and weight (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, ηₚ\u0026sup2; = 0.43). In contrast, sex, sex and weight interaction, skull conformation, age, number of sequences per case, and number of slices per case exhibited no significant effect on the CNN-models accuracy (p\u0026thinsp;\u0026gt;\u0026thinsp;0.05, ηₚ\u0026sup2; ranging from 0.065 to 0.004).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFollowing the identification of the main factors influencing model performance through ANCOVA, a two-way ANOVA was conducted to further investigate the effects of institute and diagnosis on model accuracy (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb). The analysis revealed significant main effects of institute (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) and diagnosis (p\u0026thinsp;=\u0026thinsp;0.007), as well as a significant interaction (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001), indicating that the effect of diagnosis on model performance was dependent on the institute. A pair wise post hoc comparison of the institutes reveals higher accuracy with TiHo data as compared to RVC and Glasgow (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001 and p\u0026thinsp;=\u0026thinsp;0.001), as well as higher accuracy with UC Davis data as compared to Glasgow (p\u0026thinsp;=\u0026thinsp;0.029). The two-way ANOVA has to be interpreted carefully, since Levene\u0026rsquo;s test rejected the assumption of variance homogeneity (\u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.014). However, visual inspection of the Q-Q plot of the standardized residuals shows approximate normality (\u003cb\u003eSuppl. Figure\u0026nbsp;3\u003c/b\u003e), and Levene\u0026rsquo;s test for all post hoc one-way ANOVAs did not reject the assumption of variance homogeneity (TiHo p\u0026thinsp;=\u0026thinsp;0.199; UC Davis p\u0026thinsp;=\u0026thinsp;0.148; RVC p\u0026thinsp;=\u0026thinsp;0.813; Glasgow p\u0026thinsp;=\u0026thinsp;0.075). Accordingly, one-way ANOVA revealed a significant effect of diagnosis in the TiHo subset (ANOVA, p\u0026thinsp;=\u0026thinsp;0.028), with MRI-normal cases achieving significantly higher accuracy than inflammatory cases (Bonferroni correction, p\u0026thinsp;=\u0026thinsp;0.031). Furthermore, a significant effect of diagnosis was found in the RVC cohort (ANOVA, p\u0026thinsp;=\u0026thinsp;0.040); however, pairwise post hoc tests could not be calculated due to the small number of cases in the \u0026lsquo;other causes\u0026rsquo; group. In contrast, no significant effects of diagnosis on accuracy were detected in the UC Davis (ANOVA, p\u0026thinsp;=\u0026thinsp;0.055) and Glasgow subsets (ANOVA, p\u0026thinsp;=\u0026thinsp;0.143).\u003c/p\u003e\u003cp\u003eDue to the significant interaction between weight and breed size identified in the ANCOVA, the effect of body weight on CNN classification accuracy was examined separately within each breed size category (small, medium, large) using linear regressions (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ec; residual plot in \u003cb\u003eSuppl Fig.\u0026nbsp;4)\u003c/b\u003e. In small breeds, no significant association was found between weight and accuracy (p\u0026thinsp;=\u0026thinsp;0.6006, R\u0026sup2; = 0.0165). Similarly, the association was not significant in medium breeds (p\u0026thinsp;=\u0026thinsp;0.3390, R\u0026sup2; = 0.0916). However, in large breeds, a significant negative association was observed (p\u0026thinsp;=\u0026thinsp;0.0186, R\u0026sup2; = 0.2471), indicating that higher body weight within the large breeds was associated with reduced CNN accuracy.\u003c/p\u003e\u003cp\u003eIn addition to the ANCOVA, the individual effects of categorical factors on CNN accuracy were examined using one-factor tests (\u003cb\u003eSuppl. Figure\u0026nbsp;5\u003c/b\u003e). Institution significantly affected accuracy, with TiHo cases showing higher accuracy than RVC and Glasgow (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.016; Kruskal-Wallis test, p\u0026thinsp;\u0026le;\u0026thinsp;0.001; \u003cb\u003eSuppl. Figure\u0026nbsp;5a\u003c/b\u003e). Accuracy was not significantly affected by manufacturer (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.022; Kruskal-Wallis test, p\u0026thinsp;=\u0026thinsp;0.119; \u003cb\u003eSuppl. Figure\u0026nbsp;5c\u003c/b\u003e). Scanner model had a significant effect, with Achieva showing higher accuracy than Intera and Magnetom Essenza (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.050; Kruskal-Wallis test, p\u0026thinsp;=\u0026thinsp;0.003; \u003cb\u003eSuppl. Figure\u0026nbsp;5e\u003c/b\u003e). Scanner field strength significantly influenced accuracy, with 3 T datasets showing higher accuracy than 1.5 T datasets (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.052; independent t-test, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001; \u003cb\u003eSuppl. Figure\u0026nbsp;5g\u003c/b\u003e). Accuracy was not significantly affected by breed size (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.131; ANOVA, p\u0026thinsp;=\u0026thinsp;0.400; \u003cb\u003eSuppl. Figure\u0026nbsp;5i\u003c/b\u003e), skull conformation (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.288; independent t-test, p\u0026thinsp;=\u0026thinsp;0.374; \u003cb\u003eSuppl. Figure\u0026nbsp;5k\u003c/b\u003e), sex (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.877; independent t-test, p\u0026thinsp;=\u0026thinsp;0.815; \u003cb\u003eSuppl. Figure\u0026nbsp;5m\u003c/b\u003e), or diagnosis (Levene\u0026rsquo;s test, p\u0026thinsp;=\u0026thinsp;0.001; Kruskal-Wallis test, p\u0026thinsp;=\u0026thinsp;0.848; \u003cb\u003eSuppl. Figure\u0026nbsp;5o\u003c/b\u003e).\u003c/p\u003e\u003cp\u003eSimple linear regression analyses were performed to assess whether CNN classification accuracy was individually influenced by age at MRI, weight at MRI, number of available sequences, and number of slices per case (\u003cb\u003eSuppl. Figure\u0026nbsp;6\u003c/b\u003e). The number of sequences demonstrated a positive linear relationship with accuracy (p\u0026thinsp;=\u0026thinsp;0.003, R\u0026sup2; = 0.162; \u003cb\u003eSuppl. Figure\u0026nbsp;6a\u003c/b\u003e). Similarly, the number of slices was positively linearly correlated with accuracy (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, R\u0026sup2; = 0.222; \u003cb\u003eSuppl. Figure\u0026nbsp;6b)\u003c/b\u003e. In contrast, no significant linear correlation was found for weight at MRI (p\u0026thinsp;=\u0026thinsp;0.839, R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.001; \u003cb\u003eSuppl. Figure\u0026nbsp;6e\u003c/b\u003e) and age at MRI (p\u0026thinsp;=\u0026thinsp;0.853, R\u0026sup2; = 0.001; \u003cb\u003eSuppl. Figure\u0026nbsp;6g\u003c/b\u003e).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003e CNN models are emerging as promising diagnostic tools in medical imaging, though their application in veterinary medicine remains in its early stages. In this study, we evaluated a CNN model to classify canine brain pre- and post-contrast T1w MRI studies as normal or abnormal, aiming to enable computer-assisted detection of brain lesions and improve reporting efficiency in clinical practice. Beyond evaluating overall performance, we examined the influence of biological, technical, and institutional factors on model accuracy. Our ANCOVA analysis revealed that lesion diagnosis, institutional setting, and breed size each had significant effects, all with large effect sizes, while body weight showed a medium effect. Significant interactions with large effect sizes were also observed between diagnosis and institute, as well as between breed size and weight. The model achieved its highest accuracy on the T1w pre-contrast sagittal sequence. Moreover, a spatial pattern was observed along the three anatomical axes, with higher performance in central brain regions and more classification errors in peripheral slices. These findings underscore the multifactorial nature of CNN performance and highlight the importance of dataset diversity, anatomical context, and sequence selection in developing robust and generalisable AI models for veterinary neurology.\u003c/p\u003e\u003cp\u003eThe model was trained on 444 MRI data sets collected from four veterinary referral centres, including a variety of breeds, ages, body weights, diagnoses, and MRI scanner types. This multi-institutional dataset was designed to reflect real-world variability and to support the development of a model capable of generalising across diverse clinical settings. Previous studies exploring CNNs in veterinary imaging have provided valuable insights; however, they were limited by small cohorts or single-centre dataset [\u003cspan additionalcitationids=\"CR23 CR24 CR25 CR26 CR27\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. In contrast, our study utilises a broader and more heterogeneous dataset that better represents the complexity of clinical practice.\u003c/p\u003e\u003cp\u003eThe customised CNN model presented in this study demonstrated a good ability to classify canine brain MRI slices as normal or abnormal. At the slice level, it achieved an overall accuracy of 74%, with comparable classification rates for normal (74%) and abnormal (73%) slices (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). As shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the model achieved a precision of 0.86 for normal slices, indicating that predictions of normality were generally reliable. In contrast, the lower precision for abnormal slices of 0.55 reflects a higher rate of false positives in this class. Nonetheless, recall values were similar for both classes (0.74 for normal and 0.73 for abnormal), indicating that the model successfully identified the majority of true instances. The corresponding F1-scores were 0.80 for normal and 0.63 for abnormal slices. This discrepancy may be attributed to the relatively limited dataset size compared with the complexity of the classification task, which may have constrained the model\u0026rsquo;s exposure to the full range of abnormal presentations [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. Furthermore, the heterogeneous and often subtle nature of abnormalities likely contributed to the increased false positive rate [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eTo date, few studies have used CNN models to evaluate canine brain MRI. Banzato et al. reported high classification accuracies in CNN-based canine brain MRI studies, achieving up to 94% when distinguishing between meningiomas and gliomas [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], and correctly predicting meningioma grade in 80% of cases [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. The discrepancy in accuracy between our study and previous applications of CNNs in veterinary neurology is likely attributable to differences in sample size and methodological design. Importantly, small sample sizes in neuroimaging studies employing machine learning have been shown to introduce substantial variability in performance estimates, thereby limiting the reliability of conclusions and the generalisability of findings [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. These studies focused on well-defined tumour types and employed smaller and simpler datasets, which may have simplified the classification task. In contrast, our model was trained on a larger, multi-institutional dataset that included a broader spectrum of diagnoses and greater variability in breed, MRI scanner type, and acquisition protocols. Hence, based on the ANCOVA results, the intentional inclusion of real-world variability in our experimental design is likely the primary factor contributing to the lower accuracy observed compared to the previous studies. Our analysis shows that the almost unlimited variety of possible diagnoses and the variability of the institutional framework conditions are the biggest factors influencing the performance of the CNN model. While this heterogeneity may result in the numerically lower performance as compared to the previously mentioned studies, the inclusion of real-life variability is a prerequisite for the development of practical applications, since it offers a more realistic representation of clinical diversity. Training on low-variability data sourced from a single institution carries the risk of poor reproducibility, as models may end up capturing dataset-specific artefacts rather than learning generalisable features [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. Zhang et al. (2016) showed that deep neural networks are capable of fitting random labels or noise, emphasising their potential to memorise data without extracting meaningful patterns [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. Consequently, models that demonstrate strong performance under narrowly controlled research settings often fail to translate to real-world scenarios, highlighting the need for large, heterogeneous datasets and thorough external validation to ensure robust generalisation. Ultimately, while this heterogeneity may result in numerically lower performance compared to the previous studies, incorporating real-life variability is essential for the development of practical applications, as it provides a more accurate reflection of clinical diversity [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe ANCOVA indicated that CNN classification performance was influenced by several biological, technical, and institutional factors, which may need to be considered as confounding factors in future studies (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). Imaging-related differences, including MRI protocols, manufacturer, scanner model, magnetic field strength, and acquisition techniques, summarised as 'institute,' along with the range of morphological changes represented by 'diagnosis', are shown in our results to be the major factors affecting CNN model performance. The CNN model performed best on cases from TiHo, and poorest on cases from Glasgow, reflecting the different number of cases from the institutes in the training dataset. Lower accuracy for a particular institute or diagnostic category indicates suboptimal performance on that subset, but does not imply that these cases are inherently more challenging for the CNN model. This finding aligns with previous studies that underscore the susceptibility of CNN-based models to domain shifts in medical imaging [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. Variations in patient demographics, imaging protocols, and institutional practices present significant challenges for cross-site generalisation, ultimately affecting the model\u0026rsquo;s ability to perform consistently across different clinical environments [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eA critical comparison of the ANCOVA results with the one-factor analyses shows that, although multiple factors individually influence accuracy, their effects are negligible compared to the main factors, which are institutional setting, diagnostic category, breed size, and weight, when the full experimental design is considered. Since CNN-based models learn patterns from the data they are exposed to, it is crucial to train them on diverse datasets from multiple institutes and a variety of diseases, ensuring the model can distinguish between different classes in the test set [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e, \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]. Recent research has shown that differences in patient populations, imaging methods, and institutional protocols can lead to substantial domain shifts, which consequently hinder the generalisability of CNN-based models across diverse clinical settings [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. To address this, training on diverse, multi-centre datasets that incorporate a wide range of pathologies is essential to achieve reliable model performance in independent, real-world environments [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn this study, a slice-by-slice approach was employed for CNN training, resulting in predictions made on individual MRI slices. To interpret these predictions at the patient level, we explored the optimal threshold for classifying a patient as abnormal or normal based on the proportion of slices predicted to be abnormal versus all slices within the respective patient. The results indicated that there is no single optimal threshold; rather, two thresholds (a balanced threshold of 51% and a more conservative 64%) may be applicable depending on the clinical context. For triage applications, where the model serves as a secondary reader or \u0026ldquo;second pair of eyes,\u0026rdquo; a more sensitive threshold of 51% may be preferable to minimise the risk of missing potential abnormalities. In contrast, for situations where high specificity is critical, such as to avoid unnecessary referrals or follow-ups, the 64% threshold may be more appropriate. Thus, the classification threshold should be adjusted according to the specific clinical objective.\u003c/p\u003e\u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, a pattern is observed in the distribution of prediction categories (true positive, true negative, false positive, false negative) along the anatomical axes of the slices within all sequences. This pattern is confirmed by statistical analysis, which revealed that the CNN model's performance varied significantly across quartiles (Q1\u0026ndash;Q4) within each MRI sequence. The true positive category is most prominent in the central region of the slice axes for abnormal patients, indicating that the model\u0026rsquo;s predictions are more stable in these mid-slice regions. One possible explanation is the greater anatomical homogeneity of central brain structures across patients, allowing the model to learn more generalisable and discriminative features. Furthermore, central slices contain a larger volume of brain tissue, providing richer spatial context, which may enhance model confidence and prediction accuracy.\u003c/p\u003e\u003cp\u003eIn contrast, a higher density of classification errors is observed near the extremes of the slice axes. This may be due to reduced structural information, increased anatomical variability, inconsistent signal characteristics, or a combination of biological and imaging-related factors. Peripheral slices typically contain less brain tissue and a greater proportion of non-neural structures, such as bone, cerebrospinal fluid, or air-filled spaces. These regions are more prone to partial volume effects and imaging artefacts, both of which can degrade feature quality and reduce the model\u0026rsquo;s predictive reliability.\u003c/p\u003e\u003cp\u003eBased on these findings, we concluded that the relative position of each slice along the anatomical axes influences prediction consistency, with this effect varying by sequence and orientation. To address these limitations, Kamnitsas et al. (2017) suggested excluding peripheral slices from training due to their limited diagnostic value. In their multi-scale 3D CNN for brain lesion segmentation, they excluded outer slices, citing poor information quality at the volume boundaries [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]. Furthermore, the implementation of a robust skull-stripping step, as described by Nour Eddin et al. (2023), prior to training could reduce the influence of non-brain structures [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]. However, excluding peripheral slices should be approached with caution, as these slices are routinely reviewed in clinical practice. Incorporating peripheral slices into model training might enhance robustness and better reflect the diagnostic process.\u003c/p\u003e\u003cp\u003eIt is worth noting that in our study, body weight was negatively associated with classification accuracy in large breeds, a pattern not observed in small or medium breeds. This is in contrast to findings in human neonatal MRI studies, where smaller brain volume in infants have been associated with reduced image quality and diagnostic performance, largely due to lower tissue contrast, increased partial volume effects, and technical limitations [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eWhile CNNs do not interpret images in the same way as human experts, they identify and weight statistical patterns in the input data that are associated with specific outputs (e.g., normal vs abnormal), without possessing any semantic understanding of anatomical structures such as a \u0026ldquo;lesion\u0026rdquo; or \u0026ldquo;brain\u0026rdquo; [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. As a result, misclassifications in peripheral slices or in large-breed dogs may arise from non-neural structures, signal artefacts, or anatomical variability that resemble features the model has learned to associate with abnormal cases. Similar findings have been reported in human imaging studies, where CNNs were shown to rely on spurious correlations rather than pathology-specific features [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]. Explainability tools such as Gradient-weighted Class Activation Mapping (Grad-CAM), which provide visual explanations by using the gradient of a target class flowing into the final convolutional layer to generate a coarse localisation map, may help determine whether the CNN is attending to biologically relevant brain regions or being misled by non-neural features such as bone, cerebrospinal fluid, or imaging artefacts in peripheral slices [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eSimple regression analyses showed that both the number of slices and available sequences per case positively affected accuracy, indicating that more imaging data leads to better classification performance. The model benefits from access to more data per patient, capturing additional spatial context and subtle abnormalities [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eAnother finding of this study was that the T1w pre-contrast sagittal sequence demonstrated significantly higher classification accuracy (81.8%) compared to other T1w sequences. This may be because the sagittal plane provides a clearer view of midline structures, particularly the ventricular system, which can exhibit subtle morphological changes associated with underlying pathology. In human medicine, similar observations were reported by Grigas et al. (2025) in a study distinguishing individual with mild cognitive impairment from cognitively normal controls. They found that sagittal T1w slices provided the highest diagnostic performance, followed by axial and coronal planes [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]. This was likely due to the presence of informative features in midline structures such as the corpus callosum, thalamus, and lateral ventricles, which are known to undergo structural and metabolic alterations in the early stages of cognitive decline [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eConsidering all aspects, the mentioned factors can be considered potential confounders and are important to acknowledge when designing, interpreting, or applying CNN models in a clinical context.\u003c/p\u003e\u003cp\u003e\u003cb\u003eLimitations.\u003c/b\u003e This study has several limitations, and key challenges remain to be addressed in future work. A primary limitation is the relatively small number of cases included. The reason for this was that we aimed to include only cases which had a very high diagnostic accuracy, e.g. all brain neoplasms included were histopathologically confirmed. This limited the dataset significantly, albeit using the dataset from large academic institutions. Although the training set was heterogeneous and sourced from multiple institutions, the total of 444 cases may be insufficient for training a complex deep learning model, potentially limiting the model\u0026rsquo;s ability to generalise across the full spectrum of pathological presentations observed in clinical practice [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. Nonetheless, the performance achieved can be considered acceptable given the size and diversity of the dataset, providing a valuable proof of concept for future research.\u003c/p\u003e\u003cp\u003eIn veterinary clinical practice, it is uncommon for completely healthy dogs without clinical signs to undergo brain MRI, making the acquisition of true normal controls challenging. As a result, the control group in this study included patients with Tier II idiopathic epilepsy, paroxysmal dyskinesia, and extracranial causes, rather than dogs with entirely normal neurological status. Although these dogs showed no structural brain abnormalities on MRI, the presence of underlying neurological disease could introduce subtle changes not detectable by expert visual assessment, potentially influencing model training and performance. Future studies would benefit from a broader and more representative control population to further enhance model reliability and generalisability.\u003c/p\u003e\u003cp\u003eAnother key limitation of this study is the exclusive use of T1w pre- and post-contrast images for model development. While T1w imaging is effective for detecting mass lesions and abnormalities associated with blood-brain barrier disruption, it is less sensitive to other important pathological changes. Conditions such as oedema, demyelination, mild inflammatory processes, early ischemic changes, and subtle structural abnormalities are often better visualised on T2w, FLAIR, or diffusion-weighted sequences [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e]. In fact, some abnormalities, particularly those without significant contrast enhancement, may have been underrepresented or missed during model training. The next phase of this study will focus on incorporating additional MRI sequences to enable more comprehensive detection of a wider range of brain pathologies.\u003c/p\u003e\u003cp\u003eWhile a direct comparison with veterinary radiologists was beyond the scope of this study, it will be essential for future validation and clinical integration of such models in real-world scenarios. Evaluating the model\u0026rsquo;s performance relative to expert clinicians is a critical next step to determine whether it can match or complement human expertise. These comparisons are vital for defining the model\u0026rsquo;s potential role in clinical practice, whether as a decision support tool, a triage system, or an aid for less experienced clinicians.\u003c/p\u003e\u003cp\u003eIn this study, slices with noise, motion artefacts, and incomplete imaging were removed during dataset preparation to ensure clean inputs for model training. However, in clinical practice, MRI studies submitted for automated analysis often contain motion artefacts, suboptimal image quality, or missing sequences. As a result, models developed under controlled research conditions may perform less reliably when applied to real-world clinical data. Future work should focus on enhancing model robustness to such variability by incorporating artefact detection modules, implementing quality control procedures, and training on more heterogeneous, imperfect datasets that better reflect clinical conditions.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eDespite the limitations mentioned, the present CNN model provides a foundation for further exploration of CNN-based methods in canine brain MRI classification. This study also highlights the value of using real-world, multi-institutional data with a broad range of diagnoses to train CNN-based models for clinical settings. Moreover, distinct patterns in classification performance along the anatomical axes highlight the need to use explainability tools to verify whether the CNN is focusing on biologically relevant brain regions. In the medical domain, such tools are essential to build trust, ensure clinical relevance, and identify potential biases in model predictions. The findings underscore the potential of deep learning to support diagnostic workflows in veterinary neuroimaging. Future work should prioritise collaborative data sharing, the use of larger and more balanced datasets, and rigorous external validation to enhance model generalisability and facilitate clinical integration.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003ch2\u003eCompeting interests\u003c/b\u003e\u003c/h2\u003e\u003cp\u003eAuthors J. Janisch, A. Hansch, and M. Laue are employed by DOS Software-Systeme GmbH, a commercial developer of AI solutions. The employer, DOS Software-Systeme GmbH, had no role in the interpretation of the data or in the conclusions drawn from this study. S. Abani, J. Nessler, A. Hansch, and M. Laue were funded by the \u0026ldquo;Central Innovation Program for Small- and Medium-Sized Enterprises\u0026rdquo; of the German Federal Ministry for Economic Affairs and Climate Action (grant no. KK5066602LB). Additionally, we acknowledge financial support from the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation. Both funders had no influence on the study design, data collection and analysis, or the conclusions drawn in this paper. The remaining authors declare no competing interests.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis work was funded by the Central Innovation Program for small- and medium-sized enterprises of the German Federal Ministry for Economic Affairs and Climate Action - grant number KK5066602LB. Open access publishing was supported by the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eH.V. supervised this study. H.V., S.A. and J.N. designed the experiments. P.J.D., S.D., R.G.Q., and E.M. contributed to data curation by providing raw MRI data and patient information. S.A. and A.S. annotated the cases. S.A. and F.S. extracted the patient data. S.A., S.T., S.GH., and R.U. analysed the data. S.A. wrote the first draft of the manuscript. M.L., A.H. and J.J. developed the CNN model. All authors reviewed the manuscript and approved the final version.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors acknowledge the radiologists and pathologists at the University of Veterinary Medicine Hannover, Hannover, Germany; the Royal Veterinary College, London, United Kingdom; the School of Veterinary Medicine, University of California, Davis, California, USA; and the Small Animal Hospital, School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom, for providing access to and permission to use the radiology and necropsy reports for this study. We also gratefully acknowledge L. B\u0026ouml;hringer and L. Lemke for assistance with patient selection, G. Lester, P. Hallur, D. Sanchez-Masian, and A. Wang Leonardo for their valuable consultation during the course of this study.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated during the current study is available from the corresponding author upon reasonable request.\u003c/p\u003e\u003cp\u003eNo animals were directly involved in this study, as all data were retrospectively collected from clinical cases that had previously undergone brain (MRI for diagnostic purposes. All procedures were performed as part of routine clinical care at four university veterinary referral centers: the Department of Small Animal Medicine and Surgery at the University of Veterinary Medicine Hannover (Germany), the William R. Pritchard Veterinary Medical Teaching Hospital at the University of California, Davis (USA), the Small Animal Referral Hospital at the Royal Veterinary College in London (UK), and the Small Animal Hospital of the University of Glasgow (UK). Approval for the use of medical data was obtained through the owner\u0026apos;s informed consent prior to hospital admission, in accordance with each university\u0026rsquo;s institutional guidelines. The acquisition of MRI data was supervised by veterinary technicians from the diagnostic imaging units of the referral centres, in compliance with their respective animal welfare policies. All patients were imaged under general anaesthesia.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGavin, P. R. Growth of clinical veterinary magnetic resonance imaging. \u003cem\u003eVet. Radiol. Ultrasound\u003c/em\u003e. \u003cb\u003e52\u003c/b\u003e, S2\u0026ndash;S4. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1740-8261.2010.01779.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1740-8261.2010.01779.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2011).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVickram, A. S., Infant, S. S., Priyanka \u0026amp; Chopra, H. AI-powered techniques in anatomical imaging: impacts on veterinary diagnostics and surgery. \u003cem\u003eAnn. Anat.\u003c/em\u003e \u003cb\u003e258\u003c/b\u003e, 152355. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.aanat.2024.152355\u003c/span\u003e\u003cspan address=\"10.1016/j.aanat.2024.152355\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWerring, D. J. et al. The pathogenesis of lesions and normal-appearing white matter changes in multiple sclerosis: a serial diffusion MRI study. \u003cem\u003eBrain\u003c/em\u003e \u003cb\u003e123\u003c/b\u003e, 1667\u0026ndash;1676. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/brain/123.8.1667\u003c/span\u003e\u003cspan address=\"10.1093/brain/123.8.1667\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2000).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThe Royal College of Radiologists. Clinical radiology workforce census report 2023. (2024). Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.rcr.ac.uk/media/4imb5jge/_rcr-2024-clinical-radiology-workforce-census-report.pdf\u003c/span\u003e\u003cspan address=\"https://www.rcr.ac.uk/media/4imb5jge/_rcr-2024-clinical-radiology-workforce-census-report.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDick White Referrals. Addressing the shortage of veterinary imaging specialists. (2020). Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.dickwhitereferrals.com/news/addressing-the-shortage-of-veterinary-imaging-specialists/\u003c/span\u003e\u003cspan address=\"https://www.dickwhitereferrals.com/news/addressing-the-shortage-of-veterinary-imaging-specialists/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEsteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. \u003cem\u003eNature\u003c/em\u003e \u003cb\u003e542\u003c/b\u003e, 115\u0026ndash;118. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/nature21056\u003c/span\u003e\u003cspan address=\"10.1038/nature21056\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAvendi, M., Kheradvar, A. \u0026amp; Jafarkhani, H. A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. \u003cem\u003eMed. Image Anal.\u003c/em\u003e \u003cb\u003e30\u003c/b\u003e, 108\u0026ndash;119. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.media.2016.01.005\u003c/span\u003e\u003cspan address=\"10.1016/j.media.2016.01.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGarcea, F., Serra, A., Lamberti, F. \u0026amp; Morra, L. Data augmentation for medical imaging: a systematic literature review. \u003cem\u003eComput. Biol. Med.\u003c/em\u003e \u003cb\u003e152\u003c/b\u003e, 106391. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.compbiomed.2022.106391\u003c/span\u003e\u003cspan address=\"10.1016/j.compbiomed.2022.106391\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBohmrah, M. K. \u0026amp; Kaur, H. Advanced hybridization and optimization of DNNs for medical imaging: a survey on disease detection techniques. \u003cem\u003eArtif. Intell. Rev.\u003c/em\u003e \u003cb\u003e58\u003c/b\u003e, 122. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10462-024-11049-x\u003c/span\u003e\u003cspan address=\"10.1007/s10462-024-11049-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMienye, I. D., Swart, T. G., Obaido, G., Jordan, M. \u0026amp; Ilono, P. Deep convolutional neural networks in medical image analysis: a review. \u003cem\u003eInformation\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 195. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/info16030195\u003c/span\u003e\u003cspan address=\"10.3390/info16030195\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYu, H., Yang, L. T., Zhang, Q., Armstrong, D. \u0026amp; Deen, M. J. Convolutional neural networks for medical image analysis: state-of-the-art, comparisons, improvement and perspectives. \u003cem\u003eNeurocomputing\u003c/em\u003e \u003cb\u003e444\u003c/b\u003e, 92\u0026ndash;110. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.neucom.2020.04.157\u003c/span\u003e\u003cspan address=\"10.1016/j.neucom.2020.04.157\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiang, G. \u0026amp; Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. \u003cem\u003eComput. Methods Programs Biomed.\u003c/em\u003e \u003cb\u003e187\u003c/b\u003e, 104964. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cmpb.2019.06.023\u003c/span\u003e\u003cspan address=\"10.1016/j.cmpb.2019.06.023\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNahiduzzaman, M., Islam, M. R. \u0026amp; Hassan, R. ChestX-ray6: prediction of multiple diseases including COVID-19 from chest X-ray images using convolutional neural network. \u003cem\u003eExpert Syst. Appl.\u003c/em\u003e \u003cb\u003e211\u003c/b\u003e, 118576. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.eswa.2022.118576\u003c/span\u003e\u003cspan address=\"10.1016/j.eswa.2022.118576\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi, L., Xu, M., Wang, X., Jiang, L. \u0026amp; Liu, H. Attention based glaucoma detection: a large-scale database and CNN model. \u003cem\u003eProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)\u003c/em\u003e, 10563\u0026ndash;10572; (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/CVPR.2019.01082\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2019.01082\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEltoukhy, M. M., Hosny, K. M. \u0026amp; Kassem, M. A. Classification of multiclass histopathological breast images using residual deep learning. \u003cem\u003eComput. Intell. Neurosci.\u003c/em\u003e 9086060; (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1155/2022/9086060\u003c/span\u003e\u003cspan address=\"10.1155/2022/9086060\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKhened, M., Kollerathu, V. A. \u0026amp; Krishnamurthi, G. Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. \u003cem\u003eMed. Image Anal.\u003c/em\u003e \u003cb\u003e51\u003c/b\u003e, 21\u0026ndash;45. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.media.2018.10.004\u003c/span\u003e\u003cspan address=\"10.1016/j.media.2018.10.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAl-masni, M. A., Kim, D. H. \u0026amp; Kim, T. S. Multiple skin lesions diagnostics via integrated deep convolutional networks for segmentation and classification. \u003cem\u003eComput. Methods Programs Biomed.\u003c/em\u003e \u003cb\u003e190\u003c/b\u003e, 105351. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cmpb.2020.105351\u003c/span\u003e\u003cspan address=\"10.1016/j.cmpb.2020.105351\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLee, S. M. \u0026amp; Kim, N. Deep learning model ensemble for the accuracy of classification degenerative arthritis. \u003cem\u003eComput. Mater. Continua\u003c/em\u003e. \u003cb\u003e75\u003c/b\u003e, 1981\u0026ndash;1994. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.32604/cmc.2023.035245\u003c/span\u003e\u003cspan address=\"10.32604/cmc.2023.035245\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSarwinda, D., Paradisa, R. H., Bustamam, A. \u0026amp; Anggia, P. Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. \u003cem\u003eProcedia Comput. Sci.\u003c/em\u003e \u003cb\u003e179\u003c/b\u003e, 423\u0026ndash;431. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.procs.2021.01.025\u003c/span\u003e\u003cspan address=\"10.1016/j.procs.2021.01.025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXiao, S. et al. Review of applications of deep learning in veterinary diagnostics and animal health. \u003cem\u003eFront. Vet. Sci.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 1511522. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fvets.2025.1511522\u003c/span\u003e\u003cspan address=\"10.3389/fvets.2025.1511522\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEzanno, P. et al. Research perspectives on animal health in the era of artificial intelligence. \u003cem\u003eVet. Res.\u003c/em\u003e \u003cb\u003e52\u003c/b\u003e, 40. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13567-021-00902-4\u003c/span\u003e\u003cspan address=\"10.1186/s13567-021-00902-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBanzato, T. et al. Use of transfer learning to detect diffuse degenerative hepatic diseases from ultrasound images in dogs: a methodological study. \u003cem\u003eVet. J.\u003c/em\u003e \u003cb\u003e233\u003c/b\u003e, 35\u0026ndash;40. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tvjl.2017.12.026\u003c/span\u003e\u003cspan address=\"10.1016/j.tvjl.2017.12.026\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBanzato, T., Bernardini, M., Cherubini, G. \u0026amp; Zotti, A. A methodological approach for deep learning to distinguish between meningiomas and gliomas on canine MR images. \u003cem\u003eBMC Vet. Res.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 317. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12917-018-1638-2\u003c/span\u003e\u003cspan address=\"10.1186/s12917-018-1638-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYoon, Y., Hwang, T. \u0026amp; Lee, H. Prediction of radiographic abnormalities by the use of bag-of-features and convolutional neural networks. \u003cem\u003eVet. J.\u003c/em\u003e \u003cb\u003e237\u003c/b\u003e, 43\u0026ndash;48. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tvjl.2018.05.009\u003c/span\u003e\u003cspan address=\"10.1016/j.tvjl.2018.05.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBanzato, T., Cherubini, G. B., Atzori, M. \u0026amp; Zotti, A. Development of a deep convolutional neural network to predict grading of canine meningiomas from magnetic resonance images. \u003cem\u003eVet. J.\u003c/em\u003e \u003cb\u003e235\u003c/b\u003e, 90\u0026ndash;92. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tvjl.2018.04.001\u003c/span\u003e\u003cspan address=\"10.1016/j.tvjl.2018.04.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDumortier, L., Gu\u0026eacute;pin, F., Delignette-Muller, M. L., Boulocher, C. \u0026amp; Grenier, T. Deep learning in veterinary medicine, an approach based on CNN to detect pulmonary abnormalities from lateral thoracic radiographs in cats. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e, 11418. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-022-14993-2\u003c/span\u003e\u003cspan address=\"10.1038/s41598-022-14993-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVinicki, K., Ferrari, P., Belic, M. \u0026amp; Turk, R. Using convolutional neural networks for determining reticulocyte percentage in cats. \u003cem\u003earXiv preprint arXiv\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1803.04873\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1803.04873\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018). :1803.04873.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim, H. J., Lee, S. H., Kim, H. J. \u0026amp; Lee, S. H. CNN-based diagnosis models for canine ulcerative keratitis. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 1\u0026ndash;10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-019-50437-0\u003c/span\u003e\u003cspan address=\"10.1038/s41598-019-50437-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePereira, A. et al. Artificial intelligence in veterinary imaging: an overview. \u003cem\u003eVet. Sci.\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e, 320. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/vetsci10050320\u003c/span\u003e\u003cspan address=\"10.3390/vetsci10050320\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNiemeyer, F. et al. Automatic grading of intervertebral disc degeneration in lumbar dog spines. \u003cem\u003eJOR Spine\u003c/em\u003e. \u003cb\u003e7\u003c/b\u003e, e1326. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/jsp2.1326\u003c/span\u003e\u003cspan address=\"10.1002/jsp2.1326\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBiercher, A. et al. Using deep learning to detect spinal cord diseases on thoracolumbar magnetic resonance images of dogs. \u003cem\u003eFront. Vet. Sci.\u003c/em\u003e \u003cb\u003e8\u003c/b\u003e, 8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fvets.2021.721167\u003c/span\u003e\u003cspan address=\"10.3389/fvets.2021.721167\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKaczmarska, A. et al. Postencephalitic epilepsy in dogs with meningoencephalitis of unknown origin: clinical features, risk factors, and long-term outcome. \u003cem\u003eJ. Vet. Intern. Med.\u003c/em\u003e \u003cb\u003e34\u003c/b\u003e, 808\u0026ndash;820. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/jvim.15687\u003c/span\u003e\u003cspan address=\"10.1111/jvim.15687\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchwartz, M., Lamb, C. R., Brodbelt, D. C. \u0026amp; Volk, H. A. Canine intracranial neoplasia: clinical risk factors for development of epileptic seizures. \u003cem\u003eJ. Small Anim. Pract.\u003c/em\u003e \u003cb\u003e52\u003c/b\u003e, 632\u0026ndash;637. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1748-5827.2011.01131.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1748-5827.2011.01131.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2011).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCerda-Gonzalez, S. et al. International veterinary canine dyskinesia task force ECVN consensus statement: terminology and classification. \u003cem\u003eJ. Vet. Intern. Med.\u003c/em\u003e \u003cb\u003e35\u003c/b\u003e, 1218\u0026ndash;1230. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/jvim.16108\u003c/span\u003e\u003cspan address=\"10.1111/jvim.16108\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Risio, L. et al. International veterinary epilepsy task force consensus proposal: diagnostic approach to epilepsy in dogs. \u003cem\u003eBMC Vet. Res.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 148. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12917-015-0462-1\u003c/span\u003e\u003cspan address=\"10.1186/s12917-015-0462-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2015).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePython Software Foundation. Python, version 3.11. \u003cem\u003eWilmington, DE, USA\u003c/em\u003e; Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.python.org/downloads/release/python-3110/\u003c/span\u003e\u003cspan address=\"https://www.python.org/downloads/release/python-3110/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMedixant \u003cem\u003eRadiAnt DICOM Viewer\u003c/em\u003e, version 4.0.1. \u003cem\u003ePoznań, Poland\u003c/em\u003e; Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.radiantviewer.com\u003c/span\u003e\u003cspan address=\"https://www.radiantviewer.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePixmeo, S. A. R. L. \u003cem\u003eOsiriX MD\u003c/em\u003e. FDA-cleared and CE-labeled DICOM viewer for macOS. \u003cem\u003eGeneva, Switzerland\u003c/em\u003e; Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.osirix-viewer.com/osirix/osirix-md/\u003c/span\u003e\u003cspan address=\"https://www.osirix-viewer.com/osirix/osirix-md/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eV7 Labs. \u003cem\u003eV7 Darwin: AI Data Labeling Platform\u003c/em\u003e. \u003cem\u003eLondon, UK\u003c/em\u003e; (2025). Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.v7labs.com\u003c/span\u003e\u003cspan address=\"https://www.v7labs.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThe MathWorks Inc. \u003cem\u003eStatistics and Machine Learning Toolbox\u003c/em\u003e, version R \u003cem\u003eNatick, MA, USA\u003c/em\u003e; (2022b). Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.mathworks.com/products/statistics.html\u003c/span\u003e\u003cspan address=\"https://www.mathworks.com/products/statistics.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHe, K., Zhang, X., Ren, S. \u0026amp; Sun, J. Deep residual learning for image recognition. \u003cem\u003eProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)\u003c/em\u003e, 770\u0026ndash;778; (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/CVPR.2016.90\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2016.90\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eIandola, F. N. et al. Densenet: implementing efficient ConvNet descriptor pyramids. \u003cem\u003earXiv preprint arXiv\u003c/em\u003e :1404.1869; (2014). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1404.1869\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1404.1869\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChollet, F. \u0026amp; Xception Deep learning with depthwise separable convolutions. In Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21\u0026ndash;26, 1800\u0026ndash;1807. IEEE Computer Society. (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/CVPR.2017.195\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2017.195\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHoward, A. G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. \u003cem\u003earXiv preprint arXiv\u003c/em\u003e :170404861. (2017).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSzegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. \u0026amp; Wojna, Z. Rethinking the inception architecture for computer vision. In \u003cem\u003eProc. IEEE Conf. Comput. Vis. Pattern Recognit.\u003c/em\u003e 2818\u0026ndash;2826 (Las Vegas, NV, USA, 27\u0026ndash;30 June 2016). (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/CVPR.2016.308\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2016.308\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGraphPad, S. \u003cem\u003eGraphPad Prism\u003c/em\u003e, Prism 8 for Windows, version 8.4.3. \u003cem\u003eSan Diego, CA, USA\u003c/em\u003e; Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.graphpad.com\u003c/span\u003e\u003cspan address=\"https://www.graphpad.com\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eIBM Corp. \u003cem\u003eIBM SPSS Statistics for Windows\u003c/em\u003e, version 29.0.1.1. \u003cem\u003eArmonk, NY, USA\u003c/em\u003e; Available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ibm.com/products/spss-statistics\u003c/span\u003e\u003cspan address=\"https://www.ibm.com/products/spss-statistics\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGravetter, F. J. \u0026amp; Wallnau, L. B. \u003cem\u003eStatistics for the Behavioral Sciences\u003c/em\u003e 10th edn (Cengage Learning, 2016).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHoffmann, G., Lichtinghagen, R. \u0026amp; Wosniok, W. Ein einfaches Verfahren zur Sch\u0026auml;tzung von Referenzintervallen aus routinem\u0026auml;\u0026szlig;ig erhobenen Labordaten. \u003cem\u003eLab. Med.\u003c/em\u003e \u003cb\u003e39\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1515/labmed-2015-0082\u003c/span\u003e\u003cspan address=\"10.1515/labmed-2015-0082\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2015).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBuda, M., Maki, A. \u0026amp; Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. \u003cem\u003eNeural Netw.\u003c/em\u003e \u003cb\u003e106\u003c/b\u003e, 249\u0026ndash;259. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.neunet.2018.07.011\u003c/span\u003e\u003cspan address=\"10.1016/j.neunet.2018.07.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVaroquaux, G. Cross-validation failure: Small sample sizes lead to large error bars. \u003cem\u003eNeuroImage\u003c/em\u003e 180, 68\u0026ndash;77 (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.neuroimage.2017.06.061\u003c/span\u003e\u003cspan address=\"10.1016/j.neuroimage.2017.06.061\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. New advances in encoding and decoding of brain signals.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang, C., Bengio, S., Hardt, M. \u0026amp; Recht, B. \u0026amp; Vinyals, O. Understanding deep learning requires rethinking generalization. \u003cem\u003earXiv preprint arXiv\u003c/em\u003e :161103530 (2017).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. \u003cem\u003ePLOS Med.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e, e1002683. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pmed.1002683\u003c/span\u003e\u003cspan address=\"10.1371/journal.pmed.1002683\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMahmutoglu, M. et al. Optimizing MRI sequence classification performance: insights from domain shift analysis. \u003cem\u003eEur. Radiol.\u003c/em\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00330-025-11671-5\u003c/span\u003e\u003cspan address=\"10.1007/s00330-025-11671-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGarrucho, L. et al. Domain generalization in deep learning-based mass detection in mammography: a large-scale multi-center study. \u003cem\u003eArtif. Intell. Med.\u003c/em\u003e \u003cb\u003e132\u003c/b\u003e, 102386. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.artmed.2022.102386\u003c/span\u003e\u003cspan address=\"10.1016/j.artmed.2022.102386\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHolzschuh, J. et al. The impact of multicentric datasets for the automated tumor delineation in primary prostate cancer using convolutional neural networks on 18F-PSMA-1007 PET. \u003cem\u003eRadiat. Oncol.\u003c/em\u003e \u003cb\u003e19\u003c/b\u003e, 106. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13014-024-02491-w\u003c/span\u003e\u003cspan address=\"10.1186/s13014-024-02491-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKamnitsas, K. et al. DeepMedic for brain tumor segmentation. In \u003cem\u003eBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries\u003c/em\u003e, 138\u0026ndash;149 (Springer International Publishing, 2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-319-55524-9_14\u003c/span\u003e\u003cspan address=\"10.1007/978-3-319-55524-9_14\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNour Eddin, J., Dorez, H. \u0026amp; Curcio, V. Automatic brain extraction and brain tissues segmentation on multi-contrast animal MRI. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 6416. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-023-33289-7\u003c/span\u003e\u003cspan address=\"10.1038/s41598-023-33289-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDubois, J. et al. MRI of the neonatal brain: a review of methodological challenges and neuroscientific advances. \u003cem\u003eJ. Magn. Reson. Imaging\u003c/em\u003e. \u003cb\u003e53\u003c/b\u003e, 1318\u0026ndash;1343. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/jmri.27192\u003c/span\u003e\u003cspan address=\"10.1002/jmri.27192\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuff, D., Weisman, A. \u0026amp; Jeraj, R. Interpretation and visualization techniques for deep learning models in medical imaging. \u003cem\u003ePhys. Med. Biol.\u003c/em\u003e \u003cb\u003e66\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1088/1361-6560/abcd17\u003c/span\u003e\u003cspan address=\"10.1088/1361-6560/abcd17\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDeGrave, A. J., Janizek, J. D. \u0026amp; Lee, S. I. AI for radiographic COVID-19 detection selects shortcuts over signal. \u003cem\u003emedRxiv\u003c/em\u003e (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2020.09.13.20193565\u003c/span\u003e\u003cspan address=\"10.1101/2020.09.13.20193565\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e Preprint.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSelvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. \u003cem\u003eInt. J. Comput. Vis.\u003c/em\u003e \u003cb\u003e128\u003c/b\u003e, 336\u0026ndash;359. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11263-019-01228-7\u003c/span\u003e\u003cspan address=\"10.1007/s11263-019-01228-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCoupet, M. et al. A multi-sequences MRI deep framework study applied to glioma classification. \u003cem\u003eMultimed Tools Appl.\u003c/em\u003e \u003cb\u003e81\u003c/b\u003e, 13563\u0026ndash;13591. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11042-022-12316-1\u003c/span\u003e\u003cspan address=\"10.1007/s11042-022-12316-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGrigas, O., Damaševičius, R. \u0026amp; Maskeliūnas, R. Multimodal convolutional mixer for mild cognitive impairment detection. \u003cem\u003eComput. Mater. Contin\u003c/em\u003e. \u003cb\u003e84\u003c/b\u003e, 1805\u0026ndash;1838. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.32604/cmc.2025.064354\u003c/span\u003e\u003cspan address=\"10.32604/cmc.2025.064354\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMai, W. \u003cem\u003eDiagnostic MRI in Dogs and Cats\u003c/em\u003e 1st edn (CRC, 2018).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Artificial intelligence (AI), Deep learning, Magnetic resonance imaging (MRI), Brain MRI, Abnormality detection, Diagnostic imaging","lastPublishedDoi":"10.21203/rs.3.rs-7537077/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7537077/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDiagnostic imaging represents one of the most promising clinical applications of artificial intelligence (AI) algorithms. The purpose of this study was to evaluate the customised convolutional neural network (CNN) \u003cem\u003eSepNetDense\u003c/em\u003e for distinguishing between normal and abnormal canine brain magnetic resonance images (MRIs), with the aim of enhancing diagnostic efficiency and assisting radiologists in identifying abnormal images.\u003c/p\u003e\u003cp\u003eThe dataset comprised T1-weighted (T1w) pre- and post-contrast sequences in transverse, sagittal, and dorsal planes from 550 dogs, collected from four universities. Dogs were included if they had a complete clinical diagnosis confirmed either through histopathology or in accordance with current clinical consensus. Patients were randomly divided into a training set (n\u0026thinsp;=\u0026thinsp;444), a validation set (n\u0026thinsp;=\u0026thinsp;53), and a test set (n\u0026thinsp;=\u0026thinsp;53). Each MRI was labelled on a slice-by-slice basis as normal or abnormal. The model was trained on 205 normal imaging datasets (e.g., extracranial aetiologies, idiopathic epilepsy, paroxysmal dyskinesia) and 239 abnormal ones (e.g., neoplasms, inflammatory lesions, other pathologies).\u003c/p\u003e\u003cp\u003eThe model correctly predicted 74% of the true normal slices in the test set as normal and 73% of the true abnormal as abnormal. A ROC analysis of the model\u0026rsquo;s prediction at the patient level revealed that, at a threshold of 51% abnormal slices per patient, the model reached an optimal balance of 83% sensitivity and 78% specificity, with a maximal accuracy of 80%. ANCOVA revealed that the CNN\u0026rsquo;s classification performance was influenced by multiple biological, technical, and institutional factors. Lesion diagnosis, institutional setting, and breed size had significant large effects, whereas body weight showed a significant medium effect. Significant interactions with large effect sizes were also observed between diagnosis and institute, as well as between breed size and weight. Additionally, a distinct pattern was observed in the distribution of prediction categories along the anatomical axes, with a trend towards better CNN performance in the central quartiles across all MRI sequences. The T1w pre-contrast sagittal sequence demonstrated the highest classification accuracy (81.8%) compared to the other T1w sequences. This study evaluates a CNN-based model designed to support a triage system for classifying canine brain MRI studies, with the aim of identifying abnormalities and improving reporting efficiency.\u003c/p\u003e","manuscriptTitle":"CNN model-based image classification for canine brain MRI abnormalities","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-11 11:16:14","doi":"10.21203/rs.3.rs-7537077/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-12-23T19:02:51+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-22T21:55:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"260920897027235497842001178938631144800","date":"2025-11-02T06:35:09+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-26T12:40:05+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"136651735023864230161167649988614752318","date":"2025-09-17T12:53:27+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-16T13:40:25+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-08T13:26:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-06T07:14:23+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-09-05T07:50:37+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-09-04T14:18:05+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a1d8a5e3-740a-41cb-8b0e-5328a4f58df0","owner":[],"postedDate":"September 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":54373535,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":54373536,"name":"Health sciences/Diseases"},{"id":54373537,"name":"Health sciences/Medical research"},{"id":54373538,"name":"Health sciences/Neurology"}],"tags":[],"updatedAt":"2026-05-20T06:38:20+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-11 11:16:14","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7537077","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7537077","identity":"rs-7537077","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0