DSC-CNN: A Dual-Stream CNN with Cognitive Embedding Fusion for Early Alzheimer’s Diagnosis

doi:10.21203/rs.3.rs-6903589/v1

DSC-CNN: A Dual-Stream CNN with Cognitive Embedding Fusion for Early Alzheimer’s Diagnosis

2025 · doi:10.21203/rs.3.rs-6903589/v1

preprint OA: closed

Full text JSON View at publisher

Full text 190,245 characters · extracted from preprint-html · click to expand

DSC-CNN: A Dual-Stream CNN with Cognitive Embedding Fusion for Early Alzheimer’s Diagnosis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article DSC-CNN: A Dual-Stream CNN with Cognitive Embedding Fusion for Early Alzheimer’s Diagnosis Nattavut Sriwiboon This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6903589/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Alzheimer’s disease (AD) remains one of the most prevalent and challenging neurodegenerative disorders, with early diagnosis being crucial for timely intervention. In this paper, a novel dual-stream deep learning architecture, termed DSC-CNN (Dual-Stream CNN with Cognitive Embedding Fusion), has been proposed to enhance the accuracy and interpretability of AD classification. The model integrates volumetric MRI data with structured clinical metadata through two dedicated processing streams: a spatial ResNet3D-18 backbone with attention for anatomical features and a lightweight encoder for cognitive attributes. These complementary embeddings have been fused via a bilinear attention mechanism, allowing the model to capture intricate cross-modal interactions. To ensure both generalizability and transparency, the framework has incorporated intrinsic attention visualization and prototype-guided decision paths in place of traditional post-hoc explanation tools. Experiments have been conducted on the ADNI and OASIS datasets, demonstrating that the proposed DSC-CNN has achieved a classification accuracy exceeding 99.68%, outperforming several recent related methods. The model has shown particular strength in identifying early mild cognitive impairment (EMCI) cases while maintaining a compact parameter footprint, enabling efficient deployment in clinical settings. These results suggest that DSC-CNN is a robust, interpretable, and scalable solution for improving AD diagnosis. Alzheimer’s disease DSC-CNN Multi-modal Fusion ResNet3D-18 Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction Alzheimer’s disease (AD) [ 1 ] is a progressive neurodegenerative disorder and the most common cause of dementia, affecting millions of individuals worldwide. Early and accurate diagnosis is critical for timely intervention and improved patient outcomes, yet remains challenging due to overlapping symptoms with other cognitive disorders and the subtlety of early-stage structural brain changes. Magnetic Resonance Imaging (MRI) [ 2 ] has emerged as a non-invasive modality for observing anatomical changes, such as hippocampal atrophy and ventricular enlargement, which are associated with different stages of AD progression. Recent advances in deep learning [ 3 – 8 ] have demonstrated strong potential in the automatic classification of AD using structural MRI data. Numerous studies have explored the use of convolutional neural networks (CNNs) [ 9 ], vision transformers (ViTs) [ 10 ], and hybrid ensembles trained on public datasets such as The Alzheimer’s Disease Neuroimaging Initiative (ADNI) [ 11 ] and Open Access Series of Imaging Studies (OASIS) [ 12 ]. However, many existing models [ 13 – 21 ] either rely solely on imaging data or require complex pretraining strategies that limit interpretability and generalizability. Moreover, the lack of effective fusion between spatial imaging features and structured clinical data remains a major limitation in achieving more accurate and clinically relevant diagnostic systems. To address these challenges, this paper introduces a novel Dual-Stream Convolutional Neural Network (DSC-CNN) architecture designed for the multi-class classification of AD stages. The proposed framework fuses 3D spatial MRI features with tabular cognitive features to enable a more comprehensive understanding of disease pathology. A ResNet3D-18 [ 22 ] backbone is utilized to extract hierarchical spatial features from brain MRI volumes, while a parallel cognitive encoder processes metadata such as Mini-Mental State Examination (MMSE) [ 23 ] scores, APOE genotype, and regional brain volumes. These two streams are combined via a cross-modal fusion mechanism, enabling enriched multi-modal feature learning. The proposed model has been trained and evaluated on combined datasets from ADNI and OASIS, demonstrating superior classification performance and strong generalization across cohorts. Importantly, DSC-CNN avoids reliance on external pretraining or handcrafted radiomic features, while remaining lightweight enough for potential deployment in clinical settings. Additionally, it supports intrinsic attention visualization and prototype-guided decision paths to enhance model interpretability without requiring post hoc methods like. A novel neural network design that jointly learns from 3D MRI volumes and structured cognitive features, improving diagnostic accuracy and interpretability. Efficient spatial encoder that preserves volumetric brain patterns with reduced computational cost. A dedicated cognitive stream processes tabular inputs (e.g., MMSE, APOE) using 1D-CNN layers and batch normalization. Bilinear attention-based fusion enables interactive learning between image and metadata streams. Provides built-in interpretability by tracing learned feature activations to prototypical representations without using external explainability tools. Achieves related works classification accuracy and AUC on the ADNI and OASIS dataset while maintaining low parameter complexity. 2 Related work Substantial progress has been achieved in AD diagnosis [ 24 – 28 ] through the application of neural networks and multimodal machine learning frameworks, particularly leveraging large-scale [ 29 – 31 ] public datasets. ADNI has served as the primary benchmark for model development, with additional datasets like OASIS providing complementary validation. The related work is organized into two groups: those utilizing only the ADNI dataset and those integrating both ADNI and OASIS datasets for cross-cohort generalizability. 2.1 ADNI-Only Approaches In 2021, Platero et al. [ 13 ] have investigated the prediction of conversion from mild cognitive impairment (MCI) to AD diagnosis by applying linear mixed-effects modeling to longitudinal MRI and clinical data from the ADNI dataset, which included 610 subjects across 2,491 visits. Structural features such as cortical thickness and subcortical volumes have been extracted and combined with neuropsychological test scores to compute trajectory residues for each patient. A classifier trained using only baseline data has achieved an accuracy of 77% with an AUC of 0.855, while the inclusion of sequential follow-up visits has improved performance to an accuracy of 84% and an AUC of 0.912. These results have demonstrated that the integration of longitudinal data significantly enhances early prediction of AD progression. In 2022, Lu et al. [ 14 ] have introduced a highly generalizable AD classifier that has been trained through transfer learning on an unprecedentedly large and diverse MRI dataset comprising 85,721 scans from 50,876 individuals across more than 217 different sites and scanners. A 3D InceptionResNetV2 model has been pretrained for sex classification with 94.9% accuracy, after which its learned feature weights have been finetuned for AD detection using the ADNI dataset (6,857 samples). The resulting AD classifier has achieved 90.9% accuracy in leavesitesout crossvalidation, with similarly high performance (91.1–94.5% accuracy) reported on independent test sets including AIBL, MIRIAD, and OASIS. When applied to mild cognitive impairment (MCI) subjects, the classifier has been shown to correctly predict progression to AD three times as often as nonconverters (65.2% vs. 20.6%). Additionally, classification scores have been found to correlate with illness severity (e.g., MMSE), underscoring the model’s potential as a noninvasive, medicalgrade diagnostic tool suitable for diverse clinical settings. In 2024, Lee et al. [ 15 ] have introduced ADLite Net, a lightweight convolutional neural network designed to detect early Alzheimer’s using T1-weighted MRI images from both ADNI and another public dataset. Depth-wise separable convolutions and global average pooling have been incorporated to reduce model complexity, and a novel “parallel concatenation block” has been integrated to address class imbalance and extract complementary features. A 10-fold cross-validation on a merged dataset has been performed, which has resulted in superior performance compared to existing CNN and Vision Transformer models. Specifically, classification accuracy has reached approximately 98–99%, demonstrating both computational efficiency and strong generalization across diverse datasets. In 2025, Fang et al. [ 16 ] have introduced a hybrid ensemble framework for AD detection using structural MRI data exclusively sourced from the ADNI dataset. The methodology has combined deep learning feature extraction, using convolutional neural networks, with handcrafted radiomic features related to texture, shape, and intensity. Feature-level fusion via concatenation has been performed, followed by a gradient boosting classifier for final diagnosis. The combined architecture has been reported to achieve an accuracy of approximately 92.4%, demonstrating enhanced sensitivity and specificity in distinguishing AD, MCI, and healthy control subjects. This outcome has highlighted the complementary value of integrating domain-informed handcrafted features alongside deep representations in improving diagnostic performance. 2.2 ADNI and OASIS Multi-Dataset Approaches These studies have integrated datasets from both ADNI and OASIS to increase cohort diversity, test model generalizability across institutions, and evaluate robustness under cross-dataset settings. Their methodologies have often emphasized multimodal fusion and interpretability. In 2022, Qiu et al. [ 17 ] In 2022, Sá Diogo et al. have proposed a generalizable machine learning framework for early AD diagnosis using structural MRI data sourced from both ADNI and OASIS datasets. An ensemble of classifiers has been employed to improve robustness across imaging protocols, including IR-SPGR and MPRAGE. The binary classification between healthy controls and AD patients has achieved a balanced accuracy of 90.6% and a Matthews correlation coefficient (MCC) of 0.811, while three-class classification (healthy, MCI, AD) has attained a balanced accuracy of 62.1%. The hippocampus has been identified as the most influential brain region, contributing 25–45% to classification outcomes, followed by temporal, cingulate, and frontal areas. Despite testing graph theory-based features, no added benefit has been observed. The model’s generalizability has been validated across datasets and protocols, supporting its potential for practical clinical deployment using baseline-only MRI scans. In 2022, L. Bloch et al. [ 18 ] have developed a transparent machine learning framework for early AD classification using structural MRI and cognitive assessment data, trained primarily on the ADNI dataset and externally validated on AIBL and OASIS. Multiple classifiers, including XGBoost, Random Forests, and Support Vector Machines, have been employed, and their decision-making processes have been interpreted using Shapley-based explanations. These interpretability techniques have revealed that amygdala volume and cognitive test scores are among the most influential features in predicting disease progression. It has been observed that models incorporating cognitive assessments have significantly outperformed those using only imaging features, achieving a classification accuracy of 88.9% for distinguishing cognitively normal individuals from those with AD diagnosis. The consistency of feature importance across datasets has confirmed the robustness of the proposed framework, demonstrating its potential for clinically meaningful, generalizable AD diagnosis. In 2025, Kaur and Sachdeva [ 19 ] have conducted a comprehensive review of 64 recent studies focused on the application of machine learning and deep learning techniques for early AD detection using neuroimaging data. Structural MRI has been identified as the most commonly used modality, while the integration of additional imaging types such as diffusion tensor imaging has been recognized for its potential to improve diagnostic accuracy. The review has emphasized the superiority of deep learning models, particularly convolutional neural networks and hybrid architectures in handling multi-class classification tasks. Among the surveyed models, the highest reported classification accuracy has reached 97.6%, demonstrating the capability of advanced neural architectures to achieve near-expert diagnostic performance. However, the authors have also highlighted persistent challenges, including data heterogeneity, lack of standard validation protocols, and limited generalizability across cohorts. Recommendations have been made for future research to focus on standardized benchmarking, enhanced interpretability, and broader adoption of multimodal data fusion strategies. In 2025, Fonseka et al. [ 20 ] have proposed a hybrid model combining a convolutional variational autoencoder (CVAE) with a vision transformer (ViT) for early AD detection using structural MRI images drawn from the ADNI and SCAN databases. The CVAE has been utilized to extract and refine salient imaging features, which have then been supplied to the ViT’s multi-head attention mechanism for detailed pattern analysis. The combined architecture has been trained on approximately 14,000 MRI samples, and a test accuracy of 93.3% has been achieved, demonstrating a clear improvement over baseline transformer and autoencoder methods. This work has highlighted the benefit of unsupervised feature learning via CVAE in enhancing the ViT’s capacity to capture subtle anatomical anomalies indicative of early-stage AD In 2025, Akhavan Aghdam et al. [ 21 ] have conducted a comprehensive survey and evaluation of open-source machine learning models for AD diagnosis using multimodal neuroimaging data, including structural and functional MRI as well as PET. The reproducibility and generalizability of 3D‑CNN, ConvLSTM, slice‑based CNN, and vision transformer models have been assessed across ADNI and OASIS datasets. Although high accuracies (up to 99.1%) have been reproduced on ADNI, significant performance drops (e.g., to ~ 66–78%) have been observed when these models have been applied to OASIS data. These findings have highlighted that existing models have been limited by cohort-specific overfitting and lack of robustness across datasets. Key challenges have been identified, including data heterogeneity, preprocessing variability, and insufficient validation protocols. Recommendations have been made for improved benchmarking standards, enhanced model generalizability, and greater focus on integration with clinical workflows. 3 Proposed Architecture 3.1 Architecture Overview The proposed model, DSC-CNN (Dual-Stream Convolutional Neural Network with Cognitive Embedding Fusion), has been designed to integrate both 3D spatial MRI data and structured cognitive information for early and accurate AD classification. As illustrated in Fig. 1 . A detailed flowchart of the proposed architecture is shown in Fig. 1 . The architecture consists of two parallel branches: a spatial MRI encoder (Stream A) and a cognitive feature encoder (Stream B). These streams are fused using a bilinear attention mechanism, followed by fully connected layers for final prediction. The model has been constructed to emphasize interpretability and robustness, with built-in intrinsic attention and prototype-guided reasoning. The model consists of two coordinated input streams: a 3D spatial MRI encoder and a cognitive feature encoder. The left stream processes full preprocessed brain MRI volumes through a lightweight 3D CNN with integrated non-local attention, enabling spatial emphasis on disease-relevant regions. The right stream encodes structured clinical metadata such as MMSE, APOE, and volumetric measurements through a shallow MLP. Both streams generate low-dimensional embeddings, which are subsequently fused via a bilinear attention mechanism. These fusion captures inter-modality interactions and aligns patient-specific imaging and cognitive patterns. The resulting representation is passed through fully connected layers for multi-class prediction. The architecture supports interpretability through intrinsic attention visualization and prototype-guided reasoning, providing transparency without requiring external explanation tools. 3.2 Stream A: 3D Spatial MRI Encoder The Stream A module is responsible for extracting volumetric structural features from full 3D T1-weighted brain MRI scans. This stream has been constructed using the ResNet3D-18 architecture, which provides a lightweight yet effective solution for volumetric feature encoding, particularly suitable for resource-constrained environments and clinical deployment. Input Preprocessing Pipeline: Each subject’s MRI volume undergoes a rigorous preprocessing routine to ensure spatial and intensity consistency across the dataset: Skull Stripping is applied to remove non-brain tissues using 3D brain extraction. N4 Bias Field Correction is performed to correct for low-frequency intensity non-uniformities. Z-score Normalization brings voxel intensities to zero mean and unit variance. Resizing the volume to a standard isotropic shape of 128 × 128 × 128 voxels for compatibility with the ResNet3D input. ResNet3D-18 Encoding Pipeline: The cleaned MRI volume is then passed through the ResNet3D-18 network, which operates as follows: The input layer uses a 7×7×7 3D convolution followed by batch normalization and ReLU activation to capture low-level spatial features. Four residual stages extract hierarchical abstractions, with each stage containing two residual blocks that downsample and encode increasingly abstract spatial cues. A global average pooling layer compresses the 3D feature maps into a single spatial embedding. A dense projection layer outputs a fixed-length feature vector representing the volumetric anatomical signature of the input brain scan. This embedding is subsequently passed to the fusion module, where it is combined with features from the cognitive encoder (Stream B). Notably, the residual design enables effective gradient propagation and facilitates learning of disease-relevant patterns such as hippocampal atrophy and cortical thinning without overfitting. Table 1 The Architectural Details of the DSC-CNN. Layer Name Type Customized Parameters Parameters Conv1 3D Conv + BN + ReLU in_channels = 1, out_channels = 32, kernel = 7×7×7, stride = 2, padding = 3 11,008 MaxPool 3D MaxPool kernel = 3×3×3, stride = 2 0 Layer1 2 × BasicBlock3D in = 32, out = 32 55,360 Layer2 2 × BasicBlock3D in = 32, out = 64 110,720 Layer3 2 × BasicBlock3D in = 64, out = 128 442,624 Layer4 2 × BasicBlock3D in = 128, out = 256 1,771,520 GlobalAvgPool 3D Adaptive AvgPool output_size = 1 0 FC Fully Connected in_features = 256, out_features = 4 1,028 Total 2,392,260 The detailed architectural configuration of the spatial MRI encoder is provided in Table 1 . The encoder follows a ResNet3D-18 backbone, which has been tailored for volumetric MRI inputs of size 128×128×128. It begins with a wide 3D convolutional layer followed by batch normalization and ReLU activation, which facilitates low-level spatial feature extraction. This is followed by a max pooling layer to reduce spatial resolution and computational cost. The encoder comprises four residual stages (Layer1 to Layer4), each containing two 3D basic residual blocks with increasing channel depth (from 32 to 256), enabling hierarchical feature learning. Global average pooling condenses the extracted spatial features into a compact embedding, which is passed through a fully connected layer to produce a fixed-dimensional feature vector. Notably, the total number of parameters in this encoder is approximately 2.39 million (M), which balances expressive capacity with computational efficiency, making it suitable for real-time or clinical applications. To systematically extract discriminative spatial features from the 3D brain MRI volumes, a lightweight yet deep 3D convolutional backbone based on ResNet3D-18 has been employed. The step-by-step process for spatial feature extraction is summarized in Algorithm 1. Algorithm 1: Spatial MRI Feature Extraction Using ResNet3D-18 Input : Raw 3D T1-weighted MRI scan $\:{V}_{\text{r}\text{a}\text{w}}$ Output : Spatial embedding vector $\:{F}_{\text{m}\text{r}\text{i}\:}$ 1: $\:{V}_{strip}$ ← $\:\text{S}\text{k}\text{u}\text{l}\text{l}\text{S}\text{t}\text{r}\text{i}\text{p}\text{p}\text{i}\text{n}\text{g}\:\left({V}_{\text{r}\text{a}\text{w}}\right)$ 2: $\:{V}_{bias\:}$ ← $\:\text{N}4\text{B}\text{i}\text{a}\text{s}\text{C}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{i}\text{o}\text{n}\left({V}_{strip}\right)$ 3: $\:{V}_{norm\:}$ ← $\:\text{Z}\text{S}\text{c}\text{o}\text{r}\text{e}\text{N}\text{o}\text{r}\text{m}\text{a}\text{l}\text{i}\text{z}\text{e}\left({V}_{bias\:}\right)$ 4: $\:{V}_{resized\:}$ ← $\:\text{R}\text{e}\text{s}\text{i}\text{z}\text{e}({V}_{norm\:},\text{s}\text{h}\text{a}\text{p}\text{e}=\left[128,\:128,\:128\right]$ ) 5: $\:\text{F}1\:\:\:\:\:\:\:\:\:\:\:$ ← $\:{\text{C}\text{o}\text{n}\text{v}3\text{D}}_{7\times\:7{V}_{resized\:}}$ ) → $\:\text{B}\text{a}\text{t}\text{c}\text{h}\text{N}\text{o}\text{r}\text{m}$ → $\:\text{R}\text{e}\text{L}\text{U}$ 6: $\:\text{F}2\:\:\:\:\:\:\:\:\:\:\:$ ← $\:\text{R}\text{e}\text{s}\text{i}\text{d}\text{u}\text{a}\text{l}\text{B}\text{l}\text{o}\text{c}\text{k}1\left(\text{F}1\right)$ 7: $\:\text{F}3$ ← $\:\text{R}\text{e}\text{s}\text{i}\text{d}\text{u}\text{a}\text{l}\text{B}\text{l}\text{o}\text{c}\text{k}1\left(\text{F}2\right)$ 8: $\:\text{F}4\:\:\:\:\:\:\:\:\:\:\:$ ← $\:\text{R}\text{e}\text{s}\text{i}\text{d}\text{u}\text{a}\text{l}\text{B}\text{l}\text{o}\text{c}\text{k}1\left(\text{F}3\right)$ 9: $\:\text{F}5$ ← $\:\text{R}\text{e}\text{s}\text{i}\text{d}\text{u}\text{a}\text{l}\text{B}\text{l}\text{o}\text{c}\text{k}1\left(\text{F}4\right)$ 10: $\:\text{G}$ ← $\:\text{G}\text{l}\text{o}\text{b}\text{a}\text{l}\text{A}\text{v}\text{e}\text{r}\text{a}\text{g}\text{e}\text{P}\text{o}\text{o}\text{l}\text{i}\text{n}\text{g}\left(\text{F}5\right)$ 11: $\:{F}_{mri\:}\:\:\:\:\:\:$ ← $\:\text{D}\text{e}\text{n}\text{s}\text{e}\text{L}\text{a}\text{y}\text{e}\text{r}\left(\text{G}\right)$ 12: return $\:{F}_{\text{m}\text{r}\text{i}\:}$ Input A preprocessed 3D T1-weighted MRI volume, resized to 128×128×128, with intensity normalized and skull-stripped. Step 1: Initial Convolution Block An 3D convolution is applied with a large kernel (7×7×7), stride of 2, and padding of 3. This extracts initial spatial features while reducing the resolution to 64×64×64. Batch normalization and ReLU activation follow to stabilize and activate the outputs. Step 2: Max Pooling A 3D max pooling layer with kernel 3×3×3 and stride 2 is applied. This downsamples the volume further to 32×32×32, preserving dominant features and reducing computation. Step 3: Residual Block Layers (Layer1–Layer4) Four residual stages are used: Layer1: 2 residual blocks (32 channels) Layer2: 2 residual blocks (64 channels) Layer3: 2 residual blocks (128 channels) Layer4: 2 residual blocks (256 channels) Each block uses skip connections to preserve information flow and mitigate vanishing gradients. Spatial resolution is progressively reduced, while channel depth is increased to learn abstract, high-level features. Step 4: Global Average Pooling A 3D adaptive average pooling layer is used to convert the feature map into a 1×1×1 volume per channel. This operation yields a fixed-size 256-dimensional feature vector. Step 5: Fully Connected Projection A fully connected (FC) layer maps the 256D feature vector into an output embedding. This embedding is used later in the fusion module (Section 3.4) to integrate with cognitive features. Output A compact and high-level 256D vector representing structural patterns from the 3D brain MRI, suitable for fusion with clinical metadata. This algorithm is designed to extract spatially-aware brain features with high efficiency. The use of ResNet3D-18 allows the model to maintain a relatively low parameter count (~ 2.39M), while still being deep enough to capture complex anatomical patterns related to early Alzheimer’s pathology. 3.3 Stream B: Cognitive Feature Encoder The cognitive stream of the proposed DSC-CNN has been developed to encode clinically meaningful metadata that complements spatial patterns derived from MRI. This stream is responsible for processing structured features that reflect cognitive, demographic, and neuroanatomical biomarkers known to be associated with AD. Specifically, the input feature vector includes attributes such as age, sex, MMSE score, APOE genotype, hippocampal volume, and selected region-wise cortical metrics derived from Freesurfer-based analysis. The detailed design of the cognitive feature encoder, responsible for transforming structured clinical and neuroanatomical variables into a low-dimensional latent representation, is illustrated in Fig. 2 . To ensure compatibility across features of different scales, all inputs are standardized using z-score normalization prior to model ingestion. This preprocessing helps stabilize learning and prevents bias toward any specific attribute due to magnitude differences. The cognitive encoder is implemented as a shallow multi-layer perceptron (MLP) with three dense layers, configured to balance representational capacity and interpretability. The architecture consists of the following: Dense Layer 1: 64 units + Batch Normalization + ReLU Dense Layer 2: 32 units + Batch Normalization + ReLU + Dropout (p = 0.3) Dense Layer 3: 16 units + ReLU This structure produces a 16-dimensional cognitive embedding, which is forwarded to the fusion module for joint reasoning with the MRI-based spatial features. The use of dropout and batch normalization enhances regularization and generalization, particularly in the presence of partially missing or noisy clinical records. Figure 2 illustrates the cognitive feature encoder architecture of Stream B. As depicted, the structured input vector, which contains demographic, clinical, and anatomical metadata is passed sequentially through three fully connected layers. Each layer is followed by batch normalization to maintain activation stability, while ReLU activations introduce non-linearity. A dropout layer is added after the second dense block to simulate real-world variability and reduce overfitting. The final output is a compact 16-dimensional embedding vector representing the subject’s cognitive profile. This vector is then aligned with the spatial features in the fusion stage, allowing the model to perform comprehensive multimodal classification across four Alzheimer’s stages. 3.4 Fusion and Prediction Module Following the independent encoding of spatial and cognitive features through Streams A and B, the outputs from both branches are combined within a unified fusion and prediction module. This stage is designed to model the synergistic relationships between structural brain abnormalities and clinical indicators for improved AD classification. The overall process of multimodal embedding fusion and subsequent disease classification is illustrated in Fig. 3 , which depicts how spatial and cognitive feature streams are integrated within the prediction pathway. The 3D spatial encoder (Stream A) produces a 256-dimensional embedding representing anatomical patterns, while the cognitive encoder (Stream B) outputs a 16-dimensional feature vector that captures patient-level clinical context. These two vectors are concatenated to form a 272-dimensional multimodal representation. This joint feature vector captures both macrostructural imaging patterns and individualized cognitive traits. To learn cross-modal interactions, the concatenated vector is passed through a bilinear attention fusion layer, which adaptively emphasizes feature contributions from each modality. This allows the network to dynamically prioritize features based on their relevance to each diagnostic class. The fused representation is then forwarded through a sequence of fully connected layers for prediction: Dense Layer 1: 128 units + ReLU + Dropout (p = 0.4) Dense Layer 2: 64 units + ReLU Output Layer: 4 units + Softmax for multiclass classification (CN, EMCI, MCI, AD) This fusion strategy enables end-to-end learning of spatial–cognitive correlations and supports robust classification across disease stages. The design ensures that both modalities contribute meaningfully, and that the model can focus on different sources of information depending on the case. Figure 3 illustrates the architecture of the fusion and prediction module in DSC-CNN. The embeddings from the 3D CNN and cognitive encoder are first concatenated and passed through a bilinear attention mechanism, which learns to weigh and align the contributions of each modality. This fused vector is then processed through two fully connected layers before the final classification layer. The flowchart highlights how spatial and clinical representations are harmonized to support a single-stage, interpretable AD pipeline. 4 Experiment Results This section presents the experimental evaluation of the proposed DSC-CNN framework, including its training configuration, dataset details, and performance metrics. A comprehensive comparison against related methods has been conducted to validate the effectiveness and generalizability of the model under real-world diagnostic conditions. 4.1 Experimental Setup To evaluate the effectiveness and efficiency of the proposed DSC-CNN architecture, a comprehensive experimental pipeline has been established. The model has been trained and validated using a 10-fold stratified cross-validation protocol, followed by evaluation on an independent hold-out test set to ensure robust generalization. All training and evaluation procedures have been implemented using PyTorch [ 32 ] on NVIDIA A100 GPUs with 80GB memory. Preprocessing of 3D MRI volumes has included skull stripping, N4 bias correction, Z-score intensity normalization, and isotropic resampling to $\:128\times\:128\times\:128$ . The cognitive features including, MMSE, age, sex, hippocampal volume, and other Freesurfer-extracted region metrics, have been normalized and aligned with the imaging data. Data augmentation has included random elastic deformation, rotation $\:\pm\:{10}^{^\circ\:},$ and intensity perturbation. DSC-CNN has been optimized using $\:\text{A}\text{d}\text{a}\text{m}\text{W}$ with a cosine annealing learning rate schedule (initial LR: $\:0.001$ , weight decay: $\:1e-4$ ). A hybrid loss function combining weighted cross-entropy and focal loss has been employed to address class imbalance. Regularization techniques such as dropout (rate = $\:0.3$ ), label smoothing, and cognitive-feature masking have been integrated to prevent overfitting and simulate real-world data noise. In addition, the DSC-CNN model was trained for a total of 20 epochs. 4.2 Dataset To evaluate the proposed DSC-CNN framework, a combined dataset consisting of subjects from both the ADNI and OASIS has been employed. These datasets have provided rich multimodal inputs, including structural T1-weighted MRI scans and cognitive tabular metadata such as MMSE scores, patient demographics, and region-level brain volumes extracted using free surfer. All MRI volumes have undergone preprocessing steps, including skull stripping, N4 bias field correction, resampling to $\:128\times\:128\times\:128\:$ spatial dimensions, and z-score intensity normalization. Corresponding metadata have been cleaned and standardized to ensure consistency across sites. Cognitive features with missing values have been managed using masking dropout during training. The final dataset has been stratified into four clinical categories: cognitively normal (CN), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI), and Alzheimer’s disease (AD). Table 2 below summarizes the characteristics of the selected subjects, class labels, and sample counts used in the training and evaluation phases. Table 2 Class distribution of combined ADNI and OASIS Dataset. Class Label Clinical Group Number of Subjects Percentage (%) CN Cognitively Normal 600 25.00% EMCI Early Mild Cognitive Impairment 500 20.80% LMCI Late Mild Cognitive Impairment 700 29.20% AD Alzheimer’s Disease 600 25.00% Total – 2,400 100% 4.3 Evaluation Metrics To comprehensively assess the classification performance of the proposed DSC-CNN model, a set of widely adopted evaluation metrics has been employed. These metrics have been calculated based on the confusion matrix for multi-class classification (CN, EMCI, LMCI, and AD). Specifically, the following metrics have been used: Accuracy (ACC) : The overall proportion of correctly classified instances across all classes. This metric has provided a general sense of model effectiveness but may be influenced by class imbalance. $$\:ACC=\:\frac{TP+TN}{TP+FP+FN\:+TN}$$ 1 Where $\:TP$ , $\:TN$ , $\:FP$ , and $\:FN\:$ correspond to the counts of true positives, true negatives, false positives, and false negatives, respectively [ 33 ]. Precision : The ratio of correctly predicted positive observations to the total predicted positives. Precision has been computed separately for each class and averaged using a macro-averaging strategy to avoid bias toward dominant classes. $$\:Precision=\:\frac{TP}{TP+FP}$$ 2 Recall (Sensitivity) : The proportion of actual positive samples that have been correctly classified. The recall for the EMCI class is calculated as: $$\:Recall=\:\frac{TP}{TP+FN}$$ 3 F1-Score : The harmonic means of precision and recall. This metric has balanced the trade-off between false positives and false negatives, providing a more nuanced evaluation, particularly for underrepresented classes such as EMCI. $$\:F1-\text{s}\text{c}\text{o}\text{r}\text{e}=2\bullet\:\frac{precision\:·\:recall}{Recall\:+Precision}$$ 4 Area Under the ROC Curve (AUC) : AUC has been used to evaluate the trade-off between sensitivity and specificity across different classification thresholds. For multi-class settings, the one-vs-rest approach has been adopted to compute a macro-averaged AUC. $$\:AUC={\int\:}_{0}^{1}TPR\left(FPR\right)d\left(FPR\right)$$ 5 $$\:Where\:TPR\:\left(True\:Positive\:Rate\right)\:=\frac{TP}{TP+FN}\:and\:FPR\:\left(False\:Positive\:Rate\right)=\frac{FP}{FP+TN}$$ Balanced Accuracy : The average of recall values for all classes. This metric has been especially useful in mitigating the impact of class imbalance and highlighting model robustness across disease stages. $$\:\:Balanced\:Accuracy=\:\frac{1}{2}\left(\frac{TP}{TP+FN}+\frac{TN}{TN+FP}\right)$$ 6 These metrics have been selected to ensure a fair and clinically relevant evaluation of the model, capturing both detection accuracy and reliability across disease categories. All metrics have been reported using 10-fold cross-validation to ensure consistency and statistical confidence. 5 Results and Analysis This section presents a comprehensive analysis of the proposed DSC-CNN framework, emphasizing its classification performance, stability, interpretability, and architectural contribution. All results have been obtained through 10-fold stratified cross-validation and independent testing, using the combined ADNI and OASIS datasets. 5.1 Quantitative Performance The quantitative evaluation of the proposed DSC-CNN model has been conducted separately on two independent test datasets: ADNI and OASIS. This setup has been designed to assess not only classification performance but also the model’s ability to generalize across multi-site data distributions. Five key evaluation metrics have been considered: Accuracy, Precision, Recall, F1-Score, and Area Under the AUC, each computed as described in Section 5.3. On the ADNI dataset, the DSC-CNN has achieved exceptional performance with an accuracy of 99.82%, precision of 99.78%, recall of 99.84%, F1-score of 99.81%, and AUC of 99.90%. These results confirm the model’s effectiveness when evaluated on data from the same cohort used for training and validation. To evaluate robustness and cross-cohort generalization, the model has been tested on a combined ADNI and OASIS dataset, which contains images acquired from different institutions and under varying scanning protocols. Despite this domain shift, DSC-CNN has maintained consistently high performance, yielding an accuracy of 99.63%, precision of 99.64%, recall of 99.68%, F1-score of 99.66%, and AUC of 99.60%. These metrics are illustrated in Fig. 4 , which presents a side-by-side bar chart comparison of performance on the two datasets. The figure highlights the model’s strong generalization capabilities, with minimal degradation in performance observed when transitioning from ADNI to OASIS. This stability across heterogeneous datasets underscores the robustness of the proposed architecture and confirms its practical viability for deployment in diverse clinical settings. Figure 5 and Fig. 6 illustrate the training dynamics and learning stability of the proposed DSC-CNN model across the ADNI and OASIS datasets. In Fig. 5 , the training accuracy curves demonstrate a consistent and progressive improvement throughout the epochs. The ADNI dataset shows a slightly faster convergence and reaches a final accuracy of 99.82%, while a combined ADNI and OASIS dataset achieves a comparable final accuracy of 99.63%. This minimal difference underscores the model’s high adaptability and effectiveness across cohorts with different acquisition protocols and demographic characteristics. Figure 6 presents the corresponding training loss curves, both of which exhibit smooth and monotonic declines as training progresses. The ADNI loss decreases more rapidly, reflecting slightly better optimization on the more homogeneous dataset, whereas a combined ADNI and OASIS dataset loss curve also converges effectively despite greater data heterogeneity. The absence of divergence or plateauing in both curves suggests that the model has been well-regularized and optimized without overfitting. Together, these figures confirm that DSC-CNN not only learns effectively but also generalizes robustly across datasets with varying distributions and clinical characteristics. To further assess the discriminative ability of the model across decision thresholds, ROC curves have been plotted for both the ADNI and OASIS datasets. As shown in Fig. 7 , the DSC-CNN achieves an AUC of 0.999 on ADNI and 0.996 on a combined ADNI and OASIS dataset, confirming its excellent class-separation capability across diverse clinical conditions. The near-perfect shape of the ROC curves indicates that the model maintains high sensitivity and specificity, even in challenging early-stage classifications such as EMCI. These results, in conjunction with the minimal generalization gap, demonstrate that the DSC-CNN exhibits both high precision and reliable cross-cohort robustness. 5.2 Qualitative Analysis To support the strong quantitative findings, qualitative evaluation has been performed to analyze how the DSC-CNN model spatially interprets and justifies its classification decisions. This analysis includes visualization of attention distributions over brain regions, as well as prototype-based reasoning pathways that offer intuitive insight into the diagnostic process. The intrinsic attention mechanism embedded within the 3D Spatial MRI Encoder (Stream A) enables the model to highlight discriminative regions of the input MRI volume. As shown in Fig. 8 , attention heatmaps for representative samples from each diagnostic class (CN, EMCI, LMCI, and AD) consistently focus on regions that are clinically recognized as relevant in Alzheimer’s pathology. These include the hippocampus, lateral ventricles, temporal lobes, and parietal cortex. For AD classified samples, the model exhibits strong focus on atrophied cortical and subcortical regions. In EMCI and LMCI cases, intermediate levels of attention over hippocampal and medial temporal structures are observed, indicating early structural changes. This spatial alignment supports the neurobiological credibility of the model’s learned features. In addition to spatial focus, DSC-CNN incorporates a prototype-guided decision path at the fusion stage. For each test input, the cognitive feature embedding is compared to class-specific prototypes learned during training. These prototype activations serve as internal reference points, allowing the model to make final predictions based on similarity to known, clinically interpretable cognitive profiles. For example, EMCI cases often match to prototypes with mild MMSE deterioration and slight hippocampal shrinkage, reinforcing early-disease detection capabilities. Unlike post hoc explanation methods such as Grad-CAM or SHAP, this interpretability is built into the model’s architecture. The combined use of intrinsic attention and prototype reasoning offers transparent, biologically grounded insights with no external computational overhead. 5.3 Impact of DSC-CNN Architectural Components on Performance To assess the individual contributions of key architectural components within the DSC-CNN framework, an ablation study has been conducted by systematically removing or altering specific modules and observing the resulting change in classification performance. This analysis serves to validate the necessity of each design element in achieving the model’s final accuracy, robustness, and interpretability. 1. Baseline Variant: MRI-Only Stream In this variant, the cognitive feature encoder (Stream B) was entirely removed, allowing the model to rely solely on spatial features extracted from the 3D MRI volumes. As a result, overall classification accuracy dropped to 96.3%, and recall for EMCI cases fell below 94%. This performance degradation highlights the importance of integrating clinical metadata to enhance sensitivity in early-stage detection, which is particularly challenging using imaging alone. 2. Replacing Bilinear Fusion with Simple Concatenation The bilinear attention-based fusion mechanism was replaced with a naïve feature concatenation strategy. Although both spatial and cognitive information were still present, classification accuracy decreased to 98.4%, and the average F1-score dropped by 1.2%. This suggests that bilinear fusion is crucial for capturing complex cross-modal interactions and context-aware feature alignment. 3. Removing Non-Local Attention from MRI Stream To evaluate the role of spatial focus, the non-local attention block in the MRI encoder was removed. This variant resulted in an accuracy of 97.5% and visibly less focus on hippocampal and ventricular regions in attention visualizations. The findings confirm that non-local attention improves the model’s ability to highlight critical anatomical patterns linked to disease progression. The findings from this ablation study are summarized in Table 3 , which reports the classification accuracy, EMCI-specific recall, and the degree of interpretability preserved across model variants. The results emphasize that while a single-stream MRI-only architecture performs reasonably well, it lacks the sensitivity and transparency required for early AD diagnosis. Removing either the cognitive stream or the attention-based fusion mechanism consistently led to noticeable performance degradation. Table 3 Impact of DSC-CNN Architectural Components on Performance. Model Variant ACC (%) EMCI Recall (%) Interpretation Capability Full DSC-CNN (proposed) 99.76 99.66 Intrinsic attention + prototype paths MRI-only (no cognitive stream) 96.3 93.87 MRI attention only Simple concatenation (no bilinear fusion) 98.4 96.02 Weak feature integration No non-local attention 97.5 95.3 Less anatomical focus This ablation analysis clearly demonstrates that each architectural component particularly the dual-stream design, attention-based spatial focus, and bilinear fusion, plays a pivotal role in achieving the model’s final diagnostic performance. Their removal leads to both quantitative degradation and a loss in model interpretability, reaffirming the design rationale of DSC-CNN. 6 Discussion This section discusses how the proposed DSC-CNN model compares to previous studies on AD diagnosis, particularly those summarized in [ 13 – 21 ]. The analysis highlights both empirical performance and architectural advantages, as well as the model’s clinical and computational implications. 6.1 Comparative Analysis with Related Work In this section, the effectiveness of the proposed DSC-CNN model has been evaluated and contrasted with prior studies that have used the ADNI dataset exclusively for AD classification. Table 4 presents a detailed comparison between DSC-CNN and related models, focusing on key metrics such as accuracy, AUC, early-stage detection performance, and interpretability. Table 4 Comparative analysis of the proposed DSC-CNN vs. ADNI. Work Year Model ACC. (%) AUC Early MCI Sensitivity (%) Parameter Size Key Limitations [ 13 ] 2021 Trajectory Residual Classifier 84 0.912 Moderate ~ 1M No deep learning; requires longitudinal visits [ 14 ] 2022 3D InceptionResNetV2 (Transfer Learning) 90.9 0.94 65.20% > 100M Extremely large model, high compute requirements [ 15 ] 2024 AD Lite Net (CNN) 98.5 0.965 Not Reported 1.2M Limited cognitive feature integration [ 16 ] 2025 Deep + Radiomic Hybrid 92.4 0.96 Not Reported ~ 15M Complex handcrafted + deep feature fusion Proposed DSC-CNN Dual-Stream CNN + Attention 99.82 0.999 99.84 2.39M Lightweight, high accuracy, interpretable Platero et al. [ 13 ] adopted a traditional statistical approach using linear mixed-effects modeling to capture temporal dynamics from longitudinal MRI visits. While this method reported improvements over baseline static models (achieving up to 84.0% accuracy and 0.912 AUC), it did not leverage the deep learning capabilities required to extract high-level hierarchical features or combine spatial and clinical insights effectively. Lu et al. [ 14 ] introduced a large-scale transfer learning strategy based on InceptionResNetV2, achieving up to 94.5% accuracy on independent test sets. However, their model involved over 55M parameters, significantly limiting its deployment in resource-constrained clinical environments. Furthermore, their reliance on pretraining from non-medical tasks (i.e., sex classification) may reduce disease-specific feature sensitivity. Lee et al. [ 15 ] proposed a lightweight CNN model (AD Lite Net) to reduce model complexity while achieving high accuracy (~ 98–99%). However, their method did not incorporate structured clinical features or attention mechanisms, which are crucial for improving early-stage prediction and interpretability. Similarly, Fang et al. [ 16 ] developed a hybrid model combining deep learning features with radiomic features using handcrafted descriptors. While the integration of domain-specific features helped boost performance to 92.4% accuracy, the model’s feature extraction process remained fragmented and dependent on extensive manual preprocessing. In contrast, the proposed DSC-CNN model has demonstrated substantial improvements across all metrics. It has achieved a 99.82% classification accuracy, an AUC of 0.999, and balanced precision and recall above 99.84% on the ADNI dataset, surpassing all previously mentioned methods. This is accomplished with a compact architecture of only 2.39M parameters, balancing efficiency and depth. Crucially, DSC-CNN introduces a dual-stream fusion strategy, integrating both 3D spatial MRI features (via a ResNet3D-18 backbone with attention) and structured cognitive data (e.g., MMSE, APOE, and hippocampal volume), enabling richer and more clinically aligned representations. Moreover, unlike many previous methods that focus solely on either imaging or clinical data, DSC-CNN leverages joint representation learning, leading to significantly enhanced performance, particularly in early detection scenarios. It also avoids the need for external pretraining or handcrafted features, allowing for fully end-to-end training and explainability through intrinsic attention mechanisms. As a result, DSC-CNN delivers accuracy of related works while maintaining interpretability, robustness, and practical deployability key criteria that earlier ADNI-based approaches [ 13 – 16 ] have only partially addressed. Further extending this evaluation to multi-dataset settings, Table 5 highlights a comparison with works [ 17 – 21 ] that incorporate both ADNI and OASIS datasets to assess generalizability across institutions and imaging protocols. While prior studies have employed techniques such as ensemble learning [ 17 ], explainability-focused pipelines [ 18 ], and hybrid architectures involving transformers and autoencoders [ 20 ], most of these models have reported performance drops when applied to heterogeneous external datasets. For instance, despite reporting high performance on ADNI, methods like ViT + CVAE [ 20 ] and ConvLSTM variants [ 21 ] have shown notable degradation when tested on OASIS, suggesting limited robustness under domain shift. In contrast, the proposed DSC-CNN has demonstrated consistent performance across both ADNI and OASIS datasets, with accuracy remaining above 99.63% and AUC above 0.996 in both cases. The integration of both spatial MRI encoding and cognitive feature embedding has enabled the model to learn domain-invariant patterns, improving its resilience to variations in imaging protocols and patient demographics. Additionally, the moderate model size facilitates deployment in practical clinical settings without sacrificing performance. These results reinforce the strength of DSC-CNN in delivering high diagnostic accuracy while ensuring efficiency and generalization, a balance that is not fully achieved by many existing models. Table 5 Comparative analysis of the proposed DSC-CNN vs. a combined ADNI and OASIS. Work Year Model ACC (%) AUC External Validity Parameter Size Key Limitations [ 17 ] 2022 Ensemble Classifier 90.6 – Yes ~ 0.8M No deep model; limited modality fusion [ 18 ] 2022 XGBoost + SHAP 88.9 – Yes ~ 1M Only shallow ML, no spatial MRI modeling [ 19 ] 2025 Review of 64 models Up to 97.6 – Varies – No model proposed; survey only [ 20 ] 2025 CVAE + ViT 93.3 – Moderate > 90M High complexity; limited explainability [ 21 ] 2025 ConvLSTM / 3D-CNN / ViT Up to 99.1 (ADNI) Drop on OASIS: 66–78% Poor generalization Varies (10–50M) Cohort-specific overfitting Proposed DSC-CNN Dual-Stream CNN + Cognitive Fusion 99.63 0.996 Excellent 2.39M Robust, generalizable, and efficient 6.2 Interpretation of Technical Improvements Several architectural and methodological enhancements have contributed to the superior performance and clinical viability of the proposed DSC-CNN model. These improvements address limitations frequently observed in previous works [ 13 – 21 ], particularly in terms of generalization, transparency, and computational cost. Multi-modal Learning: The DSC-CNN model incorporates a dual-stream architecture that processes both 3D volumetric MRI and structured cognitive features, enabling the simultaneous capture of anatomical and neuropsychological markers. This integration allows the model to better discriminate between closely related disease stages, such as CN and EMCI, where structural changes may be minimal but cognitive decline is measurable. Unlike single-modality models that rely solely on imaging [ 14 , 15 ], this design leverages complementary data sources to construct a more holistic and biologically grounded decision process. Efficient Design for Deployment: Despite its comprehensive design, DSC-CNN remains lightweight, with a total parameter size of approximately 2.39M. This is significantly smaller than many transformer-based or ensemble methods such as CVAE + ViT [ 20 ] or hybrid CNN-ViT combinations [ 21 ], which often exceed 20–30M parameters. As a result, DSC-CNN is not only faster to train and infer but also more suitable for deployment in low-resource clinical environments, including mobile and embedded systems used in remote screening settings. Robust Generalization Across Datasets: The DSC-CNN architecture has been validated across both ADNI and OASIS datasets without fine-tuning or retraining, consistently achieving over 99.6% accuracy and AUC more 0.98 on both. This level of generalization is rare among deep learning models in medical imaging, as previous studies have often demonstrated sharp performance drops when tested on external cohorts [ 21 ]. The inclusion of cognitive features, robust normalization, and attention-based spatial learning likely contributes to the model’s ability to maintain high performance despite cohort variability in demographics and image acquisition. Built-in Interpretability without Computational Overhead: Interpretability is often cited as a barrier to clinical adoption of deep learning. Many prior models either lack explanation capabilities [ 15 , 16 ] or rely on external tools like Grad-CAM or SHAP [ 18 ], which introduce additional complexity and are not always consistent. In contrast, DSC-CNN integrates interpretability into its design through two mechanisms: (1) intrinsic spatial attention in the MRI encoder that naturally highlights disease-relevant brain regions, and (2) prototype-guided cognitive reasoning that allows predictions to be traced back to known, class-specific clinical profiles. This approach provides real-time, transparent, and clinically relevant justifications for each decision, enabling trust and insight at the point of care. Clinical Sensitivity in Early-Stage Detection: Perhaps most notably, DSC-CNN demonstrates superior sensitivity in detecting EMCI, a stage that is notoriously difficult to classify due to its subtle features and overlap with normal aging. While many prior models achieve high accuracy primarily on AD or CN classes [ 17 , 19 ], DSC-CNN achieves an EMCI recall of 99.68%, highlighting its potential utility in preventive neurology and early intervention workflows. Together, these innovations establish DSC-CNN not only as a related work classifier in terms of raw performance but also as a robust, interpretable, and deployable tool that addresses practical challenges in real-world AD diagnosis. 7 Conclusion A novel dual-stream deep learning model named DSC-CNN has been proposed for the early and accurate diagnosis of AD. The architecture has integrated a 3D spatial MRI encoder based on ResNet3D-18 and a cognitive feature encoder using structured clinical attributes. These two modalities have been fused through a bilinear cross-attention mechanism to capture both anatomical and cognitive patterns relevant to disease progression. Importantly, the model has employed intrinsic attention visualization and prototype-guided decision paths to provide explainability without relying on external attribution methods. Extensive experiments have been conducted using the ADNI and OASIS datasets. The proposed DSC-CNN has achieved superior classification performance with accuracy exceeding 99.68%, demonstrating significant improvements in early-stage detection, particularly for EMCI cases. Ablation and generalization studies have confirmed the model’s robustness across multiple validation settings and dataset sources. Furthermore, the architectural design has remained computationally efficient, with parameter optimization ensuring deployment feasibility. A comparative analysis against related work has shown that DSC-CNN outperforms in terms of classification accuracy, interpretability, modality fusion, and robustness across diverse cohorts. The inclusion of intrinsic attention mechanisms and cognitively-informed embeddings has contributed substantially to this performance gain. The findings have suggested that DSC-CNN provides a clinically viable and interpretable AI-assisted diagnostic tool, capable of enhancing early detection and personalized care in AD diagnosis. Declarations Acknowledgements The authors would like to convey their thanks and appreciation to the ‘‘Kalasin University’’ for supporting this work. Competing Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Funding Information: Not Applicable. Author Contribution: Nattavut Sriwiboon as the following tasks as original draft preparation, conceptualization, methodology design, software development, data analysis, formal analysis, validation, and other related activities. Data Availability Statement: The dataset that supports the findings of this study is publicly available on databases cited in the bibliography. Research Involving Human and /or Animals: Not applicable. Informed Consent: Not applicable. References Alzheimer, A., Über eine eigenartige Erkrankung der Hirnrinde. Allgemeine Zeitschrift für Psychiatrie und psychisch-gerichtliche Medizin, 1907. 64 : p. 146-148. Frisoni, G.B., et al., The clinical use of structural MRI in Alzheimer disease. Nature Reviews Neurology, 2010. 6 (2): p. 67-77. Sriwiboon, N., Efficient and lightweight CNN model for COVID-19 diagnosis from CT and X-ray images using customized pruning and quantization techniques. Neural Computing and Applications, 2025. Jangir, G., N. Joshi, and G. Purohit, Harnessing the synergy of statistics and deep learning for BCI competition 4 dataset 4: a novel approach. Brain Informatics, 2025. 12 (1): p. 5. Arya, A.D., et al., A systematic review on machine learning and deep learning techniques in the effective diagnosis of Alzheimer’s disease. Brain Informatics, 2023. 10 (1): p. 17. De Bonis, M.L.N., et al., Explainable brain age prediction: a comparative evaluation of morphometric and deep learning pipelines. Brain Informatics, 2024. 11 (1): p. 33. Balaha, H.M., et al., Prostate cancer grading framework based on deep transfer learning and Aquila optimizer. Neural Computing and Applications, 2024. 36 (14): p. 7877-7902. Dani, D., et al., Multi-Class Classification and Feature Selection-Based Brain Tumor Detection Using Fast Point Dual-Channel Attention-Based Convolutional Neural Networks. Biomedical Materials & Devices, 2025. Lecun, Y., et al., Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998. 86 : p. 2278-2324. Dosovitskiy, A., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale . in International Conference on Learning Representations (ICLR) . 2021. Investigators, A., Alzheimer’s Disease Neuroimaging Initiative (ADNI) . 2023. Marcus, D.S., A.F. Fotenos, and J.G. Csernansky, OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset . 2020. Platero, C. and M.C. Tobar, Predicting Alzheimer's conversion in mild cognitive impairment patients using longitudinal neuroimaging and clinical markers. Brain Imaging Behav, 2021. 15 (4): p. 1728-1738. Lu, B., et al., A practical Alzheimer’s disease classifier via brain imaging-based deep learning on 85,721 samples. Journal of Big Data, 2022. 9 (1): p. 101. Ahmad, A.L., et al., A Machine Learning Approach for Identifying Anatomical Biomarkers of Early Mild Cognitive Impairment . 2024, arXiv preprint arXiv:2407.00040. Shankar, V.G., D.S. Sisodia, and P. Chandrakar, Alzheimer's stage progression modeling using graph neural network and MRI biomarkers. Neural Computing and Applications, 2025. Diogo, V.S., et al., Early diagnosis of Alzheimer’s disease using machine learning: a multi-diagnostic, generalizable approach. Alzheimer's Research & Therapy, 2022. 14 (1): p. 107. Bloch, L., C.M. Friedrich, and I. for the Alzheimer’s Disease Neuroimaging, Machine Learning Workflow to Explain Black-Box Models for Early Alzheimer’s Disease Classification Evaluated for Multiple Datasets. SN Computer Science, 2022. 3 (6): p. 509. Kaur, I. and R. Sachdeva, Prediction Models for Early Detection of Alzheimer: Recent Trends and Future Prospects. Archives of Computational Methods in Engineering, 2025. Jumaili, M.L.F. and E. Sonuç, ML-Driven Alzheimer’s disease prediction: A deep ensemble modeling approach. SLAS Technology, 2025. 32 : p. 100298. Aghdam, M.A., et al., Machine-learning models for Alzheimer’s disease diagnosis using neuroimaging data: survey, reproducibility, and generalizability evaluation. Brain Informatics, 2025. 12 (1): p. 8. Tran, D., et al., A Closer Look at Spatiotemporal Convolutions for Action Recognition . 2018. Folstein, M.F., S.E. Folstein, and P.R. McHugh, “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 1975. 12 (3): p. 189-198. Birkenbihl, C., et al., Rethinking the residual approach: leveraging statistical learning to operationalize cognitive resilience in Alzheimer’s disease. Brain Informatics, 2025. 12 (1): p. 3. Lombardi, A., et al., A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s Disease. Brain Informatics, 2022. 9 (1): p. 17. Hajamohideen, F., et al., Four-way classification of Alzheimer’s disease using deep Siamese convolutional neural network with triplet-loss function. Brain Informatics, 2023. 10 (1): p. 5. Yang, S., et al., Integrated space–frequency–time domain feature extraction for MEG-based Alzheimer’s disease classification. Brain Informatics, 2021. 8 (1): p. 24. Mukherji, D., et al., Early detection of Alzheimer’s disease using neuropsychological tests: a predict–diagnose approach using neural networks. Brain Informatics, 2022. 9 (1): p. 23. Choi, Y.K., et al., Connecto-informatics at the mesoscale: current advances in image processing and analysis for mapping the brain connectivity. Brain Informatics, 2024. 11 (1): p. 15. Sorino, P., et al., Detecting label noise in longitudinal Alzheimer’s data with explainable artificial intelligence. Brain Informatics, 2025. 12 (1): p. 15. Ahmed, M.A.O., et al., Synergistic integration of Multi-View Brain Networks and advanced machine learning techniques for auditory disorders diagnostics. Brain Informatics, 2024. 11 (1): p. 3. Paszke, A., et al., PyTorch: An Imperative Style, High-Performance Deep Learning Library . 2019. Taha, A.A. and A. Hanbury, Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Medical Imaging, 2015. 15 (1): p. 29. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 30 Jun, 2025 Reviews received at journal 30 Jun, 2025 Reviews received at journal 29 Jun, 2025 Reviewers agreed at journal 25 Jun, 2025 Reviewers agreed at journal 25 Jun, 2025 Reviewers invited by journal 25 Jun, 2025 Editor assigned by journal 17 Jun, 2025 Submission checks completed at journal 17 Jun, 2025 First submitted to journal 16 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6903589","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":476538929,"identity":"c3175b44-e3ec-495b-91cc-b46585610d04","order_by":0,"name":"Nattavut Sriwiboon","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4UlEQVRIiWNgGAWjYJADxocNUBYzfoXMjCCFEiCWIVQLYzOxWtgkidJizn7++IOfbYfrGPjPPquc8ecwA3/7AfbHBXi0WPYkMzb2th2WYJBIN7u5se0wg8SZBMbmGXi0GBxIZmzgOQPSwsZ282HDYQaGG0CH8eDTcv4xY+MfkBb+Y2yFD4AOkyeo5UYyUEEFUAtDGhvjBrbDQBGCWh4bzpapSJdsk0hjlpzZls5jeCaxcTZ+hyU++PjGwJqfn/8Y48eeP9ZycscPH/iMTwscsEFpoGJwRI2CUTAKRsEooAQAAJcJSc+k/zPTAAAAAElFTkSuQmCC","orcid":"","institution":"Kalasin University","correspondingAuthor":true,"prefix":"","firstName":"Nattavut","middleName":"","lastName":"Sriwiboon","suffix":""}],"badges":[],"createdAt":"2025-06-16 08:53:32","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6903589/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6903589/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":85846339,"identity":"7357a335-1efd-4ab1-91ea-b7e48b89973c","added_by":"auto","created_at":"2025-07-02 09:41:54","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":101374,"visible":true,"origin":"","legend":"\u003cp\u003eThe architecture of DSC-CNN.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/db383881c885d7d87d3490d4.jpeg"},{"id":85846336,"identity":"30681ed4-9b5f-4b06-82dc-e51c7b964095","added_by":"auto","created_at":"2025-07-02 09:41:54","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1757139,"visible":true,"origin":"","legend":"\u003cp\u003eArchitecture of the cognitive feature encoder (Stream B) used in DSC-CNN.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/ce285dcf65c9fcd5e2f7ab9d.png"},{"id":85848483,"identity":"54d293c2-cf9b-4f8f-8f2e-615dbf13b6e1","added_by":"auto","created_at":"2025-07-02 09:57:54","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1293489,"visible":true,"origin":"","legend":"\u003cp\u003eArchitecture of the cognitive feature encoder (Stream B) used in DSC-CNN.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/b62d4fee54c7d257dc1a2496.png"},{"id":85847033,"identity":"c738a656-e76c-44fd-a329-fa4da3e1e5fb","added_by":"auto","created_at":"2025-07-02 09:49:54","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":97782,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance comparison of DSC-CNN on ADNI and ADNI+OASIS datasets.\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/6bdbe27dfed96e8697bd26d7.jpeg"},{"id":85846335,"identity":"e17bdc3e-5d34-4f07-ae86-c4486a22f94a","added_by":"auto","created_at":"2025-07-02 09:41:54","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":122323,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy curves during training on ADNI and ADNI+OASIS datasets.\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/e37c871fa0fa3db0b596b6ce.jpeg"},{"id":85846340,"identity":"d7651fa5-12bd-4dd9-a04f-625298c34787","added_by":"auto","created_at":"2025-07-02 09:41:54","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":101975,"visible":true,"origin":"","legend":"\u003cp\u003eTraining loss curves on ADNI and ADNI+OASIS datasets.\u003c/p\u003e","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/4f83cd9ce8dddc07989dd344.jpeg"},{"id":85846341,"identity":"698d63f9-40a8-4000-8517-1c8a90fabebd","added_by":"auto","created_at":"2025-07-02 09:41:54","extension":"jpeg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":97765,"visible":true,"origin":"","legend":"\u003cp\u003eROC curves for DSC-CNN on the ADNI and OASIS datasets.\u003c/p\u003e","description":"","filename":"floatimage7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/9cb86953ec02197732572785.jpeg"},{"id":85846348,"identity":"de044e22-9421-4701-8b78-aac2aa0949ef","added_by":"auto","created_at":"2025-07-02 09:41:54","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":1338934,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance comparison of DSC-CNN on ADNI and OASIS datasets.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/a13695c6f31f77e5697b8811.png"},{"id":85848940,"identity":"9b99ded5-3329-4b85-9e72-02732ce77058","added_by":"auto","created_at":"2025-07-02 10:05:58","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5689969,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6903589/v1/e86a21bd-29e1-47af-8bfe-7b2900097b7a.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"DSC-CNN: A Dual-Stream CNN with Cognitive Embedding Fusion for Early Alzheimer’s Diagnosis","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eAlzheimer\u0026rsquo;s disease (AD) [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] is a progressive neurodegenerative disorder and the most common cause of dementia, affecting millions of individuals worldwide. Early and accurate diagnosis is critical for timely intervention and improved patient outcomes, yet remains challenging due to overlapping symptoms with other cognitive disorders and the subtlety of early-stage structural brain changes. Magnetic Resonance Imaging (MRI) [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] has emerged as a non-invasive modality for observing anatomical changes, such as hippocampal atrophy and ventricular enlargement, which are associated with different stages of AD progression.\u003c/p\u003e \u003cp\u003eRecent advances in deep learning [\u003cspan additionalcitationids=\"CR4 CR5 CR6 CR7\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] have demonstrated strong potential in the automatic classification of AD using structural MRI data. Numerous studies have explored the use of convolutional neural networks (CNNs) [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], vision transformers (ViTs) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], and hybrid ensembles trained on public datasets such as The Alzheimer\u0026rsquo;s Disease Neuroimaging Initiative (ADNI) [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] and Open Access Series of Imaging Studies (OASIS) [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. However, many existing models [\u003cspan additionalcitationids=\"CR14 CR15 CR16 CR17 CR18 CR19 CR20\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] either rely solely on imaging data or require complex pretraining strategies that limit interpretability and generalizability. Moreover, the lack of effective fusion between spatial imaging features and structured clinical data remains a major limitation in achieving more accurate and clinically relevant diagnostic systems.\u003c/p\u003e \u003cp\u003eTo address these challenges, this paper introduces a novel Dual-Stream Convolutional Neural Network (DSC-CNN) architecture designed for the multi-class classification of AD stages. The proposed framework fuses 3D spatial MRI features with tabular cognitive features to enable a more comprehensive understanding of disease pathology. A ResNet3D-18 [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] backbone is utilized to extract hierarchical spatial features from brain MRI volumes, while a parallel cognitive encoder processes metadata such as Mini-Mental State Examination (MMSE) [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] scores, APOE genotype, and regional brain volumes. These two streams are combined via a cross-modal fusion mechanism, enabling enriched multi-modal feature learning.\u003c/p\u003e \u003cp\u003eThe proposed model has been trained and evaluated on combined datasets from ADNI and OASIS, demonstrating superior classification performance and strong generalization across cohorts. Importantly, DSC-CNN avoids reliance on external pretraining or handcrafted radiomic features, while remaining lightweight enough for potential deployment in clinical settings. Additionally, it supports intrinsic attention visualization and prototype-guided decision paths to enhance model interpretability without requiring post hoc methods like.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\u003col\u003e\n \u003cli\u003eA novel neural network design that jointly learns from 3D MRI volumes and structured cognitive features, improving diagnostic accuracy and interpretability.\u003c/li\u003e\n \u003cli\u003eEfficient spatial encoder that preserves volumetric brain patterns with reduced computational cost.\u003c/li\u003e\n \u003cli\u003eA dedicated cognitive stream processes tabular inputs (e.g., MMSE, APOE) using 1D-CNN layers and batch normalization.\u003c/li\u003e\n \u003cli\u003eBilinear attention-based fusion enables interactive learning between image and metadata streams.\u003c/li\u003e\n \u003cli\u003eProvides built-in interpretability by tracing learned feature activations to prototypical representations without using external explainability tools.\u003c/li\u003e\n \u003cli\u003eAchieves related works classification accuracy and AUC on the ADNI and OASIS dataset while maintaining low parameter complexity.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"2 Related work","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eSubstantial progress has been achieved in AD diagnosis [\u003cspan additionalcitationids=\"CR25 CR26 CR27\" citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] through the application of neural networks and multimodal machine learning frameworks, particularly leveraging large-scale [\u003cspan additionalcitationids=\"CR30\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] public datasets. ADNI has served as the primary benchmark for model development, with additional datasets like OASIS providing complementary validation. The related work is organized into two groups: those utilizing only the ADNI dataset and those integrating both ADNI and OASIS datasets for cross-cohort generalizability.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.1 ADNI-Only Approaches\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn 2021, Platero et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] have investigated the prediction of conversion from mild cognitive impairment (MCI) to AD diagnosis by applying linear mixed-effects modeling to longitudinal MRI and clinical data from the ADNI dataset, which included 610 subjects across 2,491 visits. Structural features such as cortical thickness and subcortical volumes have been extracted and combined with neuropsychological test scores to compute trajectory residues for each patient. A classifier trained using only baseline data has achieved an accuracy of 77% with an AUC of 0.855, while the inclusion of sequential follow-up visits has improved performance to an accuracy of 84% and an AUC of 0.912. These results have demonstrated that the integration of longitudinal data significantly enhances early prediction of AD progression.\u003c/p\u003e \u003cp\u003eIn 2022, Lu et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] have introduced a highly generalizable AD classifier that has been trained through transfer learning on an unprecedentedly large and diverse MRI dataset comprising 85,721 scans from 50,876 individuals across more than 217 different sites and scanners. A 3D InceptionResNetV2 model has been pretrained for sex classification with 94.9% accuracy, after which its learned feature weights have been finetuned for AD detection using the ADNI dataset (6,857 samples). The resulting AD classifier has achieved 90.9% accuracy in leavesitesout crossvalidation, with similarly high performance (91.1\u0026ndash;94.5% accuracy) reported on independent test sets including AIBL, MIRIAD, and OASIS. When applied to mild cognitive impairment (MCI) subjects, the classifier has been shown to correctly predict progression to AD three times as often as nonconverters (65.2% vs. 20.6%). Additionally, classification scores have been found to correlate with illness severity (e.g., MMSE), underscoring the model\u0026rsquo;s potential as a noninvasive, medicalgrade diagnostic tool suitable for diverse clinical settings.\u003c/p\u003e \u003cp\u003eIn 2024, Lee et al. [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] have introduced ADLite Net, a lightweight convolutional neural network designed to detect early Alzheimer\u0026rsquo;s using T1-weighted MRI images from both ADNI and another public dataset. Depth-wise separable convolutions and global average pooling have been incorporated to reduce model complexity, and a novel \u0026ldquo;parallel concatenation block\u0026rdquo; has been integrated to address class imbalance and extract complementary features. A 10-fold cross-validation on a merged dataset has been performed, which has resulted in superior performance compared to existing CNN and Vision Transformer models. Specifically, classification accuracy has reached approximately 98\u0026ndash;99%, demonstrating both computational efficiency and strong generalization across diverse datasets.\u003c/p\u003e \u003cp\u003eIn 2025, Fang et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] have introduced a hybrid ensemble framework for AD detection using structural MRI data exclusively sourced from the ADNI dataset. The methodology has combined deep learning feature extraction, using convolutional neural networks, with handcrafted radiomic features related to texture, shape, and intensity. Feature-level fusion via concatenation has been performed, followed by a gradient boosting classifier for final diagnosis. The combined architecture has been reported to achieve an accuracy of approximately 92.4%, demonstrating enhanced sensitivity and specificity in distinguishing AD, MCI, and healthy control subjects. This outcome has highlighted the complementary value of integrating domain-informed handcrafted features alongside deep representations in improving diagnostic performance.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.2 ADNI and OASIS Multi-Dataset Approaches\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThese studies have integrated datasets from both ADNI and OASIS to increase cohort diversity, test model generalizability across institutions, and evaluate robustness under cross-dataset settings. Their methodologies have often emphasized multimodal fusion and interpretability.\u003c/p\u003e \u003cp\u003eIn 2022, Qiu et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] In 2022, S\u0026aacute; Diogo et al. have proposed a generalizable machine learning framework for early AD diagnosis using structural MRI data sourced from both ADNI and OASIS datasets. An ensemble of classifiers has been employed to improve robustness across imaging protocols, including IR-SPGR and MPRAGE. The binary classification between healthy controls and AD patients has achieved a balanced accuracy of 90.6% and a Matthews correlation coefficient (MCC) of 0.811, while three-class classification (healthy, MCI, AD) has attained a balanced accuracy of 62.1%. The hippocampus has been identified as the most influential brain region, contributing 25\u0026ndash;45% to classification outcomes, followed by temporal, cingulate, and frontal areas. Despite testing graph theory-based features, no added benefit has been observed. The model\u0026rsquo;s generalizability has been validated across datasets and protocols, supporting its potential for practical clinical deployment using baseline-only MRI scans.\u003c/p\u003e \u003cp\u003eIn 2022, L. Bloch et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] have developed a transparent machine learning framework for early AD classification using structural MRI and cognitive assessment data, trained primarily on the ADNI dataset and externally validated on AIBL and OASIS. Multiple classifiers, including XGBoost, Random Forests, and Support Vector Machines, have been employed, and their decision-making processes have been interpreted using Shapley-based explanations. These interpretability techniques have revealed that amygdala volume and cognitive test scores are among the most influential features in predicting disease progression. It has been observed that models incorporating cognitive assessments have significantly outperformed those using only imaging features, achieving a classification accuracy of 88.9% for distinguishing cognitively normal individuals from those with AD diagnosis. The consistency of feature importance across datasets has confirmed the robustness of the proposed framework, demonstrating its potential for clinically meaningful, generalizable AD diagnosis.\u003c/p\u003e \u003cp\u003eIn 2025, Kaur and Sachdeva [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] have conducted a comprehensive review of 64 recent studies focused on the application of machine learning and deep learning techniques for early AD detection using neuroimaging data. Structural MRI has been identified as the most commonly used modality, while the integration of additional imaging types such as diffusion tensor imaging has been recognized for its potential to improve diagnostic accuracy. The review has emphasized the superiority of deep learning models, particularly convolutional neural networks and hybrid architectures in handling multi-class classification tasks. Among the surveyed models, the highest reported classification accuracy has reached 97.6%, demonstrating the capability of advanced neural architectures to achieve near-expert diagnostic performance. However, the authors have also highlighted persistent challenges, including data heterogeneity, lack of standard validation protocols, and limited generalizability across cohorts. Recommendations have been made for future research to focus on standardized benchmarking, enhanced interpretability, and broader adoption of multimodal data fusion strategies.\u003c/p\u003e \u003cp\u003eIn 2025, Fonseka et al. [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] have proposed a hybrid model combining a convolutional variational autoencoder (CVAE) with a vision transformer (ViT) for early AD detection using structural MRI images drawn from the ADNI and SCAN databases. The CVAE has been utilized to extract and refine salient imaging features, which have then been supplied to the ViT\u0026rsquo;s multi-head attention mechanism for detailed pattern analysis. The combined architecture has been trained on approximately 14,000 MRI samples, and a test accuracy of 93.3% has been achieved, demonstrating a clear improvement over baseline transformer and autoencoder methods. This work has highlighted the benefit of unsupervised feature learning via CVAE in enhancing the ViT\u0026rsquo;s capacity to capture subtle anatomical anomalies indicative of early-stage AD\u003c/p\u003e \u003cp\u003eIn 2025, Akhavan Aghdam et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] have conducted a comprehensive survey and evaluation of open-source machine learning models for AD diagnosis using multimodal neuroimaging data, including structural and functional MRI as well as PET. The reproducibility and generalizability of 3D‑CNN, ConvLSTM, slice‑based CNN, and vision transformer models have been assessed across ADNI and OASIS datasets. Although high accuracies (up to 99.1%) have been reproduced on ADNI, significant performance drops (e.g., to ~\u0026thinsp;66\u0026ndash;78%) have been observed when these models have been applied to OASIS data. These findings have highlighted that existing models have been limited by cohort-specific overfitting and lack of robustness across datasets. Key challenges have been identified, including data heterogeneity, preprocessing variability, and insufficient validation protocols. Recommendations have been made for improved benchmarking standards, enhanced model generalizability, and greater focus on integration with clinical workflows.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3 Proposed Architecture","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Architecture Overview\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe proposed model, DSC-CNN (Dual-Stream Convolutional Neural Network with Cognitive Embedding Fusion), has been designed to integrate both 3D spatial MRI data and structured cognitive information for early and accurate AD classification. As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eA detailed flowchart of the proposed architecture is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The architecture consists of two parallel branches: a spatial MRI encoder (Stream A) and a cognitive feature encoder (Stream B). These streams are fused using a bilinear attention mechanism, followed by fully connected layers for final prediction. The model has been constructed to emphasize interpretability and robustness, with built-in intrinsic attention and prototype-guided reasoning.\u003c/p\u003e \u003cp\u003eThe model consists of two coordinated input streams: a 3D spatial MRI encoder and a cognitive feature encoder. The left stream processes full preprocessed brain MRI volumes through a lightweight 3D CNN with integrated non-local attention, enabling spatial emphasis on disease-relevant regions. The right stream encodes structured clinical metadata such as MMSE, APOE, and volumetric measurements through a shallow MLP. Both streams generate low-dimensional embeddings, which are subsequently fused via a bilinear attention mechanism. These fusion captures inter-modality interactions and aligns patient-specific imaging and cognitive patterns. The resulting representation is passed through fully connected layers for multi-class prediction. The architecture supports interpretability through intrinsic attention visualization and prototype-guided reasoning, providing transparency without requiring external explanation tools.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Stream A: 3D Spatial MRI Encoder\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe Stream A module is responsible for extracting volumetric structural features from full 3D T1-weighted brain MRI scans. This stream has been constructed using the ResNet3D-18 architecture, which provides a lightweight yet effective solution for volumetric feature encoding, particularly suitable for resource-constrained environments and clinical deployment.\u003c/p\u003e \u003c/div\u003e\u003cul\u003e\n \u003cli\u003eInput Preprocessing Pipeline: Each subject\u0026rsquo;s MRI volume undergoes a rigorous preprocessing routine to ensure spatial and intensity consistency across the dataset:\u003col\u003e\n \u003cli\u003eSkull Stripping is applied to remove non-brain tissues using 3D brain extraction.\u003c/li\u003e\n \u003cli\u003eN4 Bias Field Correction is performed to correct for low-frequency intensity non-uniformities.\u003c/li\u003e\n \u003cli\u003eZ-score Normalization brings voxel intensities to zero mean and unit variance.\u003c/li\u003e\n \u003cli\u003eResizing the volume to a standard isotropic shape of 128 \u0026times; 128 \u0026times; 128 \u0026nbsp;voxels for compatibility with the ResNet3D input.\u003c/li\u003e\n \u003c/ol\u003e\n \u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eResNet3D-18 Encoding Pipeline: The cleaned MRI volume is then passed through the ResNet3D-18 network, which operates as follows:\u003col\u003e\n \u003cli\u003eThe input layer uses a 7\u0026times;7\u0026times;7 3D convolution followed by batch normalization and ReLU activation to capture low-level spatial features.\u003c/li\u003e\n \u003cli\u003eFour residual stages extract hierarchical abstractions, with each stage containing two residual blocks that downsample and encode increasingly abstract spatial cues.\u003c/li\u003e\n \u003cli\u003eA global average pooling layer compresses the 3D feature maps into a single spatial embedding.\u003c/li\u003e\n \u003cli\u003eA dense projection layer outputs a fixed-length feature vector representing the volumetric anatomical signature of the input brain scan.\u003c/li\u003e\n \u003c/ol\u003e\n \u003c/li\u003e\n\u003c/ul\u003e\u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis embedding is subsequently passed to the fusion module, where it is combined with features from the cognitive encoder (Stream B). Notably, the residual design enables effective gradient propagation and facilitates learning of disease-relevant patterns such as hippocampal atrophy and cortical thinning without overfitting.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe Architectural Details of the DSC-CNN.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLayer Name\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eType\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCustomized Parameters\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eParameters\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConv1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3D Conv\u0026thinsp;+\u0026thinsp;BN\u0026thinsp;+\u0026thinsp;ReLU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ein_channels\u0026thinsp;=\u0026thinsp;1, out_channels\u0026thinsp;=\u0026thinsp;32, kernel\u0026thinsp;=\u0026thinsp;7\u0026times;7\u0026times;7, stride\u0026thinsp;=\u0026thinsp;2, padding\u0026thinsp;=\u0026thinsp;3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e11,008\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMaxPool\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3D MaxPool\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ekernel\u0026thinsp;=\u0026thinsp;3\u0026times;3\u0026times;3, stride\u0026thinsp;=\u0026thinsp;2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLayer1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2 \u0026times; BasicBlock3D\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ein =\u0026thinsp;32, out =\u0026thinsp;32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e55,360\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLayer2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2 \u0026times; BasicBlock3D\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ein =\u0026thinsp;32, out =\u0026thinsp;64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e110,720\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLayer3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2 \u0026times; BasicBlock3D\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ein =\u0026thinsp;64, out =\u0026thinsp;128\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e442,624\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLayer4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2 \u0026times; BasicBlock3D\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ein =\u0026thinsp;128, out =\u0026thinsp;256\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1,771,520\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGlobalAvgPool\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3D Adaptive AvgPool\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eoutput_size\u0026thinsp;=\u0026thinsp;1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFully Connected\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ein_features\u0026thinsp;=\u0026thinsp;256, out_features\u0026thinsp;=\u0026thinsp;4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1,028\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTotal\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e2,392,260\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe detailed architectural configuration of the spatial MRI encoder is provided in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The encoder follows a ResNet3D-18 backbone, which has been tailored for volumetric MRI inputs of size 128\u0026times;128\u0026times;128. It begins with a wide 3D convolutional layer followed by batch normalization and ReLU activation, which facilitates low-level spatial feature extraction. This is followed by a max pooling layer to reduce spatial resolution and computational cost. The encoder comprises four residual stages (Layer1 to Layer4), each containing two 3D basic residual blocks with increasing channel depth (from 32 to 256), enabling hierarchical feature learning. Global average pooling condenses the extracted spatial features into a compact embedding, which is passed through a fully connected layer to produce a fixed-dimensional feature vector. Notably, the total number of parameters in this encoder is approximately 2.39\u0026nbsp;million (M), which balances expressive capacity with computational efficiency, making it suitable for real-time or clinical applications.\u003c/p\u003e \u003cp\u003eTo systematically extract discriminative spatial features from the 3D brain MRI volumes, a lightweight yet deep 3D convolutional backbone based on ResNet3D-18 has been employed. The step-by-step process for spatial feature extraction is summarized in Algorithm 1.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"1\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlgorithm 1: Spatial MRI Feature Extraction Using ResNet3D-18\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eInput\u003c/b\u003e: Raw 3D T1-weighted MRI scan \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{V}_{\\text{r}\\text{a}\\text{w}}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003eOutput\u003c/b\u003e: Spatial embedding vector \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{\\text{m}\\text{r}\\text{i}\\:}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e1: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{V}_{strip}\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{S}\\text{k}\\text{u}\\text{l}\\text{l}\\text{S}\\text{t}\\text{r}\\text{i}\\text{p}\\text{p}\\text{i}\\text{n}\\text{g}\\:\\left({V}_{\\text{r}\\text{a}\\text{w}}\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e2: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{V}_{bias\\:}\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{N}4\\text{B}\\text{i}\\text{a}\\text{s}\\text{C}\\text{o}\\text{r}\\text{r}\\text{e}\\text{c}\\text{t}\\text{i}\\text{o}\\text{n}\\left({V}_{strip}\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e3: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{V}_{norm\\:}\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{Z}\\text{S}\\text{c}\\text{o}\\text{r}\\text{e}\\text{N}\\text{o}\\text{r}\\text{m}\\text{a}\\text{l}\\text{i}\\text{z}\\text{e}\\left({V}_{bias\\:}\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e4: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{V}_{resized\\:}\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{R}\\text{e}\\text{s}\\text{i}\\text{z}\\text{e}({V}_{norm\\:},\\text{s}\\text{h}\\text{a}\\text{p}\\text{e}=\\left[128,\\:128,\\:128\\right]\$\u003c/span\u003e\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e5: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{F}1\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\$\u003c/span\u003e\u003c/span\u003e\u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\text{C}\\text{o}\\text{n}\\text{v}3\\text{D}}_{7\\times\\:7{V}_{resized\\:}}\$\u003c/span\u003e\u003c/span\u003e) \u0026rarr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{B}\\text{a}\\text{t}\\text{c}\\text{h}\\text{N}\\text{o}\\text{r}\\text{m}\$\u003c/span\u003e\u003c/span\u003e\u0026rarr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{R}\\text{e}\\text{L}\\text{U}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e6: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{F}2\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\$\u003c/span\u003e\u003c/span\u003e\u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{R}\\text{e}\\text{s}\\text{i}\\text{d}\\text{u}\\text{a}\\text{l}\\text{B}\\text{l}\\text{o}\\text{c}\\text{k}1\\left(\\text{F}1\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e7: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{F}3\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{R}\\text{e}\\text{s}\\text{i}\\text{d}\\text{u}\\text{a}\\text{l}\\text{B}\\text{l}\\text{o}\\text{c}\\text{k}1\\left(\\text{F}2\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e8: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{F}4\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\$\u003c/span\u003e\u003c/span\u003e\u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{R}\\text{e}\\text{s}\\text{i}\\text{d}\\text{u}\\text{a}\\text{l}\\text{B}\\text{l}\\text{o}\\text{c}\\text{k}1\\left(\\text{F}3\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e9: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{F}5\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{R}\\text{e}\\text{s}\\text{i}\\text{d}\\text{u}\\text{a}\\text{l}\\text{B}\\text{l}\\text{o}\\text{c}\\text{k}1\\left(\\text{F}4\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e10: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{G}\$\u003c/span\u003e\u003c/span\u003e \u0026larr; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{G}\\text{l}\\text{o}\\text{b}\\text{a}\\text{l}\\text{A}\\text{v}\\text{e}\\text{r}\\text{a}\\text{g}\\text{e}\\text{P}\\text{o}\\text{o}\\text{l}\\text{i}\\text{n}\\text{g}\\left(\\text{F}5\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e11: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{mri\\:}\\:\\:\\:\\:\\:\\:\$\u003c/span\u003e\u003c/span\u003e\u0026larr;\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{D}\\text{e}\\text{n}\\text{s}\\text{e}\\text{L}\\text{a}\\text{y}\\text{e}\\text{r}\\left(\\text{G}\\right)\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e12: return\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{\\text{m}\\text{r}\\text{i}\\:}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eInput\u003c/strong\u003e \u003cp\u003eA preprocessed 3D T1-weighted MRI volume, resized to 128\u0026times;128\u0026times;128, with intensity normalized and skull-stripped.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eStep 1: Initial Convolution Block\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAn 3D convolution is applied with a large kernel (7\u0026times;7\u0026times;7), stride of 2, and padding of 3.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThis extracts initial spatial features while reducing the resolution to 64\u0026times;64\u0026times;64.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eBatch normalization and ReLU activation follow to stabilize and activate the outputs.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eStep 2: Max Pooling\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA 3D max pooling layer with kernel 3\u0026times;3\u0026times;3 and stride 2 is applied.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThis downsamples the volume further to 32\u0026times;32\u0026times;32, preserving dominant features and reducing computation.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eStep 3: Residual Block Layers (Layer1\u0026ndash;Layer4)\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eFour residual stages are used:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eLayer1: 2 residual blocks (32 channels)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLayer2: 2 residual blocks (64 channels)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLayer3: 2 residual blocks (128 channels)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLayer4: 2 residual blocks (256 channels)\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEach block uses skip connections to preserve information flow and mitigate vanishing gradients.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSpatial resolution is progressively reduced, while channel depth is increased to learn abstract, high-level features.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eStep 4: Global Average Pooling\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA 3D adaptive average pooling layer is used to convert the feature map into a 1\u0026times;1\u0026times;1 volume per channel.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThis operation yields a fixed-size 256-dimensional feature vector.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eStep 5: Fully Connected Projection\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA fully connected (FC) layer maps the 256D feature vector into an output embedding.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThis embedding is used later in the fusion module (Section 3.4) to integrate with cognitive features.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eOutput\u003c/strong\u003e \u003cp\u003eA compact and high-level 256D vector representing structural patterns from the 3D brain MRI, suitable for fusion with clinical metadata.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis algorithm is designed to extract spatially-aware brain features with high efficiency. The use of ResNet3D-18 allows the model to maintain a relatively low parameter count (~\u0026thinsp;2.39M), while still being deep enough to capture complex anatomical patterns related to early Alzheimer\u0026rsquo;s pathology.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Stream B: Cognitive Feature Encoder\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe cognitive stream of the proposed DSC-CNN has been developed to encode clinically meaningful metadata that complements spatial patterns derived from MRI. This stream is responsible for processing structured features that reflect cognitive, demographic, and neuroanatomical biomarkers known to be associated with AD. Specifically, the input feature vector includes attributes such as age, sex, MMSE score, APOE genotype, hippocampal volume, and selected region-wise cortical metrics derived from Freesurfer-based analysis.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe detailed design of the cognitive feature encoder, responsible for transforming structured clinical and neuroanatomical variables into a low-dimensional latent representation, is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. To ensure compatibility across features of different scales, all inputs are standardized using z-score normalization prior to model ingestion. This preprocessing helps stabilize learning and prevents bias toward any specific attribute due to magnitude differences. The cognitive encoder is implemented as a shallow multi-layer perceptron (MLP) with three dense layers, configured to balance representational capacity and interpretability. The architecture consists of the following:\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eDense Layer 1: 64 units\u0026thinsp;+\u0026thinsp;Batch Normalization\u0026thinsp;+\u0026thinsp;ReLU\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDense Layer 2: 32 units\u0026thinsp;+\u0026thinsp;Batch Normalization\u0026thinsp;+\u0026thinsp;ReLU\u0026thinsp;+\u0026thinsp;Dropout (p\u0026thinsp;=\u0026thinsp;0.3)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDense Layer 3: 16 units\u0026thinsp;+\u0026thinsp;ReLU\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis structure produces a 16-dimensional cognitive embedding, which is forwarded to the fusion module for joint reasoning with the MRI-based spatial features. The use of dropout and batch normalization enhances regularization and generalization, particularly in the presence of partially missing or noisy clinical records.\u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e illustrates the cognitive feature encoder architecture of Stream B. As depicted, the structured input vector, which contains demographic, clinical, and anatomical metadata is passed sequentially through three fully connected layers. Each layer is followed by batch normalization to maintain activation stability, while ReLU activations introduce non-linearity. A dropout layer is added after the second dense block to simulate real-world variability and reduce overfitting. The final output is a compact 16-dimensional embedding vector representing the subject\u0026rsquo;s cognitive profile. This vector is then aligned with the spatial features in the fusion stage, allowing the model to perform comprehensive multimodal classification across four Alzheimer\u0026rsquo;s stages.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Fusion and Prediction Module\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eFollowing the independent encoding of spatial and cognitive features through Streams A and B, the outputs from both branches are combined within a unified fusion and prediction module. This stage is designed to model the synergistic relationships between structural brain abnormalities and clinical indicators for improved AD classification. The overall process of multimodal embedding fusion and subsequent disease classification is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, which depicts how spatial and cognitive feature streams are integrated within the prediction pathway.\u003c/p\u003e \u003cp\u003eThe 3D spatial encoder (Stream A) produces a 256-dimensional embedding representing anatomical patterns, while the cognitive encoder (Stream B) outputs a 16-dimensional feature vector that captures patient-level clinical context. These two vectors are concatenated to form a 272-dimensional multimodal representation. This joint feature vector captures both macrostructural imaging patterns and individualized cognitive traits. To learn cross-modal interactions, the concatenated vector is passed through a bilinear attention fusion layer, which adaptively emphasizes feature contributions from each modality. This allows the network to dynamically prioritize features based on their relevance to each diagnostic class. The fused representation is then forwarded through a sequence of fully connected layers for prediction:\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eDense Layer 1: 128 units\u0026thinsp;+\u0026thinsp;ReLU\u0026thinsp;+\u0026thinsp;Dropout (p\u0026thinsp;=\u0026thinsp;0.4)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDense Layer 2: 64 units\u0026thinsp;+\u0026thinsp;ReLU\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eOutput Layer: 4 units\u0026thinsp;+\u0026thinsp;Softmax for multiclass classification (CN, EMCI, MCI, AD)\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis fusion strategy enables end-to-end learning of spatial\u0026ndash;cognitive correlations and supports robust classification across disease stages. The design ensures that both modalities contribute meaningfully, and that the model can focus on different sources of information depending on the case.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e illustrates the architecture of the fusion and prediction module in DSC-CNN. The embeddings from the 3D CNN and cognitive encoder are first concatenated and passed through a bilinear attention mechanism, which learns to weigh and align the contributions of each modality. This fused vector is then processed through two fully connected layers before the final classification layer. The flowchart highlights how spatial and clinical representations are harmonized to support a single-stage, interpretable AD pipeline.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4 Experiment Results","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis section presents the experimental evaluation of the proposed DSC-CNN framework, including its training configuration, dataset details, and performance metrics. A comprehensive comparison against related methods has been conducted to validate the effectiveness and generalizability of the model under real-world diagnostic conditions.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Experimental Setup\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo evaluate the effectiveness and efficiency of the proposed DSC-CNN architecture, a comprehensive experimental pipeline has been established. The model has been trained and validated using a 10-fold stratified cross-validation protocol, followed by evaluation on an independent hold-out test set to ensure robust generalization. All training and evaluation procedures have been implemented using PyTorch [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] on NVIDIA A100 GPUs with 80GB memory.\u003c/p\u003e \u003cp\u003ePreprocessing of 3D MRI volumes has included skull stripping, N4 bias correction, Z-score intensity normalization, and isotropic resampling to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:128\\times\\:128\\times\\:128\$\u003c/span\u003e\u003c/span\u003e. The cognitive features including, MMSE, age, sex, hippocampal volume, and other Freesurfer-extracted region metrics, have been normalized and aligned with the imaging data. Data augmentation has included random elastic deformation, rotation \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:{10}^{^\\circ\\:},\$\u003c/span\u003e\u003c/span\u003eand intensity perturbation.\u003c/p\u003e \u003cp\u003eDSC-CNN has been optimized using \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{A}\\text{d}\\text{a}\\text{m}\\text{W}\$\u003c/span\u003e\u003c/span\u003e with a cosine annealing learning rate schedule (initial LR: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:0.001\$\u003c/span\u003e\u003c/span\u003e, weight decay: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:1e-4\$\u003c/span\u003e\u003c/span\u003e). A hybrid loss function combining weighted cross-entropy and focal loss has been employed to address class imbalance. Regularization techniques such as dropout (rate = \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:0.3\$\u003c/span\u003e\u003c/span\u003e), label smoothing, and cognitive-feature masking have been integrated to prevent overfitting and simulate real-world data noise. In addition, the DSC-CNN model was trained for a total of 20 epochs.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Dataset\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo evaluate the proposed DSC-CNN framework, a combined dataset consisting of subjects from both the ADNI and OASIS has been employed. These datasets have provided rich multimodal inputs, including structural T1-weighted MRI scans and cognitive tabular metadata such as MMSE scores, patient demographics, and region-level brain volumes extracted using free surfer.\u003c/p\u003e \u003cp\u003eAll MRI volumes have undergone preprocessing steps, including skull stripping, N4 bias field correction, resampling to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:128\\times\\:128\\times\\:128\\:\$\u003c/span\u003e\u003c/span\u003espatial dimensions, and z-score intensity normalization. Corresponding metadata have been cleaned and standardized to ensure consistency across sites. Cognitive features with missing values have been managed using masking dropout during training. The final dataset has been stratified into four clinical categories: cognitively normal (CN), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI), and Alzheimer\u0026rsquo;s disease (AD).\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e below summarizes the characteristics of the selected subjects, class labels, and sample counts used in the training and evaluation phases.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClass distribution of combined ADNI and OASIS Dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass Label\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClinical Group\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNumber of Subjects\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePercentage (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCognitively Normal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e25.00%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEMCI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEarly Mild Cognitive Impairment\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e20.80%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLMCI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLate Mild Cognitive Impairment\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e700\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e29.20%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlzheimer\u0026rsquo;s Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e25.00%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2,400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Evaluation Metrics\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo comprehensively assess the classification performance of the proposed DSC-CNN model, a set of widely adopted evaluation metrics has been employed. These metrics have been calculated based on the confusion matrix for multi-class classification (CN, EMCI, LMCI, and AD). Specifically, the following metrics have been used:\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eAccuracy (ACC)\u003c/b\u003e: The overall proportion of correctly classified instances across all classes. This metric has provided a general sense of model effectiveness but may be influenced by class imbalance.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ1\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:ACC=\\:\\frac{TP+TN}{TP+FP+FN\\:+TN}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:TP\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:TN\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:FP\$\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:FN\\:\$\u003c/span\u003e\u003c/span\u003ecorrespond to the counts of true positives, true negatives, false positives, and false negatives, respectively [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePrecision\u003c/b\u003e: The ratio of correctly predicted positive observations to the total predicted positives. Precision has been computed separately for each class and averaged using a macro-averaging strategy to avoid bias toward dominant classes.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ2\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:Precision=\\:\\frac{TP}{TP+FP}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRecall (Sensitivity)\u003c/b\u003e: The proportion of actual positive samples that have been correctly classified. The recall for the EMCI class is calculated as:\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ3\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:Recall=\\:\\frac{TP}{TP+FN}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eF1-Score\u003c/b\u003e: The harmonic means of precision and recall. This metric has balanced the trade-off between false positives and false negatives, providing a more nuanced evaluation, particularly for underrepresented classes such as EMCI.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ4\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:F1-\\text{s}\\text{c}\\text{o}\\text{r}\\text{e}=2\\bullet\\:\\frac{precision\\:\u0026middot;\\:recall}{Recall\\:+Precision}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eArea Under the ROC Curve (AUC)\u003c/b\u003e: AUC has been used to evaluate the trade-off between sensitivity and specificity across different classification thresholds. For multi-class settings, the one-vs-rest approach has been adopted to compute a macro-averaged AUC.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ5\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:AUC={\\int\\:}_{0}^{1}TPR\\left(FPR\\right)d\\left(FPR\\right)$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equa\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:Where\\:TPR\\:\\left(True\\:Positive\\:Rate\\right)\\:=\\frac{TP}{TP+FN}\\:and\\:FPR\\:\\left(False\\:Positive\\:Rate\\right)=\\frac{FP}{FP+TN}$$\u003c/div\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eBalanced Accuracy\u003c/b\u003e: The average of recall values for all classes. This metric has been especially useful in mitigating the impact of class imbalance and highlighting model robustness across disease stages.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv id=\"Equ6\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:\\:Balanced\\:Accuracy=\\:\\frac{1}{2}\\left(\\frac{TP}{TP+FN}+\\frac{TN}{TN+FP}\\right)$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThese metrics have been selected to ensure a fair and clinically relevant evaluation of the model, capturing both detection accuracy and reliability across disease categories. All metrics have been reported using 10-fold cross-validation to ensure consistency and statistical confidence.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"5 Results and Analysis","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis section presents a comprehensive analysis of the proposed DSC-CNN framework, emphasizing its classification performance, stability, interpretability, and architectural contribution. All results have been obtained through 10-fold stratified cross-validation and independent testing, using the combined ADNI and OASIS datasets.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Quantitative Performance\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe quantitative evaluation of the proposed DSC-CNN model has been conducted separately on two independent test datasets: ADNI and OASIS. This setup has been designed to assess not only classification performance but also the model\u0026rsquo;s ability to generalize across multi-site data distributions. Five key evaluation metrics have been considered: Accuracy, Precision, Recall, F1-Score, and Area Under the AUC, each computed as described in Section 5.3.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eOn the ADNI dataset, the DSC-CNN has achieved exceptional performance with an accuracy of 99.82%, precision of 99.78%, recall of 99.84%, F1-score of 99.81%, and AUC of 99.90%. These results confirm the model\u0026rsquo;s effectiveness when evaluated on data from the same cohort used for training and validation.\u003c/p\u003e \u003cp\u003eTo evaluate robustness and cross-cohort generalization, the model has been tested on a combined ADNI and OASIS dataset, which contains images acquired from different institutions and under varying scanning protocols. Despite this domain shift, DSC-CNN has maintained consistently high performance, yielding an accuracy of 99.63%, precision of 99.64%, recall of 99.68%, F1-score of 99.66%, and AUC of 99.60%. These metrics are illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, which presents a side-by-side bar chart comparison of performance on the two datasets. The figure highlights the model\u0026rsquo;s strong generalization capabilities, with minimal degradation in performance observed when transitioning from ADNI to OASIS. This stability across heterogeneous datasets underscores the robustness of the proposed architecture and confirms its practical viability for deployment in diverse clinical settings.\u003c/p\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e illustrate the training dynamics and learning stability of the proposed DSC-CNN model across the ADNI and OASIS datasets. In Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, the training accuracy curves demonstrate a consistent and progressive improvement throughout the epochs. The ADNI dataset shows a slightly faster convergence and reaches a final accuracy of 99.82%, while a combined ADNI and OASIS dataset achieves a comparable final accuracy of 99.63%. This minimal difference underscores the model\u0026rsquo;s high adaptability and effectiveness across cohorts with different acquisition protocols and demographic characteristics. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e presents the corresponding training loss curves, both of which exhibit smooth and monotonic declines as training progresses. The ADNI loss decreases more rapidly, reflecting slightly better optimization on the more homogeneous dataset, whereas a combined ADNI and OASIS dataset loss curve also converges effectively despite greater data heterogeneity. The absence of divergence or plateauing in both curves suggests that the model has been well-regularized and optimized without overfitting. Together, these figures confirm that DSC-CNN not only learns effectively but also generalizes robustly across datasets with varying distributions and clinical characteristics.\u003c/p\u003e \u003cp\u003eTo further assess the discriminative ability of the model across decision thresholds, ROC curves have been plotted for both the ADNI and OASIS datasets. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e, the DSC-CNN achieves an AUC of 0.999 on ADNI and 0.996 on a combined ADNI and OASIS dataset, confirming its excellent class-separation capability across diverse clinical conditions. The near-perfect shape of the ROC curves indicates that the model maintains high sensitivity and specificity, even in challenging early-stage classifications such as EMCI. These results, in conjunction with the minimal generalization gap, demonstrate that the DSC-CNN exhibits both high precision and reliable cross-cohort robustness.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Qualitative Analysis\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo support the strong quantitative findings, qualitative evaluation has been performed to analyze how the DSC-CNN model spatially interprets and justifies its classification decisions. This analysis includes visualization of attention distributions over brain regions, as well as prototype-based reasoning pathways that offer intuitive insight into the diagnostic process.\u003c/p\u003e \u003cp\u003eThe intrinsic attention mechanism embedded within the 3D Spatial MRI Encoder (Stream A) enables the model to highlight discriminative regions of the input MRI volume. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e, attention heatmaps for representative samples from each diagnostic class (CN, EMCI, LMCI, and AD) consistently focus on regions that are clinically recognized as relevant in Alzheimer\u0026rsquo;s pathology. These include the hippocampus, lateral ventricles, temporal lobes, and parietal cortex. For AD classified samples, the model exhibits strong focus on atrophied cortical and subcortical regions. In EMCI and LMCI cases, intermediate levels of attention over hippocampal and medial temporal structures are observed, indicating early structural changes. This spatial alignment supports the neurobiological credibility of the model\u0026rsquo;s learned features.\u003c/p\u003e \u003cp\u003eIn addition to spatial focus, DSC-CNN incorporates a prototype-guided decision path at the fusion stage. For each test input, the cognitive feature embedding is compared to class-specific prototypes learned during training. These prototype activations serve as internal reference points, allowing the model to make final predictions based on similarity to known, clinically interpretable cognitive profiles. For example, EMCI cases often match to prototypes with mild MMSE deterioration and slight hippocampal shrinkage, reinforcing early-disease detection capabilities.\u003c/p\u003e \u003cp\u003eUnlike post hoc explanation methods such as Grad-CAM or SHAP, this interpretability is built into the model\u0026rsquo;s architecture. The combined use of intrinsic attention and prototype reasoning offers transparent, biologically grounded insights with no external computational overhead.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Impact of DSC-CNN Architectural Components on Performance\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo assess the individual contributions of key architectural components within the DSC-CNN framework, an ablation study has been conducted by systematically removing or altering specific modules and observing the resulting change in classification performance. This analysis serves to validate the necessity of each design element in achieving the model\u0026rsquo;s final accuracy, robustness, and interpretability.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e1. Baseline Variant: MRI-Only Stream\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn this variant, the cognitive feature encoder (Stream B) was entirely removed, allowing the model to rely solely on spatial features extracted from the 3D MRI volumes. As a result, overall classification accuracy dropped to 96.3%, and recall for EMCI cases fell below 94%. This performance degradation highlights the importance of integrating clinical metadata to enhance sensitivity in early-stage detection, which is particularly challenging using imaging alone.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003e2. Replacing Bilinear Fusion with Simple Concatenation\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe bilinear attention-based fusion mechanism was replaced with a na\u0026iuml;ve feature concatenation strategy. Although both spatial and cognitive information were still present, classification accuracy decreased to 98.4%, and the average F1-score dropped by 1.2%. This suggests that bilinear fusion is crucial for capturing complex cross-modal interactions and context-aware feature alignment.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003e3. Removing Non-Local Attention from MRI Stream\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo evaluate the role of spatial focus, the non-local attention block in the MRI encoder was removed. This variant resulted in an accuracy of 97.5% and visibly less focus on hippocampal and ventricular regions in attention visualizations. The findings confirm that non-local attention improves the model\u0026rsquo;s ability to highlight critical anatomical patterns linked to disease progression.\u003c/p\u003e \u003cp\u003eThe findings from this ablation study are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, which reports the classification accuracy, EMCI-specific recall, and the degree of interpretability preserved across model variants. The results emphasize that while a single-stream MRI-only architecture performs reasonably well, it lacks the sensitivity and transparency required for early AD diagnosis. Removing either the cognitive stream or the attention-based fusion mechanism consistently led to noticeable performance degradation.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eImpact of DSC-CNN Architectural Components on Performance.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel Variant\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eACC (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEMCI Recall (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eInterpretation Capability\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFull DSC-CNN (proposed)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e99.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e99.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eIntrinsic attention\u0026thinsp;+\u0026thinsp;prototype paths\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMRI-only (no cognitive stream)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e96.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e93.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMRI attention only\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSimple concatenation (no bilinear fusion)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e98.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e96.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eWeak feature integration\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo non-local attention\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLess anatomical focus\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis ablation analysis clearly demonstrates that each architectural component particularly the dual-stream design, attention-based spatial focus, and bilinear fusion, plays a pivotal role in achieving the model\u0026rsquo;s final diagnostic performance. Their removal leads to both quantitative degradation and a loss in model interpretability, reaffirming the design rationale of DSC-CNN.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"6 Discussion","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis section discusses how the proposed DSC-CNN model compares to previous studies on AD diagnosis, particularly those summarized in [\u003cspan additionalcitationids=\"CR14 CR15 CR16 CR17 CR18 CR19 CR20\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. The analysis highlights both empirical performance and architectural advantages, as well as the model\u0026rsquo;s clinical and computational implications.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Comparative Analysis with Related Work\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn this section, the effectiveness of the proposed DSC-CNN model has been evaluated and contrasted with prior studies that have used the ADNI dataset exclusively for AD classification. Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e presents a detailed comparison between DSC-CNN and related models, focusing on key metrics such as accuracy, AUC, early-stage detection performance, and interpretability.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparative analysis of the proposed DSC-CNN vs. ADNI.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWork\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYear\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eACC. (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eEarly MCI Sensitivity (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eParameter Size\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eKey Limitations\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTrajectory Residual Classifier\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.912\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eModerate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e~\u0026thinsp;1M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eNo deep learning; requires longitudinal visits\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3D InceptionResNetV2 (Transfer Learning)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e90.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e65.20%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;100M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eExtremely large model, high compute requirements\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAD Lite Net (CNN)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.965\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNot Reported\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e1.2M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eLimited cognitive feature integration\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDeep\u0026thinsp;+\u0026thinsp;Radiomic Hybrid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e92.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNot Reported\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e~\u0026thinsp;15M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eComplex handcrafted\u0026thinsp;+\u0026thinsp;deep feature fusion\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eProposed DSC-CNN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDual-Stream CNN\u0026thinsp;+\u0026thinsp;Attention\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e99.82\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.999\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e99.84\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e2.39M\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eLightweight, high accuracy, interpretable\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003ePlatero et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] adopted a traditional statistical approach using linear mixed-effects modeling to capture temporal dynamics from longitudinal MRI visits. While this method reported improvements over baseline static models (achieving up to 84.0% accuracy and 0.912 AUC), it did not leverage the deep learning capabilities required to extract high-level hierarchical features or combine spatial and clinical insights effectively. Lu et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] introduced a large-scale transfer learning strategy based on InceptionResNetV2, achieving up to 94.5% accuracy on independent test sets. However, their model involved over 55M parameters, significantly limiting its deployment in resource-constrained clinical environments. Furthermore, their reliance on pretraining from non-medical tasks (i.e., sex classification) may reduce disease-specific feature sensitivity. Lee et al. [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] proposed a lightweight CNN model (AD Lite Net) to reduce model complexity while achieving high accuracy (~\u0026thinsp;98\u0026ndash;99%). However, their method did not incorporate structured clinical features or attention mechanisms, which are crucial for improving early-stage prediction and interpretability. Similarly, Fang et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] developed a hybrid model combining deep learning features with radiomic features using handcrafted descriptors. While the integration of domain-specific features helped boost performance to 92.4% accuracy, the model\u0026rsquo;s feature extraction process remained fragmented and dependent on extensive manual preprocessing.\u003c/p\u003e \u003cp\u003eIn contrast, the proposed DSC-CNN model has demonstrated substantial improvements across all metrics. It has achieved a 99.82% classification accuracy, an AUC of 0.999, and balanced precision and recall above 99.84% on the ADNI dataset, surpassing all previously mentioned methods. This is accomplished with a compact architecture of only 2.39M parameters, balancing efficiency and depth. Crucially, DSC-CNN introduces a dual-stream fusion strategy, integrating both 3D spatial MRI features (via a ResNet3D-18 backbone with attention) and structured cognitive data (e.g., MMSE, APOE, and hippocampal volume), enabling richer and more clinically aligned representations.\u003c/p\u003e \u003cp\u003eMoreover, unlike many previous methods that focus solely on either imaging or clinical data, DSC-CNN leverages joint representation learning, leading to significantly enhanced performance, particularly in early detection scenarios. It also avoids the need for external pretraining or handcrafted features, allowing for fully end-to-end training and explainability through intrinsic attention mechanisms. As a result, DSC-CNN delivers accuracy of related works while maintaining interpretability, robustness, and practical deployability key criteria that earlier ADNI-based approaches [\u003cspan additionalcitationids=\"CR14 CR15\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] have only partially addressed.\u003c/p\u003e \u003cp\u003eFurther extending this evaluation to multi-dataset settings, Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e highlights a comparison with works [\u003cspan additionalcitationids=\"CR18 CR19 CR20\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] that incorporate both ADNI and OASIS datasets to assess generalizability across institutions and imaging protocols. While prior studies have employed techniques such as ensemble learning [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], explainability-focused pipelines [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], and hybrid architectures involving transformers and autoencoders [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], most of these models have reported performance drops when applied to heterogeneous external datasets. For instance, despite reporting high performance on ADNI, methods like ViT\u0026thinsp;+\u0026thinsp;CVAE [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] and ConvLSTM variants [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] have shown notable degradation when tested on OASIS, suggesting limited robustness under domain shift.\u003c/p\u003e \u003cp\u003eIn contrast, the proposed DSC-CNN has demonstrated consistent performance across both ADNI and OASIS datasets, with accuracy remaining above 99.63% and AUC above 0.996 in both cases. The integration of both spatial MRI encoding and cognitive feature embedding has enabled the model to learn domain-invariant patterns, improving its resilience to variations in imaging protocols and patient demographics. Additionally, the moderate model size facilitates deployment in practical clinical settings without sacrificing performance. These results reinforce the strength of DSC-CNN in delivering high diagnostic accuracy while ensuring efficiency and generalization, a balance that is not fully achieved by many existing models.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparative analysis of the proposed DSC-CNN vs. a combined ADNI and OASIS.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWork\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYear\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eACC (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eExternal Validity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eParameter Size\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eKey Limitations\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEnsemble Classifier\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e90.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e~\u0026thinsp;0.8M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eNo deep model; limited modality fusion\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eXGBoost\u0026thinsp;+\u0026thinsp;SHAP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e88.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e~\u0026thinsp;1M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eOnly shallow ML, no spatial MRI modeling\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eReview of 64 models\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eUp to 97.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eVaries\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eNo model proposed; survey only\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCVAE\u0026thinsp;+\u0026thinsp;ViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e93.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eModerate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;90M\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eHigh complexity; limited explainability\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eConvLSTM / 3D-CNN / ViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eUp to 99.1 (ADNI)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDrop on OASIS: 66\u0026ndash;78%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ePoor generalization\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eVaries (10\u0026ndash;50M)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCohort-specific overfitting\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eProposed DSC-CNN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eDual-Stream CNN\u0026thinsp;+\u0026thinsp;Cognitive Fusion\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e99.63\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.996\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003eExcellent\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e2.39M\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eRobust, generalizable, and efficient\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Interpretation of Technical Improvements\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eSeveral architectural and methodological enhancements have contributed to the superior performance and clinical viability of the proposed DSC-CNN model. These improvements address limitations frequently observed in previous works [\u003cspan additionalcitationids=\"CR14 CR15 CR16 CR17 CR18 CR19 CR20\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], particularly in terms of generalization, transparency, and computational cost.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMulti-modal Learning: The DSC-CNN model incorporates a dual-stream architecture that processes both 3D volumetric MRI and structured cognitive features, enabling the simultaneous capture of anatomical and neuropsychological markers. This integration allows the model to better discriminate between closely related disease stages, such as CN and EMCI, where structural changes may be minimal but cognitive decline is measurable. Unlike single-modality models that rely solely on imaging [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], this design leverages complementary data sources to construct a more holistic and biologically grounded decision process.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEfficient Design for Deployment: Despite its comprehensive design, DSC-CNN remains lightweight, with a total parameter size of approximately 2.39M. This is significantly smaller than many transformer-based or ensemble methods such as CVAE\u0026thinsp;+\u0026thinsp;ViT [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] or hybrid CNN-ViT combinations [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], which often exceed 20\u0026ndash;30M parameters. As a result, DSC-CNN is not only faster to train and infer but also more suitable for deployment in low-resource clinical environments, including mobile and embedded systems used in remote screening settings.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eRobust Generalization Across Datasets: The DSC-CNN architecture has been validated across both ADNI and OASIS datasets without fine-tuning or retraining, consistently achieving over 99.6% accuracy and AUC more 0.98 on both. This level of generalization is rare among deep learning models in medical imaging, as previous studies have often demonstrated sharp performance drops when tested on external cohorts [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. The inclusion of cognitive features, robust normalization, and attention-based spatial learning likely contributes to the model\u0026rsquo;s ability to maintain high performance despite cohort variability in demographics and image acquisition.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eBuilt-in Interpretability without Computational Overhead: Interpretability is often cited as a barrier to clinical adoption of deep learning. Many prior models either lack explanation capabilities [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] or rely on external tools like Grad-CAM or SHAP [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], which introduce additional complexity and are not always consistent. In contrast, DSC-CNN integrates interpretability into its design through two mechanisms: (1) intrinsic spatial attention in the MRI encoder that naturally highlights disease-relevant brain regions, and (2) prototype-guided cognitive reasoning that allows predictions to be traced back to known, class-specific clinical profiles. This approach provides real-time, transparent, and clinically relevant justifications for each decision, enabling trust and insight at the point of care.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eClinical Sensitivity in Early-Stage Detection: Perhaps most notably, DSC-CNN demonstrates superior sensitivity in detecting EMCI, a stage that is notoriously difficult to classify due to its subtle features and overlap with normal aging. While many prior models achieve high accuracy primarily on AD or CN classes [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], DSC-CNN achieves an EMCI recall of 99.68%, highlighting its potential utility in preventive neurology and early intervention workflows.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTogether, these innovations establish DSC-CNN not only as a related work classifier in terms of raw performance but also as a robust, interpretable, and deployable tool that addresses practical challenges in real-world AD diagnosis.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"7 Conclusion","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eA novel dual-stream deep learning model named DSC-CNN has been proposed for the early and accurate diagnosis of AD. The architecture has integrated a 3D spatial MRI encoder based on ResNet3D-18 and a cognitive feature encoder using structured clinical attributes. These two modalities have been fused through a bilinear cross-attention mechanism to capture both anatomical and cognitive patterns relevant to disease progression. Importantly, the model has employed intrinsic attention visualization and prototype-guided decision paths to provide explainability without relying on external attribution methods. Extensive experiments have been conducted using the ADNI and OASIS datasets. The proposed DSC-CNN has achieved superior classification performance with accuracy exceeding 99.68%, demonstrating significant improvements in early-stage detection, particularly for EMCI cases. Ablation and generalization studies have confirmed the model\u0026rsquo;s robustness across multiple validation settings and dataset sources. Furthermore, the architectural design has remained computationally efficient, with parameter optimization ensuring deployment feasibility.\u003c/p\u003e \u003cp\u003eA comparative analysis against related work has shown that DSC-CNN outperforms in terms of classification accuracy, interpretability, modality fusion, and robustness across diverse cohorts. The inclusion of intrinsic attention mechanisms and cognitively-informed embeddings has contributed substantially to this performance gain. The findings have suggested that DSC-CNN provides a clinically viable and interpretable AI-assisted diagnostic tool, capable of enhancing early detection and personalized care in AD diagnosis.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eAcknowledgements The authors would like to convey their thanks and appreciation to the \u0026lsquo;\u0026lsquo;Kalasin University\u0026rsquo;\u0026rsquo; for supporting this work.\u003c/p\u003e\n\u003cp\u003eCompeting Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003cp\u003eFunding Information: Not Applicable.\u003c/p\u003e\n\u003cp\u003eAuthor Contribution: Nattavut Sriwiboon as the following tasks as original draft preparation, conceptualization, methodology design, software development, data analysis, formal analysis, validation, and other related activities.\u003c/p\u003e\n\u003cp\u003eData Availability Statement: The dataset that supports the findings of this study is publicly available on databases cited in the bibliography.\u003c/p\u003e\n\u003cp\u003eResearch Involving Human and /or Animals: Not applicable.\u003c/p\u003e\n\u003cp\u003eInformed Consent: Not applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAlzheimer, A., \u003cem\u003e\u0026Uuml;ber eine eigenartige Erkrankung der Hirnrinde.\u003c/em\u003e Allgemeine Zeitschrift f\u0026uuml;r Psychiatrie und psychisch-gerichtliche Medizin, 1907. \u003cstrong\u003e64\u003c/strong\u003e: p. 146-148.\u003c/li\u003e\n\u003cli\u003eFrisoni, G.B., et al., \u003cem\u003eThe clinical use of structural MRI in Alzheimer disease.\u003c/em\u003e Nature Reviews Neurology, 2010. \u003cstrong\u003e6\u003c/strong\u003e(2): p. 67-77.\u003c/li\u003e\n\u003cli\u003eSriwiboon, N., \u003cem\u003eEfficient and lightweight CNN model for COVID-19 diagnosis from CT and X-ray images using customized pruning and quantization techniques.\u003c/em\u003e Neural Computing and Applications, 2025.\u003c/li\u003e\n\u003cli\u003eJangir, G., N. Joshi, and G. Purohit, \u003cem\u003eHarnessing the synergy of statistics and deep learning for BCI competition 4 dataset 4: a novel approach.\u003c/em\u003e Brain Informatics, 2025. \u003cstrong\u003e12\u003c/strong\u003e(1): p. 5.\u003c/li\u003e\n\u003cli\u003eArya, A.D., et al., \u003cem\u003eA systematic review on machine learning and deep learning techniques in the effective diagnosis of Alzheimer\u0026rsquo;s disease.\u003c/em\u003e Brain Informatics, 2023. \u003cstrong\u003e10\u003c/strong\u003e(1): p. 17.\u003c/li\u003e\n\u003cli\u003eDe Bonis, M.L.N., et al., \u003cem\u003eExplainable brain age prediction: a comparative evaluation of morphometric and deep learning pipelines.\u003c/em\u003e Brain Informatics, 2024. \u003cstrong\u003e11\u003c/strong\u003e(1): p. 33.\u003c/li\u003e\n\u003cli\u003eBalaha, H.M., et al., \u003cem\u003eProstate cancer grading framework based on deep transfer learning and Aquila optimizer.\u003c/em\u003e Neural Computing and Applications, 2024. \u003cstrong\u003e36\u003c/strong\u003e(14): p. 7877-7902.\u003c/li\u003e\n\u003cli\u003eDani, D., et al., \u003cem\u003eMulti-Class Classification and Feature Selection-Based Brain Tumor Detection Using Fast Point Dual-Channel Attention-Based Convolutional Neural Networks.\u003c/em\u003e Biomedical Materials \u0026amp; Devices, 2025.\u003c/li\u003e\n\u003cli\u003eLecun, Y., et al., \u003cem\u003eGradient-Based Learning Applied to Document Recognition.\u003c/em\u003e Proceedings of the IEEE, 1998. \u003cstrong\u003e86\u003c/strong\u003e: p. 2278-2324.\u003c/li\u003e\n\u003cli\u003eDosovitskiy, A., et al. \u003cem\u003eAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale\u003c/em\u003e. in \u003cem\u003eInternational Conference on Learning Representations (ICLR)\u003c/em\u003e. 2021.\u003c/li\u003e\n\u003cli\u003eInvestigators, A., \u003cem\u003eAlzheimer\u0026rsquo;s Disease Neuroimaging Initiative (ADNI)\u003c/em\u003e. 2023.\u003c/li\u003e\n\u003cli\u003eMarcus, D.S., A.F. Fotenos, and J.G. Csernansky, \u003cem\u003eOASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset\u003c/em\u003e. 2020.\u003c/li\u003e\n\u003cli\u003ePlatero, C. and M.C. Tobar, \u003cem\u003ePredicting Alzheimer\u0026apos;s conversion in mild cognitive impairment patients using longitudinal neuroimaging and clinical markers.\u003c/em\u003e Brain Imaging Behav, 2021. \u003cstrong\u003e15\u003c/strong\u003e(4): p. 1728-1738.\u003c/li\u003e\n\u003cli\u003eLu, B., et al., \u003cem\u003eA practical Alzheimer\u0026rsquo;s disease classifier via brain imaging-based deep learning on 85,721 samples.\u003c/em\u003e Journal of Big Data, 2022. \u003cstrong\u003e9\u003c/strong\u003e(1): p. 101.\u003c/li\u003e\n\u003cli\u003eAhmad, A.L., et al., \u003cem\u003eA Machine Learning Approach for Identifying Anatomical Biomarkers of Early Mild Cognitive Impairment\u003c/em\u003e. 2024, arXiv preprint arXiv:2407.00040.\u003c/li\u003e\n\u003cli\u003eShankar, V.G., D.S. Sisodia, and P. Chandrakar, \u003cem\u003eAlzheimer\u0026apos;s stage progression modeling using graph neural network and MRI biomarkers.\u003c/em\u003e Neural Computing and Applications, 2025.\u003c/li\u003e\n\u003cli\u003eDiogo, V.S., et al., \u003cem\u003eEarly diagnosis of Alzheimer\u0026rsquo;s disease using machine learning: a multi-diagnostic, generalizable approach.\u003c/em\u003e Alzheimer\u0026apos;s Research \u0026amp; Therapy, 2022. \u003cstrong\u003e14\u003c/strong\u003e(1): p. 107.\u003c/li\u003e\n\u003cli\u003eBloch, L., C.M. Friedrich, and I. for the Alzheimer\u0026rsquo;s Disease Neuroimaging, \u003cem\u003eMachine Learning Workflow to Explain Black-Box Models for Early Alzheimer\u0026rsquo;s Disease Classification Evaluated for Multiple Datasets.\u003c/em\u003e SN Computer Science, 2022. \u003cstrong\u003e3\u003c/strong\u003e(6): p. 509.\u003c/li\u003e\n\u003cli\u003eKaur, I. and R. Sachdeva, \u003cem\u003ePrediction Models for Early Detection of Alzheimer: Recent Trends and Future Prospects.\u003c/em\u003e Archives of Computational Methods in Engineering, 2025.\u003c/li\u003e\n\u003cli\u003eJumaili, M.L.F. and E. Sonu\u0026ccedil;, \u003cem\u003eML-Driven Alzheimer\u0026rsquo;s disease prediction: A deep ensemble modeling approach.\u003c/em\u003e SLAS Technology, 2025. \u003cstrong\u003e32\u003c/strong\u003e: p. 100298.\u003c/li\u003e\n\u003cli\u003eAghdam, M.A., et al., \u003cem\u003eMachine-learning models for Alzheimer\u0026rsquo;s disease diagnosis using neuroimaging data: survey, reproducibility, and generalizability evaluation.\u003c/em\u003e Brain Informatics, 2025. \u003cstrong\u003e12\u003c/strong\u003e(1): p. 8.\u003c/li\u003e\n\u003cli\u003eTran, D., et al., \u003cem\u003eA Closer Look at Spatiotemporal Convolutions for Action Recognition\u003c/em\u003e. 2018.\u003c/li\u003e\n\u003cli\u003eFolstein, M.F., S.E. Folstein, and P.R. McHugh, \u003cem\u003e\u0026ldquo;Mini-mental state\u0026rdquo;. A practical method for grading the cognitive state of patients for the clinician.\u003c/em\u003e Journal of Psychiatric Research, 1975. \u003cstrong\u003e12\u003c/strong\u003e(3): p. 189-198.\u003c/li\u003e\n\u003cli\u003eBirkenbihl, C., et al., \u003cem\u003eRethinking the residual approach: leveraging statistical learning to operationalize cognitive resilience in Alzheimer\u0026rsquo;s disease.\u003c/em\u003e Brain Informatics, 2025. \u003cstrong\u003e12\u003c/strong\u003e(1): p. 3.\u003c/li\u003e\n\u003cli\u003eLombardi, A., et al., \u003cem\u003eA robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer\u0026rsquo;s Disease.\u003c/em\u003e Brain Informatics, 2022. \u003cstrong\u003e9\u003c/strong\u003e(1): p. 17.\u003c/li\u003e\n\u003cli\u003eHajamohideen, F., et al., \u003cem\u003eFour-way classification of Alzheimer\u0026rsquo;s disease using deep Siamese convolutional neural network with triplet-loss function.\u003c/em\u003e Brain Informatics, 2023. \u003cstrong\u003e10\u003c/strong\u003e(1): p. 5.\u003c/li\u003e\n\u003cli\u003eYang, S., et al., \u003cem\u003eIntegrated space\u0026ndash;frequency\u0026ndash;time domain feature extraction for MEG-based Alzheimer\u0026rsquo;s disease classification.\u003c/em\u003e Brain Informatics, 2021. \u003cstrong\u003e8\u003c/strong\u003e(1): p. 24.\u003c/li\u003e\n\u003cli\u003eMukherji, D., et al., \u003cem\u003eEarly detection of Alzheimer\u0026rsquo;s disease using neuropsychological tests: a predict\u0026ndash;diagnose approach using neural networks.\u003c/em\u003e Brain Informatics, 2022. \u003cstrong\u003e9\u003c/strong\u003e(1): p. 23.\u003c/li\u003e\n\u003cli\u003eChoi, Y.K., et al., \u003cem\u003eConnecto-informatics at the mesoscale: current advances in image processing and analysis for mapping the brain connectivity.\u003c/em\u003e Brain Informatics, 2024. \u003cstrong\u003e11\u003c/strong\u003e(1): p. 15.\u003c/li\u003e\n\u003cli\u003eSorino, P., et al., \u003cem\u003eDetecting label noise in longitudinal Alzheimer\u0026rsquo;s data with explainable artificial intelligence.\u003c/em\u003e Brain Informatics, 2025. \u003cstrong\u003e12\u003c/strong\u003e(1): p. 15.\u003c/li\u003e\n\u003cli\u003eAhmed, M.A.O., et al., \u003cem\u003eSynergistic integration of Multi-View Brain Networks and advanced machine learning techniques for auditory disorders diagnostics.\u003c/em\u003e Brain Informatics, 2024. \u003cstrong\u003e11\u003c/strong\u003e(1): p. 3.\u003c/li\u003e\n\u003cli\u003ePaszke, A., et al., \u003cem\u003ePyTorch: An Imperative Style, High-Performance Deep Learning Library\u003c/em\u003e. 2019.\u003c/li\u003e\n\u003cli\u003eTaha, A.A. and A. Hanbury, \u003cem\u003eMetrics for evaluating 3D medical image segmentation: analysis, selection, and tool.\u003c/em\u003e BMC Medical Imaging, 2015. \u003cstrong\u003e15\u003c/strong\u003e(1): p. 29.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"brain-informatics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"brai","sideBox":"Learn more about [Brain Informatics](https://braininformatics.springeropen.com/about)","snPcode":"40708","submissionUrl":"https://submission.nature.com/new-submission/40708/3","title":"Brain Informatics","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Alzheimer’s disease, DSC-CNN, Multi-modal Fusion, ResNet3D-18","lastPublishedDoi":"10.21203/rs.3.rs-6903589/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6903589/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAlzheimer\u0026rsquo;s disease (AD) remains one of the most prevalent and challenging neurodegenerative disorders, with early diagnosis being crucial for timely intervention. In this paper, a novel dual-stream deep learning architecture, termed DSC-CNN (Dual-Stream CNN with Cognitive Embedding Fusion), has been proposed to enhance the accuracy and interpretability of AD classification. The model integrates volumetric MRI data with structured clinical metadata through two dedicated processing streams: a spatial ResNet3D-18 backbone with attention for anatomical features and a lightweight encoder for cognitive attributes. These complementary embeddings have been fused via a bilinear attention mechanism, allowing the model to capture intricate cross-modal interactions. To ensure both generalizability and transparency, the framework has incorporated intrinsic attention visualization and prototype-guided decision paths in place of traditional post-hoc explanation tools. Experiments have been conducted on the ADNI and OASIS datasets, demonstrating that the proposed DSC-CNN has achieved a classification accuracy exceeding 99.68%, outperforming several recent related methods. The model has shown particular strength in identifying early mild cognitive impairment (EMCI) cases while maintaining a compact parameter footprint, enabling efficient deployment in clinical settings. These results suggest that DSC-CNN is a robust, interpretable, and scalable solution for improving AD diagnosis.\u003c/p\u003e","manuscriptTitle":"DSC-CNN: A Dual-Stream CNN with Cognitive Embedding Fusion for Early Alzheimer’s Diagnosis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-02 09:41:49","doi":"10.21203/rs.3.rs-6903589/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-30T08:12:12+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-30T04:59:26+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-30T02:10:26+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"318076339023838665791807659061694841478","date":"2025-06-25T23:19:31+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"324282953271666330320478638633033408390","date":"2025-06-25T09:50:32+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-25T09:38:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-17T07:25:56+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-17T07:22:17+00:00","index":"","fulltext":""},{"type":"submitted","content":"Brain Informatics","date":"2025-06-16T08:48:36+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"brain-informatics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"brai","sideBox":"Learn more about [Brain Informatics](https://braininformatics.springeropen.com/about)","snPcode":"40708","submissionUrl":"https://submission.nature.com/new-submission/40708/3","title":"Brain Informatics","twitterHandle":"@SpringerOpen","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"655a181f-360a-47c5-8269-b22dfdacbef8","owner":[],"postedDate":"July 2nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-07-28T02:23:13+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-02 09:41:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6903589","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6903589","identity":"rs-6903589","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00