Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test

doi:10.21203/rs.3.rs-6894673/v1

Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test

2025 · doi:10.21203/rs.3.rs-6894673/v1

preprint OA: closed

Full text JSON View at publisher

Full text 99,514 characters · extracted from preprint-html · click to expand

Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test Junyoung Park, Eun Hyun Seo, Sunjun Kim, SangHak Yi, Kun Ho Lee, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6894673/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 Mar, 2026 Read the published version in Scientific Reports → Version 1 posted 14 You are reading this latest preprint version Abstract Drawing tests like the Rey Complex Figure Test (RCFT) are widely used to assess cognitive functions such as visuospatial skills and memory, making them valuable tools for detecting mild cognitive impairment (MCI). Despite their utility, existing predictive models based on these tests often suffer from limitations like small sample sizes and lack of external validation, which undermine their reliability. We developed a multi-stream deep learning framework that integrates two distinct processing streams: a multi-head self-attention based spatial stream using raw RCFT images and a scoring stream employing a previously developed automated scoring system. Our model was trained on data from 1,740 subjects in the Korean cohort and validated on an external hospital dataset of 222 subjects from Korea. The proposed multi-stream model demonstrated superior performance over baseline models (AUC = 0.872, Accuracy = 0.781) in external validation. The integration of both spatial and scoring streams enables the model to capture intricate visual details from the raw images while also incorporating structured scoring data, which together enhance its ability to detect subtle cognitive impairments. This dual approach not only improves predictive accuracy but also increases the robustness of the model, making it more reliable in diverse clinical settings. Our model has practical implications for clinical settings, where it could serve as a cost-effective tool for early MCI screening. Health sciences/Health care Health sciences/Health care/Public health/Population screening Physical sciences/Mathematics and computing/Statistics Rey Complex Figure Test Mild cognitive impairment prediction Multi-stream deep learning Convolutional Neural Network Multi-head self-attention Figures Figure 1 Figure 2 Figure 3 INTRODUCTION Drawing tests have been well-documented for their comprehensive assessment capabilities which include evaluating visuospatial skills, visual memory and executive function, and they are commonly used within the elderly population as a cognitive screening tool for dementia, both in clinical and research fields [ 1 ]. Among the most prominent drawing tests are the Pentagon Drawing Test (PDT), the Clock Drawing Test (CDT), and the Rey Complex Figure Test (RCFT). The PDT, for example, requires participants to draw two intersecting pentagons with scoring typically binary (fail or success) [ 2 ]. The CDT assesses executive function and visuospatial skills by having subjects draw a clock face set to a specific time, with scoring methods varying significantly – from a binary system to detailed point assignments based on accuracy of contour, number sequence, and hand placement [ 3 – 5 ]. The RCFT, designed by Rey [ 6 ], challenges participants to copy and recall a complex figure, with a widely used 36-point scoring system developed by Osterrieth [ 7 ]. Recent advancements have seen the application of machine learning approaches to enhance the predictive accuracy of cognitive status from these tests. This is particularly valuable because of the simplicity of administering drawing tests, which could be useful for screening early stages of dementia in clinical fields. For example, deep-learning approaches have been utilized for the digitized PDT[ 8 ], CDT [ 9 ] and RCFT[ 10 ] to predict MCI and CN patients. Additionally, multi-dimensional kinematic parameters extracted from a digital pen and tablet during RCFT were analyzed using logistic regression [ 11 ]. However, there are some limitations in previous studies. Primarily, most of these studies had small samples sizes and lacked an external test set, which undermined the reliability of model performances. Even in cases where sample sizes were not small, the performance of models was not sufficiently robust for screening early stages of dementia. This could be attributed to the challenges inherent in utilizing image data in deep learning models. For instance, image data often contains a vast amount of information but can also be prone to noise due to its high dimensionality [ 12 , 13 ]. Moreover, image data encompasses diverse patterns and features, making it challenging for models to learn effectively, especially when sample sizes are not significantly large [ 14 ]. In this paper, we proposed a novel multi-stream deep learning network that combines a spatial stream with raw image data and a scoring stream utilizing an automated scoring system developed in a previous study [ 15 ]. The proposed model was implemented by using a total 1,740 subjects (CN 947, MCI 793) to train a deep learning model for distinguishing MCI patients from CN subjects. Additional 222 subjects (CN 106, MCI 116) were utilized as an external dataset to improve the reliability of the model performance. MATERIALS AND METHODS Datasets The study was approved by the Institutional Review Boards of Chonnam National University Hospital (CNUH-2019‐279) and Wonkwang University Hospital (2022-01-024-004). All research was performed in accordance with relevant guidelines and regulations, including the Declaration of Helsinki. Informed consent was obtained from all participants and/or their legal guardians. GARD cohort We enrolled 1,740 subjects from the Gwangju Alzheimer’s and Related Dementia (GARD) cohort registry at Chosun University in Gwangju, Korea during 2015–2019. The diagnostic criteria for CN and MCI have been described in Seo et al. [ 16 ]. Briefly, CN subjects were included if they were aged 60 or older, had a Clinical Dementia Rating (CDR) score of 0, and exhibited normal cognitive function, with all neuropsychological test z-scores above − 1.5 × standard deviation (SD) based on age, education, and gender norms. MCI patients were aged 60 or older, had a CDR score of 0.5, and met the MCI criteria established by [ 17 ]. WUH cohort The Wonkwang University Hospital (WUH) cohort includes 106 CN subjects and 116 MCI patients enrolled between 2017 and 2022. In alignment with our training set criteria, subjects were classified based on their CDR scores: a CDR score of 0 indicated a CN diagnosis, while a score of 0.5 indicated MCI. Deep learning architecture Figure 1 A provides an overview of the proposed method. Our model predicts the probability of an individual being classified as a MCI patients using three pre-processed RCFT images along with age, sex and years of education. The pre-processing method for the RCFT images follows the protocol outlined by Park et al. [ 15 ]. Our prediction model employs a dual-stream architecture: a spatial stream and a scoring stream. Both streams process data through softmax functions, and their outputs are merged using average fusion to yield the final classification probability. In the spatial stream, each 512x512 image is input into a CNN model that uses EfficientNet [ 18 ] as its backbone. We selected EfficientNet-B2 for its efficiency and suitability in medical applications, given its lower parameter count and adequate performance with limited datasets. EfficinetNet-B2 incorporates a 3x3 convolution layer followed by multiple 3x3 and 5x5 mobile inverted bottleneck convolution (MBConv) blocks, a design borrowed from MobileNet [ 19 ] (Fig. 1 B). Post-CNN, the feature map are flattened, and a multi-head self-attention layer is applied, enhancing the model’s focus on significant spatial region. The multi-head self-attention mechanism, as defined by [ 20 ], combines multiple self-attention layers to capture diverse features, expressed as: $$\:MultiHead\left(Q,K,V\right)=Concat\left({head}_{1},\:\dots\:\:,\:{head}_{h}\right){W}^{O},\:$$ $\:{head}_{i}=Attention(Q{W}_{i}^{Q},\:\:K{W}_{i}^{K},\:V{W}_{i}^{V}$ ) where $\:Q$ , $\:K$ , $\:V$ are the query, key and value matrix, respectively, and we use four attention heads ( $\:h$ =4). The outputs from multi-head self-attention layers are integrated and processed through two fully connected (FC) layers followed by a softmax function. Conversely, the scoring stream uses a previously developed deep learning model [ 15 ] to predict RCFT scores. The scores for three images, along with demographic data, are concatenated and passed through an FC layer with a softmax function. It is important to note that the scoring model’s weights remain fixed during training, preventing updates. [Figure ] Baseline models The proposed model was evaluated against four baseline models: three logistic regression models and one deep learning model. The first baseline model utilized MMSE scores. The second and third models used three RCFT scores, scored by trained experts and a previous AI scoring system, respectively. The final baseline was a deep learning model, which solely utilized the spatial stream network. All baseline models included age, sex and years of education as covariates. Scoring validation To mitigate human errors in scoring, scanning and digitizing, we tailored our AI scoring system specifically for the external test set to enhance data quality. For images where the difference between the human expert scores and AI-generated scores exceeded ten points, we conducted a re-examination by trained human experts. Following this, we compared the AI-generated scores with these newly corrected scores to ensure accuracy and reliability. Experiments We conducted prediction model building and performance evaluation using data from GARD and WUH cohort. GARD cohort was employed to construct the prediction model. Throughout the training process, we utilized the binary cross-entropy as the loss function and the Adam optimizer was adopted to minimize the loss function. To prevent overfitting, we reduced the initial learning rate to 10% every five epochs and implemented early stopping if there was no improvement in validation loss after 30 epochs, ensuring that the final model weights selected corresponded to the lowest validation loss. To evaluate our model’s performance, GARD cohort was randomly divided into training, validation and test sets with 6:2:2 ratio. This division process was repeated fifty times. External validation was performed using WUH cohort. Model performance was assessed using the area under receiver operating characteristics (AUC), the accuracy (ACC), sensitivity (SEN) and specificity (SPE). All experiments were conducted using the Pytorch library (v 2.0.0) in Python (v 3.8.8) with NVIDIA 1080ti GPUs with 48 GB of memory per GPU. RESULTS Characteristics Table 1 summarizes the clinical characteristics of subjects in the GARD and WUH cohort datasets. In the GARD dataset, the average ages were 71.8 ( $\:\pm\:$ 6.1) years for CN subjects and 73.5 ( $\:\pm\:$ 6.4) years for MCI patients (P < 0.01). Education levels and MMSE scores also significantly differed between CN subjects (education level: 10.4 $\:\pm\:$ 4.6; MMSE score: 27.5 $\:\pm\:$ 2.1) and MCI patients (9.8 $\:\pm\:$ 4.7; 25.5 $\:\pm\:$ 3.1) (P<0.01). Similarly, sex ratios exhibited comparable trends in both groups. Conversely, the WUH dataset revealed no significant differences in the average ages between CN (69.9 $\:\pm\:$ 7.7) subjects and MCI (71.4 $\:\pm\:$ 8.3) patients (P>0.05), nor were there differences in education levels between CN (8.7 $\:\pm\:$ 4.2) and MCI (9.2 $\:\pm\:$ 4.5) groups (P>0.05). Comparing the two datasets, the external test set consistently showed lower age, education level, and RCFT scores across both groups, with the exception of the education level and RCFT copy score in CN group of the GARD dataset. Table 1 Descriptive statistics. A dataset of 1,740 subjects from the Gwangju Alzheimer’s and Related Dementia (GARD) cohort was used for training, and an external test set of 222 subjects from Wonkwang University Hospital (WUH) was used for validation. Dataset GARD (training) WUH (test) Total (N = 1,740) CN (N = 947) MCI (N = 793) Total (N = 222) CN (N = 106) MCI (N = 116) Age (years) 72.6 (6.3) 71.8 (6.1) 73.5 (6.4) 70.7 (8.0) 69.9 (7.7) 71.4 (8.3) Sex N (female, %) 743 (57.3%) 378 (60.1%) 365 (54.0%) 140 (63.0%) 76 (71.7%) 64 (55.2%) Education (years) 10.1 (4.6) 10.4 (4.5) 9.8 (4.7) 9.0 (4.4) 8.7 (4.2) 9.2 (4.5) MMSE scores 26.5 (2.8) 27.5 (2.1) 25.5 (3.1) - - - RCFT Score copy 32.0 (5.1) 33.6 (3.0) 30.2 (6.4) 30.6 (9.1) 33.7 (4.2) 27.8 (11.2) immediate 12.8 (7.4) 15.7 (6.7) 9.3 (6.6) 9.2 (8.5) 12.7 (8.2) 6.1 (7.4) delayed 12.7 (7.2) 15.7 (6.3) 9.1 (6.7) 8.5 (8.5) 11.9 (8.1) 5.3 (7.5) Scoring validation The initial correlation ( $\:{R}^{2}$ ) between scores by AI and those by experts was 0.81, with a mean absolute error (MAE) of 3.0 point (Fig. 2 A). Discrepancies exceeding 10 points between the ground truths and predicted scores were identified in 30 images. Upon validation, scores for 26 of these images were corrected. After these adjustments, the correlation improved significantly to an $\:{R}^{2}$ of 0.95 with an MAE = 2.0 (Fig. 2 B). [Figure 2 ] Comparison of model performance via internal test using GARD cohort We evaluated the classification performances of five models, including three that incorporated the proposed method. These models are: 1) logistic regression using MMSE scores; 2) logistic regression using RCFT scores assessed by experts; 3) logistic regression using RCFT scores predicted by the AI model; 4) deep learning model utilizing only spatial stream network; 5) deep learning model employing multi stream networks. The mean performances of those models are shown in Table 2 (A). Table 2. Results of model prediction performance . The baseline models consisted of three logistic regression models using MMSE scores, RCFT scores by experts, and RCFT scores by a previous AI model, respectively, and one deep learning model that utilized only the spatial stream network. All baseline models included chronological age, sex, and education as covariates. The data was split into 6:2:2 (training, validation, and testing sets), and this process was repeated 50 times. (A) Internal test using the GARD cohort dataset. Input modality GARD AUC Accuracy Sensitivity Specificity MMSE scores 0.714 [0.706-0.721] 0.660 [0.652-0.667] 0.625 [0.613-0.636] 0.694 [0.685-0.704] RCFT scores by experts 0.776 [0.768-0.782] 0.705 [0.699-0.712] 0.700 [0.689-0.711] 0.711 [0.700-0.722] RCFT scores by AI 0.777 [0.770-0.783] 0.710 [0.703-0.717] 0.699 [0.689-0.709] 0.721 [0.710-0.731] Only RCFT images 0.803 [0.768-0.837] 0.731 [0.702-0.761] 0.701 [0.661-0.741] 0.762 [0.720-0.804] Image + score by AI (Our method) 0.852 [0.837-0.869] 0.771 [0.755-0.787] 0.742 [0.718-0.767] 0.800 [0.774-0.823] (B) External test using the WUH cohort dataset. RCFT scores by experts a refers to the model using scores from the initial dataset before QC, while RCFT scores by experts b indicates the model with the validated dataset after QC based on the re-rated RCFT scores using a previous AI model. Input modality WUH AUC Accuracy Sensitivity Specificity RCFT scores by experts a 0.750 [0.750-0.751] 0.709 [0.707-0.712] 0.832 [0.829-0.835] 0.575 [0.571-0.579] RCFT scores by experts b 0.813 [0.812-0.814] 0.750 [0.748-0.753] 0.849 [0.845-0.852] 0.643 [0.639-0.648] RCFT scores by AI 0.804 [0.803-0.805] 0.722 [0.721-0.725] 0.799 [0.797-0.802] 0.639 [0.634-0.644] Only RCFT images 0.837 [0.814-0.860] 0.744 [0.719-0.768] 0.743 [0.690-0.800] 0.745 [0.697-0.792] Image + score by AI (Our method) 0.872 [0.862-0.882] 0.781 [0.768-0.795] 0.836 [0.807-0.864] 0.722 [0.687-0.757] The logistic regression model with MMSE scores demonstrated the lowest performance, with an AUC of 0.714 [95% confidence interval: 0.706–0.712], an ACC of 0.660 [0.652–0.667], SEN of 0.625 [0.613–0.636] and SPE of 0.694 [0.685–0.704]. The logistic regression model using expert-assessed RCFT scores recorded an AUC of 0.776 [0.768–0.782], an ACC of 0.705 [0.699–0.712], an SEN of 0.700 [0.689–0.711] and an SPE of 0.71 [0.700-0.722]; the performance of the model using AI-predicted RCFT scores was similar, with an AUC of 0.777 [0.770–0.783], ACC of 0.710 [0.703–0.717], SEN of 0.699[0.689–0.709] and SPE of 0.721 [0.710–0.731]. Performance improvements were evident with the spatial stream network model, which achieved an AUC of 0.803 [0.768–0.837], ACC of 0.731 [0.702–0.761], SEN of 0.701 [0.661–0.741] and SPE of 0.762[0.720–0.804]. Finally, our proposed deep learning model using the two-stream network outperformed all baseline models across all metrics, with an AUC of 0.852 [0.837–0.869], ACC of 0.771 [0.755–0.787], SEN of 0.742 [0.718–0.767] and SPE of 0.800 [0.774–0.823]. External validation using WUH cohort Performance metrics for the trained models on this set are detailed in Table 2 (B). The logistic regression model using expert-rated RCFT scores from the initial dataset demonstrated an AUC of 0.750 [0.750–0.751], ACC of 0.709 [0.707–0.712], SEN of 0.832 [0.829–0.835] and SPE of 0.575 [0.571–0.579]. With the validated dataset based on the re-rated RCFT scores, the model’s performance improved to an AUC of 0.813 [0.812–0.814], ACC of 0.750 [0.748–0.753], SEN of 0.799 [0.718–0.767] and SPE of 0.800 [0.774–0.823]. The logistic model with AI-predicted RCFT scores displayed comparable performance to that of human experts (AUC = 0.804[0.803–0.805], ACC = 0.722[0.721–0.725], SEN = 0.799[0.797–0.802] and SPE = 0.639[0.634–0.722]). The deep learning model employing the spatial stream network achieved a higher AUC (0.837[0.814–0.860]), ACC (0.744[0.719–0.768]) and SPE (0.745[0.697–0.792]) but had a lower SEN (0.743[0.690–0.800]). Our proposed deep learning method using the two-stream network outperformed all baseline models, showing superior performance across all metrics: AUC = 0.872[0.862–0.882], ACC = 0.781[0.768–0.795], SEN = 0.836[0.807–0.864] and SPE = 0.722[0.687–0.757]. [Figure 3 ] DISCUSSION In this article, we developed a multi-stream deep learning network to differentiate between MCI patients and CN subjects. Our approach surpasses previous methods utilizing drawing test (PDT, CDT and RCFT) by leveraging a larger sample size and an external test set, thereby enhancing the robustness and performance of the model. Notably, our model outperformed existing studies, achieving the highest recorded performance metrics. Our multi-stream network combines both the scoring stream and spatial stream. The scoring stream incorporates an AI scoring system for RCFT, which save time and human resources while proactively preventing human errors, thus improving accuracy. This improvement was evidenced by results showing that the model, when AI scoring was used for QC, exhibited much higher performance compared to the model performance using the initial expert-assessed RCFT scores without QC. Furthermore, while it takes approximately 5 minutes for an expert to score one subject, our AI scoring system takes only 10 seconds. The spatial stream of our model utilizes raw RCFT images as input, and extracts subtle details within the images, such as pen thickness and shape, which are not captured by the human scoring system (ranging from 0–36 points). This leads to substantial improvement in performance compared to models that rely solely on scoring. However, although raw image data is rich with information, it also includes considerable noise; therefore, the integration of multi-head self-attention layers helps the model to prioritize crucial spatial regions within the feature map, boosting performance. However, models that rely solely on raw images have shown higher SDs in performance compared to logistic models utilizing scores, and the performance of the spatial stream network may be compromised due to differences in resolution between existing training images and new test images. By combining the advantages of both scoring stream network, which utilizes human scoring systems, and the spatial stream network, which processes images, our proposed method achieves high and robust performance. The proposed method offers a cost-effective and efficient screening tool for MCI patients at the medical check-up centers. Currently, the MMSE is the most popularly utilized screening tool, known for its simplicity and quick administration time of approximately 5–10 minutes [ 2 ]. However, our results indicate that MMSE is less informative for predicting MCI and lacked accuracy in distinguishing between CN subjects and MCI patients (AUC = 0.714). Another study reported MMSE performance with an AUC of 0.733 (N = 2,577) [ 8 ]. In contrast, comprehensive cognitive function tests such as Neuropsychological Test Battery are more time-consuming, taking up 2 hours to administer [ 21 ] and pose challenges in examining multiple subjects due to the additional time required for scoring and interpretation. Although the RCFT requires more times than the MMSE, approximately 30 minutes including a 20-minute delay interval [ 22 ], our model based on the RCFT significantly outperformed that of the MMSE (AUC > 0.85). Furthermore, since our model does not necessitate additional time for expert scoring, it is highly efficient compared to other cognitive function tests that rely on expert scoring. Despite the flexibility of the proposed method, our study had some limitations and areas for future development. First, we did not incorporate additional ancillary information beyond the raw images. Recent studies have shown that kinematic data such as pressure, velocity, time which cannot be captured by traditional paper-and-pencil drawing tests but recorded by tabled-based tests revealed significant differences between case and control groups. These parameters suggest potentially useful covariates to enhance the performance of prediction models [ 11 , 23 ]. We have developed a tablet-based application that administers the RCFT, records the drawing process and extracts kinematic parameters. By incorporating this information, further improvement may be possible. Second, verbal tests have also played a crucial role in neuropsychological evaluation [ 24 ]. Recent advancements in automatic speech recognition technology, such as BERT [ 25 ], have enabled the exploration of speech-based methods for AD detection [ 26 , 27 ]. For future work, we plan to develop tablet-based, fully automated memory tests that integrate both visual and verbal assessments. In conclusion, our multi-stream deep learning network outperformed previous studies in distinguishing MCI patients from CN subjects. By integrating human scoring systems and image-based information, our model demonstrated robust performance across internal and external datasets. Our findings suggest potential clinical utility as a time-efficient screening tool for cognitive impairment. Declarations ACKNOWLEDGEMENT Not applicable FUNDING This work was supported by the Technology Innovation Program (20022810, Development and Demonstration of a Digital System for the evalution of geriatric Cognitive impairment) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea), and by the “Korea National Institute of Health”(KNIH) research project No.#2024ER210800 AVAILABILITY OF DATA AND MATERIALS The dataset for the current study is not publicly available but is available from the corresponding author upon reasonable request. AUTHOR CONTRIBUTIONS J.P. performed all data preprocessing, experiments, and manuscript writing as the first author. E.H.S., S.Y., and K.H.L. provided the data. S.K. conducted model evaluation. S.W. organized and supervised the study. All authors reviewed and approved the submitted version of the manuscript. ETHICS APPROVAL The study was approved by the Institutional Review Boards of Chonnam National University Hospital (CNUH‐2019‐279) and Wonkwang University Hospital (2022-01-024-004). Written informed consent was obtained from each participant or their legal guardian. CONSENT FOR PUBLICATION Not applicable. COMPETING INTERESTS The authors declare that they have no competing interests. References AGRELL, B. and O. DEHLIN, The clock-drawing test. Age and Ageing, 1998. 27 (3): p. 399-403. Folstein, M.F., S.E. Folstein, and P.R. McHugh, “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 1975. 12 (3): p. 189-198. Nasreddine, Z.S., et al., The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. J Am Geriatr Soc, 2005. 53 (4): p. 695-9. Darvesh, S., et al., The behavioural neurology assessment. Can J Neurol Sci, 2005. 32 (2): p. 167-77. Mendez, M.F., T. Ala, and K.L. Underwood, Development of scoring criteria for the clock drawing task in Alzheimer's disease. J Am Geriatr Soc, 1992. 40 (11): p. 1095-9. Rey, A., L'examen psychologique dans les cas d'encephalopathie traumatique. Archives de psychologie, 1941. 28 : p. 286-340. Osterrieth, P.A., Le test de copie d'une figure complexe; contribution a l'etude de la perception et de la memoire. Archives de psychologie, 1944. Tasaki, S., et al., Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons. NPJ Digital Medicine, 2023. 6 (1): p. 157. Ruengchaijatuporn, N., et al., An explainable self-attention deep neural network for detecting mild cognitive impairment using multi-input digital drawing tasks. Alzheimers Res Ther, 2022. 14 (1): p. 111. Cheah, W.-T., et al. A screening system for mild cognitive impairment based on neuropsychological drawing test and neural network . in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) . 2019. IEEE. Zhang, X., et al., A tablet-based multi-dimensional drawing system can effectively distinguish patients with amnestic MCI from healthy individuals. Scientific Reports, 2024. 14 (1): p. 982. Pintelas, E., I.E. Livieris, and P.E. Pintelas, A Convolutional Autoencoder Topology for Classification in High-Dimensional Noisy Image Datasets. Sensors, 2021. 21 (22): p. 7731. Jia, W., et al., Feature dimensionality reduction: a review. Complex & Intelligent Systems, 2022. 8 (3): p. 2663-2693. Balki, I., et al., Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review. Canadian Association of Radiologists Journal, 2019. 70 (4): p. 344-353. Park, J.Y., et al., Automating Rey Complex Figure Test scoring using a deep learning-based approach: a potential large-scale screening tool for cognitive decline. Alzheimer's Research & Therapy, 2023. 15 (1): p. 145. Seo, E.H., et al., Visuospatial memory impairment as a potential neurocognitive marker to predict tau pathology in Alzheimer’s continuum. Alzheimer's Research & Therapy, 2021. 13 : p. 1-14. Winblad, B., et al., Mild cognitive impairment–beyond controversies, towards a consensus: report of the International Working Group on Mild Cognitive Impairment. Journal of internal medicine, 2004. 256 (3): p. 240-246. Tan, M. and Q. Le. Efficientnet: Rethinking model scaling for convolutional neural networks . in International conference on machine learning . 2019. PMLR. Sandler, M., et al. Mobilenetv2: Inverted residuals and linear bottlenecks . in Proceedings of the IEEE conference on computer vision and pattern recognition . 2018. Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30 . Ryu, H.J. and D.W. Yang, The Seoul Neuropsychological Screening Battery (SNSB) for Comprehensive Neuropsychological Assessment. Dement Neurocogn Disord, 2023. 22 (1): p. 1-15. Shin, M.S., et al., Clinical and empirical applications of the Rey-Osterrieth Complex Figure Test. Nat Protoc, 2006. 1 (2): p. 892-9. Kim, K.W., et al., A comprehensive evaluation of the process of copying a complex figure in early-and late-onset Alzheimer disease: a quantitative analysis of digital pen data. Journal of medical internet research, 2020. 22 (8): p. e18136. Knopman, D.S. and S. Ryberg, A verbal memory test with high predictive accuracy for dementia of the Alzheimer type. Archives of neurology, 1989. 46 (2): p. 141-145. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Balagopalan, A., et al., Comparing pre-trained and feature-based models for prediction of Alzheimer's disease based on speech. Frontiers in aging neuroscience, 2021. 13 : p. 635945. Pappagari, R., et al. Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios . in Interspeech . 2021. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 01 Mar, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 07 Nov, 2025 Reviews received at journal 06 Nov, 2025 Reviewers agreed at journal 15 Oct, 2025 Reviewers agreed at journal 16 Sep, 2025 Reviewers agreed at journal 15 Sep, 2025 Reviews received at journal 18 Aug, 2025 Reviewers agreed at journal 10 Aug, 2025 Reviewers agreed at journal 23 Jul, 2025 Reviewers agreed at journal 23 Jul, 2025 Reviewers invited by journal 17 Jul, 2025 Editor assigned by journal 17 Jul, 2025 Editor invited by journal 23 Jun, 2025 Submission checks completed at journal 20 Jun, 2025 First submitted to journal 20 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6894673","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":488672854,"identity":"a61a92f5-8020-4866-8f84-8ef672a0ee6a","order_by":0,"name":"Junyoung Park","email":"","orcid":"","institution":"Stanford University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Junyoung","middleName":"","lastName":"Park","suffix":""},{"id":488672855,"identity":"29c21183-d689-4cfb-b319-bea14544d05f","order_by":1,"name":"Eun Hyun Seo","email":"","orcid":"","institution":"Chosun University","correspondingAuthor":false,"prefix":"","firstName":"Eun","middleName":"Hyun","lastName":"Seo","suffix":""},{"id":488672856,"identity":"5088b0d9-e373-4631-830e-c6557d8b25ef","order_by":2,"name":"Sunjun Kim","email":"","orcid":"","institution":"Neurozen Inc","correspondingAuthor":false,"prefix":"","firstName":"Sunjun","middleName":"","lastName":"Kim","suffix":""},{"id":488672857,"identity":"3dddf8f4-2612-433e-beb7-9be608782baf","order_by":3,"name":"SangHak Yi","email":"","orcid":"","institution":"Wonkwang University Hospital, Wonkwang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"SangHak","middleName":"","lastName":"Yi","suffix":""},{"id":488672858,"identity":"a9515a60-6d28-4ed5-9529-ebf096c1e390","order_by":4,"name":"Kun Ho Lee","email":"","orcid":"","institution":"Chosun University","correspondingAuthor":false,"prefix":"","firstName":"Kun","middleName":"Ho","lastName":"Lee","suffix":""},{"id":488672859,"identity":"1c7e3b66-1244-4828-9153-48a0e5cf32c4","order_by":5,"name":"Sungho Won","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAo0lEQVRIiWNgGAWjYJCCAx+gDMYGInUwHpxBqhbmwzwkaTE43nvgsM2fO4kN7IcfMM7cQ4yWM+cSDue2PUts4EkzYNzwjAgtZjdyDA7nNhxObGDIYWB8cIBYLRZ/gFr435CihYENqEUCaMsGYrTYnzljcLC37bBxm8Qzg4MziNEi2d5j/OHHn8Oy/fzJDx/2EKMFDtiAmCQNo2AUjIJRMArwAAAUVz3KJtw6TwAAAABJRU5ErkJggg==","orcid":"","institution":"RexSoft Inc","correspondingAuthor":true,"prefix":"","firstName":"Sungho","middleName":"","lastName":"Won","suffix":""}],"badges":[],"createdAt":"2025-06-14 15:08:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6894673/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6894673/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-34491-5","type":"published","date":"2026-03-01T15:57:25+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":87344780,"identity":"4b70ae26-3ee1-4c03-b631-4a492daa4139","added_by":"auto","created_at":"2025-07-23 02:07:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":115317,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModel Architecture. \u003c/strong\u003e(A) Overall model architecture featuring a dual-stream design: a spatial stream with EfficientNet-B2 and a scoring stream using a pre-trained model for RCFT scoring. The combined outputs undergo average fusion to produce the final CN or MCI classification. (B) Detailed architecture of the EfficientNet-B2 model used in the spatial stream, including convolutional layers and Mobile Inverted Bottleneck Convolution (MBConv) layers.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6894673/v1/1a06777ca89bbc6e6825034b.png"},{"id":87344783,"identity":"339539c1-7301-4d7a-bdb1-ec4e58b03764","added_by":"auto","created_at":"2025-07-23 02:07:03","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":170178,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparative validation of AI-assessed and expert-assessed scores.\u003c/strong\u003e (A) Results before quality control (QC), where AI-generated scores from a pre-trained model for RCFT scoring were compared to human expert scores. Significant discrepancies (greater than ten points) between the AI-generated scores and human expert scores (highlighted in red) led to re-examination by trained experts. (B) Results after QC, showing improved \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;between AI-generated and expert-corrected scores following the re-examination process.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6894673/v1/5d4b1d851c6ef4e2254ef59a.png"},{"id":87344782,"identity":"e6f8e52f-a0eb-401f-9ac3-6322b26ad396","added_by":"auto","created_at":"2025-07-23 02:07:03","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":22746,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eROC curve for external test set (WUH cohort dataset). \u003c/strong\u003eThe ROC curve is plotted using the median AUC results from 50 bootstrap samples, illustrating the performance of different models.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6894673/v1/d84a006324a9bb79310f4267.png"},{"id":103765436,"identity":"20678051-3ebb-483e-8433-82fb5edde6c7","added_by":"auto","created_at":"2026-03-02 16:01:28","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1175730,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6894673/v1/304ab90a-e081-46c8-bd35-770360480406.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eDrawing tests have been well-documented for their comprehensive assessment capabilities which include evaluating visuospatial skills, visual memory and executive function, and they are commonly used within the elderly population as a cognitive screening tool for dementia, both in clinical and research fields [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Among the most prominent drawing tests are the Pentagon Drawing Test (PDT), the Clock Drawing Test (CDT), and the Rey Complex Figure Test (RCFT). The PDT, for example, requires participants to draw two intersecting pentagons with scoring typically binary (fail or success) [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. The CDT assesses executive function and visuospatial skills by having subjects draw a clock face set to a specific time, with scoring methods varying significantly \u0026ndash; from a binary system to detailed point assignments based on accuracy of contour, number sequence, and hand placement [\u003cspan additionalcitationids=\"CR4\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. The RCFT, designed by Rey [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], challenges participants to copy and recall a complex figure, with a widely used 36-point scoring system developed by Osterrieth [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eRecent advancements have seen the application of machine learning approaches to enhance the predictive accuracy of cognitive status from these tests. This is particularly valuable because of the simplicity of administering drawing tests, which could be useful for screening early stages of dementia in clinical fields. For example, deep-learning approaches have been utilized for the digitized PDT[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], CDT [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] and RCFT[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] to predict MCI and CN patients. Additionally, multi-dimensional kinematic parameters extracted from a digital pen and tablet during RCFT were analyzed using logistic regression [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eHowever, there are some limitations in previous studies. Primarily, most of these studies had small samples sizes and lacked an external test set, which undermined the reliability of model performances. Even in cases where sample sizes were not small, the performance of models was not sufficiently robust for screening early stages of dementia. This could be attributed to the challenges inherent in utilizing image data in deep learning models. For instance, image data often contains a vast amount of information but can also be prone to noise due to its high dimensionality [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Moreover, image data encompasses diverse patterns and features, making it challenging for models to learn effectively, especially when sample sizes are not significantly large [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn this paper, we proposed a novel multi-stream deep learning network that combines a spatial stream with raw image data and a scoring stream utilizing an automated scoring system developed in a previous study [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The proposed model was implemented by using a total 1,740 subjects (CN 947, MCI 793) to train a deep learning model for distinguishing MCI patients from CN subjects. Additional 222 subjects (CN 106, MCI 116) were utilized as an external dataset to improve the reliability of the model performance.\u003c/p\u003e"},{"header":"MATERIALS AND METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eDatasets\u003c/h2\u003e\u003cp\u003e The study was approved by the Institutional Review Boards of Chonnam National University Hospital (CNUH-2019‐279) and Wonkwang University Hospital (2022-01-024-004). All research was performed in accordance with relevant guidelines and regulations, including the Declaration of Helsinki. Informed consent was obtained from all participants and/or their legal guardians.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eGARD cohort\u003c/h3\u003e\n\u003cp\u003eWe enrolled 1,740 subjects from the Gwangju Alzheimer\u0026rsquo;s and Related Dementia (GARD) cohort registry at Chosun University in Gwangju, Korea during 2015\u0026ndash;2019. The diagnostic criteria for CN and MCI have been described in Seo et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Briefly, CN subjects were included if they were aged 60 or older, had a Clinical Dementia Rating (CDR) score of 0, and exhibited normal cognitive function, with all neuropsychological test z-scores above \u0026minus;\u0026thinsp;1.5 \u0026times; standard deviation (SD) based on age, education, and gender norms. MCI patients were aged 60 or older, had a CDR score of 0.5, and met the MCI criteria established by [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e\n\u003ch3\u003eWUH cohort\u003c/h3\u003e\n\u003cp\u003eThe Wonkwang University Hospital (WUH) cohort includes 106 CN subjects and 116 MCI patients enrolled between 2017 and 2022. In alignment with our training set criteria, subjects were classified based on their CDR scores: a CDR score of 0 indicated a CN diagnosis, while a score of 0.5 indicated MCI.\u003c/p\u003e\n\u003ch3\u003eDeep learning architecture\u003c/h3\u003e\n\u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA provides an overview of the proposed method. Our model predicts the probability of an individual being classified as a MCI patients using three pre-processed RCFT images along with age, sex and years of education. The pre-processing method for the RCFT images follows the protocol outlined by Park et al. [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Our prediction model employs a dual-stream architecture: a spatial stream and a scoring stream. Both streams process data through softmax functions, and their outputs are merged using average fusion to yield the final classification probability. In the spatial stream, each 512x512 image is input into a CNN model that uses EfficientNet [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] as its backbone. We selected EfficientNet-B2 for its efficiency and suitability in medical applications, given its lower parameter count and adequate performance with limited datasets. EfficinetNet-B2 incorporates a 3x3 convolution layer followed by multiple 3x3 and 5x5 mobile inverted bottleneck convolution (MBConv) blocks, a design borrowed from MobileNet [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). Post-CNN, the feature map are flattened, and a multi-head self-attention layer is applied, enhancing the model\u0026rsquo;s focus on significant spatial region. The multi-head self-attention mechanism, as defined by [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], combines multiple self-attention layers to capture diverse features, expressed as:\u003c/p\u003e\u003cp\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:MultiHead\\left(Q,K,V\\right)=Concat\\left({head}_{1},\\:\\dots\\:\\:,\\:{head}_{h}\\right){W}^{O},\\:$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{head}_{i}=Attention(Q{W}_{i}^{Q},\\:\\:K{W}_{i}^{K},\\:V{W}_{i}^{V}\$\u003c/span\u003e\u003c/span\u003e)\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Q\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:K\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:V\$\u003c/span\u003e\u003c/span\u003e are the query, key and value matrix, respectively, and we use four attention heads (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:h\$\u003c/span\u003e\u003c/span\u003e=4). The outputs from multi-head self-attention layers are integrated and processed through two fully connected (FC) layers followed by a softmax function.\u003c/p\u003e\u003cp\u003eConversely, the scoring stream uses a previously developed deep learning model [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] to predict RCFT scores. The scores for three images, along with demographic data, are concatenated and passed through an FC layer with a softmax function. It is important to note that the scoring model\u0026rsquo;s weights remain fixed during training, preventing updates.\u003c/p\u003e\n\u003ch3\u003e[Figure ]\u003c/h3\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eBaseline models\u003c/h2\u003e\u003cp\u003eThe proposed model was evaluated against four baseline models: three logistic regression models and one deep learning model. The first baseline model utilized MMSE scores. The second and third models used three RCFT scores, scored by trained experts and a previous AI scoring system, respectively. The final baseline was a deep learning model, which solely utilized the spatial stream network. All baseline models included age, sex and years of education as covariates.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eScoring validation\u003c/h3\u003e\n\u003cp\u003eTo mitigate human errors in scoring, scanning and digitizing, we tailored our AI scoring system specifically for the external test set to enhance data quality. For images where the difference between the human expert scores and AI-generated scores exceeded ten points, we conducted a re-examination by trained human experts. Following this, we compared the AI-generated scores with these newly corrected scores to ensure accuracy and reliability.\u003c/p\u003e\n\u003ch3\u003eExperiments\u003c/h3\u003e\n\u003cp\u003eWe conducted prediction model building and performance evaluation using data from GARD and WUH cohort. GARD cohort was employed to construct the prediction model. Throughout the training process, we utilized the binary cross-entropy as the loss function and the Adam optimizer was adopted to minimize the loss function. To prevent overfitting, we reduced the initial learning rate to 10% every five epochs and implemented early stopping if there was no improvement in validation loss after 30 epochs, ensuring that the final model weights selected corresponded to the lowest validation loss.\u003c/p\u003e\u003cp\u003eTo evaluate our model\u0026rsquo;s performance, GARD cohort was randomly divided into training, validation and test sets with 6:2:2 ratio. This division process was repeated fifty times. External validation was performed using WUH cohort. Model performance was assessed using the area under receiver operating characteristics (AUC), the accuracy (ACC), sensitivity (SEN) and specificity (SPE).\u003c/p\u003e\u003cp\u003eAll experiments were conducted using the Pytorch library (v 2.0.0) in Python (v 3.8.8) with NVIDIA 1080ti GPUs with 48 GB of memory per GPU.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003eCharacteristics\u003c/h2\u003e\n \u003cp\u003eTable \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the clinical characteristics of subjects in the GARD and WUH cohort datasets. In the GARD dataset, the average ages were 71.8 (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e6.1) years for CN subjects and 73.5 (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e6.4) years for MCI patients (P\u0026thinsp;\u0026lt;\u0026thinsp;0.01). Education levels and MMSE scores also significantly differed between CN subjects (education level: 10.4\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e4.6; MMSE score: 27.5\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e2.1) and MCI patients (9.8\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e4.7; 25.5\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e3.1) (P\u0026lt;0.01). Similarly, sex ratios exhibited comparable trends in both groups. Conversely, the WUH dataset revealed no significant differences in the average ages between CN (69.9\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e7.7) subjects and MCI (71.4\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e8.3) patients (P\u0026gt;0.05), nor were there differences in education levels between CN (8.7\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e4.2) and MCI (9.2\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\pm\\:\$\u003c/span\u003e\u003c/span\u003e4.5) groups (P\u0026gt;0.05). Comparing the two datasets, the external test set consistently showed lower age, education level, and RCFT scores across both groups, with the exception of the education level and RCFT copy score in CN group of the GARD dataset.\u003c/p\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003e\u003cstrong\u003eDescriptive statistics.\u003c/strong\u003e A dataset of 1,740 subjects from the Gwangju Alzheimer\u0026rsquo;s and Related Dementia (GARD) cohort was used for training, and an external test set of 222 subjects from Wonkwang University Hospital (WUH) was used for validation.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eDataset\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eGARD (training)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eWUH (test)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;1,740)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCN\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;947)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMCI\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;793)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;222)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCN\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;106)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMCI\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;116)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eAge (years)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e72.6\u003c/p\u003e\n \u003cp\u003e(6.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e71.8\u003c/p\u003e\n \u003cp\u003e(6.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e73.5\u003c/p\u003e\n \u003cp\u003e(6.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e70.7\u003c/p\u003e\n \u003cp\u003e(8.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e69.9\u003c/p\u003e\n \u003cp\u003e(7.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e71.4\u003c/p\u003e\n \u003cp\u003e(8.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eSex\u003c/p\u003e\n \u003cp\u003eN (female, %)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e743\u003c/p\u003e\n \u003cp\u003e(57.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e378\u003c/p\u003e\n \u003cp\u003e(60.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e365\u003c/p\u003e\n \u003cp\u003e(54.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e140\u003c/p\u003e\n \u003cp\u003e(63.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e76\u003c/p\u003e\n \u003cp\u003e(71.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e64\u003c/p\u003e\n \u003cp\u003e(55.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eEducation (years)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10.1\u003c/p\u003e\n \u003cp\u003e(4.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10.4\u003c/p\u003e\n \u003cp\u003e(4.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.8\u003c/p\u003e\n \u003cp\u003e(4.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.0\u003c/p\u003e\n \u003cp\u003e(4.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8.7\u003c/p\u003e\n \u003cp\u003e(4.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.2\u003c/p\u003e\n \u003cp\u003e(4.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eMMSE scores\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e26.5\u003c/p\u003e\n \u003cp\u003e(2.8)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e27.5\u003c/p\u003e\n \u003cp\u003e(2.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e25.5\u003c/p\u003e\n \u003cp\u003e(3.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"3\"\u003e\n \u003cp\u003eRCFT Score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ecopy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e32.0\u003c/p\u003e\n \u003cp\u003e(5.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e33.6\u003c/p\u003e\n \u003cp\u003e(3.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e30.2\u003c/p\u003e\n \u003cp\u003e(6.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e30.6\u003c/p\u003e\n \u003cp\u003e(9.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e33.7\u003c/p\u003e\n \u003cp\u003e(4.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e27.8\u003c/p\u003e\n \u003cp\u003e(11.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eimmediate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12.8\u003c/p\u003e\n \u003cp\u003e(7.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15.7\u003c/p\u003e\n \u003cp\u003e(6.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.3\u003c/p\u003e\n \u003cp\u003e(6.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.2\u003c/p\u003e\n \u003cp\u003e(8.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12.7\u003c/p\u003e\n \u003cp\u003e(8.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e6.1\u003c/p\u003e\n \u003cp\u003e(7.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003edelayed\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12.7\u003c/p\u003e\n \u003cp\u003e(7.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15.7\u003c/p\u003e\n \u003cp\u003e(6.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.1\u003c/p\u003e\n \u003cp\u003e(6.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8.5\u003c/p\u003e\n \u003cp\u003e(8.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e11.9\u003c/p\u003e\n \u003cp\u003e(8.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5.3\u003c/p\u003e\n \u003cp\u003e(7.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003eScoring validation\u003c/h2\u003e\n \u003cp\u003eThe initial correlation (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{R}^{2}\$\u003c/span\u003e\u003c/span\u003e) between scores by AI and those by experts was 0.81, with a mean absolute error (MAE) of 3.0 point (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eA). Discrepancies exceeding 10 points between the ground truths and predicted scores were identified in 30 images. Upon validation, scores for 26 of these images were corrected. After these adjustments, the correlation improved significantly to an \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{R}^{2}\$\u003c/span\u003e\u003c/span\u003e of 0.95 with an MAE\u0026thinsp;=\u0026thinsp;2.0 (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003e[Figure \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e]\u003c/h2\u003e\n \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e\n \u003ch2\u003eComparison of model performance via internal test using GARD cohort\u003c/h2\u003e\n \u003cp\u003eWe evaluated the classification performances of five models, including three that incorporated the proposed method. These models are: 1) logistic regression using MMSE scores; 2) logistic regression using RCFT scores assessed by experts; 3) logistic regression using RCFT scores predicted by the AI model; 4) deep learning model utilizing only spatial stream network; 5) deep learning model employing multi stream networks. The mean performances of those models are shown in Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e (A).\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eTable 2. Results of model prediction performance\u003c/strong\u003e. The baseline models consisted of three logistic regression models using MMSE scores, RCFT scores by experts, and RCFT scores by a previous AI model, respectively, and one deep learning model that utilized only the spatial stream network. All baseline models included chronological age, sex, and education as covariates. The data was split into 6:2:2 (training, validation, and testing sets), and this process was repeated 50 times.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(A) Internal test using the GARD cohort dataset.\u003c/strong\u003e\u003c/p\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"589\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\" style=\"width: 152px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eInput modality\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"5\" style=\"width: 438px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eGARD\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAccuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSensitivity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSpecificity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 152px;\"\u003e\n \u003cp\u003eMMSE scores\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.714\u003c/p\u003e\n \u003cp\u003e[0.706-0.721]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.660\u003c/p\u003e\n \u003cp\u003e[0.652-0.667]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.625\u003c/p\u003e\n \u003cp\u003e[0.613-0.636]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.694\u003c/p\u003e\n \u003cp\u003e[0.685-0.704]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 152px;\"\u003e\n \u003cp\u003eRCFT scores by experts\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.776\u003c/p\u003e\n \u003cp\u003e[0.768-0.782]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.705\u003c/p\u003e\n \u003cp\u003e[0.699-0.712]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.700\u003c/p\u003e\n \u003cp\u003e[0.689-0.711]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.711\u003c/p\u003e\n \u003cp\u003e[0.700-0.722]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 152px;\"\u003e\n \u003cp\u003eRCFT scores by AI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.777\u003c/p\u003e\n \u003cp\u003e[0.770-0.783]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.710\u003c/p\u003e\n \u003cp\u003e[0.703-0.717]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.699\u003c/p\u003e\n \u003cp\u003e[0.689-0.709]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.721\u003c/p\u003e\n \u003cp\u003e[0.710-0.731]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 152px;\"\u003e\n \u003cp\u003eOnly RCFT images\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.803\u003c/p\u003e\n \u003cp\u003e[0.768-0.837]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.731\u003c/p\u003e\n \u003cp\u003e[0.702-0.761]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.701\u003c/p\u003e\n \u003cp\u003e[0.661-0.741]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e0.762\u003c/p\u003e\n \u003cp\u003e[0.720-0.804]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 152px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eImage + score by AI\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(Our method)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.852\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.837-0.869]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.771\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.755-0.787]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.742\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.718-0.767]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 109px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.800\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.774-0.823]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003cstrong\u003e(B) External test using the WUH cohort dataset.\u0026nbsp;\u003c/strong\u003eRCFT scores by experts\u003csup\u003ea\u003c/sup\u003e refers to the model using scores from the initial dataset before QC, while RCFT scores by experts\u003csup\u003eb\u003c/sup\u003e indicates the model with the validated dataset after QC based on the re-rated RCFT scores using a previous AI model.\u003c/p\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"585\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\" style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eInput modality\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"5\" style=\"width: 435px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eWUH\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAccuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSensitivity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSpecificity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003eRCFT scores by experts\u003csup\u003ea\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.750\u003c/p\u003e\n \u003cp\u003e[0.750-0.751]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.709\u003c/p\u003e\n \u003cp\u003e[0.707-0.712]\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.832\u003c/p\u003e\n \u003cp\u003e[0.829-0.835]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.575\u003c/p\u003e\n \u003cp\u003e[0.571-0.579]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003eRCFT scores by experts\u003csup\u003eb\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.813\u003c/p\u003e\n \u003cp\u003e[0.812-0.814]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.750\u003c/p\u003e\n \u003cp\u003e[0.748-0.753]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.849\u003c/p\u003e\n \u003cp\u003e[0.845-0.852]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.643\u003c/p\u003e\n \u003cp\u003e[0.639-0.648]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003eRCFT scores by AI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.804\u003c/p\u003e\n \u003cp\u003e[0.803-0.805]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.722\u003c/p\u003e\n \u003cp\u003e[0.721-0.725]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.799\u003c/p\u003e\n \u003cp\u003e[0.797-0.802]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.639\u003c/p\u003e\n \u003cp\u003e[0.634-0.644]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003eOnly RCFT images\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.837\u003c/p\u003e\n \u003cp\u003e[0.814-0.860]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.744\u003c/p\u003e\n \u003cp\u003e[0.719-0.768]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.743\u003c/p\u003e\n \u003cp\u003e[0.690-0.800]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e0.745\u003c/p\u003e\n \u003cp\u003e[0.697-0.792]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 151px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eImage + score by AI\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e(Our method)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.872\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.862-0.882]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.781\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.768-0.795]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.836\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.807-0.864]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 108px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.722\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e[0.687-0.757]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 3px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003eThe logistic regression model with MMSE scores demonstrated the lowest performance, with an AUC of 0.714 [95% confidence interval: 0.706\u0026ndash;0.712], an ACC of 0.660 [0.652\u0026ndash;0.667], SEN of 0.625 [0.613\u0026ndash;0.636] and SPE of 0.694 [0.685\u0026ndash;0.704]. The logistic regression model using expert-assessed RCFT scores recorded an AUC of 0.776 [0.768\u0026ndash;0.782], an ACC of 0.705 [0.699\u0026ndash;0.712], an SEN of 0.700 [0.689\u0026ndash;0.711] and an SPE of 0.71 [0.700-0.722]; the performance of the model using AI-predicted RCFT scores was similar, with an AUC of 0.777 [0.770\u0026ndash;0.783], ACC of 0.710 [0.703\u0026ndash;0.717], SEN of 0.699[0.689\u0026ndash;0.709] and SPE of 0.721 [0.710\u0026ndash;0.731].\u003c/p\u003e\n \u003cp\u003ePerformance improvements were evident with the spatial stream network model, which achieved an AUC of 0.803 [0.768\u0026ndash;0.837], ACC of 0.731 [0.702\u0026ndash;0.761], SEN of 0.701 [0.661\u0026ndash;0.741] and SPE of 0.762[0.720\u0026ndash;0.804]. Finally, our proposed deep learning model using the two-stream network outperformed all baseline models across all metrics, with an AUC of 0.852 [0.837\u0026ndash;0.869], ACC of 0.771 [0.755\u0026ndash;0.787], SEN of 0.742 [0.718\u0026ndash;0.767] and SPE of 0.800 [0.774\u0026ndash;0.823].\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n \u003ch2\u003eExternal validation using WUH cohort\u003c/h2\u003e\n \u003cp\u003ePerformance metrics for the trained models on this set are detailed in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e (B). The logistic regression model using expert-rated RCFT scores from the initial dataset demonstrated an AUC of 0.750 [0.750\u0026ndash;0.751], ACC of 0.709 [0.707\u0026ndash;0.712], SEN of 0.832 [0.829\u0026ndash;0.835] and SPE of 0.575 [0.571\u0026ndash;0.579]. With the validated dataset based on the re-rated RCFT scores, the model\u0026rsquo;s performance improved to an AUC of 0.813 [0.812\u0026ndash;0.814], ACC of 0.750 [0.748\u0026ndash;0.753], SEN of 0.799 [0.718\u0026ndash;0.767] and SPE of 0.800 [0.774\u0026ndash;0.823]. The logistic model with AI-predicted RCFT scores displayed comparable performance to that of human experts (AUC\u0026thinsp;=\u0026thinsp;0.804[0.803\u0026ndash;0.805], ACC\u0026thinsp;=\u0026thinsp;0.722[0.721\u0026ndash;0.725], SEN\u0026thinsp;=\u0026thinsp;0.799[0.797\u0026ndash;0.802] and SPE\u0026thinsp;=\u0026thinsp;0.639[0.634\u0026ndash;0.722]). The deep learning model employing the spatial stream network achieved a higher AUC (0.837[0.814\u0026ndash;0.860]), ACC (0.744[0.719\u0026ndash;0.768]) and SPE (0.745[0.697\u0026ndash;0.792]) but had a lower SEN (0.743[0.690\u0026ndash;0.800]). Our proposed deep learning method using the two-stream network outperformed all baseline models, showing superior performance across all metrics: AUC\u0026thinsp;=\u0026thinsp;0.872[0.862\u0026ndash;0.882], ACC\u0026thinsp;=\u0026thinsp;0.781[0.768\u0026ndash;0.795], SEN\u0026thinsp;=\u0026thinsp;0.836[0.807\u0026ndash;0.864] and SPE\u0026thinsp;=\u0026thinsp;0.722[0.687\u0026ndash;0.757].\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\n \u003ch2\u003e[Figure \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e]\u003c/h2\u003e\n\u003c/div\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eIn this article, we developed a multi-stream deep learning network to differentiate between MCI patients and CN subjects. Our approach surpasses previous methods utilizing drawing test (PDT, CDT and RCFT) by leveraging a larger sample size and an external test set, thereby enhancing the robustness and performance of the model. Notably, our model outperformed existing studies, achieving the highest recorded performance metrics.\u003c/p\u003e\u003cp\u003eOur multi-stream network combines both the scoring stream and spatial stream. The scoring stream incorporates an AI scoring system for RCFT, which save time and human resources while proactively preventing human errors, thus improving accuracy. This improvement was evidenced by results showing that the model, when AI scoring was used for QC, exhibited much higher performance compared to the model performance using the initial expert-assessed RCFT scores without QC. Furthermore, while it takes approximately 5 minutes for an expert to score one subject, our AI scoring system takes only 10 seconds. The spatial stream of our model utilizes raw RCFT images as input, and extracts subtle details within the images, such as pen thickness and shape, which are not captured by the human scoring system (ranging from 0\u0026ndash;36 points). This leads to substantial improvement in performance compared to models that rely solely on scoring. However, although raw image data is rich with information, it also includes considerable noise; therefore, the integration of multi-head self-attention layers helps the model to prioritize crucial spatial regions within the feature map, boosting performance. However, models that rely solely on raw images have shown higher SDs in performance compared to logistic models utilizing scores, and the performance of the spatial stream network may be compromised due to differences in resolution between existing training images and new test images. By combining the advantages of both scoring stream network, which utilizes human scoring systems, and the spatial stream network, which processes images, our proposed method achieves high and robust performance.\u003c/p\u003e\u003cp\u003eThe proposed method offers a cost-effective and efficient screening tool for MCI patients at the medical check-up centers. Currently, the MMSE is the most popularly utilized screening tool, known for its simplicity and quick administration time of approximately 5\u0026ndash;10 minutes [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. However, our results indicate that MMSE is less informative for predicting MCI and lacked accuracy in distinguishing between CN subjects and MCI patients (AUC\u0026thinsp;=\u0026thinsp;0.714). Another study reported MMSE performance with an AUC of 0.733 (N\u0026thinsp;=\u0026thinsp;2,577) [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. In contrast, comprehensive cognitive function tests such as Neuropsychological Test Battery are more time-consuming, taking up 2 hours to administer [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] and pose challenges in examining multiple subjects due to the additional time required for scoring and interpretation. Although the RCFT requires more times than the MMSE, approximately 30 minutes including a 20-minute delay interval [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], our model based on the RCFT significantly outperformed that of the MMSE (AUC\u0026thinsp;\u0026gt;\u0026thinsp;0.85). Furthermore, since our model does not necessitate additional time for expert scoring, it is highly efficient compared to other cognitive function tests that rely on expert scoring.\u003c/p\u003e\u003cp\u003eDespite the flexibility of the proposed method, our study had some limitations and areas for future development. First, we did not incorporate additional ancillary information beyond the raw images. Recent studies have shown that kinematic data such as pressure, velocity, time which cannot be captured by traditional paper-and-pencil drawing tests but recorded by tabled-based tests revealed significant differences between case and control groups. These parameters suggest potentially useful covariates to enhance the performance of prediction models [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. We have developed a tablet-based application that administers the RCFT, records the drawing process and extracts kinematic parameters. By incorporating this information, further improvement may be possible. Second, verbal tests have also played a crucial role in neuropsychological evaluation [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Recent advancements in automatic speech recognition technology, such as BERT [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], have enabled the exploration of speech-based methods for AD detection [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. For future work, we plan to develop tablet-based, fully automated memory tests that integrate both visual and verbal assessments.\u003c/p\u003e\u003cp\u003eIn conclusion, our multi-stream deep learning network outperformed previous studies in distinguishing MCI patients from CN subjects. By integrating human scoring systems and image-based information, our model demonstrated robust performance across internal and external datasets. Our findings suggest potential clinical utility as a time-efficient screening tool for cognitive impairment.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eACKNOWLEDGEMENT\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFUNDING\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the Technology Innovation Program (20022810, Development and Demonstration of a Digital System for the evalution of geriatric Cognitive impairment) funded By the Ministry of Trade, Industry \u0026amp; Energy (MOTIE, Korea), and\u0026nbsp;by the \u0026ldquo;Korea National Institute of Health\u0026rdquo;(KNIH) research project No.#2024ER210800\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAVAILABILITY OF DATA AND MATERIALS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe dataset for the current study is not publicly available but is available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAUTHOR CONTRIBUTIONS\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJ.P. performed all data preprocessing, experiments, and manuscript writing as the first author. E.H.S., S.Y., and K.H.L. provided the data. S.K. conducted model evaluation. S.W. organized and supervised the study. All authors reviewed and approved the submitted version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eETHICS APPROVAL\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study was approved by the Institutional Review Boards of Chonnam National University Hospital (CNUH‐2019‐279) and Wonkwang University Hospital (2022-01-024-004). \u0026nbsp;Written informed consent was obtained from each participant or their legal guardian.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCONSENT FOR PUBLICATION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCOMPETING INTERESTS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003cbr\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAGRELL, B. and O. DEHLIN, \u003cem\u003eThe clock-drawing test.\u003c/em\u003e Age and Ageing, 1998. \u003cstrong\u003e27\u003c/strong\u003e(3): p. 399-403.\u003c/li\u003e\n\u003cli\u003eFolstein, M.F., S.E. Folstein, and P.R. McHugh, \u003cem\u003e\u0026ldquo;Mini-mental state\u0026rdquo;: A practical method for grading the cognitive state of patients for the clinician.\u003c/em\u003e Journal of Psychiatric Research, 1975. \u003cstrong\u003e12\u003c/strong\u003e(3): p. 189-198.\u003c/li\u003e\n\u003cli\u003eNasreddine, Z.S., et al., \u003cem\u003eThe Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment.\u003c/em\u003e J Am Geriatr Soc, 2005. \u003cstrong\u003e53\u003c/strong\u003e(4): p. 695-9.\u003c/li\u003e\n\u003cli\u003eDarvesh, S., et al., \u003cem\u003eThe behavioural neurology assessment.\u003c/em\u003e Can J Neurol Sci, 2005. \u003cstrong\u003e32\u003c/strong\u003e(2): p. 167-77.\u003c/li\u003e\n\u003cli\u003eMendez, M.F., T. Ala, and K.L. Underwood, \u003cem\u003eDevelopment of scoring criteria for the clock drawing task in Alzheimer\u0026apos;s disease.\u003c/em\u003e J Am Geriatr Soc, 1992. \u003cstrong\u003e40\u003c/strong\u003e(11): p. 1095-9.\u003c/li\u003e\n\u003cli\u003eRey, A., \u003cem\u003eL\u0026apos;examen psychologique dans les cas d\u0026apos;encephalopathie traumatique.\u003c/em\u003e Archives de psychologie, 1941. \u003cstrong\u003e28\u003c/strong\u003e: p. 286-340.\u003c/li\u003e\n\u003cli\u003eOsterrieth, P.A., \u003cem\u003eLe test de copie d\u0026apos;une figure complexe; contribution a l\u0026apos;etude de la perception et de la memoire.\u003c/em\u003e Archives de psychologie, 1944.\u003c/li\u003e\n\u003cli\u003eTasaki, S., et al., \u003cem\u003eExplainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons.\u003c/em\u003e NPJ Digital Medicine, 2023. \u003cstrong\u003e6\u003c/strong\u003e(1): p. 157.\u003c/li\u003e\n\u003cli\u003eRuengchaijatuporn, N., et al., \u003cem\u003eAn explainable self-attention deep neural network for detecting mild cognitive impairment using multi-input digital drawing tasks.\u003c/em\u003e Alzheimers Res Ther, 2022. \u003cstrong\u003e14\u003c/strong\u003e(1): p. 111.\u003c/li\u003e\n\u003cli\u003eCheah, W.-T., et al. \u003cem\u003eA screening system for mild cognitive impairment based on neuropsychological drawing test and neural network\u003c/em\u003e. in \u003cem\u003e2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)\u003c/em\u003e. 2019. IEEE.\u003c/li\u003e\n\u003cli\u003eZhang, X., et al., \u003cem\u003eA tablet-based multi-dimensional drawing system can effectively distinguish patients with amnestic MCI from healthy individuals.\u003c/em\u003e Scientific Reports, 2024. \u003cstrong\u003e14\u003c/strong\u003e(1): p. 982.\u003c/li\u003e\n\u003cli\u003ePintelas, E., I.E. Livieris, and P.E. Pintelas, \u003cem\u003eA Convolutional Autoencoder Topology for Classification in High-Dimensional Noisy Image Datasets.\u003c/em\u003e Sensors, 2021. \u003cstrong\u003e21\u003c/strong\u003e(22): p. 7731.\u003c/li\u003e\n\u003cli\u003eJia, W., et al., \u003cem\u003eFeature dimensionality reduction: a review.\u003c/em\u003e Complex \u0026amp; Intelligent Systems, 2022. \u003cstrong\u003e8\u003c/strong\u003e(3): p. 2663-2693.\u003c/li\u003e\n\u003cli\u003eBalki, I., et al., \u003cem\u003eSample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review.\u003c/em\u003e Canadian Association of Radiologists Journal, 2019. \u003cstrong\u003e70\u003c/strong\u003e(4): p. 344-353.\u003c/li\u003e\n\u003cli\u003ePark, J.Y., et al., \u003cem\u003eAutomating Rey Complex Figure Test scoring using a deep learning-based approach: a potential large-scale screening tool for cognitive decline.\u003c/em\u003e Alzheimer\u0026apos;s Research \u0026amp; Therapy, 2023. \u003cstrong\u003e15\u003c/strong\u003e(1): p. 145.\u003c/li\u003e\n\u003cli\u003eSeo, E.H., et al., \u003cem\u003eVisuospatial memory impairment as a potential neurocognitive marker to predict tau pathology in Alzheimer\u0026rsquo;s continuum.\u003c/em\u003e Alzheimer\u0026apos;s Research \u0026amp; Therapy, 2021. \u003cstrong\u003e13\u003c/strong\u003e: p. 1-14.\u003c/li\u003e\n\u003cli\u003eWinblad, B., et al., \u003cem\u003eMild cognitive impairment\u0026ndash;beyond controversies, towards a consensus: report of the International Working Group on Mild Cognitive Impairment.\u003c/em\u003e Journal of internal medicine, 2004. \u003cstrong\u003e256\u003c/strong\u003e(3): p. 240-246.\u003c/li\u003e\n\u003cli\u003eTan, M. and Q. Le. \u003cem\u003eEfficientnet: Rethinking model scaling for convolutional neural networks\u003c/em\u003e. in \u003cem\u003eInternational conference on machine learning\u003c/em\u003e. 2019. PMLR.\u003c/li\u003e\n\u003cli\u003eSandler, M., et al. \u003cem\u003eMobilenetv2: Inverted residuals and linear bottlenecks\u003c/em\u003e. in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition\u003c/em\u003e. 2018.\u003c/li\u003e\n\u003cli\u003eVaswani, A., et al., \u003cem\u003eAttention is all you need.\u003c/em\u003e Advances in neural information processing systems, 2017. \u003cstrong\u003e30\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eRyu, H.J. and D.W. Yang, \u003cem\u003eThe Seoul Neuropsychological Screening Battery (SNSB) for Comprehensive Neuropsychological Assessment.\u003c/em\u003e Dement Neurocogn Disord, 2023. \u003cstrong\u003e22\u003c/strong\u003e(1): p. 1-15.\u003c/li\u003e\n\u003cli\u003eShin, M.S., et al., \u003cem\u003eClinical and empirical applications of the Rey-Osterrieth Complex Figure Test.\u003c/em\u003e Nat Protoc, 2006. \u003cstrong\u003e1\u003c/strong\u003e(2): p. 892-9.\u003c/li\u003e\n\u003cli\u003eKim, K.W., et al., \u003cem\u003eA comprehensive evaluation of the process of copying a complex figure in early-and late-onset Alzheimer disease: a quantitative analysis of digital pen data.\u003c/em\u003e Journal of medical internet research, 2020. \u003cstrong\u003e22\u003c/strong\u003e(8): p. e18136.\u003c/li\u003e\n\u003cli\u003eKnopman, D.S. and S. Ryberg, \u003cem\u003eA verbal memory test with high predictive accuracy for dementia of the Alzheimer type.\u003c/em\u003e Archives of neurology, 1989. \u003cstrong\u003e46\u003c/strong\u003e(2): p. 141-145.\u003c/li\u003e\n\u003cli\u003eDevlin, J., et al., \u003cem\u003eBert: Pre-training of deep bidirectional transformers for language understanding.\u003c/em\u003e arXiv preprint arXiv:1810.04805, 2018.\u003c/li\u003e\n\u003cli\u003eBalagopalan, A., et al., \u003cem\u003eComparing pre-trained and feature-based models for prediction of Alzheimer\u0026apos;s disease based on speech.\u003c/em\u003e Frontiers in aging neuroscience, 2021. \u003cstrong\u003e13\u003c/strong\u003e: p. 635945.\u003c/li\u003e\n\u003cli\u003ePappagari, R., et al. \u003cem\u003eAutomatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios\u003c/em\u003e. in \u003cem\u003eInterspeech\u003c/em\u003e. 2021.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Rey Complex Figure Test, Mild cognitive impairment prediction, Multi-stream deep learning, Convolutional Neural Network, Multi-head self-attention","lastPublishedDoi":"10.21203/rs.3.rs-6894673/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6894673/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDrawing tests like the Rey Complex Figure Test (RCFT) are widely used to assess cognitive functions such as visuospatial skills and memory, making them valuable tools for detecting mild cognitive impairment (MCI). Despite their utility, existing predictive models based on these tests often suffer from limitations like small sample sizes and lack of external validation, which undermine their reliability. We developed a multi-stream deep learning framework that integrates two distinct processing streams: a multi-head self-attention based spatial stream using raw RCFT images and a scoring stream employing a previously developed automated scoring system. Our model was trained on data from 1,740 subjects in the Korean cohort and validated on an external hospital dataset of 222 subjects from Korea. The proposed multi-stream model demonstrated superior performance over baseline models (AUC\u0026thinsp;=\u0026thinsp;0.872, Accuracy\u0026thinsp;=\u0026thinsp;0.781) in external validation. The integration of both spatial and scoring streams enables the model to capture intricate visual details from the raw images while also incorporating structured scoring data, which together enhance its ability to detect subtle cognitive impairments. This dual approach not only improves predictive accuracy but also increases the robustness of the model, making it more reliable in diverse clinical settings. Our model has practical implications for clinical settings, where it could serve as a cost-effective tool for early MCI screening.\u003c/p\u003e","manuscriptTitle":"Multi-stream deep learning framework to predict mild cognitive impairment with Rey Complex Figure Test","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-23 02:06:59","doi":"10.21203/rs.3.rs-6894673/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-11-07T16:10:46+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-07T03:43:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"233379845795810144413270014306053656372","date":"2025-10-16T03:36:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"191583486916393667796873381712089973109","date":"2025-09-16T07:41:03+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"28917652280733692644338011700956488636","date":"2025-09-15T17:03:44+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-18T10:46:49+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"153559464270827413413565377706917663527","date":"2025-08-11T02:06:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"155640164091110066852567582673267281175","date":"2025-07-23T21:15:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"90956511036524770164031873824864513471","date":"2025-07-23T13:33:58+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-07-17T12:38:40+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-07-17T12:34:20+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-06-23T17:05:19+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-20T17:36:30+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-06-20T17:33:30+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"25644793-cca4-4640-9bc0-dc28a4e04328","owner":[],"postedDate":"July 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":51874446,"name":"Health sciences/Health care"},{"id":51874447,"name":"Health sciences/Health care/Public health/Population screening"},{"id":51874448,"name":"Physical sciences/Mathematics and computing/Statistics"}],"tags":[],"updatedAt":"2026-03-02T16:00:25+00:00","versionOfRecord":{"articleIdentity":"rs-6894673","link":"https://doi.org/10.1038/s41598-025-34491-5","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-03-01 15:57:25","publishedOnDateReadable":"March 1st, 2026"},"versionCreatedAt":"2025-07-23 02:06:59","video":"","vorDoi":"10.1038/s41598-025-34491-5","vorDoiUrl":"https://doi.org/10.1038/s41598-025-34491-5","workflowStages":[]},"version":"v1","identity":"rs-6894673","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6894673","identity":"rs-6894673","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00