Explainable Machine Learning with Bayesian Hyper-Optimization for Predicting Cognitive Impairment from Longitudinal Nursing Home Data

doi:10.21203/rs.3.rs-7402937/v1

Explainable Machine Learning with Bayesian Hyper-Optimization for Predicting Cognitive Impairment from Longitudinal Nursing Home Data

2025 · doi:10.21203/rs.3.rs-7402937/v1

preprint OA: closed

Full text JSON View at publisher

Full text 135,210 characters · extracted from preprint-html · click to expand

Explainable Machine Learning with Bayesian Hyper-Optimization for Predicting Cognitive Impairment from Longitudinal Nursing Home Data | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Explainable Machine Learning with Bayesian Hyper-Optimization for Predicting Cognitive Impairment from Longitudinal Nursing Home Data Silvia Campanioni, Laura Busto, José A. González-Novoa, Carlos Martínez, and 10 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7402937/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 05 Feb, 2026 Read the published version in Scientific Reports → Version 1 posted 10 You are reading this latest preprint version Abstract The monitoring of daily life in nursing home residents generates diverse and heterogeneous sources of information. Artificial Intelligence (AI) is increasingly used to predict a wide range of outcomes in both research and clinical practice, including mortality and cognitive impairment (CI). A key challenge is determining which information sources (IS) provide the most accurate predictions. In this work, we introduce a novel AI-based methodology that integrates Bayesian optimization, XGBoost, and explainable AI (SHAP) to predict CI in nursing home residents using 13 years of heterogeneous longitudinal data from 2,608 individuals. Our approach enables interpretable predictions of CI-related clinical scales such as the Mini-Mental State Examination (MMSE), the Global Deterioration Scale (GDS), and the Barthel Scale while revealing the relative contributions of various information sources, including clinical metrics and activity records. Our results demonstrate that this is the first framework to combine harmonized temporal modeling, Bayesian-optimized ensemble learning, and SHAP-based interpretability to evaluate the predictive relevance of heterogeneous clinical and behavioral data sources in a real-world long-term care setting. This integrated approach not only improves predictive performance for CI-related scores but also offers interpretable insights that can inform personalized care strategies. Biological sciences/Computational biology and bioinformatics Health sciences/Health care Physical sciences/Mathematics and computing Health sciences/Medical research Information Source (IS) Artificial Intelligence (AI) Cognitive Impairment (CI) Homogenization of data Explainable Artificial Intelligence (XAI) Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction The accelerated aging of the global population has led to a growing prevalence of neurodegenerative disorders, underscoring the importance of investigating physiological and cognitive processes over time [1]. Many of these conditions feature prolonged prodromal phases, offering a critical window for the early identification of cognitive impairment (CI) before it significantly disrupts daily functioning [2]. While current clinical methods are effective in detecting CI once symptoms are evident, they fall short in monitoring asymptomatic individuals over extended periods. Although biomarkers such as hyperphosphorylated tau protein (p-tau) and amyloid-beta (Aβ) have shown promise in detecting neuropathology prior to the onset of CI, their invasive extraction via cerebrospinal fluid (CSF) poses practical limitations [3]. Similarly, neuroimaging techniques such as magnetic resonance imaging (MRI) and positron emission tomography (PET) have demonstrated potential to anticipate CI well in advance of clinical symptoms [4, 5]. However, these modalities involve high costs and strict standardization protocols, limiting their scalability for large-scale early screening [6, 7]. In contrast, Electronic Health Records (EHRs) and data collected in long-term care environments offer rich, albeit heterogeneous, sources of personal health information [8]. These include data related to physical activity, bowel movements, nutritional plans, mobility scores, pharmaceutical history, and metrics derived from wearable sensors [9, 10]. Although diverse in format and granularity, these datasets hold significant potential for generating clinical insights, provided they can be harmonized into a consistent format suitable for AI applications. Several standardized scales are widely used in geriatric environments to assess CI and functional ability. The Mini-Mental State Examination (MMSE) evaluates cognitive function based on memory, attention, and language [11]. The MMSE scores were obtained using the Spanish adaptation known as the Mini-Examen Cognoscitive (MEC) [12], widely used in Spanish clinical settings. The Global Deterioration Scale (GDS) categorizes cognitive decline into seven stages [13], and the Barthel Index quantifies a patient’s independence in daily activities [14]. Together, these scales provide complementary perspectives on cognitive and functional status in elderly individuals [15]. Artificial Intelligence (AI), particularly Machine Learning (ML), offers powerful tools to analyze complex clinical data [16]. However, most ML models struggle with temporal heterogeneity, sparse sampling, and inconsistent follow-up intervals inherent in real-world longitudinal datasets. While ML excels in modeling cross-sectional data, longitudinal modeling remains a challenge due to issues such as multidimensionality and information sparsity. Addressing these challenges is essential for extracting insights from diverse biomedical data types, including time series and imaging modalities [17]. The incorporation of Explainable AI (XAI) is increasingly essential for establishing and maintaining trust in these models when applied to clinical decision-making. XAI enables practitioners to understand and validate the rationale behind model predictions, fostering confidence and promoting responsible adoption [18, 19]. In this work, we propose a novel framework for predicting CI in nursing home residents using longitudinal, heterogeneous data. The methodology involves the harmonization of diverse information sources (IS) -including clinical metrics, medication, nutrition, bowel movements, and fall events -into a homogeneous temporal matrix for CI prediction. Our method combines Bayesian hyperparameter optimization of gradient boosting (XGBoost) models with SHAP-based explainability to not only predict CI-related scales but also quantify the relative predictive power of each type of data. This dual focus on accuracy and interpretability aims to bridge the gap between ML models and actionable clinical insight in nursing home populations. Materials and Methods Dataset: The dataset originates from a retrospective cohort study conducted at four DomusVi nursing homes in Spain. Ethical approval was obtained from the relevant committee (Code: 2023/576). It includes records from 2,608 residents and comprises 4,718,828 activity logs collected over a 13-year period (2011–2024). All methods were carried out in accordance with relevant guidelines and regulations, including the Declaration of Helsinki and institutional policies on research involving human participants. All participants or their legal guardians provided written informed consent prior to data collection, in accordance with institutional and national ethical guidelines. Each record is timestamped, allowing longitudinal tracking. Data are grouped into nine Information Sources (IS), each capturing different aspects of residents' health and behavior. Table 1 summarizes the number of records per IS. All data entries include timestamps, facilitating alignment along a common temporal axis. Demographics (Subjects IS) The Subjects IS contains a unique identifier for each resident, along with gender, age at entry, and follow-up duration. Table 2 presents summary statistics (mean, standard deviation, minimum, and maximum) for the overall cohort and stratified by gender (998 males and 1,610 females). Clinical Variables IS This IS includes 40 clinical variables recorded during follow-up, totaling 414,219 entries. Some variables were measured frequently, while others had sparse entries. This IS also includes the Tinetti and Mini Nutritional Assessment (MNA) (determinate for three variables: MNA_Global_score , MNA_Assessment_score and MNA_Screening_score) scores that measure the functional mobility and nutritional status of individuals, respectively. Bowel Movement Records IS This source contains 3,948,306 daily records. Each entry includes time of day (morning, afternoon, evening) and one of seven stool classifications: X (no movement), N (normal), D (diarrhea), B (soft), F (impaction), L (liquid), and E (constipation). Fall Records IS With 8,228 entries, this IS captures categorical data about fall events. See Supplementary Material S2 table for the whole description of categorical variables. Such categorical variables include the following content, describing the episode as: accompaniment during the fall ( Id_AccompanimentFall ), conduct to follow ( Id_ConductToFollow ), associated symptom ( Id_AssociatedSymptom ), and type of injury caused by the fall ( Id_TypeOfInjuryFall ). Additionally, this IS includes variables for the location of the fall (Id_FallLocation), cause of the fall ( Id_CauseFall ), neurological state ( Id_NeurologicalState ), and activity ( Id_ActivityFall ) along with the timestamp of the event. If the resident has no falls, there are no inputs in this IS. Drug Prescriptions IS This IS comprises 273,679 records of medication administration, specifying drug name, dosage, frequency, and administration start and end dates. Nutritional Plan S The Nutritional Plan contains 48,875 records detailing the start and end dates of each resident's dietary plan. Each record includes the plan identifier ( Id_Diet ), calories information (Id_Calories), route of administration ( Id_Route ), and consistency ( Id_Consistency ). For subsequent analysis, only meals categorized as breakfast, lunch, snack, and dinner are included. Using this criterion, 46,565 determinations from the table will be analyzed. Cognitive Impairment Scales IS The Cognitive Impairment Scales contains the determinations of CI using three different scales, namely MMSE scale, Barthel scale and GDS scale for each resident, describing their condition throughout the follow-up study. The number of determinations for each scale varies, as evidenced by the values in Table 1, with an average of 2.08 determinations per patient for MMSE, 1.37 determinations per patient for GDS, and 2.57 determinations per patient for Barthel. The distribution of the frequency for each scale's values is provided in Fig. 1, illustrating the prevalence and severity of CI among the study participants over time. It is remarkable that the number of determinations with MMSE = 0 corresponds to residents that are not able to undergo the test. AI tools (predictors and explainability): ML predictors (Bayesian Hyper-optimized XGBoost predictors). XGBoost [20] is a Boosting technique within Ensemble Learning, designed to enhance prediction accuracy by combining multiple models. Unlike other boosting methods that increase the weights of misclassified examples, XGBoost optimizes a loss function through gradient boosting. Its objective function, optimized at each iteration, is given by Eq. 1. $$\:L\left(t\right)=\sum\:_{i=1}^{n}l\left({y}_{i},{\stackrel{\sim}{y}}_{i}^{\left(t-1\right)}+{f}_{t}\left({x}_{i}\right)\right)+\varOmega\:\left(ft\right)$$ 1 where, $\:l$ represents a differentiable convex loss function, $\:{(x}_{i},{y}_{i})$ is the training set, $\:{\stackrel{\sim}{y}}_{i}$ is the final prediction, and $\:\varOmega\:\left(ft\right)$ is the regularization term that prevents overfitting by penalizing complex models using Lasso and Ridge regularizations. XGBoost constructs the next learner by maximizing loss reduction, using the Exact Greedy Algorithm. This process starts with a root node containing all training examples, evaluates potential splits for each feature, and stops growing a branch if the gain for the best split is not positive. Setting up XGBoost involves three types of parameters: general, booster, and learning task parameters. General parameters define the booster type (tree or linear model), booster parameters configure the internal aspects of the booster (like learning rate and number of estimators) and learning task parameters specify the learning objective. Bayesian hyper-optimization enhances ML predictors, including XGBoost, by systematically optimizing hyperparameters for maximum model accuracy [21]. It uses Bayesian inference to balance exploration and exploitation in the hyperparameter space, making the search more efficient than traditional methods like grid or random search. By defining a probabilistic model of the objective function and updating it with new data points, Bayesian optimization quickly identifies the most promising hyperparameter regions, thus speeding up training and improving generalization. For XGBoost, this involves tuning parameters like learning rate, maximum depth, and subsample ratio, which are iteratively adjusted based on validation set performance. This results in a robust model with superior predictive accuracy and reliability, making Bayesian hyper-optimized XGBoost predictors a powerful tool in ML [22]. Shapley Additive Explanations (SHAP) SHAP is a powerful tool used in explainable AI to reveal the importance of input features in model predictions [18]. It applies principles from game theory to quantify the contribution of each feature to the model's outputs. By considering all possible combinations of features, SHAP provides insights into how these features interact and collectively influence predictions. Eq. 2 represents the mathematical expression of SHAP values. This comprehensive approach enhances transparency in complex ML models, enabling informed decision-making regarding feature importance and model behavior. Eq. 2 represents the mathematical expression representing SHAP values. where 𝑓(S) represents the output of the XGBoost model based on a particular subset of features, denoted as S, from the complete set N of all features. The contributions ($\:{\varnothing\:}_{i}$) are determined by averaging the impacts across all possible permutations of feature sets. As each feature is sequentially added to the set, its influence on the model's output change becomes apparent. Methodological Framework: In this work, we present a comprehensive framework employing AI predictors (regressors and classifiers) based on XGBoost algorithms, using both default parameters and Bayesian hyper-optimization to construct AI predictors of CI. This framework allows to evaluate the informative contribution of various IS within a heterogeneous dataset on the living conditions of elderly individuals in nursing homes, addressing the outlined problem. Finally, by employing SHAP to comprehend the reasons behind such predictions, we aim to gain a better understanding of the CI process. The general schema of the methodology developed in this work is depicted in Fig. 2. The proposed method requires the recodification of the heterogeneous information collected from residences to create longitudinal models in a suitable homogeneous format. After this, the framework will enable the production of a family of predictors P ijk from X i datasets, which are created from i IS, j outcome variables that define clinical questions ( y j ), and k sets of hyperparameters for ML algorithms that generate possible models. The parameter k can iterate over different ML algorithms and various sets of hyperparameters for each algorithm. In this work, for simplicity, we propose reducing the ML algorithm to XGBoost [19], used as a regressor and classifier. We propose using the information from the Subjects (Residents) IS in combination with each of the other tables (Fall Records, Clinical Variables, Bowel Movement Records, Drug Prescriptions, and Nutritional Plans) to form the Xi datasets and create combinations to predict the yj (MMSE, GDS, and Barthel scales). After identifying the best predictor (according to the AUC metric), the explicability stage will help in identifying the reasons behind the decisions made. Thus, the methodology consists of three main steps: the first focuses on recoding data to transform the heterogeneous information of each individual into a homogeneously structured matrix and make it possible to use the dataset on the family of Pijk predictors. The second step identifies the best predictors for specific clinical outcomes. The final step employs XAI to understand the rationale behind the model's decisions of Pijk . In this way, the methodology allows us to run experiments to elucidate the importance of each data source for predicting the CI, as presented previously. Data recodification The first step is to merge all the IS to create a single time axis that contains all the collected information. To produce a homogeneous dataset from the longitudinal multidisciplinary IS, the proposed schema for analysis requires reshaping the data into a new format, as illustrated in Fig. 3. In this paper, we propose a monthly-based recodification where data for each resident are organized monthly for both Xi and yj variables. The number of determinations of clinical events over time varies greatly and occurs without a temporal pattern, except for Bowel Movement Records and Nutritional Plans, which are recorded three times daily (morning, afternoon, and evening) and four times daily (breakfast, lunch, afternoon snack, and dinner) respectively. To conduct the monthly analysis, different considerations are taken depending on whether the variable describing the event is categorical or numerical. For the recording of the categorical determinations of daily bowel movements, the number of occurrences of each reference value is accounted for the entire month in each period. This leads to a reduction of data into 21 variables, representing three time periods (morning, afternoon, evening) across seven categories, with each variable representing the number of monthly occurrences. For the Fall Records IS, the monthly number of falls is logged along with the associated information for eight variables, as described in Dataset section. In the case of the Drugs Prescription IS, the days of the month on which each medication is taken are recorded, along with the active ingredient and corresponding dose. If multiple medications are taken, the same procedure is followed, associating each medication with the patient's monthly record. This produces a monthly vector of variable length for each resident, depending on the number of medications administered. To obtain a fixed-length vector computationally, the principle of one-hot encoding can be used [23]. This algorithm produces/encodes as many columns as possible medications to be administered, producing a triple description for each (in terms of the active ingredient, dose, and frequency, using zero for the drugs not consumed in that month period). In the case of the Clinical Variables IS, if there are multiple measurements within a month, the average, standard deviation, minimum, and maximum values are extracted. For the Tinetti and MNA scores, the measured value within a 3-month period is retained. Since several variables have empty values, indicating the absence of data for those variables in certain months, it is necessary to use a filling parameter (FP). The FP represents the proportion of non-empty values for each variable, providing insight into how thoroughly each variable is documented within the dataset. Figure 4 illustrates the FP for variables of Clinical Variables IS. The variables are listed on the x-axis, while the FP percentage is on the y-axis. From the figure, we observe that variables such as the MNA Score, Tinetti Score, and blood pressure (systolic and diastolic) have a high FP, nearing 100%. This indicates that these variables are almost always recorded. Other variables, such as Weight, BMI, and Heart rate, also show relatively high FPs above 60%, represented by a dashed green line indicating the 50% threshold. In contrast, many variables have significantly lower FPs, indicating that they are less frequently documented. The low FP of these variables suggests that they might not be consistently available for analysis, which could impact the comprehensiveness of any study relying on this data. Before data monthly-basis, it is essential to ensure continuity for the discontinuous variables that require prediction, namely MMSE, GDS, and Barthel. In this study, we propose utilizing the most recent recorded value until a new determination is documented, within a 3-month period [24, 25]. Building predictors for CI Scales ( P ijk ) Given a homogeneous dataset, as the monthly-based harmonized data produced in Data recodification section, the problem of identifying the amount of information in each IS of the dataset, can be posed in a general manner as identify the parameters of a family of predictors Pijk , that can be regressors or classifiers, depending on the underlying clinical problem associated with the output variable yj . In a general manner approach, given a homogeneous dataset X = Xi , it can be understood as a joining of variables from n different IS, to predict (regressing or classifying) j outcomes variables y = yj . Examples of this outcome, proposed in this work, are the prediction of the CI scales, presented in Dataset section. In this work we propose to use XGBoost [19] as predictor and use default parameters and Hyper-opt, which uses Bayesian methods [20] to optimize the search of the k sets of parameters. The method allows us to identify the set of parameters that provides the best performance in the prediction of j clinical questions. Each predictor will have a performance metric. Using the monthly-based IS of residents, it is possible to predict CI scores described above (MMSE, GDS and Barthel). These predictors can be used as a measurement of the predictive value of the dataset. The predictors are hyper-optimized to warrant that the best possible performance in the cases is identified and employed to make the comparisons. Artificial Intelligence Explainability (XAI) Once the predictor models are built it is possible to employ XAI tools to analyze the black box. The final predictor Pijk can be explained using the SHAP approach [18]. In this methodological stage information of the importance of the variables is obtained as well as the explanations of the role that the variables play on predictions. Results Several experiments have been conducted employing the above-described methodology to evaluate the questions posed in this work. To ensure comparability of results, the data was split into training and testing sets with an 80 to 20 ratio, respectively. This split was done with the same patients from the Subjects (Residents) IS, ensuring that the information of each patient remained in the same group across all predictors. Measuring the recodification process: The heterogeneous dataset described in Dataset section is transformed into a new homogeneous monthly based format using the methodology proposed in Methodological Framework section (recodification step). This transformation results in a new matrix comprising 65,440 months of follow-up and 7,831 columns representing the variables result after recodification the IS described in Table 1. Specifically, there are 3 columns from the Subjects (Residents) IS, 21 columns from the Bowel Movement Records IS, 40 columns from the Clinical Variables IS, 8 columns from the Falls Records IS, 7,743 columns from the Drugs Prescriptions IS (using the one-hot encoding method to separately associate monthly information for each medication), and 16 columns from the Nutritional Plans IS. Comparing the different IS (Xi tables): In this experiment, the dataset is split into five different Xi IS to produce a predictor with the information from each IS. Each Xi contains the information from the Subject (Resident) IS along with each of the five remaining IS (as shown in Fig. 1). In this case, as the Clinical Variables IS have many empty registers, only the variables where the FP was greater than 50% (resulting in ten variables that exceeded this threshold, as shown in Fig. 4) have been considered. The three clinical scores (MMSE, GDS and Barthel) at each month are the predicted variable yj . Results of such comparisons are provided in Table 3. The five separate datasets after a split of training and testing of (80/20) were employed to fit three predictors. The k parameters of the predictors are hyper-optimized using the methodology described above and the testing dataset is employed to make predictions of the different IS of this set of residents. As the values of the score of the testing set are also known, this allows us to measure the differences between the predictions and the golden standard. Results are provided in terms of Mean Square Error (MSE). Values of errors on predictions depend on the score to predict, as the outcome values of the score ranges significatively. MMSE that ranges in 0–30, produces error predictions from 1.9 to 8.3; GDS that ranges from 1 to 7 produces error predictions ranging from 0.4 to 1.6. Barthel that ranges from 1 to 100 produces prediction errors from 4.2 to 27.5. As the purpose of this experiment is to compare the discriminative power of the different IS, the best predictions are produced when Clinical Variables IS is employed, while the bigger errors appear when Bowel Movement Records IS is employed. As the method allows us to identify which are the most informative variables of each IS, Table 4 presents the three main ranked variables (features) used to build the 15 predictors. The table highlights the distinct sets of features associated with each predictor, demonstrating how different variables contribute to the model's performance across the various IS categories. Comparison of the prediction models using all IS to predict several CI scores: In this experiment, the entire monthly dataset X (merging the six Xi IS tables in one) was used to train and test the prediction of three different CI scores: MMSE, GDS, and Barthel. The dataset was split into training and testing sets using an 80/20 ratio, resulting in two subsets of data spanning 52,352 and 13,088 months, respectively. This split was done randomly while ensuring an equal representation of all categories. The same dataset subsets were used for comparisons between the different outcomes. The results of the predictors are presented in Table 5, measured in terms of MSE. For all three CI scores, the XGBoost model with hyperopt parameters outperformed the model with default parameters, as they reduce the prediction errors in all models. This indicates that hyperparameter optimization significantly enhances the predictive performance of the model. The merger of all information sources in a single dataset produces the best predictors as it provides lower values of MSE in all predictors. Exploring Classifiers: In this experiment, the whole harmonized dataset (the same as the previous section) was employed. But now, the Pijk were utilized as classifiers rather than regressors. To achieve this, the outcome variable MMSE was used as the score for CI as categorical, using the following commonly accepted threshold levels for the conversion: a Normal score is 24 points or more; Mild Impairment is between 19 and 23 points; Moderate Impairment is between 14 and 18 points; and Severe Impairment is less than 14 points [ 16 , 24 ]. The harmonized data produced monthly, was used to train and test the classifier. The classifier's parameters were hyper-optimized. After obtaining the best performance predictors, various metrics for each category were evaluated, including AUC-ROC, Precision, Recall and F1-Score. Table 6 provides the results of these metrics. Using SHAP to explain the decline trajectory in the model construction. Model explainability clarifies the impairment process : In this experiment, XAI tool was employed to understand the process of CI. Specifically, the SHAP library [ 18 ] was employed to interpret the features of the importance of the model. To analyze the influence of variables on the prediction of various CI scores, we selected the predictor with the best MMSE value (see Table 5). This predictor was obtained as a result in Comparison of the prediction models using all IS to predict several CI scores section by applying XGBoost to the entire dataset and using hyperopt parameters. Figure 5 illustrates the SHAP analysis of the 15 most important features for the XGBoost regressor model to predict MMSE in Fig. 5 (A), GDS in Fig. 5 (B), and Barthel in Fig. 5 (C). The SHAP analysis in Fig. 5 reveals key insights into the factors influencing CI. For the three CI scales being predicted, the most influential variables are the Tinetti score, MNA score, and age. Additionally, in the section focused on the classifier, we employed a methodology like that used for the regressor. Specifically, we utilized the SHAP technique to evaluate and elucidate the influence of the features for each class of the classifier resulting from Exploring Classifiers section. This experiment generated predictors of the MMSE CI scores using all available information. Figure 6 presents the ranking of the top 15 features that exert the greatest influence on the model's classification decisions, thereby contributing to a deeper understanding of variable importance and overall model performance. Discussion This study introduces a novel and integrative methodology to harmonize and analyze heterogeneous, longitudinal data collected from nursing home residents to predict CI. The proposed approach records diverse IS—including clinical variables, medication records, nutrition plans, bowel movements, and fall events—into a standardized, monthly-based format, making it suitable for ML analysis. This recodification facilitates consistent temporal modeling while minimizing loss of clinical detail. Our methodology was tested on a large, real-world dataset comprising 13 years of data from 2,608 residents across four nursing homes in Galicia, Spain. As detailed in the Measuring the recodification process section, this transformation yielded 65,440 monthly follow-ups and 7,831 features across all IS. Notably, the Drug Prescriptions IS contributed the highest dimensionality due to one-hot encoding of active ingredients, dose, and frequency. We evaluated the predictive capacity of each IS individually (Comparing the different IS (Xi tables) section), using XGBoost regressors optimized with Bayesian hyperparameter tuning. As shown in Table 3, Clinical Variables IS provided the most accurate predictions for MMSE, GDS, and Barthel scores, with MMSE errors ranging from 1.9 to 8.3, GDS from 0.4 to 1.6, and Barthel from 4.2 to 27.5. Bowel Movement Records IS produced the largest errors. Key predictors included the Tinetti score, BMI, and MNA_Global_score (Table 4), indicating a strong association between functional mobility, nutritional status, and cognitive function. In the next experiment (Comparison of the prediction models using all IS to predict several CI scores section), we trained regressors using all IS merged into one dataset. Results in Table 5 demonstrate that combining all data sources further reduces prediction error. Hyperparameter optimization with Bayesian methods consistently outperformed default model configurations. These results validate the advantage of using the full, harmonized dataset and emphasize the importance of parameter tuning. Beyond regression, we explored classification of CI severity levels using MMSE scale (Exploring Classifiers section). The classifier achieved high AUC-ROC, precision, recall, and F1-scores for Normal and Severe impairment categories, but struggled more with intermediate classes like Mild and Moderate impairment (Table 6). These results suggest the need for refined feature engineering or additional data types to better differentiate mid-range cognitive states. Explainability was a central aspect of this work. Using SHAP, we analyzed the contribution of individual features in each model (Using SHAP to explain the CI trajectory in the model construction section). The SHAP analysis in Fig. 5 illustrates the top 15 features for MMSE, GDS, and Barthel regressors, consistently highlighting the relevance of Clinical Variables IS. Similarly, Fig. 6 presents SHAP values for the classifier, confirming that the same key variables influence class-level predictions. Although the methodology shows strong performance and interpretability, it is not without limitations. The dataset was derived from a single geographical region, which may limit generalizability. Additionally, Fig. 4 reveals variable documentation completeness, suggesting that sparsity remains a challenge in some IS. Our monthly aggregation strategy mitigates this but may overlook short-term dynamics relevant to early CI detection. In conclusion, this work presents a comprehensive and explainable framework for modeling CI in long-term care settings. By integrating harmonized temporal data, Bayesian-optimized ML models, and SHAP-based interpretability, it enables accurate prediction and understanding of CI trajectories. The modularity of the framework allows adaptation to other clinical outcomes and contexts, supporting its broader applicability in personalized geriatric care. Declarations Conflict of Interest The authors have declared that no competing interests exist. Data Availability All relevant data necessary to replicate the procedures described in this study are within the manuscript and its Supporting Information files. Funding: This research was funded by the Ministry of Science and Innovation through the project PID2022-138936OB-C32 (co-funded by the European Regional Development Fund (FEDER), "A way to make Europe", EU) awarded to C. Veiga. Competing interests : The authors declare that there are no financial interests, awarded or filed patents, or any other conflicts of interest related to the results presented in this paper. References L. Volpi et al. , “Detecting cognitive impairment at the early stages: The challenge of first line assessment,” Journal of the Neurological Sciences , vol. 377, pp. 12–18, Jun. 2017, doi: 10.1016/j.jns.2017.03.034. E. McDade et al. , “The pathway to secondary prevention of Alzheimer’s disease,” A&D Transl Res & Clin Interv , vol. 6, no. 1, p. e12069, Jan. 2020, doi: 10.1002/trc2.12069. K. M. Langa and D. A. Levine, “The Diagnosis and Management of Mild Cognitive Impairment: A Clinical Review,” JAMA , vol. 312, no. 23, p. 2551, Dec. 2014, doi: 10.1001/jama.2014.13806. J. McConathy and Y. I. Sheline, “Imaging Biomarkers Associated With Cognitive Decline: A Review,” Biological Psychiatry , vol. 77, no. 8, pp. 685–692, Apr. 2015, doi: 10.1016/j.biopsych.2014.08.024. R. H. Kirkpatrick, D. P. Munoz, S. Khalid-Khan, and L. Booij, “Methodological and clinical challenges associated with biomarkers for psychiatric disease: A scoping review,” Journal of Psychiatric Research , vol. 143, pp. 572–579, Nov. 2021, doi: 10.1016/j.jpsychires.2020.11.023. R. Whelan, F. M. Barbey, M. R. Cominetti, C. M. Gillan, and A. M. Rosická, “Developments in scalable strategies for detecting early markers of cognitive decline,” Transl Psychiatry , vol. 12, no. 1, p. 473, Nov. 2022, doi: 10.1038/s41398-022-02237-w. A. Morozova et al. , “Neurobiological Highlights of Cognitive Impairment in Psychiatric Disorders,” IJMS , vol. 23, no. 3, p. 1217, Jan. 2022, doi: 10.3390/ijms23031217. H. Atasoy, B. N. Greenwood, and J. S. McCullough, “The Digitization of Patient Care: A Review of the Effects of Electronic Health Records on Health Care Quality and Utilization,” Annu. Rev. Public Health , vol. 40, no. 1, pp. 487–500, Apr. 2019, doi: 10.1146/annurev-publhealth-040218-044206. R. Chen et al. , “Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , Anchorage AK USA: ACM, Jul. 2019, pp. 2145–2155. doi: 10.1145/3292500.3330690. J. Razjouyan et al. , “Toward Using Wearables to Remotely Monitor Cognitive Frailty in Community-Living Older Adults: An Observational Study,” Sensors , vol. 20, no. 8, p. 2218, Apr. 2020, doi: 10.3390/s20082218. M. F. Folstein, S. E. Folstein, and P. R. McHugh, “‘Mini-mental state,’” Journal of Psychiatric Research , vol. 12, no. 3, pp. 189–198, Nov. 1975, doi: 10.1016/0022-3956(75)90026-6. A. Lobo, J. Ezquerra, F. Gómez Burgada, J. M. Sala, and A. Seva Díaz, “[Cognocitive mini-test (a simple practical test to detect intellectual changes in medical patients)],” Actas Luso Esp Neurol Psiquiatr Cienc Afines, vol. 7, no. 3, pp. 189–202, 1979. “The Global Deterioration Scale for assessment of primary degenerative dementia,” AJP , vol. 139, no. 9, pp. 1136–1139, Sep. 1982, doi: 10.1176/ajp.139.9.1136. M. Fi, “Functional evaluation: the Barthel index,” Md State Med J, vol. 14, pp. 61–65, 1965. W. Lu, L. Ma, H. Chen, X. Jiang, and M. Gong, “A Clinical Prediction Model in Health Time Series Data Based on Long Short-Term Memory Network Optimized by Fruit Fly Optimization Algorithm”, IEEE Access , vol. 8, pp. 136014–136023, 2020, doi: 10.1109/ACCESS.2020.3011721. K. R. Jadhav and N. N. Patil, “Clinical Document Architecture (CDA) Generation and Integration for Health Data Exchange based on Cloud Computing A Survey,” ijcse , vol. 7, no. 1, pp. 801–805, Jan. 2019, doi: 10.26438/ijcse/v7i1.801805. J. Zhao, P. Papapetrou, L. Asker, and H. Boström, “Learning from heterogeneous temporal data in electronic health records,” Journal of Biomedical Informatics , vol. 65, pp. 105–119, Jan. 2017, doi: 10.1016/j.jbi.2016.11.006. S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions.” arXiv, 2017. doi: 10.48550/ARXIV.1705.07874. S. Campanioni et al. , “Explainable machine learning on baseline MRI predicts multiple sclerosis trajectory descriptors,” PLoS ONE , vol. 19, no. 7, p. e0306999, Jul. 2024, doi: 10.1371/journal.pone.0306999. T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , San Francisco California USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785. S. Putatunda and K. Rama, “A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost,” in Proceedings of the 2018 International Conference on Signal Processing and Machine Learning , Shanghai China: ACM, Nov. 2018, pp. 6–10. doi: 10.1145/3297067.3297080. A. J. Mitchell, “A meta-analysis of the accuracy of the mini-mental state examination in the detection of dementia and mild cognitive impairment,” Journal of Psychiatric Research , vol. 43, no. 4, pp. 411–431, Jan. 2009, doi: 10.1016/j.jpsychires.2008.04.014. P. Rodríguez, M. A. Bautista, J. Gonzàlez, and S. Escalera, “Beyond one-hot encoding: Lower dimensional target embedding,” Image and Vision Computing, vol. 75, pp. 21–31, Jul. 2018, doi: 10.1016/j.imavis.2018.04.004. K. J. Roedl, L. S. Wilson, and J. Fine, “A systematic review and comparison of functional assessments of community-dwelling elderly patients,” Journal of the American Association of Nurse Practitioners , vol. 28, no. 3, pp. 160–169, Mar. 2016, doi: 10.1002/2327-6924.12273. the “Progetto Alzheimer” Working Group, P. Pezzotti, S. Scalmana, A. Mastromattei, and D. Di Lallo, “The accuracy of the MMSE in detecting cognitive impairment when administered by general practitioners: A prospective observational study,” BMC Fam Pract , vol. 9, no. 1, p. 29, Dec. 2008, doi: 10.1186/1471-2296-9-29. Tables Table 1 Information Source (IS) Number of determinations Subjects (Residents) 2,608 Clinical Variables 414,219 Bowel Movement Records 3,948,306 Drug Prescriptions 273,679 Fall Records 8,228 Nutritional Plans 48,875 Cognitive Impairment Scales MMSE 11,747 GDS 6,350 Barthel 14,816 Table 2 All residents (2608) Male (998) Female (1610) Metrics Age at entering Follow-up time Age at entering Follow-up time Age at entering Follow-up time Maximum 105.08 18.07 105.08 18.07 102.58 15.70 Minimum 65.52 0.30 69.11 0.30 65.52 0.41 Standard deviation 10.06 3.08 9.15 3.18 10.81 2.89 Mean 81.07 2.48 82.81 2.65 78.27 2.19 Table 3 Predictor ( Pijk ) MSE ( yj ) Xi MSE(MMSE) MSE(GDS) MSE(Barthel) Clinical Variables 1.9583 0.4218 4.2018 Bowel Movement Records 8.3154 1.7000 28.1881 Drug Prescriptions 4.3584 1.0886 15.4059 Fall Records 7.2529 1.4627 24.2114 Nutritional Plans 3.6059 0.7466 11.4030 Table 4 Predictor Yj Xi MMSE GDS Barthel Clinical Variables -Tinneti_score -BMI (Body Mass Index) -MNA_Global_score -Tinneti_score -MNA_Global_score -BMI (Body Mass Index) -Tinneti_score -MNA_Global_score -Gender Bowel Movement Records -Gender -Age -Id_AssociatedSymptom -Gender -Age -Id_AssociatedSymptom -Gender -Age -Id_AccompanimentFall Drug Prescriptions - Voltaren Emulgel - Calcium Carbonate - Omeprazole - Omeprazole - Voltaren Emulgel -Gender - Lormetazepam - Nutritional Thickening Module - Diclofenac Fall Records -Gender -L_morning -E_evening -Gender -Age -L_morning -Gender -N_morning -B_evening Nutritional Plans -Id_Route_dinner -Id_Route_snack -Id_Consistency_ breakfast -Id_Route_dinner -Id_Consistency_ breakfast -Id_Route_snack -Id_Consistency_ breakfast -Id_Route_dinner -Id_Consistency_ lunch Table 5 Predictor ( Pjk ) MSE( k ) Yj XGBoost using hyperopt parameters XGBoost using default parameters MMSE 1.7179 2.0456 GDS 0.3776 0.4123 Barthel 4.0433 4.2210 Table 6 Classifier Metrics Category AUC-ROC Precision Recall F1-Score Mild impairment 0.86 0.65 0.57 0.61 Moderate impairment 0.90 0.76 0.69 0.67 Normal 0.96 0.96 0.72 0.83 Severe impairment 0.91 0.82 0.94 0.87 Macro Average 0.91 0.80 0.73 0.74 Additional Declarations No competing interests reported. Supplementary Files SupplementaryMaterialS1.docx SupplementaryMaterialS2.docx Cite Share Download PDF Status: Published Journal Publication published 05 Feb, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 18 Nov, 2025 Reviews received at journal 25 Oct, 2025 Reviewers agreed at journal 03 Oct, 2025 Reviews received at journal 03 Oct, 2025 Reviewers agreed at journal 01 Oct, 2025 Reviewers invited by journal 09 Sep, 2025 Editor invited by journal 21 Aug, 2025 Editor assigned by journal 20 Aug, 2025 Submission checks completed at journal 19 Aug, 2025 First submitted to journal 18 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7402937","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":514589844,"identity":"2f9b41ca-93a4-4181-ab78-def2b0068183","order_by":0,"name":"Silvia Campanioni","email":"","orcid":"","institution":"Galicia Sur Health Research Institute (IIS Galicia Sur)","correspondingAuthor":false,"prefix":"","firstName":"Silvia","middleName":"","lastName":"Campanioni","suffix":""},{"id":514589845,"identity":"70f33cae-c9f1-49b9-bfc0-20d74d9c4f6b","order_by":1,"name":"Laura Busto","email":"","orcid":"","institution":"Galicia Sur Health Research Institute (IIS Galicia Sur)","correspondingAuthor":false,"prefix":"","firstName":"Laura","middleName":"","lastName":"Busto","suffix":""},{"id":514589846,"identity":"05ed1a86-f181-4ab9-8b69-627348b52a3e","order_by":2,"name":"José A. González-Novoa","email":"","orcid":"","institution":"Galicia Sur Health Research Institute","correspondingAuthor":false,"prefix":"","firstName":"José","middleName":"A.","lastName":"González-Novoa","suffix":""},{"id":514589847,"identity":"7137a6e8-f92c-4570-8caa-afeaee85935b","order_by":3,"name":"Carlos Martínez","email":"","orcid":"","institution":"Galicia Sur Health Research Institute (IIS Galicia Sur)","correspondingAuthor":false,"prefix":"","firstName":"Carlos","middleName":"","lastName":"Martínez","suffix":""},{"id":514589848,"identity":"6c7ac880-c547-4839-8176-4b693c00f82a","order_by":4,"name":"Pablo Juan-Salvadores","email":"","orcid":"","institution":"Galicia Sur Health Research Institute (IIS Galicia Sur)","correspondingAuthor":false,"prefix":"","firstName":"Pablo","middleName":"","lastName":"Juan-Salvadores","suffix":""},{"id":514589849,"identity":"d46c0df2-d442-47eb-83b1-30111b9d1726","order_by":5,"name":"Irene Vieitez","email":"","orcid":"","institution":"Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO","correspondingAuthor":false,"prefix":"","firstName":"Irene","middleName":"","lastName":"Vieitez","suffix":""},{"id":514589850,"identity":"310f14f9-4dfb-452a-810a-9acc49e4491f","order_by":6,"name":"David N. Olivieri","email":"","orcid":"","institution":"University of Vigo","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"N.","lastName":"Olivieri","suffix":""},{"id":514589851,"identity":"8543c553-86dd-4abf-8f24-333ba8207e08","order_by":7,"name":"José María Prieto","email":"","orcid":"","institution":"Health Research Institute of Santiago de Compostela (IDIS), Santiago University Hospital Complex, SERGAS-USC","correspondingAuthor":false,"prefix":"","firstName":"José","middleName":"María","lastName":"Prieto","suffix":""},{"id":514589852,"identity":"d3b06be7-40b2-4431-baab-af7dd89f9dcb","order_by":8,"name":"Isabel Vilariño","email":"","orcid":"","institution":"University of A Coruña, A Coruña","correspondingAuthor":false,"prefix":"","firstName":"Isabel","middleName":"","lastName":"Vilariño","suffix":""},{"id":514589853,"identity":"41aa74bd-3bfa-4f09-90a7-e5811630aef5","order_by":9,"name":"Roberto González Novas","email":"","orcid":"","institution":"DomusVi Spain","correspondingAuthor":false,"prefix":"","firstName":"Roberto","middleName":"González","lastName":"Novas","suffix":""},{"id":514589854,"identity":"c1fed110-fdd6-4541-bf1b-72f84797d1ef","order_by":10,"name":"Alberto Rodríguez Taboada","email":"","orcid":"","institution":"DomusVi Spain","correspondingAuthor":false,"prefix":"","firstName":"Alberto","middleName":"Rodríguez","lastName":"Taboada","suffix":""},{"id":514589856,"identity":"09ff02cc-eff5-410d-82a2-c091faf12c83","order_by":11,"name":"María Ángeles Fernández","email":"","orcid":"","institution":"Health Research Institute of Santiago de Compostela (IDIS), Santiago University Hospital Complex, SERGAS-USC","correspondingAuthor":false,"prefix":"","firstName":"María","middleName":"Ángeles","lastName":"Fernández","suffix":""},{"id":514589857,"identity":"3df866cb-3b19-497d-afbe-ee78579f379a","order_by":12,"name":"Roberto Carlos Agis-Balboa","email":"","orcid":"","institution":"Health Research Institute of Santiago de Compostela (IDIS), Santiago University Hospital Complex, SERGAS-USC","correspondingAuthor":false,"prefix":"","firstName":"Roberto","middleName":"Carlos","lastName":"Agis-Balboa","suffix":""},{"id":514589859,"identity":"e4f73737-d03e-4d5f-b919-26276f39915b","order_by":13,"name":"César Veiga","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABK0lEQVRIie2RsWrDMBCGzwicJVTrZWj8CgoGQSEkryJhyJQh0CVDcVUM9lKa1UO3PkGnrgZBvfgB2s3BkK4JgZBCKLVN6SRwxkL1LXeI+7j/EIDF8pdxAZxSsLrrOQrEmQphrULOVRoL20I6xmiSrPdL0MML+rRZlotwsorI3ba8CeWLInprULAo/EEB2ndxw98E00GqnQjFq5aPmTtDg8JwDgMFMxlj1ihZANqJQbiZTKHPTcGY91F91sptjPlhIVgYeK3yFdYK3ZuCMQRebxkLl95zEIxMWKPImDRbwBQMizm/Umw8irF/jfUtYtTcIh+0nxKXmxSa5NW7WqLnrfLn3fEUTod5st4dD+Fl2osqU7CfeM0+0bZS/b52fRDQrC3TrjmLxWL5f3wDAUdhIqUlXiEAAAAASUVORK5CYII=","orcid":"","institution":"Galicia Sur Health Research Institute (IIS Galicia Sur)","correspondingAuthor":true,"prefix":"","firstName":"César","middleName":"","lastName":"Veiga","suffix":""}],"badges":[],"createdAt":"2025-08-18 21:53:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7402937/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7402937/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-34060-w","type":"published","date":"2026-02-05T15:58:16+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":91442464,"identity":"18a1517b-7994-476b-b461-954c4dcad500","added_by":"auto","created_at":"2025-09-16 14:19:37","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":145932,"visible":true,"origin":"","legend":"\u003cp\u003eScores Distribution in the dataset. (\u003cstrong\u003eA\u003c/strong\u003e) Frequency of MMSE Scale Determinations, (\u003cstrong\u003eB\u003c/strong\u003e) Frequency of GDS Scale Determinations, (\u003cstrong\u003eC\u003c/strong\u003e) Frequency of Barthel Scale Determinations.\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/4445d5c5118d9d7ce69a4c8b.jpg"},{"id":91441780,"identity":"03f14e11-20cd-48cf-b85a-ce6ae680eb06","added_by":"auto","created_at":"2025-09-16 14:11:37","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":250253,"visible":true,"origin":"","legend":"\u003cp\u003eGeneral schema of methodology proposed in this work.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/eb713b3d710a1e1d4b3f24af.jpg"},{"id":91444233,"identity":"42ab7594-ab6a-42c6-80d7-33a098d9d135","added_by":"auto","created_at":"2025-09-16 14:27:37","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":284845,"visible":true,"origin":"","legend":"\u003cp\u003eReshaping of heterogeneous dataset into a homogeneous format for monthly-based analysis.\u003c/p\u003e","description":"","filename":"Figure3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/d2290252fb63bdcdeb925fd7.jpg"},{"id":91441782,"identity":"757df015-21ba-418e-887b-b673f5d1d4ad","added_by":"auto","created_at":"2025-09-16 14:11:37","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":26475,"visible":true,"origin":"","legend":"\u003cp\u003eTable after making the dataset homogeneous. FP of each feature in Clinical Variables IS on the monthly-based homogenized dataset.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/903724bc0c4293ce11578a85.jpg"},{"id":91441788,"identity":"e7e6e4e0-55e3-4db1-97db-aebd58181aeb","added_by":"auto","created_at":"2025-09-16 14:11:37","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":234279,"visible":true,"origin":"","legend":"\u003cp\u003eRelevance and SHAP analysis of the 15 most important features of the model XGBoost regressor result of the experiment that generate predictors of the different CI scores, using all available information. (\u003cstrong\u003eA\u003c/strong\u003e) Results of the MMSE regressor, (\u003cstrong\u003eB\u003c/strong\u003e) results of the GDS regressor and (\u003cstrong\u003eC\u003c/strong\u003e) results of the Barthel regressor. Explicability is provided for the 15 most important features of the predictor. The most important is on top and the importance is decreasing moving down.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/667c70e4e878f2df84d983c9.jpg"},{"id":91442470,"identity":"b2cc179e-f34f-47e2-b7a6-c8ff27ee8973","added_by":"auto","created_at":"2025-09-16 14:19:37","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":145923,"visible":true,"origin":"","legend":"\u003cp\u003eThe relevance and SHAP analysis of the 15 most important features for each CI category in the XGBoost classifier.\u003c/p\u003e","description":"","filename":"Figure6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/354129143309c0238133cb4b.jpg"},{"id":102234176,"identity":"4b0d7f09-ff82-4a6f-8997-3d99b3aa336a","added_by":"auto","created_at":"2026-02-09 16:07:14","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1893183,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/1f1c222c-5b46-4f2b-8a05-b285f77066bb.pdf"},{"id":91442467,"identity":"cb7c45db-a21c-4a57-9a47-06c24b6fd9aa","added_by":"auto","created_at":"2025-09-16 14:19:37","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":53958,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterialS1.docx","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/c493eab9321abc2f58dc5dde.docx"},{"id":91441786,"identity":"014ca700-d99e-497d-b4df-1319f9da81b3","added_by":"auto","created_at":"2025-09-16 14:11:37","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":17904,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterialS2.docx","url":"https://assets-eu.researchsquare.com/files/rs-7402937/v1/a2b4a6de03b6527157976688.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Explainable Machine Learning with Bayesian Hyper-Optimization for Predicting Cognitive Impairment from Longitudinal Nursing Home Data","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe accelerated aging of the global population has led to a growing prevalence of neurodegenerative disorders, underscoring the importance of investigating physiological and cognitive processes over time [1]. Many of these conditions feature prolonged prodromal phases, offering a critical window for the early identification of cognitive impairment (CI) before it significantly disrupts daily functioning [2]. While current clinical methods are effective in detecting CI once symptoms are evident, they fall short in monitoring asymptomatic individuals over extended periods.\u003c/p\u003e\n\u003cp\u003eAlthough biomarkers such as hyperphosphorylated tau protein (p-tau) and amyloid-beta (A\u0026beta;) have shown promise in detecting neuropathology prior to the onset of CI, their invasive extraction via cerebrospinal fluid (CSF) poses practical limitations [3]. Similarly, neuroimaging techniques such as magnetic resonance imaging (MRI) and positron emission tomography (PET) have demonstrated potential to anticipate CI well in advance of clinical symptoms [4, 5]. However, these modalities involve high costs and strict standardization protocols, limiting their scalability for large-scale early screening [6, 7].\u003c/p\u003e\n\u003cp\u003eIn contrast, Electronic Health Records (EHRs) and data collected in long-term care environments offer rich, albeit heterogeneous, sources of personal health information [8]. These include data related to physical activity, bowel movements, nutritional plans, mobility scores, pharmaceutical history, and metrics derived from wearable sensors [9, 10]. Although diverse in format and granularity, these datasets hold significant potential for generating clinical insights, provided they can be harmonized into a consistent format suitable for AI applications.\u003c/p\u003e\n\u003cp\u003eSeveral standardized scales are widely used in geriatric environments to assess CI and functional ability. The Mini-Mental State Examination (MMSE) evaluates cognitive function based on memory, attention, and language [11]. The MMSE scores were obtained using the Spanish adaptation known as the Mini-Examen Cognoscitive (MEC) [12], widely used in Spanish clinical settings. The Global Deterioration Scale (GDS) categorizes cognitive decline into seven stages [13], and the Barthel Index quantifies a patient\u0026rsquo;s independence in daily activities [14]. Together, these scales provide complementary perspectives on cognitive and functional status in elderly individuals [15].\u003c/p\u003e\n\u003cp\u003eArtificial Intelligence (AI), particularly Machine Learning (ML), offers powerful tools to analyze complex clinical data [16]. However, most ML models struggle with temporal heterogeneity, sparse sampling, and inconsistent follow-up intervals inherent in real-world longitudinal datasets. While ML excels in modeling cross-sectional data, longitudinal modeling remains a challenge due to issues such as multidimensionality and information sparsity. Addressing these challenges is essential for extracting insights from diverse biomedical data types, including time series and imaging modalities [17].\u003c/p\u003e\n\u003cp\u003eThe incorporation of Explainable AI (XAI) is increasingly essential for establishing and maintaining trust in these models when applied to clinical decision-making. XAI enables practitioners to understand and validate the rationale behind model predictions, fostering confidence and promoting responsible adoption [18, 19].\u003c/p\u003e\n\u003cp\u003eIn this work, we propose a novel framework for predicting CI in nursing home residents using longitudinal, heterogeneous data. The methodology involves the harmonization of diverse information sources (IS) -including clinical metrics, medication, nutrition, bowel movements, and fall events -into a homogeneous temporal matrix for CI prediction. Our method combines Bayesian hyperparameter optimization of gradient boosting (XGBoost) models with SHAP-based explainability to not only predict CI-related scales but also quantify the relative predictive power of each type of data. This dual focus on accuracy and interpretability aims to bridge the gap between ML models and actionable clinical insight in nursing home populations.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec2\"\u003e\n \u003ch2\u003eDataset:\u003c/h2\u003e\n \u003cp\u003eThe dataset originates from a retrospective cohort study conducted at four DomusVi nursing homes in Spain. Ethical approval was obtained from the relevant committee (Code: 2023/576). It includes records from 2,608 residents and comprises 4,718,828 activity logs collected over a 13-year period (2011–2024). All methods were carried out in accordance with relevant guidelines and regulations, including the Declaration of Helsinki and institutional policies on research involving human participants. All participants or their legal guardians provided written informed consent prior to data collection, in accordance with institutional and national ethical guidelines. Each record is timestamped, allowing longitudinal tracking. Data are grouped into nine Information Sources (IS), each capturing different aspects of residents' health and behavior. Table 1 summarizes the number of records per IS. All data entries include timestamps, facilitating alignment along a common temporal axis.\u003c/p\u003e\n \u003cp\u003eDemographics (Subjects IS)\u003c/p\u003e\n \u003cp\u003eThe Subjects IS contains a unique identifier for each resident, along with gender, age at entry, and follow-up duration. Table\u0026nbsp;2 presents summary statistics (mean, standard deviation, minimum, and maximum) for the overall cohort and stratified by gender (998 males and 1,610 females).\u003c/p\u003e\n \u003cp\u003eClinical Variables IS\u003c/p\u003e\n \u003cp\u003eThis IS includes 40 clinical variables recorded during follow-up, totaling 414,219 entries. Some variables were measured frequently, while others had sparse entries. This IS also includes the Tinetti and Mini Nutritional Assessment (MNA) (determinate for three variables: \u003cem\u003eMNA_Global_score\u003c/em\u003e, \u003cem\u003eMNA_Assessment_score\u003c/em\u003e and \u003cem\u003eMNA_Screening_score)\u003c/em\u003e scores that measure the functional mobility and nutritional status of individuals, respectively.\u003c/p\u003e\n \u003cp\u003eBowel Movement Records IS\u003c/p\u003e\n \u003cp\u003eThis source contains 3,948,306 daily records. Each entry includes time of day (morning, afternoon, evening) and one of seven stool classifications: X (no movement), N (normal), D (diarrhea), B (soft), F (impaction), L (liquid), and E (constipation).\u003c/p\u003e\n \u003cp\u003eFall Records IS\u003c/p\u003e\n \u003cp\u003eWith 8,228 entries, this IS captures categorical data about fall events. See Supplementary Material S2 table for the whole description of categorical variables. Such categorical variables include the following content, describing the episode as: accompaniment during the fall (\u003cem\u003eId_AccompanimentFall\u003c/em\u003e), conduct to follow (\u003cem\u003eId_ConductToFollow\u003c/em\u003e), associated symptom (\u003cem\u003eId_AssociatedSymptom\u003c/em\u003e), and type of injury caused by the fall (\u003cem\u003eId_TypeOfInjuryFall\u003c/em\u003e). Additionally, this IS includes variables for the location of the fall (Id_FallLocation), cause of the fall (\u003cem\u003eId_CauseFall\u003c/em\u003e), neurological state (\u003cem\u003eId_NeurologicalState\u003c/em\u003e), and activity (\u003cem\u003eId_ActivityFall\u003c/em\u003e) along with the timestamp of the event. If the resident has no falls, there are no inputs in this IS.\u003c/p\u003e\n \u003cp\u003eDrug Prescriptions IS\u003c/p\u003e\n \u003cp\u003eThis IS comprises 273,679 records of medication administration, specifying drug name, dosage, frequency, and administration start and end dates.\u003c/p\u003e\n \u003cp\u003eNutritional Plan S\u003c/p\u003e\n \u003cp\u003eThe Nutritional Plan contains 48,875 records detailing the start and end dates of each resident's dietary plan. Each record includes the plan identifier (\u003cem\u003eId_Diet\u003c/em\u003e), calories information (Id_Calories), route of administration (\u003cem\u003eId_Route\u003c/em\u003e), and consistency (\u003cem\u003eId_Consistency\u003c/em\u003e). For subsequent analysis, only meals categorized as breakfast, lunch, snack, and dinner are included. Using this criterion, 46,565 determinations from the table will be analyzed.\u003c/p\u003e\n \u003cp\u003eCognitive Impairment Scales IS\u003c/p\u003e\n \u003cp\u003eThe Cognitive Impairment Scales contains the determinations of CI using three different scales, namely MMSE scale, Barthel scale and GDS scale for each resident, describing their condition throughout the follow-up study. The number of determinations for each scale varies, as evidenced by the values in Table 1, with an average of 2.08 determinations per patient for MMSE, 1.37 determinations per patient for GDS, and 2.57 determinations per patient for Barthel. The distribution of the frequency for each scale's values is provided in Fig. 1, illustrating the prevalence and severity of CI among the study participants over time. It is remarkable that the number of determinations with MMSE = 0 corresponds to residents that are not able to undergo the test.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec3\"\u003e\n \u003ch2\u003eAI tools (predictors and explainability):\u003c/h2\u003e\n \u003cp\u003eML predictors (Bayesian Hyper-optimized XGBoost predictors).\u003c/p\u003e\n \u003cp\u003eXGBoost [20] is a Boosting technique within Ensemble Learning, designed to enhance prediction accuracy by combining multiple models. Unlike other boosting methods that increase the weights of misclassified examples, XGBoost optimizes a loss function through gradient boosting. Its objective function, optimized at each iteration, is given by Eq. 1.\u003c/p\u003e\n \u003cdiv id=\"Equ1\"\u003e\n \u003cdiv id=\"FileID_Equ1\" name=\"EquationSource\"\u003e$$\\:L\\left(t\\right)=\\sum\\:_{i=1}^{n}l\\left({y}_{i},{\\stackrel{\\sim}{y}}_{i}^{\\left(t-1\\right)}+{f}_{t}\\left({x}_{i}\\right)\\right)+\\varOmega\\:\\left(ft\\right)$$\u003c/div\u003e\n \u003cdiv\u003e1\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003ewhere, \$\\:l\$ represents a differentiable convex loss function, \$\\:{(x}_{i},{y}_{i})\$ is the training set, \$\\:{\\stackrel{\\sim}{y}}_{i}\$ is the final prediction, and \$\\:\\varOmega\\:\\left(ft\\right)\$ is the regularization term that prevents overfitting by penalizing complex models using Lasso and Ridge regularizations.\u003c/p\u003e\n \u003cp\u003eXGBoost constructs the next learner by maximizing loss reduction, using the Exact Greedy Algorithm. This process starts with a root node containing all training examples, evaluates potential splits for each feature, and stops growing a branch if the gain for the best split is not positive. Setting up XGBoost involves three types of parameters: general, booster, and learning task parameters. General parameters define the booster type (tree or linear model), booster parameters configure the internal aspects of the booster (like learning rate and number of estimators) and learning task parameters specify the learning objective.\u003c/p\u003e\n \u003cp\u003eBayesian hyper-optimization enhances ML predictors, including XGBoost, by systematically optimizing hyperparameters for maximum model accuracy [21]. It uses Bayesian inference to balance exploration and exploitation in the hyperparameter space, making the search more efficient than traditional methods like grid or random search. By defining a probabilistic model of the objective function and updating it with new data points, Bayesian optimization quickly identifies the most promising hyperparameter regions, thus speeding up training and improving generalization. For XGBoost, this involves tuning parameters like learning rate, maximum depth, and subsample ratio, which are iteratively adjusted based on validation set performance. This results in a robust model with superior predictive accuracy and reliability, making Bayesian hyper-optimized XGBoost predictors a powerful tool in ML [22].\u003c/p\u003e\n \u003cp\u003eShapley Additive Explanations (SHAP)\u003c/p\u003e\n \u003cp\u003eSHAP is a powerful tool used in explainable AI to reveal the importance of input features in model predictions [18]. It applies principles from game theory to quantify the contribution of each feature to the model's outputs. By considering all possible combinations of features, SHAP provides insights into how these features interact and collectively influence predictions. Eq. 2 represents the mathematical expression of SHAP values. This comprehensive approach enhances transparency in complex ML models, enabling informed decision-making regarding feature importance and model behavior. Eq. 2 represents the mathematical expression representing SHAP values.\u003c/p\u003e\n \u003cdiv id=\"Equ2\"\u003e\n \u003cdiv id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAlQAAAAvCAYAAAAl4lQCAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAABCnSURBVHhe7Z3Jqx3FF8crv72zK1EJRkFREZxRE1DQOC2UKHHYCIoSdZPgEBwW4hgH3DhEUZBgcAy4MI6gYFScUVAUHBCnVYzTH/B+91OvT6hXr7q7um/fvve++/1A03379lh16tSpU6eql80NcEIIIYQQojX/K9ZCCCGEEKIlMqiEEEIIIYZEBpUQQgghxJDIoBJCCCGEGBIZVEIIIYQQQyKDSgghhBBiSGRQCSGEEEIMiQwqIYQQQoghkUElRsL777/vzj777OJXPfHx999/f/J8jrv22mvdTz/91PgeIX2eO8y9Uvz555/ukksuca+99lqxRwghxLiRQRVARUVFTmUluuerr77yxtCtt97q14ceeqh7/vnni3/r4dhHHnnE3Xnnne6QQw4p9i4E42XZsmV+eeKJJ4q9zhs07DvhhBOKPd2CcYPcmCHIfXjfYUnJ5H777efT4tVXX22UfkIIIUaHDKoCKt/LL7/cPfnkk+7vv/8u9oouWbNmjTv//PPd3Xff7R577DF31llnuT333LP4dzHHHXdcsTVvWNx2223u0Ucf9QZFGaeddpq75ZZb3IoVK9xTTz1V7HVu69atfh3u6wq8Zeedd55/tptuusm98cYb7q+//nIHHnhgcUQ76mQSw5I04f5CCCHGiwyqgiOPPNJXhFdffXWxR3QJ3poff/xxgQGFUXXuuecWvxaz1157FVvOPf30094AqzKmDO5z1113uc8++2y3l+iff/5xxx9/vDvmmGP87y55++23/Tp8th9++CHrWauok0muv3btWp82QgghxosMqgI8G2J0YMjgNVq/fn0rj8o777zjvVs5vPXWW76LjPu9+OKLft8nn3ziDbJRcOaZZ/o13Zh40roiRybPOecc98ILLxS/hBBCjAsZVKI3tm3b5td4inJifw4++OBiy7k333yzsnvQwCNlhtONN97oHn/8cb/93nvvuaOPPtpvl2GxV6mlKqiceK7t27f7dzrppJNqY6dS17elTfA6HjkhhBDjRQaV6A28VJ9++qn3Hl166aW1hkebGKSPPvrIHXvssX4bzxGxTASM47U68cQT/f4y5ubmShe63qqg6/L77793++yzjzv99NOLvWlS17el7j5CCCEmk84MKipJDeNeDF6LlCeibGGU2lKH2Cn477///DoFwd1tumHfffddd8opp/htPEerV692W7Zs2f17lBDT9PDDD3sjbpLpWuZsWohJHXGI4Y5+2nfffWs9gIyoJE0YpZnzPgwc4NqzUG5zsPSzhd9CzAqdGFTEjeABqAowngZ4D2J1CCjuKhYGZUulDuvWrUt6JXbs2OE9G0sVKp1QsZK2vO8BBxxQ7FkMFV9Y+ZGGv/32W/ErjcnhEUccUexx7oorrvAxRqOKn4or03///dd3aXZFjkwSK9YU0tPkr8xwpYGEYWGVYzgNhUHcGHFxmzZt8mkRE18Do4bjquLoyHeOrSLnGMCYMo8hgxRSHkCuw3sAhvzOnTvdVVdd5b2oTPFRxTXXXOOuv/56nwZ1x/YBMsK7kM6W3mVgMDJ1SZwXZoByvi2kN3LOf1XGJulnciXEJIM+q+slCaF818X/tjKoqBxDJUncSEqZThMojP3339/H6hCTwnZYoQ8DHhkMCOJ5Ul48KrSNGzcWv5YejLB76aWXvEAiO9ddd533JDXxGJ1xxhk+DqoKphjAO8TaQC5J+1WrVhV7uofKlPfi/ZgbqqtuuxyZfP3110diLFJhMhUEHjcqR4zEgw46qPh3Ht4ZA5b3TY2etGtgnFglS35g4IYjOEcJShOZYEqLlLyZMRw+P55GDCUaQBaDVwXllzTg2JTR2RcYU8jHrl27fPczcX0pY5vj0N+vvPKK+/jjjxekixmgGGIYluTZc88952WQxgLpRP5xLvcqM/KFmFSsnFx00UW7yz370N/WEEnNkbhhwwY/JU+lETYoMI0YtGznBsp1bseOHf73oNDNDSosmiNzg4Ln94nFbN682acRaUWaxQwqTP+/peu0w3sgK7nUHU/6DCqHBWnX9B4hfZ47zL2qsLJH2jQBOat6HrvuwKAo9izG5HVQaRd7FpO6BmlBPlbBs3HtKnKOAY5rm/abNm3KuoeB/uP4pvnRFfa8dfcnPdauXVv8Wgj7U/mTum7VdYBzeCYhJgnkFj0UghyzH3kdGE27bRrq7RB0I+WjrIw18lBhsdFSYXJE6yZgDh5anbj8cZFrksE0tHiZMyj2oBi0Egf50SpuaBYgfZhbirRTq3ie22+/3Xs/m3j6cmBeK+S0apqKBx980Hs/yrr5acVxjeXLlxd75kG+6b7sk77uZ97QccwLht5lAlg8iVXyYDocj10KPI60zmPQTfF1kT28VXFLXohJBVml/gjrWcoOXlm8zHRZM/E0PSjwwAMP+LWB95p5AS1EIKaRQfXQQw/52IvQPf7MM8+4yy67zLvPwCY5FItBiaFwUWjj7BqYVqiwmBkcQ2KWDXcUAmmBwcO6S3B333zzzX6b7jp+h/FvBoqpqqvRRmhSyY/TAKareNCaLH4txEIW4m5U44svvmgcn0aa9D0vGPnDc/KexInZe6Wg/NCVWTbpLNdBP+XElmBg0UikXhBiGkD+L7744uLXPMgxDo8QbBxsHbrPY9hfWka8nyoTDg9duLi9cH8ZdAXiNhPl0EVCOrJ8+eWXxV4h+gPZqyqnZd0+BnIb64IUuM45Dr3QRNZ5Ns6rIucYwLXPcbGLH3Df8x/PmYJ7NNVnpAnXDLum+8DypCrsIucY0090eVR15xoWypDqAmF/nYwI0Rcm2yldkIKyXxb2wHVS/w01yg+3P90wRlmrJwdaWdayylmmFbpIaNXBvffe69dCTBJ0+1R5n2y6C5ueogxc5wRx4jXBpd531xCeMT6mTTdYqiudwRJw6qmn+nUM35Jkhv02UyJ8++23xVY/MP8aVM21ZsdUze+GfiIIHfBQ1o1c5PNI8N133/m1EJPK119/7dd77LGHX1eB7iBU4IYbbij2LARPLnMqxrQ2qLghitc+uzEs4XDbnKUJKYOsq6UN99xzj1+TKZMGhi0xFGXdIFQuvDddTWXdAnEaaelvGRa6Uol9Crv1hwGjatAi9Nt9x1gyKhJjrqwLDmMJDj/8cL+O4dmR85UrVya7PdtC2UrlXWopK4cxlEXCCarip8yArIvT5J1JNwxRdFWOIWyVlRCTCtPPQI5uIw4SZ1FZeaKOpIzEtDaoiJWiFRt6pVKxEkwTgGIYZ8xLyiDramkDAW0oKxR2FaQnx9JKZCET61qMKUw5h/E2oUfQ8g3Fed999/m4uHDof5iHKOOdO3f64aVr1qwpjlhIKp209LMMixkZ5nnoAmTm2Wef9dsvv/yyX/cBcko5M49wDNNw0KgpU5qUEcoEBiENvq6gbKXyLrWE5bAKGrd1XwJoAmli90YfCDEr0Djhs2dt4lMbGVQoJwI1geCu0B1GpYzFduGFFxZ75sGFjGKoajnBrHT58Z5UWjmBqwRf0+rE8GIJu1chnDcjXsIW9fZiPhruacYTFQT5uXnz5t1G8S+//OIDVuMWbJyHHE8QX1mwr5hebK6vKi+Gucw//PBDv86hbDRgCrraugA5ZdJN9FKq2w4jhDJQBi1ajJQ6j06KcHLZUUNDh7JIAH6XkH6rV89PSizELIAx9fvvv7cypqCRQUV0vA2TxVsSGkm4yKj8wy5Ac22HlXsZo+zymxRQ6oygorVeZ2AC6RxOgEgmm1cLY4oKgcoilT5hi/rXX3/1Hwomf0IPAV07NjqzjLI8zOmHFtMHcQF1lSguc2Tp559/LvYsBCM/9kibQRN+8LoM+4h1WZcyEN+QU9mb0RcbRTQsMELsu48p9t5778ZTLtjIwNBzP2osfqkups10Scq4RLfEXXukEY2/HF111FFHFVtCTCbW4EjJP5gxldP4o2ykQnYaGVRXXnmlb9ERC2EPx4WpbDEU4jlxtm7d6td1BX0WIJ34DApBumUZxuzFYWaj5OiCSwkAszLfcccdWcqOOTXowuF6zCEGVHhUiqHip7L7/PPPi1/zlOXhH3/8Udm6F9MHMoqBnuMhQpYw6GOQKwx1BqxwPUB+mU0eeeG8OmiUIZtlgzbogsYYojzVgZcpJacWNI7M83yprnSMraZeWNKkrItxVHzwwQd+XecVO/nkk/069Qkn8p3pD8wQZs0XDaAsMBe++eYbvy6LQxNiUrA6jBn/Y9AB2DDEAmLP2ELITWqKI8pLcuDOXEMYcjhQdrtnEmVhmHVqKOJAGfn/+x5CPImQRgPFXvxajKVVmI6kG+exPx7aHR9bxcCS9msbNs2aodPxsE/uwf1CyvKQe3Nd5e30QX6mpgOwYcVVw+oNk4v4WPZzbWSD/1nYZvh8E1nhupyLjHJN4HyG6aN7yoYzx/AsqXcFyqPdI/VsNgVCLvbM9rx9wXtYGa+D41JpZ/qJ52exNK57lyq9xnU0bYKYJEwXhZjeK1ti3WD1aKr+bWxQoZxyK3IUTJUR0Sc8c5lirSM+l98kaC5WCVQpJ5RXWSaZog6nwed6OfPEkPnhsyNQGE7cL6wMOSb1jFV5aM8spgvyLFUWzIDINQiQI+RpVEY1ZYEKG7nkuey5Q7mtw2S0zTNyLvfOgetz7DgMCN6PZ83BdEkXRh/X4Fpl+cF/MqjEJFFVn+VCPZzSnzDS2pBCnlvQR03KoLIKhAXDw7B95hUaxqAyazZ3SRlUwP1D5cSzIRh1ipHMD89jm0owdW7KQ1WWh7zXKCvTvsiVgaUE75VSCMhEU2WDbHBOmHaTBM+FoYNc5xoRyDTlJjf/uQdpMA5ZsdZ1TuPK4Fnjct4GZCglRwbPFeoeISYBZL9JeQlBN1B+ynTJSA0qMwKwCssMhb5IGVRA4pjXxiCxQ2MhZVBxXg6ch2LJXcgoFHT8rNwvFgKeOWy9h4spMq4Tnsf1+Z/3iyGf4vvyO5WHTdJg0smRgaVELCvkJfnLdhtFY+Wjzbl9QHkyT1cs3zFmYCMTpEkdGF5cOywbfWHKvWk5tPN47jbyzTmcS1rG51v62WJ6SIhJAZlFdts0Aqkjqs4bqUHFzSlUYUU1Lkzpx1BpohRD1z6KNHzm+Nyya3UFFRMKj9Y/Col1jnKPCZVbCMow5XXi+FgBluUhQhVfd1rJkYGlDLJGGrSRMTEe0BGUv7ZGEZDf5H2u5w4o95wjWRHTDjq/ieznHL80asQMUkYQyoF95rWxFjbGRtja7tugyoHnrVratAw5J9eIWCoGVa4MCCGEEFW0nil9KcC3rZiIlKkHBi09t2XLFr+foc91w4BzpisYJYO8q1zazOzMEHKmY0hN0xDCcHiGkq6YwE/nNGUYGRBCCCGMmTaobH4muOCCC/ykpTaZYJ3BtHz58mJr6cAcQRs3bvTz+1R9y49vpO3atctt27at2Du9DCMDQgghhLFsDnfGDIAhwKdbwm9jMaMzhgHgdTnssMP8ZF3sZ5JSI3WuWBrkyoAQQghRxcx6qDCSwi4rZgzHQ4OHYtWqVcXeNMygyudYxHQzjAwIIYQQITNrUK1cudJPH49xZPDBX+jyq+1icpEMCCGE6IqZ7vLLRV1+QgghhKhipoPShRBCCCG6QAaVEEIIIcSQzEyXnxBCCCHEqJCHSgghhBBiSGRQCSGEEEIMiQwqIYQQQoihcO7/1dobk3Vis2wAAAAASUVORK5CYII=\"\u003e\u003c/div\u003e\n \u003c/div\u003e\n \u003cp\u003ewhere 𝑓(S) represents the output of the XGBoost model based on a particular subset of features, denoted as S, from the complete set N of all features. The contributions (\$\\:{\\varnothing\\:}_{i}\$) are determined by averaging the impacts across all possible permutations of feature sets. As each feature is sequentially added to the set, its influence on the model's output change becomes apparent.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eMethodological Framework:\u003c/h3\u003e\n\u003cp\u003eIn this work, we present a comprehensive framework employing AI predictors (regressors and classifiers) based on XGBoost algorithms, using both default parameters and Bayesian hyper-optimization to construct AI predictors of CI. This framework allows to evaluate the informative contribution of various IS within a heterogeneous dataset on the living conditions of elderly individuals in nursing homes, addressing the outlined problem. Finally, by employing SHAP to comprehend the reasons behind such predictions, we aim to gain a better understanding of the CI process. The general schema of the methodology developed in this work is depicted in Fig. 2.\u003c/p\u003e\n\u003cp\u003eThe proposed method requires the recodification of the heterogeneous information collected from residences to create longitudinal models in a suitable homogeneous format. After this, the framework will enable the production of a family of predictors \u003cem\u003eP\u003c/em\u003e\u003csub\u003e\u003cem\u003eijk\u003c/em\u003e\u003c/sub\u003e from \u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e datasets, which are created from \u003cem\u003ei\u003c/em\u003e IS, \u003cem\u003ej\u003c/em\u003e outcome variables that define clinical questions (\u003cem\u003ey\u003c/em\u003e\u003csub\u003e\u003cem\u003ej\u003c/em\u003e\u003c/sub\u003e), and \u003cem\u003ek\u003c/em\u003e sets of hyperparameters for ML algorithms that generate possible models. The parameter \u003cem\u003ek\u003c/em\u003e can iterate over different ML algorithms and various sets of hyperparameters for each algorithm. In this work, for simplicity, we propose reducing the ML algorithm to XGBoost [19], used as a regressor and classifier. We propose using the information from the Subjects (Residents) IS in combination with each of the other tables (Fall Records, Clinical Variables, Bowel Movement Records, Drug Prescriptions, and Nutritional Plans) to form the \u003cem\u003eXi\u003c/em\u003e datasets and create combinations to predict the \u003cem\u003eyj\u003c/em\u003e (MMSE, GDS, and Barthel scales). After identifying the best predictor (according to the AUC metric), the explicability stage will help in identifying the reasons behind the decisions made.\u003c/p\u003e\n\u003cp\u003eThus, the methodology consists of three main steps: the first focuses on recoding data to transform the heterogeneous information of each individual into a homogeneously structured matrix and make it possible to use the dataset on the family of \u003cem\u003ePijk\u003c/em\u003e predictors. The second step identifies the best predictors for specific clinical outcomes. The final step employs XAI to understand the rationale behind the model's decisions of \u003cem\u003ePijk\u003c/em\u003e. In this way, the methodology allows us to run experiments to elucidate the importance of each data source for predicting the CI, as presented previously.\u003c/p\u003e\n\u003cp\u003eData recodification\u003c/p\u003e\n\u003cp\u003eThe first step is to merge all the IS to create a single time axis that contains all the collected information. To produce a homogeneous dataset from the longitudinal multidisciplinary IS, the proposed schema for analysis requires reshaping the data into a new format, as illustrated in Fig. 3. In this paper, we propose a monthly-based recodification where data for each resident are organized monthly for both \u003cem\u003eXi\u003c/em\u003e and \u003cem\u003eyj\u003c/em\u003e variables.\u003c/p\u003e\n\u003cp\u003eThe number of determinations of clinical events over time varies greatly and occurs without a temporal pattern, except for Bowel Movement Records and Nutritional Plans, which are recorded three times daily (morning, afternoon, and evening) and four times daily (breakfast, lunch, afternoon snack, and dinner) respectively. To conduct the monthly analysis, different considerations are taken depending on whether the variable describing the event is categorical or numerical. For the recording of the categorical determinations of daily bowel movements, the number of occurrences of each reference value is accounted for the entire month in each period. This leads to a reduction of data into 21 variables, representing three time periods (morning, afternoon, evening) across seven categories, with each variable representing the number of monthly occurrences. For the Fall Records IS, the monthly number of falls is logged along with the associated information for eight variables, as described in Dataset section.\u003c/p\u003e\n\u003cp\u003eIn the case of the Drugs Prescription IS, the days of the month on which each medication is taken are recorded, along with the active ingredient and corresponding dose. If multiple medications are taken, the same procedure is followed, associating each medication with the patient's monthly record. This produces a monthly vector of variable length for each resident, depending on the number of medications administered. To obtain a fixed-length vector computationally, the principle of one-hot encoding can be used [23]. This algorithm produces/encodes as many columns as possible medications to be administered, producing a triple description for each (in terms of the active ingredient, dose, and frequency, using zero for the drugs not consumed in that month period).\u003c/p\u003e\n\u003cp\u003eIn the case of the Clinical Variables IS, if there are multiple measurements within a month, the average, standard deviation, minimum, and maximum values are extracted. For the Tinetti and MNA scores, the measured value within a 3-month period is retained. Since several variables have empty values, indicating the absence of data for those variables in certain months, it is necessary to use a filling parameter (FP). The FP represents the proportion of non-empty values for each variable, providing insight into how thoroughly each variable is documented within the dataset. Figure 4 illustrates the FP for variables of Clinical Variables IS. The variables are listed on the x-axis, while the FP percentage is on the y-axis. From the figure, we observe that variables such as the MNA Score, Tinetti Score, and blood pressure (systolic and diastolic) have a high FP, nearing 100%. This indicates that these variables are almost always recorded. Other variables, such as Weight, BMI, and Heart rate, also show relatively high FPs above 60%, represented by a dashed green line indicating the 50% threshold. In contrast, many variables have significantly lower FPs, indicating that they are less frequently documented. The low FP of these variables suggests that they might not be consistently available for analysis, which could impact the comprehensiveness of any study relying on this data.\u003c/p\u003e\n\u003cp\u003eBefore data monthly-basis, it is essential to ensure continuity for the discontinuous variables that require prediction, namely MMSE, GDS, and Barthel. In this study, we propose utilizing the most recent recorded value until a new determination is documented, within a 3-month period [24, 25].\u003c/p\u003e\n\u003cp\u003eBuilding predictors for CI Scales (\u003cem\u003eP\u003c/em\u003e\u003csub\u003eijk\u003c/sub\u003e)\u003c/p\u003e\n\u003cp\u003eGiven a homogeneous dataset, as the monthly-based harmonized data produced in Data recodification section, the problem of identifying the amount of information in each IS of the dataset, can be posed in a general manner as identify the parameters of a family of predictors \u003cem\u003ePijk\u003c/em\u003e, that can be regressors or classifiers, depending on the underlying clinical problem associated with the output variable \u003cem\u003eyj\u003c/em\u003e. In a general manner approach, given a homogeneous dataset X = \u003cem\u003eXi\u003c/em\u003e, it can be understood as a joining of variables from n different IS, to predict (regressing or classifying) \u003cem\u003ej\u003c/em\u003e outcomes variables y = \u003cem\u003eyj\u003c/em\u003e. Examples of this outcome, proposed in this work, are the prediction of the CI scales, presented in Dataset section. In this work we propose to use XGBoost [19] as predictor and use default parameters and Hyper-opt, which uses Bayesian methods [20] to optimize the search of the \u003cem\u003ek\u003c/em\u003e sets of parameters. The method allows us to identify the set of parameters that provides the best performance in the prediction of \u003cem\u003ej\u003c/em\u003e clinical questions. Each predictor will have a performance metric.\u003c/p\u003e\n\u003cp\u003eUsing the monthly-based IS of residents, it is possible to predict CI scores described above (MMSE, GDS and Barthel). These predictors can be used as a measurement of the predictive value of the dataset. The predictors are hyper-optimized to warrant that the best possible performance in the cases is identified and employed to make the comparisons.\u003c/p\u003e\n\u003cp\u003eArtificial Intelligence Explainability (XAI)\u003c/p\u003e\n\u003cp\u003eOnce the predictor models are built it is possible to employ XAI tools to analyze the black box. The final predictor \u003cem\u003ePijk\u003c/em\u003e can be explained using the SHAP approach [18]. In this methodological stage information of the importance of the variables is obtained as well as the explanations of the role that the variables play on predictions.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eSeveral experiments have been conducted employing the above-described methodology to evaluate the questions posed in this work. To ensure comparability of results, the data was split into training and testing sets with an 80 to 20 ratio, respectively. This split was done with the same patients from the Subjects (Residents) IS, ensuring that the information of each patient remained in the same group across all predictors.\u003c/p\u003e\n\u003ch3\u003eMeasuring the recodification process:\u003c/h3\u003e\n\u003cp\u003eThe heterogeneous dataset described in Dataset section is transformed into a new homogeneous monthly based format using the methodology proposed in Methodological Framework section (recodification step). This transformation results in a new matrix comprising 65,440 months of follow-up and 7,831 columns representing the variables result after recodification the IS described in Table\u0026nbsp;1. Specifically, there are 3 columns from the Subjects (Residents) IS, 21 columns from the Bowel Movement Records IS, 40 columns from the Clinical Variables IS, 8 columns from the Falls Records IS, 7,743 columns from the Drugs Prescriptions IS (using the one-hot encoding method to separately associate monthly information for each medication), and 16 columns from the Nutritional Plans IS.\u003c/p\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003eComparing the different IS (Xi tables):\u003c/h2\u003e\n \u003cp\u003eIn this experiment, the dataset is split into five different \u003cem\u003eXi\u003c/em\u003e IS to produce a predictor with the information from each IS. Each \u003cem\u003eXi\u003c/em\u003e contains the information from the Subject (Resident) IS along with each of the five remaining IS (as shown in Fig. 1). In this case, as the Clinical Variables IS have many empty registers, only the variables where the FP was greater than 50% (resulting in ten variables that exceeded this threshold, as shown in Fig. 4) have been considered. The three clinical scores (MMSE, GDS and Barthel) at each month are the predicted variable \u003cem\u003eyj\u003c/em\u003e.\u003c/p\u003e\n \u003cp\u003eResults of such comparisons are provided in Table\u0026nbsp;3. The five separate datasets after a split of training and testing of (80/20) were employed to fit three predictors. The k parameters of the predictors are hyper-optimized using the methodology described above and the testing dataset is employed to make predictions of the different IS of this set of residents. As the values of the score of the testing set are also known, this allows us to measure the differences between the predictions and the golden standard. Results are provided in terms of Mean Square Error (MSE). Values of errors on predictions depend on the score to predict, as the outcome values of the score ranges significatively. MMSE that ranges in 0\u0026ndash;30, produces error predictions from 1.9 to 8.3; GDS that ranges from 1 to 7 produces error predictions ranging from 0.4 to 1.6. Barthel that ranges from 1 to 100 produces prediction errors from 4.2 to 27.5.\u003c/p\u003e\n \u003cp\u003eAs the purpose of this experiment is to compare the discriminative power of the different IS, the best predictions are produced when Clinical Variables IS is employed, while the bigger errors appear when Bowel Movement Records IS is employed.\u003c/p\u003e\n \u003cp\u003eAs the method allows us to identify which are the most informative variables of each IS, Table\u0026nbsp;4 presents the three main ranked variables (features) used to build the 15 predictors. The table highlights the distinct sets of features associated with each predictor, demonstrating how different variables contribute to the model\u0026apos;s performance across the various IS categories.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eComparison of the prediction models using all IS to predict several CI scores:\u003c/h3\u003e\n\u003cp\u003eIn this experiment, the entire monthly dataset X (merging the six \u003cem\u003eXi\u003c/em\u003e IS tables in one) was used to train and test the prediction of three different CI scores: MMSE, GDS, and Barthel. The dataset was split into training and testing sets using an 80/20 ratio, resulting in two subsets of data spanning 52,352 and 13,088 months, respectively. This split was done randomly while ensuring an equal representation of all categories. The same dataset subsets were used for comparisons between the different outcomes. The results of the predictors are presented in Table 5, measured in terms of MSE.\u003c/p\u003e\n\u003cp\u003eFor all three CI scores, the XGBoost model with hyperopt parameters outperformed the model with default parameters, as they reduce the prediction errors in all models. This indicates that hyperparameter optimization significantly enhances the predictive performance of the model. The merger of all information sources in a single dataset produces the best predictors as it provides lower values of MSE in all predictors.\u003c/p\u003e\n\u003ch3\u003eExploring Classifiers:\u003c/h3\u003e\n\u003cp\u003eIn this experiment, the whole harmonized dataset (the same as the previous section) was employed. But now, the \u003cem\u003ePijk\u003c/em\u003e were utilized as classifiers rather than regressors. To achieve this, the outcome variable MMSE was used as the score for CI as categorical, using the following commonly accepted threshold levels for the conversion: a Normal score is 24 points or more; Mild Impairment is between 19 and 23 points; Moderate Impairment is between 14 and 18 points; and Severe Impairment is less than 14 points [\u003cspan class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e\n\u003cp\u003eThe harmonized data produced monthly, was used to train and test the classifier. The classifier\u0026apos;s parameters were hyper-optimized. After obtaining the best performance predictors, various metrics for each category were evaluated, including AUC-ROC, Precision, Recall and F1-Score. Table\u0026nbsp;6 provides the results of these metrics.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eUsing SHAP to explain the decline trajectory in the model construction. Model explainability clarifies the impairment process\u003c/em\u003e:\u003c/p\u003e\n\u003cp\u003eIn this experiment, XAI tool was employed to understand the process of CI. Specifically, the SHAP library [\u003cspan class=\"CitationRef\"\u003e18\u003c/span\u003e] was employed to interpret the features of the importance of the model. To analyze the influence of variables on the prediction of various CI scores, we selected the predictor with the best MMSE value (see Table\u0026nbsp;5). This predictor was obtained as a result in Comparison of the prediction models using all IS to predict several CI scores section by applying XGBoost to the entire dataset and using hyperopt parameters. Figure\u0026nbsp;5 illustrates the SHAP analysis of the 15 most important features for the XGBoost regressor model to predict MMSE in Fig.\u0026nbsp;5 (A), GDS in Fig.\u0026nbsp;5 (B), and Barthel in Fig.\u0026nbsp;5 (C).\u003c/p\u003e\n\u003cp\u003eThe SHAP analysis in Fig. 5 reveals key insights into the factors influencing CI. For the three CI scales being predicted, the most influential variables are the Tinetti score, MNA score, and age.\u003c/p\u003e\n\u003cp\u003eAdditionally, in the section focused on the classifier, we employed a methodology like that used for the regressor. Specifically, we utilized the SHAP technique to evaluate and elucidate the influence of the features for each class of the classifier resulting from Exploring Classifiers section. This experiment generated predictors of the MMSE CI scores using all available information. Figure\u0026nbsp;6 presents the ranking of the top 15 features that exert the greatest influence on the model\u0026apos;s classification decisions, thereby contributing to a deeper understanding of variable importance and overall model performance.\u003c/p\u003e\n"},{"header":"Discussion","content":"\u003cp\u003eThis study introduces a novel and integrative methodology to harmonize and analyze heterogeneous, longitudinal data collected from nursing home residents to predict CI. The proposed approach records diverse IS\u0026mdash;including clinical variables, medication records, nutrition plans, bowel movements, and fall events\u0026mdash;into a standardized, monthly-based format, making it suitable for ML analysis. This recodification facilitates consistent temporal modeling while minimizing loss of clinical detail.\u003c/p\u003e\u003cp\u003eOur methodology was tested on a large, real-world dataset comprising 13 years of data from 2,608 residents across four nursing homes in Galicia, Spain. As detailed in the Measuring the recodification process section, this transformation yielded 65,440 monthly follow-ups and 7,831 features across all IS. Notably, the Drug Prescriptions IS contributed the highest dimensionality due to one-hot encoding of active ingredients, dose, and frequency.\u003c/p\u003e\u003cp\u003eWe evaluated the predictive capacity of each IS individually (Comparing the different IS (Xi tables) section), using XGBoost regressors optimized with Bayesian hyperparameter tuning. As shown in Table\u0026nbsp;3, Clinical Variables IS provided the most accurate predictions for MMSE, GDS, and Barthel scores, with MMSE errors ranging from 1.9 to 8.3, GDS from 0.4 to 1.6, and Barthel from 4.2 to 27.5. Bowel Movement Records IS produced the largest errors. Key predictors included the Tinetti score, BMI, and MNA_Global_score (Table\u0026nbsp;4), indicating a strong association between functional mobility, nutritional status, and cognitive function.\u003c/p\u003e\u003cp\u003eIn the next experiment (Comparison of the prediction models using all IS to predict several CI scores section), we trained regressors using all IS merged into one dataset. Results in Table\u0026nbsp;5 demonstrate that combining all data sources further reduces prediction error. Hyperparameter optimization with Bayesian methods consistently outperformed default model configurations. These results validate the advantage of using the full, harmonized dataset and emphasize the importance of parameter tuning.\u003c/p\u003e\u003cp\u003eBeyond regression, we explored classification of CI severity levels using MMSE scale (Exploring Classifiers section). The classifier achieved high AUC-ROC, precision, recall, and F1-scores for Normal and Severe impairment categories, but struggled more with intermediate classes like Mild and Moderate impairment (Table\u0026nbsp;6). These results suggest the need for refined feature engineering or additional data types to better differentiate mid-range cognitive states.\u003c/p\u003e\u003cp\u003eExplainability was a central aspect of this work. Using SHAP, we analyzed the contribution of individual features in each model (Using SHAP to explain the CI trajectory in the model construction section). The SHAP analysis in Fig.\u0026nbsp;5 illustrates the top 15 features for MMSE, GDS, and Barthel regressors, consistently highlighting the relevance of Clinical Variables IS. Similarly, Fig.\u0026nbsp;6 presents SHAP values for the classifier, confirming that the same key variables influence class-level predictions.\u003c/p\u003e\u003cp\u003eAlthough the methodology shows strong performance and interpretability, it is not without limitations. The dataset was derived from a single geographical region, which may limit generalizability. Additionally, Fig.\u0026nbsp;4 reveals variable documentation completeness, suggesting that sparsity remains a challenge in some IS. Our monthly aggregation strategy mitigates this but may overlook short-term dynamics relevant to early CI detection.\u003c/p\u003e\u003cp\u003eIn conclusion, this work presents a comprehensive and explainable framework for modeling CI in long-term care settings. By integrating harmonized temporal data, Bayesian-optimized ML models, and SHAP-based interpretability, it enables accurate prediction and understanding of CI trajectories. The modularity of the framework allows adaptation to other clinical outcomes and contexts, supporting its broader applicability in personalized geriatric care.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConflict of Interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors have declared that no competing interests exist.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll relevant data necessary to replicate the procedures described in this study are within the manuscript and its Supporting Information files.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e This research was funded by the Ministry of Science and Innovation through the project PID2022-138936OB-C32 (co-funded by the European Regional Development Fund (FEDER), \u0026quot;A way to make Europe\u0026quot;, EU) awarded to C. Veiga.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e: The authors declare that there are no financial interests, awarded or filed patents, or any other conflicts of interest related to the results presented in this paper.\u003c/p\u003e\n"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eL. Volpi \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Detecting cognitive impairment at the early stages: The challenge of first line assessment,\u0026rdquo; \u003cem\u003eJournal of the Neurological Sciences\u003c/em\u003e, vol. 377, pp. 12\u0026ndash;18, Jun. 2017, doi: 10.1016/j.jns.2017.03.034. \u003c/li\u003e\n\u003cli\u003eE. McDade \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;The pathway to secondary prevention of Alzheimer\u0026rsquo;s disease,\u0026rdquo; \u003cem\u003eA\u0026amp;D Transl Res \u0026amp; Clin Interv\u003c/em\u003e, vol. 6, no. 1, p. e12069, Jan. 2020, doi: 10.1002/trc2.12069. \u003c/li\u003e\n\u003cli\u003eK. M. Langa and D. A. Levine, \u0026ldquo;The Diagnosis and Management of Mild Cognitive Impairment: A Clinical Review,\u0026rdquo; \u003cem\u003eJAMA\u003c/em\u003e, vol. 312, no. 23, p. 2551, Dec. 2014, doi: 10.1001/jama.2014.13806. \u003c/li\u003e\n\u003cli\u003eJ. McConathy and Y. I. Sheline, \u0026ldquo;Imaging Biomarkers Associated With Cognitive Decline: A Review,\u0026rdquo; \u003cem\u003eBiological Psychiatry\u003c/em\u003e, vol. 77, no. 8, pp. 685\u0026ndash;692, Apr. 2015, doi: 10.1016/j.biopsych.2014.08.024. \u003c/li\u003e\n\u003cli\u003eR. H. Kirkpatrick, D. P. Munoz, S. Khalid-Khan, and L. Booij, \u0026ldquo;Methodological and clinical challenges associated with biomarkers for psychiatric disease: A scoping review,\u0026rdquo; \u003cem\u003eJournal of Psychiatric Research\u003c/em\u003e, vol. 143, pp. 572\u0026ndash;579, Nov. 2021, doi: 10.1016/j.jpsychires.2020.11.023. \u003c/li\u003e\n\u003cli\u003eR. Whelan, F. M. Barbey, M. R. Cominetti, C. M. Gillan, and A. M. Rosick\u0026aacute;, \u0026ldquo;Developments in scalable strategies for detecting early markers of cognitive decline,\u0026rdquo; \u003cem\u003eTransl Psychiatry\u003c/em\u003e, vol. 12, no. 1, p. 473, Nov. 2022, doi: 10.1038/s41398-022-02237-w. \u003c/li\u003e\n\u003cli\u003eA. Morozova \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Neurobiological Highlights of Cognitive Impairment in Psychiatric Disorders,\u0026rdquo; \u003cem\u003eIJMS\u003c/em\u003e, vol. 23, no. 3, p. 1217, Jan. 2022, doi: 10.3390/ijms23031217. \u003c/li\u003e\n\u003cli\u003eH. Atasoy, B. N. Greenwood, and J. S. McCullough, \u0026ldquo;The Digitization of Patient Care: A Review of the Effects of Electronic Health Records on Health Care Quality and Utilization,\u0026rdquo; \u003cem\u003eAnnu. Rev. Public Health\u003c/em\u003e, vol. 40, no. 1, pp. 487\u0026ndash;500, Apr. 2019, doi: 10.1146/annurev-publhealth-040218-044206. \u003c/li\u003e\n\u003cli\u003eR. Chen \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams,\u0026rdquo; in \u003cem\u003eProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \u0026amp; Data Mining\u003c/em\u003e, Anchorage AK USA: ACM, Jul. 2019, pp. 2145\u0026ndash;2155. doi: 10.1145/3292500.3330690. \u003c/li\u003e\n\u003cli\u003eJ. Razjouyan \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Toward Using Wearables to Remotely Monitor Cognitive Frailty in Community-Living Older Adults: An Observational Study,\u0026rdquo; \u003cem\u003eSensors\u003c/em\u003e, vol. 20, no. 8, p. 2218, Apr. 2020, doi: 10.3390/s20082218. \u003c/li\u003e\n\u003cli\u003eM. F. Folstein, S. E. Folstein, and P. R. McHugh, \u0026ldquo;\u0026lsquo;Mini-mental state,\u0026rsquo;\u0026rdquo; \u003cem\u003eJournal of Psychiatric Research\u003c/em\u003e, vol. 12, no. 3, pp. 189\u0026ndash;198, Nov. 1975, doi: 10.1016/0022-3956(75)90026-6. \u003c/li\u003e\n\u003cli\u003eA. Lobo, J. Ezquerra, F. G\u0026oacute;mez Burgada, J. M. Sala, and A. Seva D\u0026iacute;az, \u0026ldquo;[Cognocitive mini-test (a simple practical test to detect intellectual changes in medical patients)],\u0026rdquo; Actas Luso Esp Neurol Psiquiatr Cienc Afines, vol. 7, no. 3, pp. 189\u0026ndash;202, 1979.\u003c/li\u003e\n\u003cli\u003e\u0026ldquo;The Global Deterioration Scale for assessment of primary degenerative dementia,\u0026rdquo; \u003cem\u003eAJP\u003c/em\u003e, vol. 139, no. 9, pp. 1136\u0026ndash;1139, Sep. 1982, doi: 10.1176/ajp.139.9.1136. \u003c/li\u003e\n\u003cli\u003eM. Fi, \u0026ldquo;Functional evaluation: the Barthel index,\u0026rdquo; Md State Med J, vol. 14, pp. 61\u0026ndash;65, 1965. \u003c/li\u003e\n\u003cli\u003eW. Lu, L. Ma, H. Chen, X. Jiang, and M. Gong, \u0026ldquo;A Clinical Prediction Model in Health Time Series Data Based on Long Short-Term Memory Network Optimized by Fruit Fly Optimization Algorithm\u0026rdquo;, \u003cem\u003eIEEE Access\u003c/em\u003e, vol. 8, pp. 136014\u0026ndash;136023, 2020, doi: 10.1109/ACCESS.2020.3011721. \u003c/li\u003e\n\u003cli\u003eK. R. Jadhav and N. N. Patil, \u0026ldquo;Clinical Document Architecture (CDA) Generation and Integration for Health Data Exchange based on Cloud Computing A Survey,\u0026rdquo; \u003cem\u003eijcse\u003c/em\u003e, vol. 7, no. 1, pp. 801\u0026ndash;805, Jan. 2019, doi: 10.26438/ijcse/v7i1.801805. \u003c/li\u003e\n\u003cli\u003eJ. Zhao, P. Papapetrou, L. Asker, and H. Bostr\u0026ouml;m, \u0026ldquo;Learning from heterogeneous temporal data in electronic health records,\u0026rdquo; \u003cem\u003eJournal of Biomedical Informatics\u003c/em\u003e, vol. 65, pp. 105\u0026ndash;119, Jan. 2017, doi: 10.1016/j.jbi.2016.11.006. \u003c/li\u003e\n\u003cli\u003eS. Lundberg and S.-I. Lee, \u0026ldquo;A Unified Approach to Interpreting Model Predictions.\u0026rdquo; arXiv, 2017. doi: 10.48550/ARXIV.1705.07874. \u003c/li\u003e\n\u003cli\u003eS. Campanioni \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Explainable machine learning on baseline MRI predicts multiple sclerosis trajectory descriptors,\u0026rdquo; \u003cem\u003ePLoS ONE\u003c/em\u003e, vol. 19, no. 7, p. e0306999, Jul. 2024, doi: 10.1371/journal.pone.0306999. \u003c/li\u003e\n\u003cli\u003eT. Chen and C. Guestrin, \u0026ldquo;XGBoost: A Scalable Tree Boosting System,\u0026rdquo; in \u003cem\u003eProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u003c/em\u003e, San Francisco California USA: ACM, Aug. 2016, pp. 785\u0026ndash;794. doi: 10.1145/2939672.2939785. \u003c/li\u003e\n\u003cli\u003eS. Putatunda and K. Rama, \u0026ldquo;A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost,\u0026rdquo; in \u003cem\u003eProceedings of the 2018 International Conference on Signal Processing and Machine Learning\u003c/em\u003e, Shanghai China: ACM, Nov. 2018, pp. 6\u0026ndash;10. doi: 10.1145/3297067.3297080. \u003c/li\u003e\n\u003cli\u003eA. J. Mitchell, \u0026ldquo;A meta-analysis of the accuracy of the mini-mental state examination in the detection of dementia and mild cognitive impairment,\u0026rdquo; \u003cem\u003eJournal of Psychiatric Research\u003c/em\u003e, vol. 43, no. 4, pp. 411\u0026ndash;431, Jan. 2009, doi: 10.1016/j.jpsychires.2008.04.014. \u003c/li\u003e\n\u003cli\u003eP. Rodr\u0026iacute;guez, M. A. Bautista, J. Gonz\u0026agrave;lez, and S. Escalera, \u0026ldquo;Beyond one-hot encoding: Lower dimensional target embedding,\u0026rdquo; Image and Vision Computing, vol. 75, pp. 21\u0026ndash;31, Jul. 2018, doi: 10.1016/j.imavis.2018.04.004.\u003c/li\u003e\n\u003cli\u003eK. J. Roedl, L. S. Wilson, and J. Fine, \u0026ldquo;A systematic review and comparison of functional assessments of community-dwelling elderly patients,\u0026rdquo; \u003cem\u003eJournal of the American Association of Nurse Practitioners\u003c/em\u003e, vol. 28, no. 3, pp. 160\u0026ndash;169, Mar. 2016, doi: 10.1002/2327-6924.12273. \u003c/li\u003e\n\u003cli\u003ethe \u0026ldquo;Progetto Alzheimer\u0026rdquo; Working Group, P. Pezzotti, S. Scalmana, A. Mastromattei, and D. Di Lallo, \u0026ldquo;The accuracy of the MMSE in detecting cognitive impairment when administered by general practitioners: A prospective observational study,\u0026rdquo; \u003cem\u003eBMC Fam Pract\u003c/em\u003e, vol. 9, no. 1, p. 29, Dec. 2008, doi: 10.1186/1471-2296-9-29.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003e\u0026nbsp;Table 1\u0026nbsp;\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"397\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eInformation Source (IS)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNumber of determinations\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eSubjects (Residents)\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e2,608\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eClinical Variables\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e414,219\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eBowel Movement Records\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e3,948,306\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eDrug Prescriptions\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e273,679\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eFall Records\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e8,228\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eNutritional Plans\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e48,875\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003cp\u003eCognitive Impairment Scales\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eMMSE\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e11,747\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eGDS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e6,350\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003e\u0026nbsp;Barthel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e14,816\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u0026nbsp;Table 2\u0026nbsp;\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"611\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 142px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 157px;\"\u003e\n \u003cp\u003eAll residents (2608)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 151px;\"\u003e\n \u003cp\u003eMale (998)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 161px;\"\u003e\n \u003cp\u003eFemale (1610)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 142px;\"\u003e\n \u003cp\u003eMetrics\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74px;\"\u003e\n \u003cp\u003eAge at entering\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003eFollow-up time\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003eAge at entering\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 79px;\"\u003e\n \u003cp\u003eFollow-up time\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003eAge at entering\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003eFollow-up time\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 142px;\"\u003e\n \u003cp\u003eMaximum\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74px;\"\u003e\n \u003cp\u003e105.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e18.07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e105.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 79px;\"\u003e\n \u003cp\u003e18.07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e102.58\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e15.70\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 142px;\"\u003e\n \u003cp\u003eMinimum\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74px;\"\u003e\n \u003cp\u003e65.52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e0.30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e69.11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 79px;\"\u003e\n \u003cp\u003e0.30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e65.52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e0.41\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 142px;\"\u003e\n \u003cp\u003eStandard deviation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74px;\"\u003e\n \u003cp\u003e10.06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e3.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e9.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 79px;\"\u003e\n \u003cp\u003e3.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e10.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e2.89\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 142px;\"\u003e\n \u003cp\u003eMean\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 74px;\"\u003e\n \u003cp\u003e81.07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e2.48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e82.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 79px;\"\u003e\n \u003cp\u003e2.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 77px;\"\u003e\n \u003cp\u003e78.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e2.19\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u0026nbsp;Table 3\u0026nbsp;\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"494\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003ePredictor (\u003cem\u003ePijk\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\" valign=\"top\" style=\"width: 296px;\"\u003e\n \u003cp\u003eMSE (\u003cem\u003eyj\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cem\u003eXi\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003eMSE(MMSE)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003eMSE(GDS)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 111px;\"\u003e\n \u003cp\u003eMSE(Barthel)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eClinical Variables \u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003e1.9583\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003e0.4218\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 111px;\"\u003e\n \u003cp\u003e4.2018\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eBowel Movement Records\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003e8.3154\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003e1.7000\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 111px;\"\u003e\n \u003cp\u003e28.1881\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eDrug Prescriptions\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003e4.3584\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003e1.0886\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 111px;\"\u003e\n \u003cp\u003e15.4059\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eFall Records\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003e7.2529\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003e1.4627\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 111px;\"\u003e\n \u003cp\u003e24.2114\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eNutritional Plans\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 91px;\"\u003e\n \u003cp\u003e3.6059\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 94px;\"\u003e\n \u003cp\u003e0.7466\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 111px;\"\u003e\n \u003cp\u003e11.4030\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u0026nbsp;Table 4\u0026nbsp;\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"671\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003ePredictor\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\" valign=\"top\" style=\"width: 567px;\"\u003e\n \u003cp\u003e\u003cem\u003eYj\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003e\u003cem\u003eXi\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003eMMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 170px;\"\u003e\n \u003cp\u003eGDS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 217px;\"\u003e\n \u003cp\u003eBarthel\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eClinical Variables\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e-Tinneti_score\u003c/p\u003e\n \u003cp\u003e-BMI (Body Mass Index)\u003c/p\u003e\n \u003cp\u003e-MNA_Global_score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 170px;\"\u003e\n \u003cp\u003e-Tinneti_score\u0026nbsp;\u003c/p\u003e\n \u003cp\u003e-MNA_Global_score\u003c/p\u003e\n \u003cp\u003e-BMI (Body Mass Index)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 217px;\"\u003e\n \u003cp\u003e-Tinneti_score\u003c/p\u003e\n \u003cp\u003e-MNA_Global_score\u003c/p\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eBowel Movement Records\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e-Age\u003c/p\u003e\n \u003cp\u003e-Id_AssociatedSymptom\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 170px;\"\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e-Age\u003c/p\u003e\n \u003cp\u003e-Id_AssociatedSymptom\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 217px;\"\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e-Age\u003c/p\u003e\n \u003cp\u003e-Id_AccompanimentFall\u0026nbsp;\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003eDrug Prescriptions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e-\u0026nbsp;Voltaren Emulgel\u003c/p\u003e\n \u003cp\u003e- Calcium Carbonate\u003c/p\u003e\n \u003cp\u003e- Omeprazole\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 170px;\"\u003e\n \u003cp\u003e- Omeprazole\u003c/p\u003e\n \u003cp\u003e- Voltaren Emulgel\u003c/p\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 217px;\"\u003e\n \u003cp\u003e- Lormetazepam\u003c/p\u003e\n \u003cp\u003e- Nutritional Thickening Module\u003c/p\u003e\n \u003cp\u003e-\u0026nbsp;Diclofenac\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003cp\u003eFall Records\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e-L_morning\u003c/p\u003e\n \u003cp\u003e-E_evening\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 170px;\"\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e-Age\u003c/p\u003e\n \u003cp\u003e-L_morning\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 217px;\"\u003e\n \u003cp\u003e-Gender\u003c/p\u003e\n \u003cp\u003e-N_morning\u003c/p\u003e\n \u003cp\u003e-B_evening\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 104px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003cp\u003eNutritional Plans\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e-Id_Route_dinner\u003c/p\u003e\n \u003cp\u003e-Id_Route_snack\u003c/p\u003e\n \u003cp\u003e-Id_Consistency_ breakfast\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 170px;\"\u003e\n \u003cp\u003e-Id_Route_dinner\u003c/p\u003e\n \u003cp\u003e-Id_Consistency_ breakfast\u003c/p\u003e\n \u003cp\u003e-Id_Route_snack\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 217px;\"\u003e\n \u003cp\u003e-Id_Consistency_ breakfast\u003c/p\u003e\n \u003cp\u003e-Id_Route_dinner\u003c/p\u003e\n \u003cp\u003e-Id_Consistency_ lunch\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u0026nbsp;Table 5\u0026nbsp;\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"501\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003ePredictor (\u003cem\u003ePjk\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 378px;\"\u003e\n \u003cp\u003eMSE(\u003cem\u003ek\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003cem\u003eYj\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003eXGBoost using hyperopt parameters\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eXGBoost using default parameters\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eMMSE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e1.7179\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e2.0456\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eGDS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e0.3776\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e0.4123\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eBarthel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 180px;\"\u003e\n \u003cp\u003e4.0433\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e4.2210\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eTable 6\u0026nbsp;\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"574\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eClassifier\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"4\" valign=\"top\" style=\"width: 414px;\"\u003e\n \u003cp\u003eMetrics\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eCategory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 85px;\"\u003e\n \u003cp\u003eAUC-ROC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 96px;\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 117px;\"\u003e\n \u003cp\u003eRecall\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 116px;\"\u003e\n \u003cp\u003eF1-Score\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eMild impairment\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e0.86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 96px;\"\u003e\n \u003cp\u003e0.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 117px;\"\u003e\n \u003cp\u003e0.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 116px;\"\u003e\n \u003cp\u003e0.61\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eModerate impairment\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 96px;\"\u003e\n \u003cp\u003e0.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 117px;\"\u003e\n \u003cp\u003e0.69\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 116px;\"\u003e\n \u003cp\u003e0.67\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eNormal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e0.96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 96px;\"\u003e\n \u003cp\u003e0.96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 117px;\"\u003e\n \u003cp\u003e0.72\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 116px;\"\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eSevere impairment\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e0.91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 96px;\"\u003e\n \u003cp\u003e0.82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 117px;\"\u003e\n \u003cp\u003e0.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 116px;\"\u003e\n \u003cp\u003e0.87\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 160px;\"\u003e\n \u003cp\u003eMacro Average\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e0.91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 96px;\"\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 117px;\"\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 116px;\"\u003e\n \u003cp\u003e0.74\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Information Source (IS), Artificial Intelligence (AI), Cognitive Impairment (CI), Homogenization of data, Explainable Artificial Intelligence (XAI)","lastPublishedDoi":"10.21203/rs.3.rs-7402937/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7402937/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe monitoring of daily life in nursing home residents generates diverse and heterogeneous sources of information. Artificial Intelligence (AI) is increasingly used to predict a wide range of outcomes in both research and clinical practice, including mortality and cognitive impairment (CI). A key challenge is determining which information sources (IS) provide the most accurate predictions. In this work, we introduce a novel AI-based methodology that integrates Bayesian optimization, XGBoost, and explainable AI (SHAP) to predict CI in nursing home residents using 13 years of heterogeneous longitudinal data from 2,608 individuals. Our approach enables interpretable predictions of CI-related clinical scales such as the Mini-Mental State Examination (MMSE), the Global Deterioration Scale (GDS), and the Barthel Scale while revealing the relative contributions of various information sources, including clinical metrics and activity records. Our results demonstrate that this is the first framework to combine harmonized temporal modeling, Bayesian-optimized ensemble learning, and SHAP-based interpretability to evaluate the predictive relevance of heterogeneous clinical and behavioral data sources in a real-world long-term care setting. This integrated approach not only improves predictive performance for CI-related scores but also offers interpretable insights that can inform personalized care strategies.\u003c/p\u003e","manuscriptTitle":"Explainable Machine Learning with Bayesian Hyper-Optimization for Predicting Cognitive Impairment from Longitudinal Nursing Home Data","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-16 14:11:32","doi":"10.21203/rs.3.rs-7402937/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-11-18T09:26:40+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-26T02:40:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"5736649828640655569822929828501035964","date":"2025-10-03T16:46:42+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-03T05:15:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"292595831905659531390626545193277322512","date":"2025-10-01T08:25:22+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-09T09:25:48+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-08-21T18:46:29+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-20T10:14:58+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-19T07:01:49+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-08-18T21:48:09+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e8898d4b-fa6f-43d6-9128-70f45b67ff41","owner":[],"postedDate":"September 16th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":54667878,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":54667879,"name":"Health sciences/Health care"},{"id":54667880,"name":"Physical sciences/Mathematics and computing"},{"id":54667881,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-02-09T16:03:24+00:00","versionOfRecord":{"articleIdentity":"rs-7402937","link":"https://doi.org/10.1038/s41598-025-34060-w","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-02-05 15:58:16","publishedOnDateReadable":"February 5th, 2026"},"versionCreatedAt":"2025-09-16 14:11:32","video":"","vorDoi":"10.1038/s41598-025-34060-w","vorDoiUrl":"https://doi.org/10.1038/s41598-025-34060-w","workflowStages":[]},"version":"v1","identity":"rs-7402937","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7402937","identity":"rs-7402937","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00