Identifying Key Gait Features in Stroke Patients: A Machine Learning Approach with Supervised and Unsupervised Validation

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 161,306 characters · extracted from preprint-html · click to expand
Identifying Key Gait Features in Stroke Patients: A Machine Learning Approach with Supervised and Unsupervised Validation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Identifying Key Gait Features in Stroke Patients: A Machine Learning Approach with Supervised and Unsupervised Validation Brasiliano Paolo, Orejel-Bustos Amaranta, Belluscio Valeria, Cereatti Andrea, and 7 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7478886/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 09 Mar, 2026 Read the published version in Scientific Reports → Version 1 posted 10 You are reading this latest preprint version Abstract Stroke is a major cause of motor disability, degrading walking and quality of life. Wearable gait analysis with magneto-inertial measurement units (MIMUs) can quantify post-stroke impairments. We used machine learning to identify discriminative gait features in stroke, coupling supervised feature selection with unsupervised clustering to improve interpretability and generalizability. Eighty-five stroke patients and 97 healthy controls completed 10-Meter Walk Tests while wearing five MIMUs. Feature selection spanned spatiotemporal, symmetry, stability, and smoothness metrics. K-nearest neighbors (KNN), support vector machines (SVM), and decision trees (TREE) were trained, validated, and tested iteratively across data splits; clustering then verified discriminative ability. Sequential backward feature selection retained nine features, yielding accuracies (healthy vs patient) of 94.1% (KNN), 96.7% (SVM), and 89.1% (TREE). SVM generalized best. Unsupervised k-medoids with cosine distance confirmed discrimination, reaching 90% accuracy with only three features: stride speed, stance-phase coefficient of variation, and medio-lateral harmonic ratio. Results indicate that gait variability, trunk smoothness, and upper-body stability robustly characterize post-stroke dysfunctions. Notably, head-movement smoothness emerged as a novel, discriminative feature. This integrated framework shows how wearable sensors plus machine learning can support clinical gait analysis and rehabilitation planning. Future work should enable real-time deployment and broaden datasets to cover more clinical scenarios. Biological sciences/Computational biology and bioinformatics Physical sciences/Engineering Health sciences/Health care Physical sciences/Mathematics and computing Health sciences/Neurology Biological sciences/Neuroscience Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. INTRODUCTION The term stroke is used to describe brain damage due to several different vascular causes 1 . Stroke is a global challenge that poses significant health and socioeconomic challenges for both the individual and society as a whole 2 , 3 . Indeed, after a stroke event, brain cells die, resulting in functional and cognitive impairments 2 . Among these, motor disabilities primarily limit patients' ability to accomplish activities of daily living 4 , with walking impairments influencing participation, autonomy, and quality of life of patients 5 , 6 . With these premises, the improvement of gait impairments is one of the crucial aspects of post-stroke rehabilitation 5 . In this context, standard clinical can be integrated with instrumented gait analysis, aimed at obtaining objective indicators of walking performance to evaluate the progression of the pathology, define tailored treatments, and monitor the efficacy of the latter. In the past years, the context in which instrumented gait analysis took place (especially in the clinical practice) has shifted towards a more ecological one, thanks to the increasing development of wearable technologies and the development of computational methods 7 – 10 , bringing several advantages over traditional laboratory-based assessments. In this context, magneto-inertial measurement units (MIMUs) are widely used to obtain features related to the quality of gait and pertain to different domains, like symmetry, smoothness or spatiotemporal features which are often considered during post-stroke clinical evaluations 8 . In combination with the availability of new measurement tools, new data analysis techniques, like machine learning algorithms, have been fruitfully integrated to obtain more information from patients’ data. These tools have found many applications in the stroke population, ranging from the discrimination between/among groups or patient categories (like people with stroke and healthy controls, patients with different pathologies, or patients with different levels of stroke severity) 11 , as well as the recognition of different types of activities, classification of well/poorly executed tasks, and other applications 8 , 12 . Among them, one of particular interest for clinicians is the classification of pathological and healthy people based on gait features. Although it may seem an obvious categorization following a medical diagnosis, the understanding of a pathology is primarily based on the characterization of the differences when compared to a non-pathological condition. Nevertheless, to do so, the features that can optimally capture such differences must be identified. Indeed, when analyzing gait, a wide number of features may be measured, some of which may not carry useful information for the pathological population of interest. To overcome this issue, features selection techniques and machine learning algorithms may be used in combination to reduce the number of features with the aim of retaining those more informative and able to distinguish healthy from pathological people. In this framework, classification performance per se is often employed as an indicator of the quality of the selected features in distinguishing between healthy and pathological conditions. Indeed, this distinction after a stroke diagnosis holds limited relevance. However, evaluating differences in classification performance when varying the selected features allows for identifying the feature combinations that best optimize healthy/pathological discrimination. As a result, those gait domains that mostly differ between healthy and patients with stroke can be identified. Although some authors have explored this (or similar) approach 13 – 18 , most of them applied feature selection methodologies as an intermediate step to improve the performance of a given machine learning model, thus not focusing on the generalizability of the feature selection approach. Indeed, the selected features depend on several factors, including the dataset/sample size, the feature selection technique, and the machine learning algorithm employed. Testing the generalizability of the feature selection approach across different machine learning algorithms is thus of the utmost importance. To the authors’ knowledge, among recent studies leveraging MIMUs and machine learning in patients with stroke 16 – 24 , only one 21 sought to reduce the features space by employing a feature selection technique across multiple machine learning algorithms and analyzing the frequency of the selected features. This approach identifies features that most consistently enhance classification performance, thus highlighting characteristics of pathological gait patterns. Nevertheless, the sample size in that study was insufficient to provide a reliable representation of the investigated population. Furthermore, no previous study has evaluated whether features selected through supervised machine learning could reliably differentiate healthy and pathological individuals when applied to unsupervised clustering methods. In other words, the discriminative value of selected features has not been tested independently of the supervised algorithm's learning capabilities. Therefore, this study aims to identify an optimal subset of gait features extracted using a set of MIMUs through a feature selection approach in combination with multiple machine learning algorithms. The goal was achieved by distinguishing individuals with stroke from healthy controls, thereby characterizing pathological gait patterns. To ensure generalizability, the analysis was iterated across various combinations of feature subsets and participant groups. 2. METHODS 2.1. PARTICIPANTS Eighty-five patients with stroke (PwS; age: 57 ± 16 yrs; mass: 71 ± 12 kg; stature: 1.69 ± 0.09 m) and 97 healthy participants (HP; age: 48 ± 12 yrs; mass: 70 ± 18 kg; stature: 1.67 ± 0.08 m) were enrolled in this study. The study was conducted in accordance with the World Medical Association Declaration of Helsinki and was approved by the Ethics Committee of the Institute for research and Healthcare Santa Lucia (with protocol number CE/AG4/PROG.383 − 11 and subsequent integrations). Healthy participants between the age of 18 and 80 years were considered eligible for the study if they did not report any condition or use of medication that could have affected their motor performance. Stroke patients (both in the sub-acute and in the chronic phase of the pathology) that were able to walk without any device or need physical assistance were included in the study (Functional Ambulation Classification 25 ) scale score ≥ 3). Exclusion criteria for this group were cognitive deficits affecting the capacity of patients to understand the task instructions (Mini Mental State Examination 26 >4), severe unilateral spatial neglect, severe aphasia, and presence of neurological, orthopedic, or cardiac comorbidities. All participants were included in the study after providing their informed consent 2.2. EXPERIMENTAL SET-UP Data collection was performed in the gym of the Institute for research and Healthcare Santa Lucia, in Rome. Participants were asked to perform a 10-Meter Walk Test (10-MWT) at their self-selected speed along a straight walkway while wearing comfortable shoes. At the beginning of each trial, participants were instructed to maintain an orthostatic posture for five seconds. Each participant performed a minimum of three trials. During the trials, participants were equipped with five synchronized MIMUs (OPAL, APDM wearable technologies, Portland, USA). The MIMUs included triaxial accelerometer, gyroscope, and magnetometer with full scale ranges of ± 6 g, ± 1500 deg/s, and ± 6 Gauss, respectively, with a sampling rate of 128 Hz. Three MIMUs were fixed to the upper body of the participants, on the forehead (FH) on the occipital cranium bone close to the lambdoid suture of the head, at the center of the sternum (ST), and at the pelvis (PV) level, in correspondence of L4-L5 vertebrae. The last two MIMUs were placed laterally on the distal part of the tibiae, slightly above the lateral malleoli, and securely fixed with Velcro straps. These two sensors were used for segmentation of the gait step cycle. Attention was paid to each MIMU fixation to minimize the relative movement between the MIMU and the underlying bones. 2.3. SIGNAL PREPROCESSING Data preprocessing was performed through implementation of customized algorithms in the Matlab® Software R2021b (The MathWorks Inc., MA, US). First, a consistent reference frame was defined for all participants. During the static phase of the 10-MWT, a time-invariant transformation aligned each MIMU’s local reference system to a frame based on the gravity vector. Afterwards, the time-invariant transformation was applied to the accelerometer and gyroscope data recorded during the dynamic phase of the test. Finally, gravity was removed from the component of the acceleration signal aligned with the vertical axis of the reference frame system. As a result, all data were expressed relative to a reference frame that approximated the anterior-posterior (AP), medio-lateral (ML), and cranio-caudal (CC) anatomical axes 27 . Accelerometer data were filtered using a second order Butterworth low-pass filter with a cut-off frequency of 10 Hz, while gyroscope data were filtered using a second order Butterworth low-pass filter with a cut-off frequency of 6 Hz 28 . Gait events were identified from ML angular velocity recorded by the MIMUs placed on the shanks of the participants while walking speed and gait spatial features were calculated through forward and backward integration of shank data in combination with a complementary filter 29 and zero-velocity update procedure 30 . For each identified stride, features of upper body movement stability 31 , 32 , symmetry 33 , and smoothness 34 were calculated from FH, ST, and PV MIMU data. In addition, symmetry 35 and variability of gait spatiotemporal features were also calculated. For the sake of readability, in the following sections spatiotemporal gait features are not described in detail as they represent standard measurement in instrumented gait analysis. Nevertheless, a complete list of spatiotemporal features is provided. For information on the equations for calculating each feature refer to the work by Bertoli et al 30 . 2.4 . SPATIOTEMPORAL FEATURES Stride frequency Stride speed Stride length Stride duration Stance speed Stance length Stance duration Swing speed Swing length Swing duration Double support duration Single Support duration 2.5 . STABILITY FEATURES Root Mean Square (RMS) was calculated from PV, ST, and FH MIMUs acceleration signals over each stride as follow: $$\:RMS=\:\sqrt{\frac{{\sum\:}_{i=1}^{n}{x}^{2}}{n}}$$ Where X represents the acceleration values and n the number of samples of the considered stride. Coefficient of attenuation 32 (COA) was calculated from PV, ST, and FH MIMUs acceleration signals over each stride as follow: $$\:COA=\left(1-\frac{{RMS}_{upper\:segment}}{{RMS}_{lower\:segment}}\right)\times\:100$$ Precisely, COAs were calculated from PV to ST, from PV to FH, and from ST to FH 2.6. SYMMETRY FEATURES Improved Harmonic Ratio 33 (IHR) was calculated from PV, ST, and FH MIMUs acceleration signals over each stride as follow: $$\:{iHR}_{n}=\:\frac{\sum\:_{i=1}^{n}{P}_{I}^{i}}{\sum\:_{i=1}^{n}{(P}_{I}^{i}{+\:P}_{E}^{i})}$$ Where \(\:{P}_{I}^{i}\) and \(\:{P}_{E}^{i}\) are the power of the intrinsic and extrinsic n considered harmonics. The Symmetry Angle (SA) was calculated from the spatiotemporal features of the side-paired values of each stride, as follows: $$\:SA=\:\frac{\left(45^\circ\:-\text{arctan}\left(\frac{{X}_{left}}{{X}_{right}}\right)\right)}{90^\circ\:}\:\times\:100$$ Where X left and X right are the features values for left and right strides, respectively. 2.7. SMOOTHNESS FEATURES Log dimensionless jerk 34 (LDLJ) was calculated from PV, ST, and FH MIMUs acceleration and angular velocities (LDLJA and LDLJW, respectively) signals over each stride as follows: $$\:LDLJ\:=\:-\text{ln}\left(\frac{{t}^{2}\:-\:{t}^{1}}{\underset{t\in\:\left[{t}^{1},{t}^{2}\right]}{\text{max}}\left({\left|\left|x\left(t\right)\right|\right|}_{}\right)}\:\cdot\:Ij\right)$$ With: $$\:Ij\:=\:{\int\:}_{{t}^{1}}^{{t}^{2}}{\left|\left|{x}^{{\prime\:}}\left(t\right)\right|\right|}_{}^{2}\:dt$$ where x are the linear or angular accelerations data and t 1 and t 2 are the starting and ending instants of each stride. Finally, variability of spatiotemporal features was estimated by calculating the Coefficient of Variation (CoV) of each feature. After data normalization through z-score the median value was calculated over the gait cycles for each trial and each feature. Afterwards, dataset was visually inspected to detect outliers and trials that were considered to present obvious instrumental errors were discarded. Finally, each participant’s median value was calculated over the trials. Missing values were replaced with the group median. 2.8. FEATURE SELECTION AND VALIDATION From the entire sample considered, two subgroups 18 PwS and 20 HP (Unsupervised Test Groups) were selected and kept aside for further analysis. Afterwards, two subgroups of 68 PwS and 77 HP (Feature Selection Groups) were randomly identified and used for the first steps of the feature selection procedure. Precisely, within groups distribution of each feature was tested using the Shapiro-Wilk test. Afterwards, according to data distribution, an independent sample t -test or a Mann-Whitney U test was used to identify those features which differed significantly between HP and PwS 17 , 36 . Only these features were considered for further analysis. To limit multicollinearity of the dataset, Pearson’s correlation coefficients between all the retained features pairs were calculated 36 and analyzed according to the following procedure: 1.1 The total number of correlations with r > 0.5 was calculated for each feature. 1.2 The feature (F maxr ) that showed the highest number of over-threshold correlations was kept while those that showed over-threshold correlations with F maxr were discarded. The procedure was iterated until any r > 0.5 was found. If two features with the same number of over-threshold correlations were found (i.e., if two F maxr were found), one was chosen according to the suggestion of physical therapists of the neurorehabilitation hospital. Finally, from the Feature Selection Groups, ten subgroups were defined by randomly selecting 70% of the participants (48 PwS and 54 HP) for the training and validation sets, and the remaining 30% of the participants (20 PwS and 23 HP) for the test sets. Afterward, on these datasets, a sequential backward feature selection (SBS) was implemented. This procedure allows the reduction of the number of features while preserving the performance of the classifier and the interpretability of the results. Indeed, no new combination of features is created (such as in Principal Component Analysis, Linear Discriminant Analysis or other features extraction techniques); rather, only relevant features from the original dataset are kept, making the procedure usable and the results interpretable in clinical settings 36 . This approach is implemented by evaluating the performance of a classifier while changing the set of features as described: 2.1 The complete dataset with k features is defined as the starting point and tested. 2.2 All the possible combinations of k-1 features are tested. 2.3 The subset of k-1 features with the best classification performance is identified. 2.4 The subset of features identified in step 2.3 is used as the new starting point and the procedure is repeated from step 2.2. 2.5 The feature selection continues until a stop criterion is met. In this instance, the SBS process was carried out until one single feature remained. The evaluated classifiers were the K-nearest neighbors (KNN), the Support Vector Machine (SVM), and the decision tree (TREE) algorithms. The algorithms were selected for their nonparametric approach, which does not require a-priori assumptions on the dataset. During the SBS procedure, hyperparameters tuning for each algorithm was implemented using a Bayesian optimization approach. Detailed information on the hyperparameters tuning, the for each algorithm are provided in supplementary material. The classifiers’ performance during SBS was assessed using a 5-fold cross validation approach and measured by classification accuracy, i.e., the ability to correctly classify participants irrespective of their group. As a result, during SBS, the three algorithms were trained and cross-validated on all ten subgroups randomly identified. Each time, the combination of best subsets and best algorithm hyperparameters were identified according to the highest classification accuracy obtained. The best identified model (i.e., the best subset of features with the tuned hyperparameters) was then tested on the corresponding test set. From the thirty SBS procedures carried out, only those features that were selected in at least two of the ten runs were retained. Afterwards, only the shared features between datasets and algorithms were kept. Such features were arranged in descending order according to their number of occurrences at the end of the SBS procedures. Following, an unsupervised algorithm was tested, adding one by one the ordered selected features on the Unsupervised Test Groups kept aside before data analysis and not considered during the features selection procedure. In this instance, a k-medoids algorithm was implemented with the initial medoids identified using the Single Pass Seed Selection algorithm 37 with the aim of obtaining only one solution of the clustering procedure. K-medoids was implemented to produce two clusters. Distances between the clusters medoids and the data points were measured using four different distance metrics described in the following equations. Cosine distance (CO) $$\:CO=1-\frac{A\bullet\:B}{‖A‖‖B‖}$$ Where A and B are the vectors defined by the features median value of the medoid and of each participant, respectively. Squared Euclidean distance (SqEU) $$\:SqEU=\:{\sum\:}_{i=1}^{n}{\left({A}_{i}-\:{B}_{i}\right)}^{2}$$ City Block distance (CB) $$\:CB=\:{\sum\:}_{i=1}^{n}\left|{A}_{i}-\:{B}_{i}\right|$$ Euclidean distance (EU) $$\:EU=\:{\sum\:}_{i=1}^{n}\sqrt{{\left({A}_{i}-\:{B}_{i}\right)}^{2}}$$ Where A i and B i are i th features median value of the medoid and of each participant. Being unsupervised, the algorithm produces two unlabeled clusters. To assign the group labels (i.e., PwS and HP) to the two clusters, the following procedure was implemented: 3.1 The medians of each feature for the stroke and healthy groups and the two identified clusters were calculated and arranged to form n-dimensional vectors. 3.2 The Euclidean distance between each of the two cluster vectors and each of the two group vectors was calculated. 3.3 The smallest distance was used to label the cluster according to the corresponding known group. 3.4 The other cluster was labeled by exclusion. Except for the SBS procedures, in which only classification accuracy was measured, all the other classification performances also included the recall, precision, and F1-score. The described procedure is graphically shown in Fig. 1 . 3. RESULTS Of the initial 79 features, 60 were retained after the t- test and 20 after the correlation analysis. The mean (± SD) accuracy during SBS procedures over the ten runs for KNN, SVM, and TREE were 94.1% ± 1.6%, 96.7% ± 2.1%, and 89.1% ± 2.2%, respectively. The mean results of the classification on the test sets are reported in Table 1 . Figure 2 shows the same performance indexes across each performed run. Table 1 Mean and standard deviation values of the performance indexes of the supervised algorithms on the test sets over the ten runs. Accuracy Recall Precision F1 Score KNN 88.1 ± 5.7% 85 ± 4.7% 89.4 ± 8.3% 87.1 ± 6% SVM 89.8 ± 5.1% 91 ± 5.7% 87.8 ± 6.6% 89.2 ± 5.3% TREE 81.2 ± 5.7% 78.5 ± 9.1% 82.1 ± 10.3% 79.6 ± 5.5% Table 1 The features used in the SBS procedure, together with their relative number of occurrences in total and for each algorithm, are shown in Fig. 3 . Nine common features, reported in bold, with at least two occurrences were found across all SBS methods. The results of the unsupervised clustering on the ten test sets and on the Unsupervised Test Groups are presented in Fig. 4 , Table 2 , and Fig. 5 . More in detail, Fig. 4 shows the mean of the performance indexes across the ten runs as each common feature is added incrementally, ordered by number of occurrences, whereas Table 2 reports the mean (± SD) of the performance indexes across the ten runs and on the Unsupervised Test Groups using the whole subset of common features. Similarly, Fig. 5 presents the performance indexes for the Unsupervised Test Groups, showing the incremental addition of each common feature. Table 2 Unsupervised cluster analysis results on the unseen data. DISTANCE METRICS ACCURACY RECALL PRECISION F1 SCORE ten runs Final test ten runs Final test ten runs Final test ten runs Final test CO 84 ± 4% 81% 83 ± 5.9% 82.3% 82.6 ± 4% 77.7% 82.7 ± 4.4% 80% SqEU 75.8 ± 8% 83.8% 64 ± 19.2% 82.3% 82.5 ± 12.6% 82.3% 70 ± 12.3% 82.3% CB 75.8 ± 8.1% 83.8% 61 ± 19.7% 82.3% 85.8 ± 14% 82.3% 68.8 ± 13.2% 82.3% EU 77.7 ± 9.2% 83.8% 69.5 ± 20.5% 88.2% 81.4 ± 11.3% 79% 73.1 ± 13.5% 83.3% Table 2 4. DISCUSSION The purpose of this study was to identify an optimal set of MIMU-based features able to distinguish between healthy participants and patients with stroke. This approach aims to characterize the gait patterns of people after a stroke event by identifying the features that most effectively capture deviations from physiological conditions. The methodological approach includes different key strategies to overcome some of the limitations highlighted in a recent review on the topic and to enhance the validity and generalizability of the results 11 . First, the sample size was increased to limit classification performance overestimation and improve results generalizability 38 – 40 . Wearable sensors, specifically MIMUs, were used to propose a model suitable for clinical practice. An instrumental setup was selected that balances the minimum number of required devices with the number of measurable useful features. Only clinically meaningful features were extracted, and a feature selection technique was chosen to reduce the feature space without modifying it, thereby maintaining the clinical interpretability of the starting dataset. The dataset was split into training, validation, and test sets to perform algorithm tuning, feature selection, and to test their discriminant ability. The entire procedure was repeated while randomly changing the composition of the datasets to obtain more generalizable results. Different supervised classification algorithms were used to address differences and commonalities in the selected features according to the classifier used. Finally, an unsupervised clustering technique was employed to verify whether the selected features contain discriminating information. Results showed that the method proposed here can be used to select features that highlight differences between healthy and pathological locomotion (when using both supervised classification and unsupervised clustering), thereby identifying the gait domains that most predominantly characterize the pathology. The set of 80 features analyzed in the present study has been selected based on the scientific literature on the topic 8 , 41 , 42 . Specifically, features that described spatiotemporal, symmetry, variability, stability, and smoothness domains of gait were considered. Features were chosen to characterize both general aspects of gait (i.e., those derived from shanks, and the lower back, LB, MIMUs) and the quality of movement of the upper body (i.e., those derived from the sternum, ST, and the forehead, FH, MIMUs). Features selection procedures involved both statistical and machine learning approaches. The statistical approach reduced the initial number of features from 80 to 20, which were then further used for the machine learning feature selection approach, based on three different classification algorithms: KNN, SVM, and TREE. When using KNN, SVM, and TREE different results were obtained in terms of both classification performance and of retained features after features selection. Specifically, when looking at the classification accuracy, SVM performed better in both the training and the test sets. The same result was found by Trabassi and colleagues 36 when classifying patients with Parkinson’s disease and healthy people using a similar approach with respect to this study. These results suggest the potential effectiveness of SVM in detecting gait deviations in neurological patients. Concerning the other performance indexes obtained from the test sets, SVM showed the highest values except for classification precision, which was higher when using the KNN. The classification performance achieved in this study is slightly lower compared to those obtained in previous studies on patients with stroke 16 – 19 , 23 , 24 but there are several methodological differences to consider. First, previous studies often had significantly smaller sample sizes (from a minimum of 15 to a maximum of 58 participants), which can lead to overestimation of classification performance due to overfitting and random effects 43 . Moreover, the considered studies used multiple data from the same subject rather than using a single representative data point for each participant (like the median value over several gait cycles and trials in this study). When using multiple data of the same participant, some may appear in both training and testing sets, thus increasing the risk of overestimating classification performance 44 . In some cases, this issue was considered and avoided 18 , 19 . Differences in classification performance can also be attributed to the use of different features related to various gait domains, such as joint kinematics 19 , as well as demographic differences among study groups 19 . Additionally, variations in data processing, such as focusing solely on the affected side of pathological participants 18 , may have led to different results. When looking at the features selected over the ten runs, the three analyzed algorithms performed differently. Generally, KNN tended to exclude more features, followed by SVM and TREE (Fig. 2 ). Considering all the feature sets used as input for the feature selection and the ten runs, the maximum possible number of feature occurrences was 200 (i.e., 20 features over ten runs). The overall feature occurrences were 61, 79, and 112 when using KNN, SVM, and TREE, respectively. Other authors have tried a similar approach: Altilio and colleagues 21 tested nine different algorithms with all possible combinations of the selected features, counting the occurrences of each feature to estimate its relevance. Similar to the findings of this study, the set of retained features changed depending on the algorithm used. This study highlights key methodological challenges when using machine learning for gait classification. Notably, classification performance varies when the same algorithm is applied to different datasets (Table 1 and Fig. 1 ). This variation is particularly significant, given that the datasets across the ten runs were not entirely distinct, demonstrating that even minor changes in data can lead to different results. Therefore, the issue of limited representativeness due to small sample sizes in previous studies becomes pertinent. In contrast, this study enrolled more participants than those in earlier research, aligning with the recommendations by Jiao and colleagues 11 . Second, the optimal set of features to be used to discriminate between two different populations also depends on the algorithm used. Consequently, when using a single algorithm, the selected features are those that maximize classification performance for that specific algorithm in a defined dataset rather than a set of features capable of discriminating between two populations, which is often the actual objective. The iterative feature selection technique used in this study, combined with the application of different machine learning algorithms tuned with a 5-fold cross-validation, enhances the generalizability of the results. This comprehensive approach ensures that the classification performance is not only optimized, but also reliable and applicable across different datasets. Features that were more frequently chosen after the SBS were tested using a non-supervised clustering technique. Non-supervised clustering was chosen to evaluate the discriminant information within the selected features, independent of any machine learning classification algorithm. Such clustering was performed on both the test sets used for the ten runs of the supervised classification and a separate test set comprising data not used for any analysis A k-medoids clustering algorithm was chosen using four different distance metrics, with a medoids initialization algorithm to ensure repeatability of the results. The selected features were ranked by their number of occurrences and incrementally added to obtain information on the relevance of each feature on the clustering output as well as the performance of all the selected features. When looking at the clustering output using the whole set of features on the ten test sets, results are promising, laying in the range identified in the review by Jiao and colleagues 11 when using different supervised algorithms (i.e., 80–100%). Among the distance metrics used, the cosine distance achieved the best performance for all the classification performance indexes with the only exception of precision (see Table 2 ). However, in the context of identifying gait features that characterize stroke patients, precision (i.e., the ability to correctly classify healthy participants) may be considered less critical. The best performance of the cosine distance compared to others may be attributed to their differing nature. Cosine distance in fact evaluates the orientation of n-dimensional vectors formed by the selected features measuring the angle between them, whereas the other metrics focus, albeit slightly differently, on the absolute distance between these vectors. In the first case, the magnitude of the vectors does not influence the results while in the other cases it gains importance. It appears that, at least in the used dataset, the cosine distance can measure more accurately differences between healthy participants and stroke patients. Moreover, the cosine distance exhibited the smallest standard deviation across all performance indexes over the ten test sets, suggesting it is not only the most appropriate distance metric, but also the most robust and consistent. The best performance (90.2% ± 5.5%, 87% ± 5.8%, 89.2% ± 6% for accuracy, recall, and F1 score, respectively) on the test sets was obtained when only three features were used, namely the improved harmonic ratio on the medio-lateral direction, the coefficient of variation of the stance phase, and the stride speed. Thus, a further reduction of the feature used to discriminate between healthy participants and stroke patients enhances the classification results. When examining the results obtained on the test set of unseen data using all features and the cosine distance, the outcomes appear consistent with those from the ten runs. Conversely, results obtained with other distance metrics show higher performance index values. Nevertheless, this is not surprising given the large standard deviation observed over the ten runs with these metrics, indicating a high variability. It is likely that in some of the ten test sets, the clustering performance aligned with those from the unseen data. Notably, the highest accuracy, precision and F1 score when using the cosine distance were obtained using six features. Those included the duration of the stance phase and the movement smoothness of the head in the anterior-posterior and medio-lateral directions. However, the differences in results between using three versus six features were modest (+ 3%, + 6%, and + 3% for accuracy, precision, and f1 score, respectively). Recall values, in contrast, remained unchanged after the third feature was added. These findings demonstrate that even with a non-supervised clustering technique, discrepancies across datasets can be addressed. The results presented here may be summarized as follows: Among all 80 features considered in this work, nine features (see Fig. 3 ) seem to be sufficient to discriminate between healthy participants and participants with stroke with fair-to-good classification results. When using a non-supervised clustering technique based on the distances between data points, the cosine distance metric seems to be the most appropriate and reliable. Generally, to maximize the classification results, three features are sufficient; nevertheless, other features may carry discriminant information and should be considered. The procedure applied in this study has yielded substantial insights into the most discriminative gait-related features that differentiate between healthy individuals and stroke patients. By evaluating a comprehensive array of gait-related features on different dataset, our model identified the domains of spatiotemporal, symmetry, variability, and upper body movement control as effective in discriminating between the two groups. Altered gait spatiotemporal features, as well as heighten asymmetry and variability in such features, have been frequently reported in the scientific literature 41 , 42 , 45 – 49 . These features reflect the core motor deficits induced by a stroke event, including hemiparesis and altered neuromuscular control, which directly affect the timing and rhythm of walking. Some authors have reported the persistence of such alterations even after clinical treatments, highlighting their impact after a stroke. Notably, none of the selected features represented asymmetry in spatiotemporal parameters. While gait asymmetry is a widely recognized alteration after stroke, it is often characterized by varying patterns 41 , 42 , 45 . These diverse trends in spatiotemporal feature asymmetry may have reduced their discriminative value. However, an asymmetry feature based on the frequency content of the acceleration measured at the lower back in the mediolateral direction was still included in the selected features. Generally, asymmetry at the trunk level has been widely analyzed in stroke patients and has been reported to significantly differ compared to healthy 41 , 42 controls. This is true not only when using a single MIMU at the lumbar level, but as well as multiple sensors are used and when different aspects of trunk movement are analyzed 42 . Accordingly, some of the features considered in the present work captures aspects of trunk movement stability and symmetry. Finally, trunk smoothness (at different levels and in different directions) emerged as a discriminative gait domain. To the authors’ knowledge, only two prior studies have directly measured trunk smoothness in stroke patients, albeit using different 50 equipment or metrics 51 . Notably, trunk and, in particular, head movement smoothness, were identified as key discriminative features. While traditionally less emphasized in gait analysis, head movement smoothness reflects the integration of postural control and balance during walking. The reduced smoothness observed in post-stroke patients may reflect impaired sensorimotor integration and balance, which are critical for safe and efficient ambulation. This finding opens new avenues for incorporating head movement analysis into routine gait assessments, thereby providing a more comprehensive understanding of post-stroke mobility impairments. 5. CONCLUSION The results presented must be interpreted in light of the following limitations. It should be noted that different machine learning algorithms may have been employed. Notwithstanding, the algorithms employed have demonstrated efficacy even when utilizing modest datasets, and they are not contingent on any assumptions regarding the analyzed dataset. Another potential limitation is the inclusion of walking speed as a feature in the analysis. In some cases 36 , walking speed has been employed as a means of matching patients and healthy controls, with the objective of limiting the influence of the latter on the estimation of the former's features. Nevertheless, this procedure results in the exclusion of some patients under analysis if a suitable control group is not identified. This may result in the exclusion of patients with specific gait deviations, as well as a reduction in the sample size. Furthermore, the correlation analysis conducted prior to feature selection should have limited the impact of walking speed by eliminating features that are highly correlated with it. The study population included patients with acute, sub-acute and chronic stroke. While this approach could have resulted in poorer supervised and unsupervised classification performances, it also enabled the identification of clinically meaningful features irrespective of the time since the stroke event. In conclusion, the present study demonstrates the potential of machine learning in identifying key features of post-stroke gait dysfunctions. The highlighted features—spatiotemporal features, gait variability and trunk movement symmetry, stability, and smoothness—not only enhance our understanding of post-stroke gait dysfunctions but also provide practical markers for clinical assessment and rehabilitation. It would be beneficial for future work to focus on refining machine learning models to support real-time gait analysis and expanding their application in diverse clinical settings, ensuring their integration into personalized and effective rehabilitation strategies for patients with stroke and neurological conditions. Declarations AUTHORS CONTRIBUTIONS STATEMENT Brasiliano Paolo: Conceptualization, Methodology, Software, Validation, Formal Analysis, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization. Orejel-Bustos Amaranta: Investigation, Data Curation. Belluscio Valeria: Investigation, Writing - Review & Editing. Cereatti Andrea: Software Della Croce Ugo: Software Trabassi Dante: Methodology, Writing - Review & Editing. Salis Francesca: Software Tramontano Marco: Resources, Writing - Review & Editing, Funding Acquisition. Buzzi Maria Gabriella: Resources, Vannozzi Giuseppe: Writing - Review & Editing Bergamini Elena: Supervision, Project administration, Writing - Review & Editing, Funding Acquisition Funding : This study was supported by the Italian Ministry of Health (GR-2019-12370757). Data Availability Statement : The data associated with this paper are not publicly available but are available from the corresponding author on reasonable request. Competing interests: All the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Declaration of generative AI in scientific writing: During the preparation of this work the authors used DeepL in order to check and correct English grammar mistakes. References Sacco, R. L. et al. An Updated Definition of Stroke for the 21st Century. Stroke 44 , 2064–2089 (2013). Martin, S. S. et al. 2024 Heart Disease and Stroke Statistics: A Report of US and Global Data From the American Heart Association. Circulation (2024) doi:10.1161/CIR.0000000000001209. Wang, W. et al. Prevalence, Incidence, and Mortality of Stroke in China. Circulation (2017) doi:10.1161/CIRCULATIONAHA.116.025250. Kim, Y. W. Update on Stroke Rehabilitation in Motor Impairment. Brain Neurorehabil 15 , e12 (2022). Selves, C., Stoquart, G. & Lejeune, T. Gait rehabilitation after stroke: review of the evidence of predictors, clinical outcomes and timing for interventions. Acta Neurol Belg 120 , 783–790 (2020). Kinoshita, S., Abo, M., Okamoto, T. & Tanaka, N. Utility of the Revised Version of the Ability for Basic Movement Scale in Predicting Ambulation during Rehabilitation in Poststroke Patients. Journal of Stroke and Cerebrovascular Diseases 26 , 1663–1669 (2017). Hutabarat, Y., Owaki, D. & Hayashibe, M. Recent Advances in Quantitative Gait Analysis Using Wearable Sensors: A Review. IEEE Sensors Journal 21 , 26470–26487 (2021). Mohan, D. M. et al. Assessment Methods of Post-stroke Gait: A Scoping Review of Technology-Driven Approaches to Gait Characterization and Analysis. Frontiers in Neurology 12 , (2021). Kim, G. J., Parnandi, A., Eva, S. & Schambra, H. The use of wearable sensors to assess and treat the upper extremity after stroke: a scoping review. Disabil Rehabil 44 , 6119–6138 (2022). Picerno, P. et al. Wearable inertial sensors for human movement analysis: a five-year update. Expert Rev Med Devices 18 , 79–94 (2021). Jiao, Y., Hart, R., Reading, S. & Zhang, Y. Systematic review of automatic post-stroke gait classification systems. Gait & Posture 109 , 259–270 (2024). Boukhennoufa, I., Zhai, X., Utti, V., Jackson, J. & McDonald-Maier, K. D. Wearable sensors and machine learning in post-stroke rehabilitation assessment: A systematic review. Biomedical Signal Processing and Control 71 , 103197 (2022). Altilio, R., Paoloni, M. & Panella, M. Selection of clinical features for pattern recognition applied to gait analysis. Med Biol Eng Comput 55 , 685–695 (2017). Sung, J. et al. Classification of Stroke Severity Using Clinically Relevant Symmetric Gait Features Based on Recursive Feature Elimination With Cross-Validation. IEEE Access 10 , 119437–119447 (2022). Altilio, R., Liparulo, L., Proietti, A., Paoloni, M. & Panella, M. A genetic algorithm for feature selection in gait analysis. in 2016 IEEE Congress on Evolutionary Computation (CEC) 4584–4591 (2016). doi:10.1109/CEC.2016.7744374. Lee, J., Park, S. & Shin, H. Detection of Hemiplegic Walking Using a Wearable Inertia Sensing Device. Sensors (Basel) 18 , 1736 (2018). Hsu, W.-C. et al. Can Trunk Acceleration Differentiate Stroke Patient Gait Patterns Using Time- and Frequency-Domain Features? Applied Sciences 11 , 1541 (2021). Mannini, A., Trojaniello, D., Cereatti, A. & Sabatini, A. M. A Machine Learning Framework for Gait Classification Using Inertial Sensors: Application to Elderly, Post-Stroke and Huntington’s Disease Patients. Sensors (Basel) 16 , 134 (2016). Scheffer, C. & Cloete, T. Inertial motion capture in conjunction with an artificial neural network can differentiate the gait patterns of hemiparetic stroke patients compared with able-bodied counterparts. Comput Methods Biomech Biomed Engin 15 , 285–294 (2012). Wang, L., Sun, Y., Li, Q., Liu, T. & Yi, J. Two Shank-Mounted IMUs-Based Gait Analysis and Classification for Neurological Disease Patients. IEEE Robotics and Automation Letters 5 , 1970–1976 (2020). Altilio, R., Rossetti, A., Fang, Q., Gu, X. & Panella, M. A comparison of machine learning classifiers for smartphone-based gait analysis. Med Biol Eng Comput 59 , 535–546 (2021). Iosa, M. et al. Artificial Neural Network Analyzing Wearable Device Gait Data for Identifying Patients With Stroke Unable to Return to Work. Front Neurol 12 , 650542 (2021). Wang, F.-C. et al. Detection and Classification of Stroke Gaits by Deep Neural Networks Employing Inertial Measurement Units. Sensors (Basel) 21 , 1864 (2021). Mathur, D. & Bhatia, D. Gait classification of stroke survivors - An analytical study. Journal of Interdisciplinary Mathematics 25 , 163–181 (2022). Holden, M. K., Gill, K. M., Magliozzi, M. R., Nathan, J. & Piehl-Baker, L. Clinical gait assessment in the neurologically impaired. Reliability and meaningfulness. Phys Ther 64 , 35–40 (1984). Folstein, M. F., Folstein, S. E. & McHugh, P. R. ‘Mini-mental state’. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12 , 189–198 (1975). Bergamini, E. et al. Estimating orientation using magnetic and inertial sensors and different sensor fusion approaches: accuracy assessment in manual and locomotion tasks. Sensors (Basel) 14 , 18625–18649 (2014). Kavanagh, J. J. & Menz, H. B. Accelerometry: A technique for quantifying movement patterns during walking. Gait & Posture 28 , 1–15 (2008). Madgwick, S. O. H., Harrison, A. J. L. & Vaidyanathan, A. Estimation of IMU and MARG orientation using a gradient descent algorithm. IEEE Int Conf Rehabil Robot 2011 , 5975346 (2011). Bertoli, M. et al. Estimation of spatio-temporal parameters of gait from magneto-inertial measurement units: multicenter validation among Parkinson, mildly cognitively impaired and healthy older adults. BioMed Eng OnLine 17 , 58 (2018). Menz, H. B., Lord, S. R. & Fitzpatrick, R. C. Acceleration patterns of the head and pelvis when walking on level and irregular surfaces. Gait & Posture 18 , 35–46 (2003). Buckley, C., Galna, B., Rochester, L. & Mazzà, C. Attenuation of Upper Body Accelerations during Gait: Piloting an Innovative Assessment Tool for Parkinson’s Disease. Biomed Res Int 2015 , 865873 (2015). Pasciuto, I., Bergamini, E., Iosa, M., Vannozzi, G. & Cappozzo, A. Overcoming the limitations of the Harmonic Ratio for the reliable assessment of gait symmetry. J Biomech 53 , 84–89 (2017). Melendez-Calderon, A., Shirota, C. & Balasubramanian, S. Estimating Movement Smoothness From Inertial Measurement Units. Front Bioeng Biotechnol 8 , 558771 (2020). Zifchock, R. A., Davis, I., Higginson, J. & Royer, T. The symmetry angle: a novel, robust method of quantifying asymmetry. Gait Posture 27 , 622–627 (2008). Trabassi, D. et al. Machine Learning Approach to Support the Detection of Parkinson’s Disease in IMU-Based Gait Analysis. Sensors 22 , 3700 (2022). Pavan, K. K., Rao, A. A., Rao, A. V. D. & Sridhar, G. R. Single Pass Seed Selection Algorithm for k-Means. JCS 6 , 60–66 (2010). McCrum, C., van Beek, J., Schumacher, C., Janssen, S. & Van Hooren, B. Sample size justifications in Gait & Posture. Gait & Posture 92 , 333–337 (2022). Lakens, D. Sample Size Justification. Collabra: Psychology 8 , 33267 (2022). Trabassi, D. et al. Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia. Sensors (Basel) 24 , 3613 (2024). Tramontano, M. et al. Dynamic Stability, Symmetry, and Smoothness of Gait in People with Neurological Health Conditions. Sensors (Basel) 24 , 2451 (2024). Bergamini, E. et al. Multi-sensor assessment of dynamic balance during gait in patients with subacute stroke. Journal of Biomechanics 61 , 208–215 (2017). Vabalas, A., Gowen, E., Poliakoff, E. & Casson, A. J. Machine learning algorithm validation with a limited sample size. PLOS ONE 14 , e0224365 (2019). Chaibub Neto, E. et al. Detecting the impact of subject characteristics on machine learning-based diagnostic applications. npj Digit. Med. 2 , 1–6 (2019). Little, V. L., Perry, L. A., Mercado, M. W., Kautz, S. A. & Patten, C. Gait asymmetry pattern following stroke determines acute response to locomotor task. Gait Posture 77 , 300–307 (2020). Patterson, K. K., Gage, W. H., Brooks, D., Black, S. E. & McIlroy, W. E. Evaluation of gait symmetry after stroke: a comparison of current methods and recommendations for standardization. Gait Posture 31 , 241–246 (2010). Balasubramanian, C. K., Neptune, R. R. & Kautz, S. A. Variability in spatiotemporal step characteristics and its relationship to walking performance post-stroke. Gait Posture 29 , 408–414 (2009). Kim, C. M. & Eng, J. J. Symmetry in vertical ground reaction force is accompanied by symmetry in temporal but not distance variables of gait in persons with stroke. Gait Posture 18 , 23–28 (2003). Bowden, M. G., Balasubramanian, C. K., Behrman, A. L. & Kautz, S. A. Validation of a speed-based classification system using quantitative measures of walking performance poststroke. Neurorehabil Neural Repair 22 , 672–675 (2008). Germanotta, M., Iacovelli, C. & Aprile, I. Evaluation of Gait Smoothness in Patients with Stroke Undergoing Rehabilitation: Comparison between Two Metrics. Int J Environ Res Public Health 19 , 13440 (2022). Garcia, F. do V. et al. Movement smoothness in chronic post-stroke individuals walking in an outdoor environment—A cross-sectional study using IMU sensors. PLoS One 16 , e0250100 (2021). Additional Declarations No competing interests reported. Supplementary Files TREE.xls KNN.xls SVM.xls Cite Share Download PDF Status: Published Journal Publication published 09 Mar, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 15 Dec, 2025 Reviews received at journal 12 Dec, 2025 Reviewers agreed at journal 18 Nov, 2025 Reviews received at journal 21 Sep, 2025 Reviewers agreed at journal 11 Sep, 2025 Reviewers invited by journal 11 Sep, 2025 Editor invited by journal 02 Sep, 2025 Editor assigned by journal 30 Aug, 2025 Submission checks completed at journal 29 Aug, 2025 First submitted to journal 28 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7478886","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":516705511,"identity":"42a7ab20-ce03-4800-9447-cb2669d53367","order_by":0,"name":"Brasiliano Paolo","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABH0lEQVRIiWNgGAWjYHACxgOMDUCKmbnhAESAGURLyODTA9XCCNPClgDSwkNYCwOEBAIeAzCJS7k5++EDBz7uuCcv387YeODnDpt8fv4zn1/dqLHgYWA/fHQDFi2WPWkJB2eeKTbccJix4WDvmTTLmTNyt1nnHAM6jCct7QYWLQYHcgwO87YlMG4A+YW37bCBwQ3ebcY5bEAtEjxmWLWcfwPWYj+/GWjLX6AW+/Nnnhnn/MOj5QbElsQGoMMOg21hyGF+nNuGW4vljGcgvyQkg/xyWLYtzUDiRpoZc26fBA8bDr+Y8ycffPBxR4Lt/P7Dhz++bbMx4O8//Phzzrc6OX72w8ewOgyLGAObBJjEJoVLC/MHHKpHwSgYBaNgZAIA27xsdxj+GxQAAAAASUVORK5CYII=","orcid":"","institution":"University of Rome “Foro Italico”","correspondingAuthor":true,"prefix":"","firstName":"Brasiliano","middleName":"","lastName":"Paolo","suffix":""},{"id":516705517,"identity":"b6e4751b-c488-4145-aa20-cc64a3f3deb5","order_by":1,"name":"Orejel-Bustos Amaranta","email":"","orcid":"","institution":"IRCCS Santa Lucia Foundation","correspondingAuthor":false,"prefix":"","firstName":"Orejel-Bustos","middleName":"","lastName":"Amaranta","suffix":""},{"id":516705520,"identity":"ef3dbe9b-e76d-4e8f-8e75-2241b5fe129d","order_by":2,"name":"Belluscio Valeria","email":"","orcid":"","institution":"University of Rome “Foro Italico”","correspondingAuthor":false,"prefix":"","firstName":"Belluscio","middleName":"","lastName":"Valeria","suffix":""},{"id":516705526,"identity":"4a193c3a-aa77-4b3a-a395-03e6b56c58cd","order_by":3,"name":"Cereatti Andrea","email":"","orcid":"","institution":"Politecnico di Torino","correspondingAuthor":false,"prefix":"","firstName":"Cereatti","middleName":"","lastName":"Andrea","suffix":""},{"id":516705527,"identity":"fdb360c4-f691-4741-9e85-1b07d9c1568d","order_by":4,"name":"Della Croce Ugo","email":"","orcid":"","institution":"University of Sassari","correspondingAuthor":false,"prefix":"","firstName":"Della","middleName":"Croce","lastName":"Ugo","suffix":""},{"id":516705531,"identity":"6e05ac7e-f552-47c9-9aef-656369c327c2","order_by":5,"name":"Trabassi Dante","email":"","orcid":"","institution":"Sapienza University of Rome","correspondingAuthor":false,"prefix":"","firstName":"Trabassi","middleName":"","lastName":"Dante","suffix":""},{"id":516705533,"identity":"bb1a2b69-20c8-49ff-b2ce-6b760617be12","order_by":6,"name":"Salis Francesca","email":"","orcid":"","institution":"University of Sassari","correspondingAuthor":false,"prefix":"","firstName":"Salis","middleName":"","lastName":"Francesca","suffix":""},{"id":516705535,"identity":"dfc0ea76-0977-48e4-b22c-0e447370cff2","order_by":7,"name":"Tramontano Marco","email":"","orcid":"","institution":"Alma Mater University of Bologna","correspondingAuthor":false,"prefix":"","firstName":"Tramontano","middleName":"","lastName":"Marco","suffix":""},{"id":516705537,"identity":"9c9d7d9b-0bb9-435f-a648-227de6f5e7cf","order_by":8,"name":"Buzzi Maria Gabriella","email":"","orcid":"","institution":"IRCCS Santa Lucia Foundation","correspondingAuthor":false,"prefix":"","firstName":"Buzzi","middleName":"Maria","lastName":"Gabriella","suffix":""},{"id":516705539,"identity":"abebae09-6e2a-40da-977e-b290c548fe2c","order_by":9,"name":"Vannozzi Giuseppe","email":"","orcid":"","institution":"University of Rome “Foro Italico”","correspondingAuthor":false,"prefix":"","firstName":"Vannozzi","middleName":"","lastName":"Giuseppe","suffix":""},{"id":516705540,"identity":"e7c789aa-e0c4-4da2-9a1a-331e1926e59f","order_by":10,"name":"Bergamini Elena","email":"","orcid":"","institution":"University of Bergamo","correspondingAuthor":false,"prefix":"","firstName":"Bergamini","middleName":"","lastName":"Elena","suffix":""}],"badges":[],"createdAt":"2025-08-28 09:53:22","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7478886/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7478886/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-026-43666-7","type":"published","date":"2026-03-09T15:57:53+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":91847293,"identity":"e3c843df-eca9-42e5-a127-d13458c408e9","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9115460,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/93499844cb7adbb3701bcf3a.docx"},{"id":91845767,"identity":"e3e20489-d026-4088-9e3a-f7ba512ee63a","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12285,"visible":true,"origin":"","legend":"","description":"","filename":"2559ec07edb347b398cc50c83941d2e1.json","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/ea20fa763c749c2da14a3d80.json"},{"id":91845769,"identity":"8e54287c-d913-4c4d-9950-8153e4baf604","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"xls","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":79872,"visible":true,"origin":"","legend":"","description":"","filename":"KNN.xls","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/845e55e7269629dd828129c0.xls"},{"id":91845776,"identity":"77334f0a-e6a2-49b0-95f6-b92535b02239","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"xls","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":82432,"visible":true,"origin":"","legend":"","description":"","filename":"SVM.xls","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/8560829e7977e32ec4c0dc4e.xls"},{"id":91845775,"identity":"77cf1338-e2d9-4747-9fbf-bc37153a297f","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"xls","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":70656,"visible":true,"origin":"","legend":"","description":"","filename":"TREE.xls","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/ee8ee45ed67b00c1679cb372.xls"},{"id":91847291,"identity":"97138a3a-5f0e-4189-b168-e8a64d25b8cf","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":141771,"visible":true,"origin":"","legend":"","description":"","filename":"2559ec07edb347b398cc50c83941d2e11enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/5a464af5ab7363303f539fcb.xml"},{"id":91845779,"identity":"91d1ecf8-d24a-4754-b303-170bae236dfd","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"jpeg","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":253044,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/a6c105ecd674b6187cfda0ab.jpeg"},{"id":91845785,"identity":"579290bc-a8ab-478c-9e88-d720c55e12cc","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"jpeg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":596878,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/f3baaf1ae68f07660906b7e3.jpeg"},{"id":91845787,"identity":"9180c28b-786e-4220-8a30-55c89d67cee6","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"jpeg","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":452688,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/f716930ff22f7d4cf4c3277e.jpeg"},{"id":91845782,"identity":"cf644d0b-3c00-41e7-a3be-b5edf8b0ece2","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"jpeg","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":395839,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/6d9744b05f46266ec4942603.jpeg"},{"id":91847294,"identity":"122218ba-1aec-411f-afbb-374d0980d3a3","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"jpeg","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":410678,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/10b59aa6f0dd9217bd8443ed.jpeg"},{"id":91847289,"identity":"a8be9028-c81f-46af-ab53-3da878121e20","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":36696,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/9f2b4ad883e24540ed20bc81.png"},{"id":91847290,"identity":"3c82efba-6d9a-413e-b075-ad78309a5974","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48982,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/b64893155eea4c960c9afcd7.png"},{"id":91845789,"identity":"94f4a0e2-c83c-4183-901c-a205c95919c3","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":151588,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/270e1d17545ca74d8c46fc98.png"},{"id":91845783,"identity":"6ab17e93-ba95-4513-9d5c-39310eb81451","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":88904,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/15ba7a25da651ff58c41b786.png"},{"id":91845788,"identity":"40597286-fd86-45b2-ae2c-bd34357da9d0","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":57125,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/0098ce9e70c375a1c36b9bb7.png"},{"id":91847295,"identity":"b945cd05-1516-4a3f-8ac3-e79306bc6441","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"xml","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":139995,"visible":true,"origin":"","legend":"","description":"","filename":"2559ec07edb347b398cc50c83941d2e11structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/d63e4e2114d42b0c633ac6e9.xml"},{"id":91845791,"identity":"034a526b-afc7-4577-9dfd-b3b757d15d98","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"html","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":159990,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/8124e1e4d070ba2833fc2d6b.html"},{"id":91845765,"identity":"361cd6c8-3850-40f8-9480-03e3fdd2b826","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":68573,"visible":true,"origin":"","legend":"\u003cp\u003eScheme of the steps followed during the data splitting and analysis presented in the work.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/d7aea13d656b7818ca13d641.jpg"},{"id":91847285,"identity":"5effedeb-77db-4489-a368-401878202f4d","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":201619,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy, Recall, Precision, and F1 score are reported for the three algorithms used (KNN, SVM, TREE) over the ten runs performed.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/d430cf715de649feb4b35ebf.jpg"},{"id":91847947,"identity":"c3825f31-047d-4c04-82e1-bbb5894b6486","added_by":"auto","created_at":"2025-09-22 10:29:48","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":140868,"visible":true,"origin":"","legend":"\u003cp\u003eHeatmap of feature occurrences selected over the ten runs during the SBS procedures when using KNN, SVM, and TREE algorithms. Last column of the heatmap shows the overall occurrence of each feature. Bold written features are those considered for the following analysis. LB, ST, and FH: Lower Back, Sternum, and Forehead, respectively. AP, ML, and CC: Anterior-posterior, Medio-lateral, and Cranio-caudal, respectively. IHR: Improved Harmonic Ratio. CoV: Coefficient of Variation. LDLJ: Logarithmic dimensionless Jerk from acceleration (LDLJA) or angular velocity (LDLJW) data. COA: Coefficient of attenuation.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/7749b1085f1836ad100af104.jpg"},{"id":91845773,"identity":"9ad642c0-6f40-44d0-86ca-ba0df4452e7f","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":117539,"visible":true,"origin":"","legend":"\u003cp\u003eResults of the unsupervised cluster over the ten test sets defined. Accuracy, Recall, Precision, and F1 score are reported for the k-medoids over the ten runs performed using four distance metrics (CO, SqEU, CB, and EU) while adding one-by-one the feature ordered according to the total number of occurrences at the end of the SBSs.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/8be52839e620387f8ac9cb81.jpg"},{"id":91847288,"identity":"a303370f-9501-4c91-87c3-3fe1e3f7ab8e","added_by":"auto","created_at":"2025-09-22 10:21:48","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":125859,"visible":true,"origin":"","legend":"\u003cp\u003eResults of the unsupervised cluster on the unseen data. Accuracy, Recall, Precision, and F1 score are reported for the k-medoids using four distance metrics (CO, SqEU, CB, and EU) while adding one-by-one the feature ordered according to the total number of occurrences at the end of the SBS.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/fd192699b92cf8f55bdde1fe.jpg"},{"id":104739586,"identity":"9aabb0f3-3182-47b8-9c32-2a3f02dd3dad","added_by":"auto","created_at":"2026-03-16 16:09:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1596674,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/2b1d4f02-ef1e-48b9-b801-196114952c16.pdf"},{"id":91845766,"identity":"1682aab1-dde6-4c3f-9199-f84c0f0a15c7","added_by":"auto","created_at":"2025-09-22 10:13:48","extension":"xls","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":70656,"visible":true,"origin":"","legend":"","description":"","filename":"TREE.xls","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/fbee53bb1af1e3fb41d9808a.xls"},{"id":91847945,"identity":"65ac2156-b5db-4f84-b2c4-c0261d1b26ce","added_by":"auto","created_at":"2025-09-22 10:29:48","extension":"xls","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":79872,"visible":true,"origin":"","legend":"","description":"","filename":"KNN.xls","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/6df6bc309cdb70094f9728dc.xls"},{"id":91848203,"identity":"0d4c2e45-fa20-48d1-a06d-8f63cf08185c","added_by":"auto","created_at":"2025-09-22 10:37:48","extension":"xls","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":82432,"visible":true,"origin":"","legend":"","description":"","filename":"SVM.xls","url":"https://assets-eu.researchsquare.com/files/rs-7478886/v1/9646c471ff8f004ee25ca3fa.xls"}],"financialInterests":"No competing interests reported.","formattedTitle":"Identifying Key Gait Features in Stroke Patients: A Machine Learning Approach with Supervised and Unsupervised Validation","fulltext":[{"header":"1. INTRODUCTION","content":"\u003cp\u003eThe term stroke is used to describe brain damage due to several different vascular causes \u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. Stroke is a global challenge that poses significant health and socioeconomic challenges for both the individual and society as a whole \u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e,\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Indeed, after a stroke event, brain cells die, resulting in functional and cognitive impairments \u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. Among these, motor disabilities primarily limit patients' ability to accomplish activities of daily living \u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e, with walking impairments influencing participation, autonomy, and quality of life of patients \u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. With these premises, the improvement of gait impairments is one of the crucial aspects of post-stroke rehabilitation \u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. In this context, standard clinical can be integrated with instrumented gait analysis, aimed at obtaining objective indicators of walking performance to evaluate the progression of the pathology, define tailored treatments, and monitor the efficacy of the latter. In the past years, the context in which instrumented gait analysis took place (especially in the clinical practice) has shifted towards a more ecological one, thanks to the increasing development of wearable technologies and the development of computational methods \u003csup\u003e\u003cspan additionalcitationids=\"CR8 CR9\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, bringing several advantages over traditional laboratory-based assessments. In this context, magneto-inertial measurement units (MIMUs) are widely used to obtain features related to the quality of gait and pertain to different domains, like symmetry, smoothness or spatiotemporal features which are often considered during post-stroke clinical evaluations \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eIn combination with the availability of new measurement tools, new data analysis techniques, like machine learning algorithms, have been fruitfully integrated to obtain more information from patients\u0026rsquo; data. These tools have found many applications in the stroke population, ranging from the discrimination between/among groups or patient categories (like people with stroke and healthy controls, patients with different pathologies, or patients with different levels of stroke severity) \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, as well as the recognition of different types of activities, classification of well/poorly executed tasks, and other applications \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Among them, one of particular interest for clinicians is the classification of pathological and healthy people based on gait features. Although it may seem an obvious categorization following a medical diagnosis, the understanding of a pathology is primarily based on the characterization of the differences when compared to a non-pathological condition. Nevertheless, to do so, the features that can optimally capture such differences must be identified. Indeed, when analyzing gait, a wide number of features may be measured, some of which may not carry useful information for the pathological population of interest. To overcome this issue, features selection techniques and machine learning algorithms may be used in combination to reduce the number of features with the aim of retaining those more informative and able to distinguish healthy from pathological people.\u003c/p\u003e\u003cp\u003eIn this framework, classification performance per se is often employed as an indicator of the quality of the selected features in distinguishing between healthy and pathological conditions. Indeed, this distinction after a stroke diagnosis holds limited relevance. However, evaluating differences in classification performance when varying the selected features allows for identifying the feature combinations that best optimize healthy/pathological discrimination. As a result, those gait domains that mostly differ between healthy and patients with stroke can be identified.\u003c/p\u003e\u003cp\u003eAlthough some authors have explored this (or similar) approach \u003csup\u003e\u003cspan additionalcitationids=\"CR14 CR15 CR16 CR17\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e, most of them applied feature selection methodologies as an intermediate step to improve the performance of a given machine learning model, thus not focusing on the generalizability of the feature selection approach. Indeed, the selected features depend on several factors, including the dataset/sample size, the feature selection technique, and the machine learning algorithm employed. Testing the generalizability of the feature selection approach across different machine learning algorithms is thus of the utmost importance. To the authors\u0026rsquo; knowledge, among recent studies leveraging MIMUs and machine learning in patients with stroke \u003csup\u003e\u003cspan additionalcitationids=\"CR17 CR18 CR19 CR20 CR21 CR22 CR23\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, only one \u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e sought to reduce the features space by employing a feature selection technique across multiple machine learning algorithms and analyzing the frequency of the selected features. This approach identifies features that most consistently enhance classification performance, thus highlighting characteristics of pathological gait patterns. Nevertheless, the sample size in that study was insufficient to provide a reliable representation of the investigated population. Furthermore, no previous study has evaluated whether features selected through supervised machine learning could reliably differentiate healthy and pathological individuals when applied to unsupervised clustering methods. In other words, the discriminative value of selected features has not been tested independently of the supervised algorithm's learning capabilities.\u003c/p\u003e\u003cp\u003eTherefore, this study aims to identify an optimal subset of gait features extracted using a set of MIMUs through a feature selection approach in combination with multiple machine learning algorithms. The goal was achieved by distinguishing individuals with stroke from healthy controls, thereby characterizing pathological gait patterns. To ensure generalizability, the analysis was iterated across various combinations of feature subsets and participant groups.\u003c/p\u003e"},{"header":"2. METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1. PARTICIPANTS\u003c/h2\u003e\u003cp\u003eEighty-five patients with stroke (PwS; age: 57\u0026thinsp;\u0026plusmn;\u0026thinsp;16 yrs; mass: 71\u0026thinsp;\u0026plusmn;\u0026thinsp;12 kg; stature: 1.69\u0026thinsp;\u0026plusmn;\u0026thinsp;0.09 m) and 97 healthy participants (HP; age: 48\u0026thinsp;\u0026plusmn;\u0026thinsp;12 yrs; mass: 70\u0026thinsp;\u0026plusmn;\u0026thinsp;18 kg; stature: 1.67\u0026thinsp;\u0026plusmn;\u0026thinsp;0.08 m) were enrolled in this study. The study was conducted in accordance with the World Medical Association Declaration of Helsinki and was approved by the Ethics Committee of the Institute for research and Healthcare Santa Lucia (with protocol number CE/AG4/PROG.383\u0026thinsp;\u0026minus;\u0026thinsp;11 and subsequent integrations). Healthy participants between the age of 18 and 80 years were considered eligible for the study if they did not report any condition or use of medication that could have affected their motor performance. Stroke patients (both in the sub-acute and in the chronic phase of the pathology) that were able to walk without any device or need physical assistance were included in the study (Functional Ambulation Classification \u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e) scale score\u0026thinsp;\u0026ge;\u0026thinsp;3). Exclusion criteria for this group were cognitive deficits affecting the capacity of patients to understand the task instructions (Mini Mental State Examination \u003csup\u003e26\u003c/sup\u003e \u0026gt;4), severe unilateral spatial neglect, severe aphasia, and presence of neurological, orthopedic, or cardiac comorbidities. All participants were included in the study after providing their informed consent\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2. EXPERIMENTAL SET-UP\u003c/h2\u003e\u003cp\u003eData collection was performed in the gym of the Institute for research and Healthcare Santa Lucia, in Rome. Participants were asked to perform a 10-Meter Walk Test (10-MWT) at their self-selected speed along a straight walkway while wearing comfortable shoes. At the beginning of each trial, participants were instructed to maintain an orthostatic posture for five seconds. Each participant performed a minimum of three trials. During the trials, participants were equipped with five synchronized MIMUs (OPAL, APDM wearable technologies, Portland, USA). The MIMUs included triaxial accelerometer, gyroscope, and magnetometer with full scale ranges of \u0026plusmn;\u0026thinsp;6 g, \u0026plusmn;\u0026thinsp;1500 deg/s, and \u0026plusmn;\u0026thinsp;6 Gauss, respectively, with a sampling rate of 128 Hz. Three MIMUs were fixed to the upper body of the participants, on the forehead (FH) on the occipital cranium bone close to the lambdoid suture of the head, at the center of the sternum (ST), and at the pelvis (PV) level, in correspondence of L4-L5 vertebrae. The last two MIMUs were placed laterally on the distal part of the tibiae, slightly above the lateral malleoli, and securely fixed with Velcro straps. These two sensors were used for segmentation of the gait step cycle. Attention was paid to each MIMU fixation to minimize the relative movement between the MIMU and the underlying bones.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3. SIGNAL PREPROCESSING\u003c/h2\u003e\u003cp\u003eData preprocessing was performed through implementation of customized algorithms in the Matlab\u0026reg; Software R2021b (The MathWorks Inc., MA, US).\u003c/p\u003e\u003cp\u003eFirst, a consistent reference frame was defined for all participants. During the static phase of the 10-MWT, a time-invariant transformation aligned each MIMU\u0026rsquo;s local reference system to a frame based on the gravity vector. Afterwards, the time-invariant transformation was applied to the accelerometer and gyroscope data recorded during the dynamic phase of the test. Finally, gravity was removed from the component of the acceleration signal aligned with the vertical axis of the reference frame system. As a result, all data were expressed relative to a reference frame that approximated the anterior-posterior (AP), medio-lateral (ML), and cranio-caudal (CC) anatomical axes \u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. Accelerometer data were filtered using a second order Butterworth low-pass filter with a cut-off frequency of 10 Hz, while gyroscope data were filtered using a second order Butterworth low-pass filter with a cut-off frequency of 6 Hz \u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eGait events were identified from ML angular velocity recorded by the MIMUs placed on the shanks of the participants while walking speed and gait spatial features were calculated through forward and backward integration of shank data in combination with a complementary filter \u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e and zero-velocity update procedure \u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. For each identified stride, features of upper body movement stability \u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e,\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e, symmetry \u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e, and smoothness \u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e were calculated from FH, ST, and PV MIMU data. In addition, symmetry \u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e and variability of gait spatiotemporal features were also calculated. For the sake of readability, in the following sections spatiotemporal gait features are not described in detail as they represent standard measurement in instrumented gait analysis. Nevertheless, a complete list of spatiotemporal features is provided. For information on the equations for calculating each feature refer to the work by Bertoli et al \u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003e\u003cb\u003e2.4\u003c/b\u003e. \u003cb\u003eSPATIOTEMPORAL FEATURES\u003c/b\u003e\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eStride frequency\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eStride speed\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eStride length\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eStride duration\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eStance speed\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eStance length\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eStance duration\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eSwing speed\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eSwing length\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eSwing duration\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eDouble support duration\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eSingle Support duration\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e\u003cb\u003e2.5\u003c/b\u003e. \u003cb\u003eSTABILITY FEATURES\u003c/b\u003e\u003c/h2\u003e\u003cp\u003eRoot Mean Square (RMS) was calculated from PV, ST, and FH MIMUs acceleration signals over each stride as follow:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:RMS=\\:\\sqrt{\\frac{{\\sum\\:}_{i=1}^{n}{x}^{2}}{n}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cem\u003eX\u003c/em\u003e represents the acceleration values and \u003cem\u003en\u003c/em\u003e the number of samples of the considered stride.\u003c/p\u003e\u003cp\u003eCoefficient of attenuation \u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e (COA) was calculated from PV, ST, and FH MIMUs acceleration signals over each stride as follow:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:COA=\\left(1-\\frac{{RMS}_{upper\\:segment}}{{RMS}_{lower\\:segment}}\\right)\\times\\:100$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ePrecisely, COAs were calculated from PV to ST, from PV to FH, and from ST to FH\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.6. SYMMETRY FEATURES\u003c/h2\u003e\u003cp\u003eImproved Harmonic Ratio \u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e (IHR) was calculated from PV, ST, and FH MIMUs acceleration signals over each stride as follow:\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:{iHR}_{n}=\\:\\frac{\\sum\\:_{i=1}^{n}{P}_{I}^{i}}{\\sum\\:_{i=1}^{n}{(P}_{I}^{i}{+\\:P}_{E}^{i})}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{P}_{I}^{i}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{P}_{E}^{i}\\)\u003c/span\u003e\u003c/span\u003e are the power of the intrinsic and extrinsic \u003cem\u003en\u003c/em\u003e considered harmonics.\u003c/p\u003e\u003cp\u003eThe Symmetry Angle (SA) was calculated from the spatiotemporal features of the side-paired values of each stride, as follows:\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:SA=\\:\\frac{\\left(45^\\circ\\:-\\text{arctan}\\left(\\frac{{X}_{left}}{{X}_{right}}\\right)\\right)}{90^\\circ\\:}\\:\\times\\:100$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003eleft\u003c/em\u003e\u003c/sub\u003e and \u003cem\u003eX\u003c/em\u003e\u003csub\u003e\u003cem\u003eright\u003c/em\u003e\u003c/sub\u003e are the features values for left and right strides, respectively.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.7. SMOOTHNESS FEATURES\u003c/h2\u003e\u003cp\u003eLog dimensionless jerk \u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e (LDLJ) was calculated from PV, ST, and FH MIMUs acceleration and angular velocities (LDLJA and LDLJW, respectively) signals over each stride as follows:\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$$\\:LDLJ\\:=\\:-\\text{ln}\\left(\\frac{{t}^{2}\\:-\\:{t}^{1}}{\\underset{t\\in\\:\\left[{t}^{1},{t}^{2}\\right]}{\\text{max}}\\left({\\left|\\left|x\\left(t\\right)\\right|\\right|}_{}\\right)}\\:\\cdot\\:Ij\\right)$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWith:\u003cdiv id=\"Equf\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equf\" name=\"EquationSource\"\u003e\n$$\\:Ij\\:=\\:{\\int\\:}_{{t}^{1}}^{{t}^{2}}{\\left|\\left|{x}^{{\\prime\\:}}\\left(t\\right)\\right|\\right|}_{}^{2}\\:dt$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cem\u003ex\u003c/em\u003e are the linear or angular accelerations data and \u003cem\u003et\u003c/em\u003e\u003csup\u003e\u003cem\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/em\u003e\u003c/sup\u003e and \u003cem\u003et\u003c/em\u003e\u003csup\u003e\u003cem\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/em\u003e\u003c/sup\u003e are the starting and ending instants of each stride.\u003c/p\u003e\u003cp\u003eFinally, variability of spatiotemporal features was estimated by calculating the Coefficient of Variation (CoV) of each feature.\u003c/p\u003e\u003cp\u003eAfter data normalization through z-score the median value was calculated over the gait cycles for each trial and each feature. Afterwards, dataset was visually inspected to detect outliers and trials that were considered to present obvious instrumental errors were discarded. Finally, each participant\u0026rsquo;s median value was calculated over the trials. Missing values were replaced with the group median.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.8. FEATURE SELECTION AND VALIDATION\u003c/h2\u003e\u003cp\u003eFrom the entire sample considered, two subgroups 18 PwS and 20 HP (Unsupervised Test Groups) were selected and kept aside for further analysis. Afterwards, two subgroups of 68 PwS and 77 HP (Feature Selection Groups) were randomly identified and used for the first steps of the feature selection procedure. Precisely, within groups distribution of each feature was tested using the Shapiro-Wilk test. Afterwards, according to data distribution, an independent sample \u003cem\u003et\u003c/em\u003e-test or a Mann-Whitney U test was used to identify those features which differed significantly between HP and PwS \u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. Only these features were considered for further analysis.\u003c/p\u003e\u003cp\u003eTo limit multicollinearity of the dataset, Pearson\u0026rsquo;s correlation coefficients between all the retained features pairs were calculated \u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e and analyzed according to the following procedure:\u003c/p\u003e\u003cp\u003e1.1\u0026nbsp;\u0026nbsp;The total number of correlations with \u003cem\u003er\u003c/em\u003e \u0026gt; 0.5 was calculated for each feature.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e1.2 The feature (F\u003csub\u003emaxr\u003c/sub\u003e)\u0026nbsp;that showed the highest number of over-threshold correlations was kept while those that showed over-threshold correlations with\u0026nbsp;F\u003csub\u003emaxr\u003c/sub\u003e were discarded.\u003c/p\u003e\u003cp\u003eThe procedure was iterated until any \u003cem\u003er\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.5 was found. If two features with the same number of over-threshold correlations were found (i.e., if two F\u003csub\u003emaxr\u003c/sub\u003e were found), one was chosen according to the suggestion of physical therapists of the neurorehabilitation hospital.\u003c/p\u003e\u003cp\u003eFinally, from the Feature Selection Groups, ten subgroups were defined by randomly selecting 70% of the participants (48 PwS and 54 HP) for the training and validation sets, and the remaining 30% of the participants (20 PwS and 23 HP) for the test sets. Afterward, on these datasets, a sequential backward feature selection (SBS) was implemented. This procedure allows the reduction of the number of features while preserving the performance of the classifier and the interpretability of the results. Indeed, no new combination of features is created (such as in Principal Component Analysis, Linear Discriminant Analysis or other features extraction techniques); rather, only relevant features from the original dataset are kept, making the procedure usable and the results interpretable in clinical settings \u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. This approach is implemented by evaluating the performance of a classifier while changing the set of features as described:\u003c/p\u003e\u003cp\u003e2.1\u0026nbsp;\u0026nbsp;The complete dataset with k features is defined as the starting point and tested.\u003c/p\u003e\n\u003cp\u003e2.2\u0026nbsp;\u0026nbsp;All the possible combinations of \u003cem\u003ek-1\u003c/em\u003e features are tested.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e2.3\u0026nbsp;\u0026nbsp;The subset of \u003cem\u003ek-1\u003c/em\u003e features with the best classification performance is identified.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e2.4 The subset of features identified in step 2.3 is used as the new starting point and the procedure is repeated from step\u0026nbsp;2.2.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e2.5 \u0026nbsp;The feature selection continues until a stop criterion is met.\u0026nbsp;\u003c/p\u003e\u003cp\u003eIn this instance, the SBS process was carried out until one single feature remained. The evaluated classifiers were the K-nearest neighbors (KNN), the Support Vector Machine (SVM), and the decision tree (TREE) algorithms. The algorithms were selected for their nonparametric approach, which does not require a-priori assumptions on the dataset. During the SBS procedure, hyperparameters tuning for each algorithm was implemented using a Bayesian optimization approach. Detailed information on the hyperparameters tuning, the for each algorithm are provided in supplementary material. The classifiers\u0026rsquo; performance during SBS was assessed using a 5-fold cross validation approach and measured by classification accuracy, i.e., the ability to correctly classify participants irrespective of their group.\u003c/p\u003e\u003cp\u003eAs a result, during SBS, the three algorithms were trained and cross-validated on all ten subgroups randomly identified. Each time, the combination of best subsets and best algorithm hyperparameters were identified according to the highest classification accuracy obtained. The best identified model (i.e., the best subset of features with the tuned hyperparameters) was then tested on the corresponding test set.\u003c/p\u003e\u003cp\u003eFrom the thirty SBS procedures carried out, only those features that were selected in at least two of the ten runs were retained. Afterwards, only the shared features between datasets and algorithms were kept. Such features were arranged in descending order according to their number of occurrences at the end of the SBS procedures. Following, an unsupervised algorithm was tested, adding one by one the ordered selected features on the Unsupervised Test Groups kept aside before data analysis and not considered during the features selection procedure. In this instance, a k-medoids algorithm was implemented with the initial medoids identified using the Single Pass Seed Selection algorithm \u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e with the aim of obtaining only one solution of the clustering procedure. K-medoids was implemented to produce two clusters. Distances between the clusters medoids and the data points were measured using four different distance metrics described in the following equations.\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eCosine distance (CO)\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003cdiv id=\"Equg\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equg\" name=\"EquationSource\"\u003e\n$$\\:CO=1-\\frac{A\\bullet\\:B}{‖A‖‖B‖}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cb\u003eA\u003c/b\u003e and \u003cb\u003eB\u003c/b\u003e are the vectors defined by the features median value of the medoid and of each participant, respectively.\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eSquared Euclidean distance (SqEU)\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003cdiv id=\"Equh\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equh\" name=\"EquationSource\"\u003e\n$$\\:SqEU=\\:{\\sum\\:}_{i=1}^{n}{\\left({A}_{i}-\\:{B}_{i}\\right)}^{2}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eCity Block distance (CB)\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003cdiv id=\"Equi\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equi\" name=\"EquationSource\"\u003e\n$$\\:CB=\\:{\\sum\\:}_{i=1}^{n}\\left|{A}_{i}-\\:{B}_{i}\\right|$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eEuclidean distance (EU)\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003cdiv id=\"Equj\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equj\" name=\"EquationSource\"\u003e\n$$\\:EU=\\:{\\sum\\:}_{i=1}^{n}\\sqrt{{\\left({A}_{i}-\\:{B}_{i}\\right)}^{2}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cb\u003eA\u003c/b\u003e\u003csub\u003e\u003cb\u003ei\u003c/b\u003e\u003c/sub\u003e and \u003cb\u003eB\u003c/b\u003e\u003csub\u003e\u003cb\u003ei\u003c/b\u003e\u003c/sub\u003e are \u003cem\u003ei\u003c/em\u003e\u003csup\u003e\u003cem\u003eth\u003c/em\u003e\u003c/sup\u003e features median value of the medoid and of each participant.\u003c/p\u003e\u003cp\u003eBeing unsupervised, the algorithm produces two unlabeled clusters. To assign the group labels (i.e., PwS and HP) to the two clusters, the following procedure was implemented:\u003c/p\u003e\u003cp\u003e3.1\u0026nbsp;\u0026nbsp;The medians of each feature for the stroke and healthy groups and the two identified clusters were calculated and arranged to form n-dimensional vectors.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e3.2 The\u0026nbsp;Euclidean distance between each of the two cluster vectors and each of the two group vectors was calculated.\u003c/p\u003e\n\u003cp\u003e3.3\u0026nbsp;\u0026nbsp;The smallest distance was used to label the cluster according to the corresponding known group.\u003c/p\u003e\n\u003cp\u003e3.4 \u0026nbsp;The other cluster was labeled by exclusion.\u0026nbsp;\u003c/p\u003e\u003cp\u003eExcept for the SBS procedures, in which only classification accuracy was measured, all the other classification performances also included the recall, precision, and F1-score. The described procedure is graphically shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003c/div\u003e"},{"header":"3. RESULTS","content":"\u003cp\u003eOf the initial 79 features, 60 were retained after the \u003cem\u003et-\u003c/em\u003etest and 20 after the correlation analysis. The mean (\u0026plusmn;\u0026thinsp;SD) accuracy during SBS procedures over the ten runs for KNN, SVM, and TREE were 94.1% \u0026plusmn; 1.6%, 96.7% \u0026plusmn; 2.1%, and 89.1% \u0026plusmn; 2.2%, respectively. The mean results of the classification on the test sets are reported in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the same performance indexes across each performed run.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eMean and standard deviation values of the performance indexes of the supervised algorithms on the test sets over the ten runs.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eF1 Score\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eKNN\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e\u003cp\u003e88.1\u0026thinsp;\u0026plusmn;\u0026thinsp;5.7%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e\u003cp\u003e85\u0026thinsp;\u0026plusmn;\u0026thinsp;4.7%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e\u003cp\u003e89.4\u0026thinsp;\u0026plusmn;\u0026thinsp;8.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c5\"\u003e\u003cp\u003e87.1\u0026thinsp;\u0026plusmn;\u0026thinsp;6%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSVM\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e\u003cp\u003e89.8\u0026thinsp;\u0026plusmn;\u0026thinsp;5.1%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e\u003cp\u003e91\u0026thinsp;\u0026plusmn;\u0026thinsp;5.7%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e\u003cp\u003e87.8\u0026thinsp;\u0026plusmn;\u0026thinsp;6.6%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c5\"\u003e\u003cp\u003e89.2\u0026thinsp;\u0026plusmn;\u0026thinsp;5.3%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eTREE\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e\u003cp\u003e81.2\u0026thinsp;\u0026plusmn;\u0026thinsp;5.7%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e\u003cp\u003e78.5\u0026thinsp;\u0026plusmn;\u0026thinsp;9.1%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e\u003cp\u003e82.1\u0026thinsp;\u0026plusmn;\u0026thinsp;10.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c5\"\u003e\u003cp\u003e79.6\u0026thinsp;\u0026plusmn;\u0026thinsp;5.5%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable border=\"1\"\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThe features used in the SBS procedure, together with their relative number of occurrences in total and for each algorithm, are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. Nine common features, reported in bold, with at least two occurrences were found across all SBS methods.\u003c/p\u003e\u003cp\u003eThe results of the unsupervised clustering on the ten test sets and on the Unsupervised Test Groups are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, and Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. More in detail, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the mean of the performance indexes across the ten runs as each common feature is added incrementally, ordered by number of occurrences, whereas Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e reports the mean (\u0026plusmn;\u0026thinsp;SD) of the performance indexes across the ten runs and on the Unsupervised Test Groups using the whole subset of common features. Similarly, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e presents the performance indexes for the Unsupervised Test Groups, showing the incremental addition of each common feature.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eUnsupervised cluster analysis results on the unseen data.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"9\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDISTANCE METRICS\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eACCURACY\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003eRECALL\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e\u003cp\u003ePRECISION\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e\u003cp\u003eF1 SCORE\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eten runs\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eFinal test\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eten runs\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eFinal test\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eten runs\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eFinal test\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eten runs\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e\u003cspan type=\"BoldItalicUnderline\" class=\"BoldItalicUnderline\" name=\"Emphasis\"\u003eFinal test\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCO\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e84\u0026thinsp;\u0026plusmn;\u0026thinsp;4%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e81%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e83\u0026thinsp;\u0026plusmn;\u0026thinsp;5.9%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e82.6\u0026thinsp;\u0026plusmn;\u0026thinsp;4%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e77.7%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e82.7\u0026thinsp;\u0026plusmn;\u0026thinsp;4.4%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e80%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSqEU\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e75.8\u0026thinsp;\u0026plusmn;\u0026thinsp;8%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e83.8%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e64\u0026thinsp;\u0026plusmn;\u0026thinsp;19.2%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e82.5\u0026thinsp;\u0026plusmn;\u0026thinsp;12.6%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e70\u0026thinsp;\u0026plusmn;\u0026thinsp;12.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCB\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e75.8\u0026thinsp;\u0026plusmn;\u0026thinsp;8.1%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e83.8%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e61\u0026thinsp;\u0026plusmn;\u0026thinsp;19.7%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e85.8\u0026thinsp;\u0026plusmn;\u0026thinsp;14%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e68.8\u0026thinsp;\u0026plusmn;\u0026thinsp;13.2%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e82.3%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eEU\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e77.7\u0026thinsp;\u0026plusmn;\u0026thinsp;9.2%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e83.8%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e69.5\u0026thinsp;\u0026plusmn;\u0026thinsp;20.5%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e88.2%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e81.4\u0026thinsp;\u0026plusmn;\u0026thinsp;11.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e79%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e73.1\u0026thinsp;\u0026plusmn;\u0026thinsp;13.5%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e83.3%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable border=\"1\"\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e"},{"header":"4. DISCUSSION","content":"\u003cp\u003eThe purpose of this study was to identify an optimal set of MIMU-based features able to distinguish between healthy participants and patients with stroke. This approach aims to characterize the gait patterns of people after a stroke event by identifying the features that most effectively capture deviations from physiological conditions. The methodological approach includes different key strategies to overcome some of the limitations highlighted in a recent review on the topic and to enhance the validity and generalizability of the results \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. First, the sample size was increased to limit classification performance overestimation and improve results generalizability \u003csup\u003e\u003cspan additionalcitationids=\"CR39\" citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e. Wearable sensors, specifically MIMUs, were used to propose a model suitable for clinical practice. An instrumental setup was selected that balances the minimum number of required devices with the number of measurable useful features. Only clinically meaningful features were extracted, and a feature selection technique was chosen to reduce the feature space without modifying it, thereby maintaining the clinical interpretability of the starting dataset. The dataset was split into training, validation, and test sets to perform algorithm tuning, feature selection, and to test their discriminant ability. The entire procedure was repeated while randomly changing the composition of the datasets to obtain more generalizable results. Different supervised classification algorithms were used to address differences and commonalities in the selected features according to the classifier used. Finally, an unsupervised clustering technique was employed to verify whether the selected features contain discriminating information. Results showed that the method proposed here can be used to select features that highlight differences between healthy and pathological locomotion (when using both supervised classification and unsupervised clustering), thereby identifying the gait domains that most predominantly characterize the pathology.\u003c/p\u003e\u003cp\u003eThe set of 80 features analyzed in the present study has been selected based on the scientific literature on the topic \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Specifically, features that described spatiotemporal, symmetry, variability, stability, and smoothness domains of gait were considered. Features were chosen to characterize both general aspects of gait (i.e., those derived from shanks, and the lower back, LB, MIMUs) and the quality of movement of the upper body (i.e., those derived from the sternum, ST, and the forehead, FH, MIMUs). Features selection procedures involved both statistical and machine learning approaches. The statistical approach reduced the initial number of features from 80 to 20, which were then further used for the machine learning feature selection approach, based on three different classification algorithms: KNN, SVM, and TREE. When using KNN, SVM, and TREE different results were obtained in terms of both classification performance and of retained features after features selection.\u003c/p\u003e\u003cp\u003eSpecifically, when looking at the classification accuracy, SVM performed better in both the training and the test sets. The same result was found by Trabassi and colleagues \u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e when classifying patients with Parkinson\u0026rsquo;s disease and healthy people using a similar approach with respect to this study. These results suggest the potential effectiveness of SVM in detecting gait deviations in neurological patients. Concerning the other performance indexes obtained from the test sets, SVM showed the highest values except for classification precision, which was higher when using the KNN. The classification performance achieved in this study is slightly lower compared to those obtained in previous studies on patients with stroke \u003csup\u003e\u003cspan additionalcitationids=\"CR17 CR18\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e,\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e but there are several methodological differences to consider. First, previous studies often had significantly smaller sample sizes (from a minimum of 15 to a maximum of 58 participants), which can lead to overestimation of classification performance due to overfitting and random effects \u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e. Moreover, the considered studies used multiple data from the same subject rather than using a single representative data point for each participant (like the median value over several gait cycles and trials in this study). When using multiple data of the same participant, some may appear in both training and testing sets, thus increasing the risk of overestimating classification performance \u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e. In some cases, this issue was considered and avoided \u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Differences in classification performance can also be attributed to the use of different features related to various gait domains, such as joint kinematics \u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, as well as demographic differences among study groups \u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Additionally, variations in data processing, such as focusing solely on the affected side of pathological participants \u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e, may have led to different results.\u003c/p\u003e\u003cp\u003eWhen looking at the features selected over the ten runs, the three analyzed algorithms performed differently. Generally, KNN tended to exclude more features, followed by SVM and TREE (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Considering all the feature sets used as input for the feature selection and the ten runs, the maximum possible number of feature occurrences was 200 (i.e., 20 features over ten runs). The overall feature occurrences were 61, 79, and 112 when using KNN, SVM, and TREE, respectively. Other authors have tried a similar approach: Altilio and colleagues \u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e tested nine different algorithms with all possible combinations of the selected features, counting the occurrences of each feature to estimate its relevance. Similar to the findings of this study, the set of retained features changed depending on the algorithm used.\u003c/p\u003e\u003cp\u003eThis study highlights key methodological challenges when using machine learning for gait classification. Notably, classification performance varies when the same algorithm is applied to different datasets (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This variation is particularly significant, given that the datasets across the ten runs were not entirely distinct, demonstrating that even minor changes in data can lead to different results. Therefore, the issue of limited representativeness due to small sample sizes in previous studies becomes pertinent. In contrast, this study enrolled more participants than those in earlier research, aligning with the recommendations by Jiao and colleagues \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. Second, the optimal set of features to be used to discriminate between two different populations also depends on the algorithm used. Consequently, when using a single algorithm, the selected features are those that maximize classification performance for that specific algorithm in a defined dataset rather than a set of features capable of discriminating between two populations, which is often the actual objective.\u003c/p\u003e\u003cp\u003eThe iterative feature selection technique used in this study, combined with the application of different machine learning algorithms tuned with a 5-fold cross-validation, enhances the generalizability of the results. This comprehensive approach ensures that the classification performance is not only optimized, but also reliable and applicable across different datasets.\u003c/p\u003e\u003cp\u003eFeatures that were more frequently chosen after the SBS were tested using a non-supervised clustering technique. Non-supervised clustering was chosen to evaluate the discriminant information within the selected features, independent of any machine learning classification algorithm. Such clustering was performed on both the test sets used for the ten runs of the supervised classification and a separate test set comprising data not used for any analysis\u003c/p\u003e\u003cp\u003eA k-medoids clustering algorithm was chosen using four different distance metrics, with a medoids initialization algorithm to ensure repeatability of the results. The selected features were ranked by their number of occurrences and incrementally added to obtain information on the relevance of each feature on the clustering output as well as the performance of all the selected features. When looking at the clustering output using the whole set of features on the ten test sets, results are promising, laying in the range identified in the review by Jiao and colleagues \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e when using different supervised algorithms (i.e., 80\u0026ndash;100%). Among the distance metrics used, the cosine distance achieved the best performance for all the classification performance indexes with the only exception of precision (see Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). However, in the context of identifying gait features that characterize stroke patients, precision (i.e., the ability to correctly classify healthy participants) may be considered less critical. The best performance of the cosine distance compared to others may be attributed to their differing nature. Cosine distance in fact evaluates the orientation of n-dimensional vectors formed by the selected features measuring the angle between them, whereas the other metrics focus, albeit slightly differently, on the absolute distance between these vectors. In the first case, the magnitude of the vectors does not influence the results while in the other cases it gains importance. It appears that, at least in the used dataset, the cosine distance can measure more accurately differences between healthy participants and stroke patients. Moreover, the cosine distance exhibited the smallest standard deviation across all performance indexes over the ten test sets, suggesting it is not only the most appropriate distance metric, but also the most robust and consistent.\u003c/p\u003e\u003cp\u003eThe best performance (90.2% \u0026plusmn; 5.5%, 87% \u0026plusmn; 5.8%, 89.2% \u0026plusmn; 6% for accuracy, recall, and F1 score, respectively) on the test sets was obtained when only three features were used, namely the improved harmonic ratio on the medio-lateral direction, the coefficient of variation of the stance phase, and the stride speed. Thus, a further reduction of the feature used to discriminate between healthy participants and stroke patients enhances the classification results. When examining the results obtained on the test set of unseen data using all features and the cosine distance, the outcomes appear consistent with those from the ten runs. Conversely, results obtained with other distance metrics show higher performance index values. Nevertheless, this is not surprising given the large standard deviation observed over the ten runs with these metrics, indicating a high variability. It is likely that in some of the ten test sets, the clustering performance aligned with those from the unseen data.\u003c/p\u003e\u003cp\u003eNotably, the highest accuracy, precision and F1 score when using the cosine distance were obtained using six features. Those included the duration of the stance phase and the movement smoothness of the head in the anterior-posterior and medio-lateral directions. However, the differences in results between using three versus six features were modest (+\u0026thinsp;3%, +\u0026thinsp;6%, and +\u0026thinsp;3% for accuracy, precision, and f1 score, respectively). Recall values, in contrast, remained unchanged after the third feature was added. These findings demonstrate that even with a non-supervised clustering technique, discrepancies across datasets can be addressed. The results presented here may be summarized as follows:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eAmong all 80 features considered in this work, nine features (see Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) seem to be sufficient to discriminate between healthy participants and participants with stroke with fair-to-good classification results.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eWhen using a non-supervised clustering technique based on the distances between data points, the cosine distance metric seems to be the most appropriate and reliable.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eGenerally, to maximize the classification results, three features are sufficient; nevertheless, other features may carry discriminant information and should be considered.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe procedure applied in this study has yielded substantial insights into the most discriminative gait-related features that differentiate between healthy individuals and stroke patients. By evaluating a comprehensive array of gait-related features on different dataset, our model identified the domains of spatiotemporal, symmetry, variability, and upper body movement control as effective in discriminating between the two groups. Altered gait spatiotemporal features, as well as heighten asymmetry and variability in such features, have been frequently reported in the scientific literature \u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e,\u003cspan additionalcitationids=\"CR46 CR47 CR48\" citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e. These features reflect the core motor deficits induced by a stroke event, including hemiparesis and altered neuromuscular control, which directly affect the timing and rhythm of walking. Some authors have reported the persistence of such alterations even after clinical treatments, highlighting their impact after a stroke.\u003c/p\u003e\u003cp\u003eNotably, none of the selected features represented asymmetry in spatiotemporal parameters. While gait asymmetry is a widely recognized alteration after stroke, it is often characterized by varying patterns \u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e,\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e. These diverse trends in spatiotemporal feature asymmetry may have reduced their discriminative value. However, an asymmetry feature based on the frequency content of the acceleration measured at the lower back in the mediolateral direction was still included in the selected features. Generally, asymmetry at the trunk level has been widely analyzed in stroke patients and has been reported to significantly differ compared to healthy \u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e controls. This is true not only when using a single MIMU at the lumbar level, but as well as multiple sensors are used and when different aspects of trunk movement are analyzed \u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Accordingly, some of the features considered in the present work captures aspects of trunk movement stability and symmetry.\u003c/p\u003e\u003cp\u003eFinally, trunk smoothness (at different levels and in different directions) emerged as a discriminative gait domain. To the authors\u0026rsquo; knowledge, only two prior studies have directly measured trunk smoothness in stroke patients, albeit using different \u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e equipment or metrics \u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e. Notably, trunk and, in particular, head movement smoothness, were identified as key discriminative features. While traditionally less emphasized in gait analysis, head movement smoothness reflects the integration of postural control and balance during walking. The reduced smoothness observed in post-stroke patients may reflect impaired sensorimotor integration and balance, which are critical for safe and efficient ambulation. This finding opens new avenues for incorporating head movement analysis into routine gait assessments, thereby providing a more comprehensive understanding of post-stroke mobility impairments.\u003c/p\u003e"},{"header":"5. CONCLUSION","content":"\u003cp\u003eThe results presented must be interpreted in light of the following limitations. It should be noted that different machine learning algorithms may have been employed. Notwithstanding, the algorithms employed have demonstrated efficacy even when utilizing modest datasets, and they are not contingent on any assumptions regarding the analyzed dataset. Another potential limitation is the inclusion of walking speed as a feature in the analysis. In some cases \u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e, walking speed has been employed as a means of matching patients and healthy controls, with the objective of limiting the influence of the latter on the estimation of the former's features. Nevertheless, this procedure results in the exclusion of some patients under analysis if a suitable control group is not identified. This may result in the exclusion of patients with specific gait deviations, as well as a reduction in the sample size. Furthermore, the correlation analysis conducted prior to feature selection should have limited the impact of walking speed by eliminating features that are highly correlated with it. The study population included patients with acute, sub-acute and chronic stroke. While this approach could have resulted in poorer supervised and unsupervised classification performances, it also enabled the identification of clinically meaningful features irrespective of the time since the stroke event.\u003c/p\u003e\u003cp\u003eIn conclusion, the present study demonstrates the potential of machine learning in identifying key features of post-stroke gait dysfunctions. The highlighted features\u0026mdash;spatiotemporal features, gait variability and trunk movement symmetry, stability, and smoothness\u0026mdash;not only enhance our understanding of post-stroke gait dysfunctions but also provide practical markers for clinical assessment and rehabilitation. It would be beneficial for future work to focus on refining machine learning models to support real-time gait analysis and expanding their application in diverse clinical settings, ensuring their integration into personalized and effective rehabilitation strategies for patients with stroke and neurological conditions.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAUTHORS CONTRIBUTIONS STATEMENT\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBrasiliano Paolo:\u003c/strong\u003e Conceptualization, Methodology, Software, Validation, Formal Analysis, Data Curation, Writing - Original Draft, Writing - Review \u0026amp; Editing,\u0026nbsp;Visualization.\u0026nbsp;\u003cstrong\u003eOrejel-Bustos Amaranta:\u003c/strong\u003e Investigation, Data Curation.\u0026nbsp;\u003cstrong\u003eBelluscio Valeria:\u003c/strong\u003e Investigation, Writing - Review \u0026amp; Editing.\u0026nbsp;\u003cstrong\u003eCereatti Andrea:\u0026nbsp;\u003c/strong\u003eSoftware\u0026nbsp;\u003cstrong\u003eDella Croce Ugo:\u0026nbsp;\u003c/strong\u003eSoftware\u0026nbsp;\u003cstrong\u003eTrabassi Dante:\u0026nbsp;\u003c/strong\u003eMethodology,\u0026nbsp;Writing - Review \u0026amp; Editing.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eSalis Francesca:\u003c/strong\u003e Software\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eTramontano Marco:\u0026nbsp;\u003c/strong\u003eResources,\u0026nbsp;Writing - Review \u0026amp; Editing, Funding Acquisition.\u0026nbsp;\u003cstrong\u003eBuzzi Maria Gabriella:\u003c/strong\u003e Resources,\u0026nbsp;\u003cstrong\u003eVannozzi Giuseppe:\u003c/strong\u003e Writing - Review \u0026amp; Editing\u003cstrong\u003e\u0026nbsp;Bergamini Elena:\u0026nbsp;\u003c/strong\u003eSupervision, Project administration,\u0026nbsp;Writing - Review \u0026amp; Editing,\u0026nbsp;Funding Acquisition\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003cem\u003e:\u0026nbsp;\u003c/em\u003eThis study was supported by the Italian Ministry of Health (GR-2019-12370757).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e: The data associated with this paper are not publicly available but are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests:\u003c/strong\u003e All the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of generative AI in scientific writing:\u0026nbsp;\u003c/strong\u003eDuring the preparation of this work the authors used DeepL in order to check and correct English grammar mistakes.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eSacco, R. L. \u003cem\u003eet al.\u003c/em\u003e An Updated Definition of Stroke for the 21st Century. \u003cem\u003eStroke\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 2064\u0026ndash;2089 (2013).\u003c/li\u003e\n\u003cli\u003eMartin, S. S. \u003cem\u003eet al.\u003c/em\u003e 2024 Heart Disease and Stroke Statistics: A Report of US and Global Data From the American Heart Association. \u003cem\u003eCirculation\u003c/em\u003e (2024) doi:10.1161/CIR.0000000000001209.\u003c/li\u003e\n\u003cli\u003eWang, W. \u003cem\u003eet al.\u003c/em\u003e Prevalence, Incidence, and Mortality of Stroke in China. \u003cem\u003eCirculation\u003c/em\u003e (2017) doi:10.1161/CIRCULATIONAHA.116.025250.\u003c/li\u003e\n\u003cli\u003eKim, Y. W. Update on Stroke Rehabilitation in Motor Impairment. \u003cem\u003eBrain Neurorehabil\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, e12 (2022).\u003c/li\u003e\n\u003cli\u003eSelves, C., Stoquart, G. \u0026amp; Lejeune, T. Gait rehabilitation after stroke: review of the evidence of predictors, clinical outcomes and timing for interventions. \u003cem\u003eActa Neurol Belg\u003c/em\u003e \u003cstrong\u003e120\u003c/strong\u003e, 783\u0026ndash;790 (2020).\u003c/li\u003e\n\u003cli\u003eKinoshita, S., Abo, M., Okamoto, T. \u0026amp; Tanaka, N. Utility of the Revised Version of the Ability for Basic Movement Scale in Predicting Ambulation during Rehabilitation in Poststroke Patients. \u003cem\u003eJournal of Stroke and Cerebrovascular Diseases\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 1663\u0026ndash;1669 (2017).\u003c/li\u003e\n\u003cli\u003eHutabarat, Y., Owaki, D. \u0026amp; Hayashibe, M. Recent Advances in Quantitative Gait Analysis Using Wearable Sensors: A Review. \u003cem\u003eIEEE Sensors Journal\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 26470\u0026ndash;26487 (2021).\u003c/li\u003e\n\u003cli\u003eMohan, D. M. \u003cem\u003eet al.\u003c/em\u003e Assessment Methods of Post-stroke Gait: A Scoping Review of Technology-Driven Approaches to Gait Characterization and Analysis. \u003cem\u003eFrontiers in Neurology\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, (2021).\u003c/li\u003e\n\u003cli\u003eKim, G. J., Parnandi, A., Eva, S. \u0026amp; Schambra, H. The use of wearable sensors to assess and treat the upper extremity after stroke: a scoping review. \u003cem\u003eDisabil Rehabil\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 6119\u0026ndash;6138 (2022).\u003c/li\u003e\n\u003cli\u003ePicerno, P. \u003cem\u003eet al.\u003c/em\u003e Wearable inertial sensors for human movement analysis: a five-year update. \u003cem\u003eExpert Rev Med Devices\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 79\u0026ndash;94 (2021).\u003c/li\u003e\n\u003cli\u003eJiao, Y., Hart, R., Reading, S. \u0026amp; Zhang, Y. Systematic review of automatic post-stroke gait classification systems. \u003cem\u003eGait \u0026amp; Posture\u003c/em\u003e \u003cstrong\u003e109\u003c/strong\u003e, 259\u0026ndash;270 (2024).\u003c/li\u003e\n\u003cli\u003eBoukhennoufa, I., Zhai, X., Utti, V., Jackson, J. \u0026amp; McDonald-Maier, K. D. Wearable sensors and machine learning in post-stroke rehabilitation assessment: A systematic review. \u003cem\u003eBiomedical Signal Processing and Control\u003c/em\u003e \u003cstrong\u003e71\u003c/strong\u003e, 103197 (2022).\u003c/li\u003e\n\u003cli\u003eAltilio, R., Paoloni, M. \u0026amp; Panella, M. Selection of clinical features for pattern recognition applied to gait analysis. \u003cem\u003eMed Biol Eng Comput\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 685\u0026ndash;695 (2017).\u003c/li\u003e\n\u003cli\u003eSung, J. \u003cem\u003eet al.\u003c/em\u003e Classification of Stroke Severity Using Clinically Relevant Symmetric Gait Features Based on Recursive Feature Elimination With Cross-Validation. \u003cem\u003eIEEE Access\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 119437\u0026ndash;119447 (2022).\u003c/li\u003e\n\u003cli\u003eAltilio, R., Liparulo, L., Proietti, A., Paoloni, M. \u0026amp; Panella, M. A genetic algorithm for feature selection in gait analysis. in \u003cem\u003e2016 IEEE Congress on Evolutionary Computation (CEC)\u003c/em\u003e 4584\u0026ndash;4591 (2016). doi:10.1109/CEC.2016.7744374.\u003c/li\u003e\n\u003cli\u003eLee, J., Park, S. \u0026amp; Shin, H. Detection of Hemiplegic Walking Using a Wearable Inertia Sensing Device. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1736 (2018).\u003c/li\u003e\n\u003cli\u003eHsu, W.-C. \u003cem\u003eet al.\u003c/em\u003e Can Trunk Acceleration Differentiate Stroke Patient Gait Patterns Using Time- and Frequency-Domain Features? \u003cem\u003eApplied Sciences\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1541 (2021).\u003c/li\u003e\n\u003cli\u003eMannini, A., Trojaniello, D., Cereatti, A. \u0026amp; Sabatini, A. M. A Machine Learning Framework for Gait Classification Using Inertial Sensors: Application to Elderly, Post-Stroke and Huntington\u0026rsquo;s Disease Patients. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 134 (2016).\u003c/li\u003e\n\u003cli\u003eScheffer, C. \u0026amp; Cloete, T. Inertial motion capture in conjunction with an artificial neural network can differentiate the gait patterns of hemiparetic stroke patients compared with able-bodied counterparts. \u003cem\u003eComput Methods Biomech Biomed Engin\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 285\u0026ndash;294 (2012).\u003c/li\u003e\n\u003cli\u003eWang, L., Sun, Y., Li, Q., Liu, T. \u0026amp; Yi, J. Two Shank-Mounted IMUs-Based Gait Analysis and Classification for Neurological Disease Patients. \u003cem\u003eIEEE Robotics and Automation Letters\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1970\u0026ndash;1976 (2020).\u003c/li\u003e\n\u003cli\u003eAltilio, R., Rossetti, A., Fang, Q., Gu, X. \u0026amp; Panella, M. A comparison of machine learning classifiers for smartphone-based gait analysis. \u003cem\u003eMed Biol Eng Comput\u003c/em\u003e \u003cstrong\u003e59\u003c/strong\u003e, 535\u0026ndash;546 (2021).\u003c/li\u003e\n\u003cli\u003eIosa, M. \u003cem\u003eet al.\u003c/em\u003e Artificial Neural Network Analyzing Wearable Device Gait Data for Identifying Patients With Stroke Unable to Return to Work. \u003cem\u003eFront Neurol\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 650542 (2021).\u003c/li\u003e\n\u003cli\u003eWang, F.-C. \u003cem\u003eet al.\u003c/em\u003e Detection and Classification of Stroke Gaits by Deep Neural Networks Employing Inertial Measurement Units. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 1864 (2021).\u003c/li\u003e\n\u003cli\u003eMathur, D. \u0026amp; Bhatia, D. Gait classification of stroke survivors - An analytical study. \u003cem\u003eJournal of Interdisciplinary Mathematics\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 163\u0026ndash;181 (2022).\u003c/li\u003e\n\u003cli\u003eHolden, M. K., Gill, K. M., Magliozzi, M. R., Nathan, J. \u0026amp; Piehl-Baker, L. Clinical gait assessment in the neurologically impaired. Reliability and meaningfulness. \u003cem\u003ePhys Ther\u003c/em\u003e \u003cstrong\u003e64\u003c/strong\u003e, 35\u0026ndash;40 (1984).\u003c/li\u003e\n\u003cli\u003eFolstein, M. F., Folstein, S. E. \u0026amp; McHugh, P. R. \u0026lsquo;Mini-mental state\u0026rsquo;. A practical method for grading the cognitive state of patients for the clinician. \u003cem\u003eJ Psychiatr Res\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 189\u0026ndash;198 (1975).\u003c/li\u003e\n\u003cli\u003eBergamini, E. \u003cem\u003eet al.\u003c/em\u003e Estimating orientation using magnetic and inertial sensors and different sensor fusion approaches: accuracy assessment in manual and locomotion tasks. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 18625\u0026ndash;18649 (2014).\u003c/li\u003e\n\u003cli\u003eKavanagh, J. J. \u0026amp; Menz, H. B. Accelerometry: A technique for quantifying movement patterns during walking. \u003cem\u003eGait \u0026amp; Posture\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 1\u0026ndash;15 (2008).\u003c/li\u003e\n\u003cli\u003eMadgwick, S. O. H., Harrison, A. J. L. \u0026amp; Vaidyanathan, A. Estimation of IMU and MARG orientation using a gradient descent algorithm. \u003cem\u003eIEEE Int Conf Rehabil Robot\u003c/em\u003e \u003cstrong\u003e2011\u003c/strong\u003e, 5975346 (2011).\u003c/li\u003e\n\u003cli\u003eBertoli, M. \u003cem\u003eet al.\u003c/em\u003e Estimation of spatio-temporal parameters of gait from magneto-inertial measurement units: multicenter validation among Parkinson, mildly cognitively impaired and healthy older adults. \u003cem\u003eBioMed Eng OnLine\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 58 (2018).\u003c/li\u003e\n\u003cli\u003eMenz, H. B., Lord, S. R. \u0026amp; Fitzpatrick, R. C. Acceleration patterns of the head and pelvis when walking on level and irregular surfaces. \u003cem\u003eGait \u0026amp; Posture\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 35\u0026ndash;46 (2003).\u003c/li\u003e\n\u003cli\u003eBuckley, C., Galna, B., Rochester, L. \u0026amp; Mazz\u0026agrave;, C. Attenuation of Upper Body Accelerations during Gait: Piloting an Innovative Assessment Tool for Parkinson\u0026rsquo;s Disease. \u003cem\u003eBiomed Res Int\u003c/em\u003e \u003cstrong\u003e2015\u003c/strong\u003e, 865873 (2015).\u003c/li\u003e\n\u003cli\u003ePasciuto, I., Bergamini, E., Iosa, M., Vannozzi, G. \u0026amp; Cappozzo, A. Overcoming the limitations of the Harmonic Ratio for the reliable assessment of gait symmetry. \u003cem\u003eJ Biomech\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 84\u0026ndash;89 (2017).\u003c/li\u003e\n\u003cli\u003eMelendez-Calderon, A., Shirota, C. \u0026amp; Balasubramanian, S. Estimating Movement Smoothness From Inertial Measurement Units. \u003cem\u003eFront Bioeng Biotechnol\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 558771 (2020).\u003c/li\u003e\n\u003cli\u003eZifchock, R. A., Davis, I., Higginson, J. \u0026amp; Royer, T. The symmetry angle: a novel, robust method of quantifying asymmetry. \u003cem\u003eGait Posture\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 622\u0026ndash;627 (2008).\u003c/li\u003e\n\u003cli\u003eTrabassi, D. \u003cem\u003eet al.\u003c/em\u003e Machine Learning Approach to Support the Detection of Parkinson\u0026rsquo;s Disease in IMU-Based Gait Analysis. \u003cem\u003eSensors\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 3700 (2022).\u003c/li\u003e\n\u003cli\u003ePavan, K. K., Rao, A. A., Rao, A. V. D. \u0026amp; Sridhar, G. R. Single Pass Seed Selection Algorithm for k-Means. \u003cem\u003eJCS\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 60\u0026ndash;66 (2010).\u003c/li\u003e\n\u003cli\u003eMcCrum, C., van Beek, J., Schumacher, C., Janssen, S. \u0026amp; Van Hooren, B. Sample size justifications in Gait \u0026amp; Posture. \u003cem\u003eGait \u0026amp; Posture\u003c/em\u003e \u003cstrong\u003e92\u003c/strong\u003e, 333\u0026ndash;337 (2022).\u003c/li\u003e\n\u003cli\u003eLakens, D. Sample Size Justification. \u003cem\u003eCollabra: Psychology\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 33267 (2022).\u003c/li\u003e\n\u003cli\u003eTrabassi, D. \u003cem\u003eet al.\u003c/em\u003e Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 3613 (2024).\u003c/li\u003e\n\u003cli\u003eTramontano, M. \u003cem\u003eet al.\u003c/em\u003e Dynamic Stability, Symmetry, and Smoothness of Gait in People with Neurological Health Conditions. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 2451 (2024).\u003c/li\u003e\n\u003cli\u003eBergamini, E. \u003cem\u003eet al.\u003c/em\u003e Multi-sensor assessment of dynamic balance during gait in patients with subacute stroke. \u003cem\u003eJournal of Biomechanics\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, 208\u0026ndash;215 (2017).\u003c/li\u003e\n\u003cli\u003eVabalas, A., Gowen, E., Poliakoff, E. \u0026amp; Casson, A. J. Machine learning algorithm validation with a limited sample size. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, e0224365 (2019).\u003c/li\u003e\n\u003cli\u003eChaibub Neto, E. \u003cem\u003eet al.\u003c/em\u003e Detecting the impact of subject characteristics on machine learning-based diagnostic applications. \u003cem\u003enpj Digit. Med.\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 1\u0026ndash;6 (2019).\u003c/li\u003e\n\u003cli\u003eLittle, V. L., Perry, L. A., Mercado, M. W., Kautz, S. A. \u0026amp; Patten, C. Gait asymmetry pattern following stroke determines acute response to locomotor task. \u003cem\u003eGait Posture\u003c/em\u003e \u003cstrong\u003e77\u003c/strong\u003e, 300\u0026ndash;307 (2020).\u003c/li\u003e\n\u003cli\u003ePatterson, K. K., Gage, W. H., Brooks, D., Black, S. E. \u0026amp; McIlroy, W. E. Evaluation of gait symmetry after stroke: a comparison of current methods and recommendations for standardization. \u003cem\u003eGait Posture\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 241\u0026ndash;246 (2010).\u003c/li\u003e\n\u003cli\u003eBalasubramanian, C. K., Neptune, R. R. \u0026amp; Kautz, S. A. Variability in spatiotemporal step characteristics and its relationship to walking performance post-stroke. \u003cem\u003eGait Posture\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 408\u0026ndash;414 (2009).\u003c/li\u003e\n\u003cli\u003eKim, C. M. \u0026amp; Eng, J. J. Symmetry in vertical ground reaction force is accompanied by symmetry in temporal but not distance variables of gait in persons with stroke. \u003cem\u003eGait Posture\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 23\u0026ndash;28 (2003).\u003c/li\u003e\n\u003cli\u003eBowden, M. G., Balasubramanian, C. K., Behrman, A. L. \u0026amp; Kautz, S. A. Validation of a speed-based classification system using quantitative measures of walking performance poststroke. \u003cem\u003eNeurorehabil Neural Repair\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 672\u0026ndash;675 (2008).\u003c/li\u003e\n\u003cli\u003eGermanotta, M., Iacovelli, C. \u0026amp; Aprile, I. Evaluation of Gait Smoothness in Patients with Stroke Undergoing Rehabilitation: Comparison between Two Metrics. \u003cem\u003eInt J Environ Res Public Health\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 13440 (2022).\u003c/li\u003e\n\u003cli\u003eGarcia, F. do V. \u003cem\u003eet al.\u003c/em\u003e Movement smoothness in chronic post-stroke individuals walking in an outdoor environment\u0026mdash;A cross-sectional study using IMU sensors. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, e0250100 (2021).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7478886/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7478886/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eStroke is a major cause of motor disability, degrading walking and quality of life. Wearable gait analysis with magneto-inertial measurement units (MIMUs) can quantify post-stroke impairments. We used machine learning to identify discriminative gait features in stroke, coupling supervised feature selection with unsupervised clustering to improve interpretability and generalizability.\u003c/p\u003e\n\u003cp\u003eEighty-five stroke patients and 97 healthy controls completed 10-Meter Walk Tests while wearing five MIMUs. Feature selection spanned spatiotemporal, symmetry, stability, and smoothness metrics. K-nearest neighbors (KNN), support vector machines (SVM), and decision trees (TREE) were trained, validated, and tested iteratively across data splits; clustering then verified discriminative ability.\u003c/p\u003e\n\u003cp\u003eSequential backward feature selection retained nine features, yielding accuracies (healthy vs patient) of 94.1% (KNN), 96.7% (SVM), and 89.1% (TREE). SVM generalized best. Unsupervised k-medoids with cosine distance confirmed discrimination, reaching 90% accuracy with only three features: stride speed, stance-phase coefficient of variation, and medio-lateral harmonic ratio.\u003c/p\u003e\n\u003cp\u003eResults indicate that gait variability, trunk smoothness, and upper-body stability robustly characterize post-stroke dysfunctions. Notably, head-movement smoothness emerged as a novel, discriminative feature.\u003c/p\u003e\n\u003cp\u003eThis integrated framework shows how wearable sensors plus machine learning can support clinical gait analysis and rehabilitation planning. Future work should enable real-time deployment and broaden datasets to cover more clinical scenarios.\u003c/p\u003e","manuscriptTitle":"Identifying Key Gait Features in Stroke Patients: A Machine Learning Approach with Supervised and Unsupervised Validation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-22 10:13:43","doi":"10.21203/rs.3.rs-7478886/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-12-15T12:54:17+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-13T01:38:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"316426407996893596398283501597675916564","date":"2025-11-18T06:17:24+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-21T13:43:23+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"83554706381722098486006554303411515867","date":"2025-09-12T01:04:41+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-12T00:00:03+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-02T04:59:29+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-30T09:18:40+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-29T09:58:30+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-08-28T09:46:01+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"57042790-4e52-4610-a1f8-8d1a8dc19f36","owner":[],"postedDate":"September 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":54901513,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":54901514,"name":"Physical sciences/Engineering"},{"id":54901515,"name":"Health sciences/Health care"},{"id":54901516,"name":"Physical sciences/Mathematics and computing"},{"id":54901517,"name":"Health sciences/Neurology"},{"id":54901518,"name":"Biological sciences/Neuroscience"}],"tags":[],"updatedAt":"2026-03-16T16:05:02+00:00","versionOfRecord":{"articleIdentity":"rs-7478886","link":"https://doi.org/10.1038/s41598-026-43666-7","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-03-09 15:57:53","publishedOnDateReadable":"March 9th, 2026"},"versionCreatedAt":"2025-09-22 10:13:43","video":"","vorDoi":"10.1038/s41598-026-43666-7","vorDoiUrl":"https://doi.org/10.1038/s41598-026-43666-7","workflowStages":[]},"version":"v1","identity":"rs-7478886","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7478886","identity":"rs-7478886","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0