Enhancing Parkinson’s Disease Diagnosis Using Machine Learning: A Comparative Study

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 87,403 characters · extracted from preprint-html · click to expand
Enhancing Parkinson’s Disease Diagnosis Using Machine Learning: A Comparative Study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Enhancing Parkinson’s Disease Diagnosis Using Machine Learning: A Comparative Study Sharang D, Aditya Vyavahare, Anuja Bokhare This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6366739/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Parkinson's disease is the second most common neurological illness in the world, exactly after Alzheimer's disease. It is the long-term degenerative condition of a human being's central nervous system that primarily affects people over the age of sixty. Parkinson's disease is one of the many neurological conditions that progress over time. Problems with movement are the first symptoms. Initial indicators of the disease could also include vocal dysfunction. Humans diagnosed with Parkinson's have vocal abnormalities that impair their voice's loudness and cause difficulty in pronunciation. As a result, Parkinson's disease can be diagnosed using vocal measures. People may notice issues with common movements, tremors, stiffness in the limbs or trunk, or even decreased balance as neurons (nerve cells) in areas of the brain are weakened, get injured, or die. Patients may struggle with walking, talking, or accomplishing other simple tasks as these symptoms become more noticeable. However, like many other diseases and disorders, these symptoms also appear in other conditions. Thus, it is not necessary for everyone with one or more of these symptoms to have Parkinson's disease. This paper intends to implement 4 base machine learning classifiers and 4 proposed ensemble classifiers to compare and select the best possible model through a dataset of 23 attributes and around 177 records. It is concluded that the ensembles perform far better and are the best interchange between the XG Boost and Random Forest classifier. Neurons Alzheimer's disease Parkinson's disease Parkinson's patient Machine Learning Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction Parkinson's is not an easy-to-diagnose disease. [ 2 ] Accurate diagnosis in the early stages is an exhaustive and tedious task. Even experienced medical professionals and researchers have faced a huge barrier to precise and efficient diagnosis. This paper addresses the said issue by building a prediction system based on machine learning models that bring out accurate results that can be utilized promptly. The objective is to run the dataset through multiple models and select the best-performing one. This is meant to bring increased efficiency and diagnostic accuracy. Machine learning classifiers were used to compare patient voice attributes from the dataset. 70% of the dataset was utilized for training and the remaining 30% for testing. Overall, Machine Learning techniques like Support Vector Machines, Logistic Regression, and Naive Bayes, along with Decision Trees, were found to be the commonly utilized techniques in the creation of Parkinsons detection systems. This study proposes four more such techniques Random Forest Gradient Descent XG Boost Ada Boost These 8 models were compared based on performance. The said model performance was evaluated based on additional proposed Ensemble Learning Classifiers. It used indicators like classification precision, accuracy, sensitivity/recall, and F1-Score. It debated which one could be better used to predict the correct outcome. 2 Previous study The authors state the criticalness of early Parkinson's disease diagnosis. It differentiates dopamine production between the brain of a Parkinson's patient and a normal brain. They showcased the importance of data mining techniques for detection. The methods, namely, Naive Bayes, Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Decision Tree are theoretically explained in this study. The study analyses the performance of four different classifiers mentioned above, on a sample of 8 patients' voice input. The paper is an elaborate study of how clinical decision support systems run based on regression trees, support vector machines and sensitivity analysis [ 3 ]. Another work predicts Parkinson's disease by stating the importance of vocal signal analysis. Acoustic instruments are put to use through the development of automated classifications that root from Naïve Bayes and K-Nearest-Neighbours. People from various areas of society are brought together, and their speech factors are investigated in this article to predict Parkinson's disease accurately. To recognise the same in the voice dataset, researchers employed a Multi-Layer Perceptron and a Logistic Regression Framework [ 4 ]. [ 5 ] Like the other study [ 3 ], these researchers also focused on the secretions of dopamine. They used multiple acoustic devices to collect speech parameters from 50 Parkinson's patients and 50 healthy people. They employed the K-Gold cross-validation technique for testing and claimed that it managed to deliver around 85% accuracy. However, this paper failed to provide an experimental explanation for the outcome. But, because so many patients were treated successfully, its findings can be considered optimistic. The article discusses a study that used a voice-based dataset to diagnose Parkinson's Disease. It consisted of audio recordings of 31 Parkinson's patients. The source of the data was the “UCI Machine Learning repository from the Centre for Machine Learning and Intelligent Systems”, and it was analysed using different ML algorithms. Various models were compared to derive the best results. The following conclusions were drawn based on the findings. The prediction accuracy of algorithms like Logistic Regression, Naive Bayes, and Decision Trees is around 70%. The accuracy of the Support Vector Machine was around 85% [ 6 ]. The author introduced a hybrid smart system that uses noise-eliminating methods for data pre-processing, clustering methods to help determine class labels and some prediction methods for predicting the development of the illness. To determine what dimensions or traits are important, PCA is used. Later on, the study implements support vector regression algorithms and neural fuzzy interface systems. This hybrid intelligence system greatly increased the previous accuracy of prediction [ 7 ]. A brain MRI dataset was used to identify medical image-related biomarkers. In order to understand the progression of the disease and the function of biomarkers in recognising the progressive behaviour, supervised Multi-Layer Classification, such as a Support Vector Machine, has been used [ 8 ] [ 25 ]. In this study, five classification techniques were applied, and their performance was measured. Linear discriminant analysis (LDA), K-nearest neighbour (KNN), Support vector machine (SVM) and Naive Bayes (NB) were employed in this investigation. In their presented study, the SVM with radial basis kernel has been picked as the best classifier and predictor [ 9 ]. The use of data mining in science and medicine can help researchers understand diseases better and devise more effective tactics to combat them, allowing them to leverage resources better. The goal is to apply various data mining approaches, to obtain solid results. The use of a Decision Tree and the creation of a classifier yields an accuracy of 88–94 per cent [ 10 ]. Through this literature, the predictor element of this system forecasted each Parkinson's Disease symptom separately and managed to cover 15 Parkinson's Disease symptoms, depending on the first patient assessment and drugs taken. The prediction accuracy varies from 57.1–77.4% depending on the symptom, with tremor detection having the highest accuracy [ 11 ]. Another paper discusses the rising occurrence of Parkinson's disease in people above the age of 50. By utilizing a dataset of 30 people and 4 machine learning classifiers, namely Support Vector Machine (SVM), Random Forest, K-Nearest Neighbors (KNN) and Logistic Regression, it was concluded that Random Forest is the best classifier for detecting Parkinson's disease, with an accuracy ranging between 90–95 per cent [ 21 ]. One study aimed to account for intra-individual and inter-individual variability by developing a statistical progression model for Parkinson's disease. The dataset comprised of medical records of 423 patients collected through 7 years. The findings were 8 different disease states related to a patient's functional impairment, tremors, neuropsychiatric attributes, and health, which were used to develop a statistical progression model [ 22 ]. With a dataset of 31 patients as the base, chi 2 and Extra Trees Classifier for feature extraction and understanding data dimensionality, the study aimed to improve diagnostic accuracy and interpretations. ETC's robustness and efficiency combined with chi 2 were concluded to be very beneficial and efficient [ 23 ]. This publication aimed to develop a detection system using deep learning and MRI scans of the human brain. It explores a CNN-based approach. In the findings, a custom 3-layer CNN achieved 94–98% accuracy, outperforming SVM, RVM, and other methods. CNN's automatic feature learning surpasses traditional handcrafted features. It effectively identified Parkinson's Disease-related features from MRI data [ 24 ]. 3 Proposed Methodology 3.1. Overview of Proposed Methodology In this paper, we will utilize traditional and new machine learning classifiers to achieve a better outcome classify an individual's health status, and determine if they have Parkinson's or not. The four common classifiers derived from the literature review will be used, along with four new proposed classifiers, and the same dataset will be applied, as shown in Fig. 1 . Using a comparison, we will understand the performance of each classifier, select the best one, and then use it to make further predictions. 3.2. Machine Learning Classifiers in the Proposed Methodology Logistic Regression : This classification [ 6 ] is based on a sigmoid function called the logistic function [ 12 ], which takes a real input and outputs a number between 0 and 1. Decision Tree : The input is split into sub-spaces based on specified functions in the case of a Decision Tree classifier [ 6 ] [ 13 ]. It uses conditional control statements to derive a conclusion [ 14 ]. Support Vector Machine : SVM is a standard Supervised Learning algorithm that is mostly utilised in Machine Learning for Classification tasks [ 8 ] [ 9 ]. SVM uses a hyperplane as a decision boundary. Data points falling on one side of this hyperplane are assigned to one class, while those on the other side belong to the other class. [ 3 ] [ 6 ]. Gaussian Naïve Bayes : [ 3 ] [ 6 ] [ 9 ] The Naive Bayes Classifier is a simple and practical classification method that aids in developing fast machine learning models capable of making quick predictions. It's a probabilistic classifier, making predictions based on an object's probability. Random Forest : [ 15 ] A parallel bagging model, this classifier builds decision trees at random and averages the results, reducing model overfitting. It randomly selects a small subset of the features and uses the most prominent one to split the node. Stochastic Gradient Descent : [ 16 ] SGD, technically speaking, is an optimization strategy that does not refer to a specific class of machine learning models. SGD Classifier is a simple stochastic gradient descent learning technique that supports a variety of classification loss functions and penalties. It is equivalent to a linear SVM and uses an SGD Classifier trained with the hinge loss. In simple words, it's a method of training a model. XG Boost : [ 17 ] [ 18 ] A gradient-boosting framework called Extreme Gradient Boosting is used in this machine-learning approach. It constructs trees sequentially, with each succeeding tree attempting to reduce the faults of the previous tree. Each tree learns from the trees before it, refining the model by fixing past errors. The final model integrates the predictions of many individual trees. Ada Boost : [ 17 ] [ 18 ] Adaptive Boosting is a classification technique that fits weak classifiers and enhances the prediction with each iteration. 4 Dataset and experiment discussion 4.1. About Dataset The dataset [ 19 ] [ 20 ] for this research was developed by Max Little et al. at the University of Oxford in partnership with the National Centre for Voice and Speech and is available at the UCI machine learning repository. Their study employed speech recordings from 31 persons, including at least 23 Parkinson's disease patients (16M and 7F) and 8 healthy people (3M and 5F). Each column for a specific voice and each row in the dataset table correspond to one of the 195 audio recordings from these distinct subjects. Furthermore, the people's age group ranged from 46 to 85, with the median age being 65. The primary goal of this dataset was to differentiate patients from healthy individuals by identifying changes in vocalisation of vowels based on their status, corresponding to column 103, which was set to 0 for healthy and 1 for Parkinson's disease. A total of 195 samples were recorded, with each subject recording an average of 6 vowel phonation for 36 seconds. A computerised speech laboratory was used to record the voice signals. The dataset has over 15 citations and 50 thousand views and has been used for multiple case studies that revolve around machine learning classifiers. It comprises attributes focusing on fundamental frequency, amplitude, noise levels and non-linear dynamics of speech signals. The attributes of this dataset are described and represented in Table 1 , and the distribution of classes, attributes and values is shown in Figs. 2 , 3 and 4 . Table 1 Dataset Specification Sr. No Attributes Description 1. MDVP:Fo(Hz) mean frequency of vocal chord vibrations. 2. MDVP:Fhi(Hz) the highest pitch of a voice signal 3. MDVP:Flo(Hz) the lowest frequency recorded during voice propagation 4. MDVP:Jitter(%) variations in pitch recorded in between vibration cycles in patients' vocal 5. MDVP:Jitter(Abs) 6. MDVP:RAP short term frequency disruptions 7. MDVP:PPQ averages RAP variations over five pitch periods. 8. Jitter:DDP three-point period of pitch instability 9. MDVP:Shimmer amplitude disparities across a vocal cycle. 10. MDVP:Shimmer(dB) 11. Shimmer:APQ3 analyze amplitude quotient variations over different intervals of time. 12. Shimmer:APQ5 13. MDVP:APQ 14. Shimmer:DDA deviation of loudness over 3 voice cycles 15. NHR the noise-to-harmonics ratio 16. HNR harmonics-to-noise ratio 17. RPDE recurrence of vocal patterns 18. DFA how fluctuations of speech evolve signals over time 19. spread1 variations in fundamental frequency 20. spread2 21. PPE 22. D2 the complexity of fold dynamics in a vocal signal 4.2. Experimentation approach and Steps This section elaborates about experiment steps. Parkinson's disease classification implementation steps Step 1: Import the basic libraries to read and visualize the data and the dependencies to create machine-learning models Step 2: Read the data, plot the required graphs, and remove the unnecessary columns. Step 3: Perform Feature selection and visualize the hidden patterns. Step 4: Split the data into 2 parts. Step 5: Create the eight machine-learning models using loaded libraries. Step 6: Evaluate every model's accuracy, recall, precision and f1 score and store it in a dictionary for further analysis. Step 7: Run the algorithm to select the best model based on accuracy and precision. If both fail to match the set criterion, select the model with the highest f1 score. Step 8: Prediction. The first step was setting up the environment and importing all required Python libraries for data processing, analysis, and model building. These libraries included NumPy, Pandas, Scikit-learn, and Matplotlib. To complete Python's modularity and reusability, we created one primary function capable of training all machine learning models and plotting confusion matrices to evaluate classification performance. The Parkinson's Disease dataset (acquired from the UCI Machine Learning Repository) is imported and stored in a Pandas DataFrame to preserve its integrity. An initial data analysis was carried out to gain insights into the dataset's structure and distributions. Feature and target sets were extracted, and scaling was applied to normalize numerical values. The dataset was then split into training (70%) and testing (30%) sets to evaluate the model's performance efficiently. In the model selection phase, eight classification models were tested. Four from existing research (Linear Regression, Decision Tree, Support Vector Machine, and Naïve Bayes) and four proposed models (Random Forest, Gradient Descent, XGBoost, and AdaBoost). The performance of these models is assessed using confusion matrices and evaluation metrics, allowing for a comparison of their accuracy in diagnosing Parkinson's disease. The best-performing model is then used for prediction, where new patient data is fed into the system to determine whether anyone has Parkinson's disease. 4.3. Evaluation metrics In the next step, a confusion matrix will be plotted to compare classifier predictions to actual results as shown in Fig. 5, breaking them down into correct and incorrect predictions. This will help identify where the model is making mistakes and help improve it. The matrix displays the number of occurrences produced through the model on test data, indicating True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) given in Eq. (1). It calculates key measures, namely accuracy, precision, and recall. The accuracy of a model is measured by its overall accuracy, which can be misleading in imbalanced datasets. On the other hand, precision measures the quality of the model's positive predictions, which is crucial in minimizing false positives. On the other hand, Recall calculates the model's ability to identify all positive cases, which is essential in situations like medical diagnoses. The F1-score, combining precision and recall, provides a better understanding of a model's performance, especially in imbalanced datasets. 5 Result analysis This section shows the discuss the result analysed. Figure 6 (a) to ( h) shows Confusion Matrix for Logistic Regression, Decision tree, Support Vector Machine, Gaussian Naive Bayes, Random Forest, Stochastic Gradient Descent, XGBoost, AdaBoost which evaluates specific label performance for each algorithm. The model was assessed based on accuracy, precision, recall, and F1-score. Table 2 shows the results of the evaluation metrics. Figure 7 shows the graph of accuracy vs precision for all models and Fig. 8 describe the comparison among the models. Table 2 Complete model comparison Model Accuracy Precision Recall F1 - Score Logistic Regression 0.847458 0.830189 1 0.907216 Decision Tree 0.847458 0.87234 0.931818 0.901099 Support Vector Machine 0.830508 0.814815 1 0.897959 Gaussian Naive Bayes 0.762712 0.894737 0.772727 0.829268 Random Forest 0.847458 0.830189 1 0.907216 Stochastic Gradient Descent 0.813559 0.811321 0.977273 0.886598 XGBoost 0.864407 0.86 0.977273 0.914894 AdaBoost 0.864407 0.875 0.954545 0.913043 6. Interpretation and Discussion The obtained results indicate that the ensemble classifiers perform better than the base classifiers. Logistic Regression performed well in terms of accuracy for the base classification system. Naive Bayes performed the lowest among all the classifiers. In contrast, the Support Vector Machine shows the highest overall accuracy and stands ahead among the four classifiers that were taken from the literature for comparison. We can notice a major difference in the classifiers we proposed. There can be seen an increase in average accuracy as well. Random forest had the highest sensitivity/recall among all the 8 classifiers. Parallel method of Bagging using Random Forest and Sequential method of boosting using XG Boost and Ada Boost were also applied. XG Boost clearly ends up on top here with a considerable margin in relation to overall performance when compared to Support Vector Machines and the remaining 8 as well. The greatest benefit of using classifiers is that they are bias-free. Using the f1 scoring system as the standard metric for evaluating the classifier's performance allows us to account for both Recall and Precision. In contrast to accuracy, it produces a more balanced outcome by considering the False Negative when calculating the Final Score. 7. Conclusion Parkinson's disease is a serious condition for which there is currently no cure. As it affects the motion of one's body parts, it also affects speech. The study aims to develop a method of diagnosing Parkinson's Disease that will lead to prompt measures to eliminate or perhaps prevent the disease from impacting the entire body before it's too late. This study aims to provide the foundation for newer models and machine learning classifiers to be implemented and drawn out to detect this or any related diseases in the healthcare domain. In our study, we demonstrate the power of various machine learning classification systems and the advantages of integrating the power of numerous heterogeneous classifiers can give higher accuracy results. As a result, our suggested method highlights the necessity of early Parkinson's disease diagnosis and prediction so that patients can receive treatment and support as soon as feasible. In most circumstances, our observations have demonstrated that Ensemble Classifiers outperform Base Classifiers in terms of effectiveness. Logistic Regression is a good performer among the Base Classifiers. Support Vector Machine and XG Boost produce the best results out of all of them. When the F1 scoring system is used as the standard metric for classification and also boosting produces a superior overall outcome. As a result, these ML models may be applied to classify the dataset effectively. Our findings also highlight the most significant characteristics to consider when making predictions. 8. Future scope The scope of this study is limited to Machine Learning and Classification. It merely provides a template for advanced researchers to view comparisons and utilize for implementations. This research can be further improved by incorporating more advanced methodologies and concepts of ANN (Artificial Neural Networks) and Deep learning while tuning parameters to achieve and obtain the best results. Declarations Author Contribution A.V.: Conceptualization, data curation, methodology, contributed to plotting model comparisons, S.D.: Experiments, investigation, model validation, technical assistance, formal analysis, A.B.: Review and final editing. Data Availability The dataset for this research is available at :-https://www.kaggle.com/datasets/vikasukani/parkinsons-disease-data-set References Sveinbjornsdottir S (2016) The clinical symptoms of Parkinson's disease. J Neurochem 139(1):318–324 Sakar BE et al (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomedical Health Inf 17:828–834 Hadjahamadi AH, Askari TJ (2012) A Detection Support System for Parkinson’s Disease Diagnosis Using Classification and Regression Tree. J Math Comput Sci 4:257–263 Alemami Y, Almazaydeh L (2014) Detecting of Parkinson Disease through Voice Signal Features. Journal of American Science Olanrewaju RF, Sahari NS, Musa AA, Hakiem N (2014) Application of Neural Networks in Early Detection and Diagnosis of Parkinson’s Disease, International Conference on Cyber and IT Service Management Marar S, Swain D, Hiwarkar V, Motwani N, Awari A (2018) Predicting the occurrence of Parkinson’s Disease using various Classification Models, 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), pp. 1–5. 10.1109/ICACAT.2018.8933579 Mehrbakhsh, Nilashi (2016) Othman Ibrahim & AliAhaniAccuracy Improvement for Predicting Parkinson‟s Disease Progression, Scientific Reports Salvatore C, Cerasa A, Castiglioni I, Gallivanone F, Augimeri A, Lopez M, Quattrone A (2014) Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy. J Neurosci Methods 222:230–237 Abedin MM, Md.Maniruzzaman NAM (2019) Faisal Ahmed Benojir Ahammed and Mohammad Ali, Classification and Prediction of Parkinson Disease: A Machine Learning Approach 7th Int. Conf. on Data Science & SDGs EC – 012 December 18–19, pp 75–78 Rahul R, Zaveri, Prof. Pramila M, Chawan Prediction of Parkinson’s Disease using Data Mining: A Survey, International Research Journal of Engineering and Technology (IRJET) Volume: 07 Issue: 10, e-ISSN: 2395-0056 p-ISSN: 2395-0072 Dragana M et al (2016) Machine Learning and Data Mining Methods for Managing Parkinson’s Disease LNAI 9605, pp 209–220 Stylianou N, Akbarov A, Kontopantelis E, Buchan I, Dunn KW (2015) Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches. Burns 41(5):925–934 Mashat AF, Fouad MM, Philip SY, Gharib T (2012) F. A decision tree classification model for university admission system. Editorial Preface, 3(10) Alemami Y, Almazaydeh L (2014) Detecting of Parkinson Disease through Voice Signal Features. Journal of American Science Kumari GPA (2012) Study Of Bagging And Boosting Approaches To Develop MetaClassifier Khaled Mohamad, Almustafa (2020) Classification of epileptic seizure dataset using different machine learning algorithms, Informatics in Medicine Unlocked. 21(100444):2352–9148 Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. Journal-Japanese Soc Artif Intell 14(771–780):1612 Emon M, Uddin et al (2020) Performance Analysis of Machine Learning Approaches in Stroke Prediction. 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection, BioMedical Engineering Online, June 2007. https://archive.ics.uci.edu/ml/datasets/parkinsons Max A, Little PE, McSharry EJ, Hunter LO (2008) Ramig,Suitability of dysphonia measurements for telemonitoring of Parkinson's Disease. IEEE Trans Biomedical Eng June https://archive.ics.uci.edu/ml/datasets/parkinsons Aditi Govindu S, Palwe (2023) Early detection of Parkinson's disease using machine learning. Procedia Comput Sci, 218, ISSN 1877 – 0509 Severson KA et al (2021) Discovery of Parkinson's disease states and disease progression modelling: a longitudinal data study using machine learningThe. Lancet Digit Health Volume 3, Issue 9, e555 - e564 Yadav S, Singh MK, Pal S (2023) Artificial Intelligence Model for Parkinson Disease Detection Using Machine Learning Algorithms. Biomedical Mater Devices 1:899–911 Sangeetha S, Baskar K, Kalaivaani PCD, Kumaravel T, Deep Learning-based Early Parkinson's Disease Detection from Brain MRI Image, 2023 7th International Conference on Intelligent Computing and, Systems C (2023) (ICICCS), Madurai, India Prediction of Parkinson’s disease using Ensemble Machine Learning classification from acoustic analysis, Amit Kumar Patra 2019. J Phys : Conf Ser. 1372 012041 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6366739","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":440209543,"identity":"f57bcfd0-a421-4030-928b-b5e8cce038f1","order_by":0,"name":"Sharang D","email":"","orcid":"","institution":"Dr. Vishwanath Karad MIT World Peace University","correspondingAuthor":false,"prefix":"","firstName":"Sharang","middleName":"","lastName":"D","suffix":""},{"id":440209544,"identity":"f7505895-bc4e-4e6f-8941-103d61ca6e1a","order_by":1,"name":"Aditya Vyavahare","email":"","orcid":"","institution":"Dr. Vishwanath Karad MIT World Peace University","correspondingAuthor":false,"prefix":"","firstName":"Aditya","middleName":"","lastName":"Vyavahare","suffix":""},{"id":440209545,"identity":"6c7f87c2-b122-4812-bd15-257da8a4a3e1","order_by":2,"name":"Anuja Bokhare","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/klEQVRIiWNgGAWjYJACAwaGAwwSEiBmBRAzMzeQouUMSAsjYS0McC2MbSAOAS38s3sMCn4w3JGTnN387MPHebXR/O1ALT8qtuHUInHnjIFhD8MzY2mZY8YzZ247njvjMGMDY8+Z27ituZFjYMDDcDhxnkSCMTPvtmO5DUAtzIxtuLXIA7UY/gFrSf/M/HfOsdz5hLQYALUYg2yZLZFjDAyrmtwNhLQY3kgrMJYxeGYsOSOnmLHn2IHcjUAtB/H5Re5G8jbDNxV35CRupG9m+FFTlzvv/OGDD35U4PE+AwObASgyoeAwmDyATz0QMD9A4tQRUDwKRsEoGAUjEQAAQMpcaBo5QDwAAAAASUVORK5CYII=","orcid":"","institution":"Dr. Vishwanath Karad MIT World Peace University","correspondingAuthor":true,"prefix":"","firstName":"Anuja","middleName":"","lastName":"Bokhare","suffix":""}],"badges":[],"createdAt":"2025-04-03 07:08:25","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6366739/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6366739/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":80293205,"identity":"50ea9827-545e-4fd1-ba96-87aa8d84483d","added_by":"auto","created_at":"2025-04-10 08:14:48","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":40889,"visible":true,"origin":"","legend":"\u003cp\u003eProposed Model\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/359b99d54713c067088f23d1.png"},{"id":80293202,"identity":"5348aafa-cf45-494a-9ded-51ae260e8f21","added_by":"auto","created_at":"2025-04-10 08:14:48","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":28094,"visible":true,"origin":"","legend":"\u003cp\u003eClass Distribution\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/527ecaae9728767ed429d2d1.png"},{"id":80293207,"identity":"9b791683-7a46-46e4-983f-4a159b540d82","added_by":"auto","created_at":"2025-04-10 08:14:48","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":39754,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of attributes\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/8eac9d67e663f0c3e89ef21e.png"},{"id":80293229,"identity":"0f3329db-230c-4b56-ab65-4989f920505c","added_by":"auto","created_at":"2025-04-10 08:14:48","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":451247,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of values in the dataset\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/7ab407c84b37823180402e8a.png"},{"id":80294914,"identity":"f60b5c26-182a-4216-b285-d53740fb8506","added_by":"auto","created_at":"2025-04-10 08:30:48","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":59264,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix guideline\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/165dc25dc8e33b8823ea144d.png"},{"id":80293209,"identity":"58bf1420-aef9-40c3-b2ba-f58ed5011975","added_by":"auto","created_at":"2025-04-10 08:14:48","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":84493,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix for (a) Logistic Regression (b) Decision Tree (c) Support Vector Machine (d) Gaussian Naive Bayes (e ) Random Forest (f) Stochastic Gradient Descent (g) XGBoost (h)AdaBoost\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/9035909700c09869632a1402.png"},{"id":80293221,"identity":"c25d782d-3402-4dba-bdf4-81d19b995b81","added_by":"auto","created_at":"2025-04-10 08:14:48","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":104168,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of accuracy and precision of every model\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/17e4a8be0d9caf808727861f.png"},{"id":80294915,"identity":"1eab7180-8c28-495d-acf0-0f9fe7b56276","added_by":"auto","created_at":"2025-04-10 08:30:48","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":70508,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance comparison of all metrics of every model\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/e3c666f97acc04b45b596a46.png"},{"id":83275429,"identity":"0624abd4-f4d7-4cf0-ac7b-ee66c8dca180","added_by":"auto","created_at":"2025-05-22 08:54:06","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1536447,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6366739/v1/3ed26c5e-72b9-4245-a7c5-35d438576b5f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Enhancing Parkinson’s Disease Diagnosis Using Machine Learning: A Comparative Study","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eParkinson's is not an easy-to-diagnose disease. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] Accurate diagnosis in the early stages is an exhaustive and tedious task. Even experienced medical professionals and researchers have faced a huge barrier to precise and efficient diagnosis. This paper addresses the said issue by building a prediction system based on machine learning models that bring out accurate results that can be utilized promptly. The objective is to run the dataset through multiple models and select the best-performing one. This is meant to bring increased efficiency and diagnostic accuracy.\u003c/p\u003e \u003cp\u003eMachine learning classifiers were used to compare patient voice attributes from the dataset. 70% of the dataset was utilized for training and the remaining 30% for testing. Overall, Machine Learning techniques like Support Vector Machines, Logistic Regression, and Naive Bayes, along with Decision Trees, were found to be the commonly utilized techniques in the creation of Parkinsons detection systems. This study proposes four more such techniques\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eGradient Descent\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eXG Boost\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eAda Boost\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThese 8 models were compared based on performance. The said model performance was evaluated based on additional proposed Ensemble Learning Classifiers. It used indicators like classification precision, accuracy, sensitivity/recall, and F1-Score. It debated which one could be better used to predict the correct outcome.\u003c/p\u003e"},{"header":"2 Previous study","content":"\u003cp\u003eThe authors state the criticalness of early Parkinson's disease diagnosis. It differentiates dopamine production between the brain of a Parkinson's patient and a normal brain. They showcased the importance of data mining techniques for detection. The methods, namely, Naive Bayes, Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Decision Tree are theoretically explained in this study. The study analyses the performance of four different classifiers mentioned above, on a sample of 8 patients' voice input. The paper is an elaborate study of how clinical decision support systems run based on regression trees, support vector machines and sensitivity analysis [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Another work predicts Parkinson's disease by stating the importance of vocal signal analysis. Acoustic instruments are put to use through the development of automated classifications that root from Na\u0026iuml;ve Bayes and K-Nearest-Neighbours. People from various areas of society are brought together, and their speech factors are investigated in this article to predict Parkinson's disease accurately. To recognise the same in the voice dataset, researchers employed a Multi-Layer Perceptron and a Logistic Regression Framework [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] Like the other study [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], these researchers also focused on the secretions of dopamine. They used multiple acoustic devices to collect speech parameters from 50 Parkinson's patients and 50 healthy people. They employed the K-Gold cross-validation technique for testing and claimed that it managed to deliver around 85% accuracy. However, this paper failed to provide an experimental explanation for the outcome. But, because so many patients were treated successfully, its findings can be considered optimistic.\u003c/p\u003e \u003cp\u003eThe article discusses a study that used a voice-based dataset to diagnose Parkinson's Disease. It consisted of audio recordings of 31 Parkinson's patients. The source of the data was the \u0026ldquo;UCI Machine Learning repository from the Centre for Machine Learning and Intelligent Systems\u0026rdquo;, and it was analysed using different ML algorithms. Various models were compared to derive the best results. The following conclusions were drawn based on the findings. The prediction accuracy of algorithms like Logistic Regression, Naive Bayes, and Decision Trees is around 70%. The accuracy of the Support Vector Machine was around 85% [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. The author introduced a hybrid smart system that uses noise-eliminating methods for data pre-processing, clustering methods to help determine class labels and some prediction methods for predicting the development of the illness. To determine what dimensions or traits are important, PCA is used. Later on, the study implements support vector regression algorithms and neural fuzzy interface systems. This hybrid intelligence system greatly increased the previous accuracy of prediction [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. A brain MRI dataset was used to identify medical image-related biomarkers. In order to understand the progression of the disease and the function of biomarkers in recognising the progressive behaviour, supervised Multi-Layer Classification, such as a Support Vector Machine, has been used [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, five classification techniques were applied, and their performance was measured. Linear discriminant analysis (LDA), K-nearest neighbour (KNN), Support vector machine (SVM) and Naive Bayes (NB) were employed in this investigation. In their presented study, the SVM with radial basis kernel has been picked as the best classifier and predictor [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The use of data mining in science and medicine can help researchers understand diseases better and devise more effective tactics to combat them, allowing them to leverage resources better. The goal is to apply various data mining approaches, to obtain solid results. The use of a Decision Tree and the creation of a classifier yields an accuracy of 88\u0026ndash;94 per cent [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Through this literature, the predictor element of this system forecasted each Parkinson's Disease symptom separately and managed to cover 15 Parkinson's Disease symptoms, depending on the first patient assessment and drugs taken. The prediction accuracy varies from 57.1\u0026ndash;77.4% depending on the symptom, with tremor detection having the highest accuracy [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Another paper discusses the rising occurrence of Parkinson's disease in people above the age of 50. By utilizing a dataset of 30 people and 4 machine learning classifiers, namely Support Vector Machine (SVM), Random Forest, K-Nearest Neighbors (KNN) and Logistic Regression, it was concluded that Random Forest is the best classifier for detecting Parkinson's disease, with an accuracy ranging between 90\u0026ndash;95 per cent [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. One study aimed to account for intra-individual and inter-individual variability by developing a statistical progression model for Parkinson's disease. The dataset comprised of medical records of 423 patients collected through 7 years. The findings were 8 different disease states related to a patient's functional impairment, tremors, neuropsychiatric attributes, and health, which were used to develop a statistical progression model [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. With a dataset of 31 patients as the base, chi\u003csup\u003e2\u003c/sup\u003e and Extra Trees Classifier for feature extraction and understanding data dimensionality, the study aimed to improve diagnostic accuracy and interpretations. ETC's robustness and efficiency combined with chi\u003csup\u003e2\u003c/sup\u003e were concluded to be very beneficial and efficient [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. This publication aimed to develop a detection system using deep learning and MRI scans of the human brain. It explores a CNN-based approach. In the findings, a custom 3-layer CNN achieved 94\u0026ndash;98% accuracy, outperforming SVM, RVM, and other methods. CNN's automatic feature learning surpasses traditional handcrafted features. It effectively identified Parkinson's Disease-related features from MRI data [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e"},{"header":"3 Proposed Methodology","content":"\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Overview of Proposed Methodology\u003c/h2\u003e \u003cp\u003eIn this paper, we will utilize traditional and new machine learning classifiers to achieve a better outcome classify an individual's health status, and determine if they have Parkinson's or not.\u003c/p\u003e \u003cp\u003eThe four common classifiers derived from the literature review will be used, along with four new proposed classifiers, and the same dataset will be applied, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Using a comparison, we will understand the performance of each classifier, select the best one, and then use it to make further predictions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Machine Learning Classifiers in the Proposed Methodology\u003c/h2\u003e \u003cp\u003e \u003cb\u003eLogistic Regression\u003c/b\u003e:\u003c/p\u003e \u003cp\u003eThis classification [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] is based on a sigmoid function called the logistic function [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], which takes a real input and outputs a number between 0 and 1.\u003c/p\u003e \u003cp\u003e \u003cb\u003eDecision Tree\u003c/b\u003e:\u003c/p\u003e \u003cp\u003eThe input is split into sub-spaces based on specified functions in the case of a Decision Tree classifier [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIt uses conditional control statements to derive a conclusion [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003eSupport Vector Machine\u003c/b\u003e:\u003c/p\u003e \u003cp\u003eSVM is a standard Supervised Learning algorithm that is mostly utilised in Machine Learning for Classification tasks [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eSVM uses a hyperplane as a decision boundary. Data points falling on one side of this hyperplane are assigned to one class, while those on the other side belong to the other class. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003eGaussian Na\u0026iuml;ve Bayes\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] The Naive Bayes Classifier is a simple and practical classification method that aids in developing fast machine learning models capable of making quick predictions.\u003c/p\u003e \u003cp\u003eIt's a probabilistic classifier, making predictions based on an object's probability.\u003c/p\u003e \u003cp\u003e \u003cb\u003eRandom Forest\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] A parallel bagging model, this classifier builds decision trees at random and averages the results, reducing model overfitting. It randomly selects a small subset of the features and uses the most prominent one to split the node.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStochastic Gradient Descent\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] SGD, technically speaking, is an optimization strategy that does not refer to a specific class of machine learning models. SGD Classifier is a simple stochastic gradient descent learning technique that supports a variety of classification loss functions and penalties. It is equivalent to a linear SVM and uses an SGD Classifier trained with the hinge loss. In simple words, it's a method of training a model.\u003c/p\u003e \u003cp\u003e \u003cb\u003eXG Boost\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] A gradient-boosting framework called Extreme Gradient Boosting is used in this machine-learning approach.\u003c/p\u003e \u003cp\u003eIt constructs trees sequentially, with each succeeding tree attempting to reduce the faults of the previous tree.\u003c/p\u003e \u003cp\u003eEach tree learns from the trees before it, refining the model by fixing past errors. The final model integrates the predictions of many individual trees.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAda Boost\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] Adaptive Boosting is a classification technique that fits weak classifiers and enhances the prediction with each iteration.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4 Dataset and experiment discussion","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e4.1. About Dataset\u003c/h2\u003e \u003cp\u003eThe dataset [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] for this research was developed by Max Little et al. at the University of Oxford in partnership with the National Centre for Voice and Speech and is available at the UCI machine learning repository. Their study employed speech recordings from 31 persons, including at least 23 Parkinson's disease patients (16M and 7F) and 8 healthy people (3M and 5F). Each column for a specific voice and each row in the dataset table correspond to one of the 195 audio recordings from these distinct subjects. Furthermore, the people's age group ranged from 46 to 85, with the median age being 65.\u003c/p\u003e \u003cp\u003eThe primary goal of this dataset was to differentiate patients from healthy individuals by identifying changes in vocalisation of vowels based on their status, corresponding to column 103, which was set to 0 for healthy and 1 for Parkinson's disease. A total of 195 samples were recorded, with each subject recording an average of 6 vowel phonation for 36 seconds. A computerised speech laboratory was used to record the voice signals.\u003c/p\u003e \u003cp\u003eThe dataset has over 15 citations and 50 thousand views and has been used for multiple case studies that revolve around machine learning classifiers. It comprises attributes focusing on fundamental frequency, amplitude, noise levels and non-linear dynamics of speech signals.\u003c/p\u003e \u003cp\u003eThe attributes of this dataset are described and represented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, and the distribution of classes, attributes and values is shown in Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, 3 and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDataset Specification\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSr. No\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAttributes\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Fo(Hz)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003emean frequency of vocal chord vibrations.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Fhi(Hz)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ethe highest pitch of a voice signal\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Flo(Hz)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ethe lowest frequency recorded during voice propagation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Jitter(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003evariations in pitch recorded in between vibration cycles in patients' vocal\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Jitter(Abs)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:RAP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eshort term frequency disruptions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:PPQ\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eaverages RAP variations over five pitch periods.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eJitter:DDP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ethree-point period of pitch instability\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Shimmer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eamplitude disparities across a vocal cycle.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:Shimmer(dB)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e11.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eShimmer:APQ3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eanalyze amplitude quotient variations over different intervals of time.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eShimmer:APQ5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e13.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMDVP:APQ\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e14.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eShimmer:DDA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003edeviation of loudness over 3 voice cycles\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e15.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNHR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ethe noise-to-harmonics ratio\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e16.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHNR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eharmonics-to-noise ratio\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e17.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRPDE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003erecurrence of vocal patterns\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e18.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDFA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ehow fluctuations of speech evolve signals over time\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e19.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003espread1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003evariations in fundamental frequency\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e20.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003espread2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e21.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePPE\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e22.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eD2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ethe complexity of fold dynamics in a vocal signal\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e4.2. Experimentation approach and Steps\u003c/h2\u003e \u003cp\u003eThis section elaborates about experiment steps.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"1\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParkinson's disease classification implementation steps\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStep 1: Import the basic libraries to read and visualize the data and the \u003c/p\u003e \u003cp\u003e dependencies to create machine-learning models\u003c/p\u003e \u003cp\u003eStep 2: Read the data, plot the required graphs, and remove the \u003c/p\u003e \u003cp\u003e unnecessary columns.\u003c/p\u003e \u003cp\u003eStep 3: Perform Feature selection and visualize the hidden patterns.\u003c/p\u003e \u003cp\u003eStep 4: Split the data into 2 parts.\u003c/p\u003e \u003cp\u003eStep 5: Create the eight machine-learning models using loaded libraries.\u003c/p\u003e \u003cp\u003eStep 6: Evaluate every model's accuracy, recall, precision and f1 score \u003c/p\u003e \u003cp\u003e and store it in a dictionary for further analysis.\u003c/p\u003e \u003cp\u003eStep 7: Run the algorithm to select the best model based on accuracy \u003c/p\u003e \u003cp\u003e and precision. If both fail to match the set criterion, select the \u003c/p\u003e \u003cp\u003e model with the highest f1 score.\u003c/p\u003e \u003cp\u003eStep 8: Prediction.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe first step was setting up the environment and importing all required Python libraries for data processing, analysis, and model building. These libraries included NumPy, Pandas, Scikit-learn, and Matplotlib. To complete Python's modularity and reusability, we created one primary function capable of training all machine learning models and plotting confusion matrices to evaluate classification performance.\u003c/p\u003e \u003cp\u003eThe Parkinson's Disease dataset (acquired from the UCI Machine Learning Repository) is imported and stored in a Pandas DataFrame to preserve its integrity. An initial data analysis was carried out to gain insights into the dataset's structure and distributions.\u003c/p\u003e \u003cp\u003eFeature and target sets were extracted, and scaling was applied to normalize numerical values. The dataset was then split into training (70%) and testing (30%) sets to evaluate the model's performance efficiently.\u003c/p\u003e \u003cp\u003eIn the model selection phase, eight classification models were tested. Four from existing research (Linear Regression, Decision Tree, Support Vector Machine, and Na\u0026iuml;ve Bayes) and four proposed models (Random Forest, Gradient Descent, XGBoost, and AdaBoost). The performance of these models is assessed using confusion matrices and evaluation metrics, allowing for a comparison of their accuracy in diagnosing Parkinson's disease.\u003c/p\u003e \u003cp\u003eThe best-performing model is then used for prediction, where new patient data is fed into the system to determine whether anyone has Parkinson's disease.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e4.3. Evaluation metrics\u003c/h2\u003e \u003cp\u003eIn the next step, a confusion matrix will be plotted to compare classifier predictions to actual results as shown in Fig.\u0026nbsp;5, breaking them down into correct and incorrect predictions. This will help identify where the model is making mistakes and help improve it. The matrix displays the number of occurrences produced through the model on test data, indicating True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) given in Eq.\u0026nbsp;(1). It calculates key measures, namely accuracy, precision, and recall.\u003c/p\u003e \u003cp\u003e The accuracy of a model is measured by its overall accuracy, which can be misleading in imbalanced datasets. On the other hand, precision measures the quality of the model's positive predictions, which is crucial in minimizing false positives. On the other hand, Recall calculates the model's ability to identify all positive cases, which is essential in situations like medical diagnoses. The F1-score, combining precision and recall, provides a better understanding of a model's performance, especially in imbalanced datasets.\u003c/p\u003e \u003c/div\u003e"},{"header":"5 Result analysis","content":"\u003cp\u003eThis section shows the discuss the result analysed. Figure\u0026nbsp;6 (a) to ( h) shows Confusion Matrix for Logistic Regression, Decision tree, Support Vector Machine, Gaussian Naive Bayes, Random Forest, Stochastic Gradient Descent, XGBoost, AdaBoost which evaluates specific label performance for each algorithm.\u003c/p\u003e \u003cp\u003eThe model was assessed based on accuracy, precision, recall, and F1-score. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the results of the evaluation metrics. Figure\u0026nbsp;7 shows the graph of accuracy vs precision for all models and Fig.\u0026nbsp;8 describe the comparison among the models.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComplete model comparison\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eF1 - Score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLogistic Regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.847458\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.830189\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.907216\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDecision Tree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.847458\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.87234\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.931818\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.901099\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSupport Vector Machine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.830508\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.814815\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.897959\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGaussian Naive Bayes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.762712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.894737\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.772727\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.829268\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.847458\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.830189\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.907216\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStochastic Gradient Descent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.813559\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.811321\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.977273\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.886598\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.864407\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.977273\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.914894\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAdaBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.864407\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.875\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.954545\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.913043\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"6. Interpretation and Discussion","content":"\u003cp\u003eThe obtained results indicate that the ensemble classifiers perform better than the base classifiers. Logistic Regression performed well in terms of accuracy for the base classification system. Naive Bayes performed the lowest among all the classifiers. In contrast, the Support Vector Machine shows the highest overall accuracy and stands ahead among the four classifiers that were taken from the literature for comparison.\u003c/p\u003e \u003cp\u003eWe can notice a major difference in the classifiers we proposed. There can be seen an increase in average accuracy as well. Random forest had the highest sensitivity/recall among all the 8 classifiers.\u003c/p\u003e \u003cp\u003eParallel method of Bagging using Random Forest and Sequential method of boosting using XG Boost and Ada Boost were also applied. XG Boost clearly ends up on top here with a considerable margin in relation to overall performance when compared to Support Vector Machines and the remaining 8 as well.\u003c/p\u003e \u003cp\u003eThe greatest benefit of using classifiers is that they are bias-free. Using the f1 scoring system as the standard metric for evaluating the classifier's performance allows us to account for both Recall and Precision. In contrast to accuracy, it produces a more balanced outcome by considering the False Negative when calculating the Final Score.\u003c/p\u003e"},{"header":"7. Conclusion","content":"\u003cp\u003eParkinson's disease is a serious condition for which there is currently no cure. As it affects the motion of one's body parts, it also affects speech. The study aims to develop a method of diagnosing Parkinson's Disease that will lead to prompt measures to eliminate or perhaps prevent the disease from impacting the entire body before it's too late.\u003c/p\u003e \u003cp\u003eThis study aims to provide the foundation for newer models and machine learning classifiers to be implemented and drawn out to detect this or any related diseases in the healthcare domain. In our study, we demonstrate the power of various machine learning classification systems and the advantages of integrating the power of numerous heterogeneous classifiers can give higher accuracy results. As a result, our suggested method highlights the necessity of early Parkinson's disease diagnosis and prediction so that patients can receive treatment and support as soon as feasible.\u003c/p\u003e \u003cp\u003eIn most circumstances, our observations have demonstrated that Ensemble Classifiers outperform Base Classifiers in terms of effectiveness. Logistic Regression is a good performer among the Base Classifiers. Support Vector Machine and XG Boost produce the best results out of all of them. When the F1 scoring system is used as the standard metric for classification and also boosting produces a superior overall outcome. As a result, these ML models may be applied to classify the dataset effectively. Our findings also highlight the most significant characteristics to consider when making predictions.\u003c/p\u003e"},{"header":"8. Future scope","content":"\u003cp\u003eThe scope of this study is limited to Machine Learning and Classification. It merely provides a template for advanced researchers to view comparisons and utilize for implementations. This research can be further improved by incorporating more advanced methodologies and concepts of ANN (Artificial Neural Networks) and Deep learning while tuning parameters to achieve and obtain the best results.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA.V.: Conceptualization, data curation, methodology, contributed to plotting model comparisons, S.D.: Experiments, investigation, model validation, technical assistance, formal analysis, A.B.: Review and final editing.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe dataset for this research is available at :-https://www.kaggle.com/datasets/vikasukani/parkinsons-disease-data-set\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSveinbjornsdottir S (2016) The clinical symptoms of Parkinson's disease. J Neurochem 139(1):318\u0026ndash;324\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSakar BE et al (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomedical Health Inf 17:828\u0026ndash;834\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHadjahamadi AH, Askari TJ (2012) A Detection Support System for Parkinson\u0026rsquo;s Disease Diagnosis Using Classification and Regression Tree. J Math Comput Sci 4:257\u0026ndash;263\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlemami Y, Almazaydeh L (2014) Detecting of Parkinson Disease through Voice Signal Features. Journal of American Science\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOlanrewaju RF, Sahari NS, Musa AA, Hakiem N (2014) Application of Neural Networks in Early Detection and Diagnosis of Parkinson\u0026rsquo;s Disease, International Conference on Cyber and IT Service Management\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarar S, Swain D, Hiwarkar V, Motwani N, Awari A (2018) Predicting the occurrence of Parkinson\u0026rsquo;s Disease using various Classification Models, 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), pp. 1\u0026ndash;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ICACAT.2018.8933579\u003c/span\u003e\u003cspan address=\"10.1109/ICACAT.2018.8933579\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMehrbakhsh, Nilashi (2016) Othman Ibrahim \u0026amp; AliAhaniAccuracy Improvement for Predicting Parkinson‟s Disease Progression, Scientific Reports\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSalvatore C, Cerasa A, Castiglioni I, Gallivanone F, Augimeri A, Lopez M, Quattrone A (2014) Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy. J Neurosci Methods 222:230\u0026ndash;237\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbedin MM, Md.Maniruzzaman NAM (2019) Faisal Ahmed Benojir Ahammed and Mohammad Ali, Classification and Prediction of Parkinson Disease: A Machine Learning Approach 7th Int. Conf. on Data Science \u0026amp; SDGs EC \u0026ndash;\u0026thinsp;012 December 18\u0026ndash;19, pp 75\u0026ndash;78\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRahul R, Zaveri, Prof. Pramila M, Chawan Prediction of Parkinson\u0026rsquo;s Disease using Data Mining: A Survey, International Research Journal of Engineering and Technology (IRJET) Volume: 07 Issue: 10, e-ISSN: 2395-0056 p-ISSN: 2395-0072\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDragana M et al (2016) Machine Learning and Data Mining Methods for Managing Parkinson\u0026rsquo;s Disease LNAI 9605, pp 209\u0026ndash;220\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStylianou N, Akbarov A, Kontopantelis E, Buchan I, Dunn KW (2015) Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches. Burns 41(5):925\u0026ndash;934\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMashat AF, Fouad MM, Philip SY, Gharib T (2012) F. A decision tree classification model for university admission system. Editorial Preface, 3(10)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlemami Y, Almazaydeh L (2014) Detecting of Parkinson Disease through Voice Signal Features. Journal of American Science\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKumari GPA (2012) Study Of Bagging And Boosting Approaches To Develop MetaClassifier\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhaled Mohamad, Almustafa (2020) Classification of epileptic seizure dataset using different machine learning algorithms, Informatics in Medicine Unlocked. 21(100444):2352\u0026ndash;9148\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFreund Y, Schapire R, Abe N (1999) A short introduction to boosting. Journal-Japanese Soc Artif Intell 14(771\u0026ndash;780):1612\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmon M, Uddin et al (2020) Performance Analysis of Machine Learning Approaches in Stroke Prediction. 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLittle MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection, BioMedical Engineering Online, June 2007. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://archive.ics.uci.edu/ml/datasets/parkinsons\u003c/span\u003e\u003cspan address=\"https://archive.ics.uci.edu/ml/datasets/parkinsons\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMax A, Little PE, McSharry EJ, Hunter LO (2008) Ramig,Suitability of dysphonia measurements for telemonitoring of Parkinson's Disease. IEEE Trans Biomedical Eng June \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://archive.ics.uci.edu/ml/datasets/parkinsons\u003c/span\u003e\u003cspan address=\"https://archive.ics.uci.edu/ml/datasets/parkinsons\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAditi Govindu S, Palwe (2023) Early detection of Parkinson's disease using machine learning. Procedia Comput Sci, 218, ISSN 1877\u0026thinsp;\u0026ndash;\u0026thinsp;0509\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSeverson KA et al (2021) Discovery of Parkinson's disease states and disease progression modelling: a longitudinal data study using machine learningThe. Lancet Digit Health Volume 3, Issue 9, e555 - e564\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYadav S, Singh MK, Pal S (2023) Artificial Intelligence Model for Parkinson Disease Detection Using Machine Learning Algorithms. Biomedical Mater Devices 1:899\u0026ndash;911\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSangeetha S, Baskar K, Kalaivaani PCD, Kumaravel T, Deep Learning-based Early Parkinson's Disease Detection from Brain MRI Image, 2023 7th International Conference on Intelligent Computing and, Systems C (2023) (ICICCS), Madurai, India\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrediction of Parkinson\u0026rsquo;s disease using Ensemble Machine Learning classification from acoustic analysis, Amit Kumar Patra 2019. J Phys : Conf Ser. 1372 012041\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Neurons, Alzheimer's disease, Parkinson's disease, Parkinson's patient, Machine Learning","lastPublishedDoi":"10.21203/rs.3.rs-6366739/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6366739/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eParkinson's disease is the second most common neurological illness in the world, exactly after Alzheimer's disease. It is the long-term degenerative condition of a human being's central nervous system that primarily affects people over the age of sixty.\u003c/p\u003e \u003cp\u003eParkinson's disease is one of the many neurological conditions that progress over time. Problems with movement are the first symptoms. Initial indicators of the disease could also include vocal dysfunction. Humans diagnosed with Parkinson's have vocal abnormalities that impair their voice's loudness and cause difficulty in pronunciation. As a result, Parkinson's disease can be diagnosed using vocal measures.\u003c/p\u003e \u003cp\u003ePeople may notice issues with common movements, tremors, stiffness in the limbs or trunk, or even decreased balance as neurons (nerve cells) in areas of the brain are weakened, get injured, or die. Patients may struggle with walking, talking, or accomplishing other simple tasks as these symptoms become more noticeable. However, like many other diseases and disorders, these symptoms also appear in other conditions. Thus, it is not necessary for everyone with one or more of these symptoms to have Parkinson's disease. This paper intends to implement 4 base machine learning classifiers and 4 proposed ensemble classifiers to compare and select the best possible model through a dataset of 23 attributes and around 177 records. It is concluded that the ensembles perform far better and are the best interchange between the XG Boost and Random Forest classifier.\u003c/p\u003e","manuscriptTitle":"Enhancing Parkinson’s Disease Diagnosis Using Machine Learning: A Comparative Study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-10 08:14:43","doi":"10.21203/rs.3.rs-6366739/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3669548e-1132-46fb-8523-154faf72d614","owner":[],"postedDate":"April 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-05-22T08:53:49+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-10 08:14:43","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6366739","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6366739","identity":"rs-6366739","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Outcome instruments

MUSA

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0