Multi-omics machine learning classifier and blood transcriptomic signature of Parkinson’s disease | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Multi-omics machine learning classifier and blood transcriptomic signature of Parkinson’s disease Xianjun Dong, Ruifeng Hu, Ruoxuan Wang, Jie Yuan, Zechuan Lin, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6837659/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Early diagnosis and biomarker discovery to bolster the therapeutic pipeline for Parkinson’s disease (PD) are urgently needed. In this study, we leverage the large-scale, whole-blood total RNA and DNA sequencing data from the Accelerating Medicines Partnership in Parkinson’s Disease (AMP PD) program to identify PD-associated RNAs, including both known genes and novel circular RNAs (circRNA) and enhancer RNAs (eRNAs). Initially, 874 known genes, 783 eRNAs, and 35 circRNAs were found differentially expressed in PD blood in the PPMI cohort (FDR < 0.05). Based on these findings, a novel multi-omics machine learning model was built to predict PD diagnosis with high performance (AUC = 0.89), which was superior to previous models. We further replicated this discovery in an independent PDBP/BioFIND cohort and confirmed 1,111 significant marker genes, including 491 known genes, 599 eRNAs, and 21 circRNAs. Functional enrichment analysis showed that the PD-associated genes are involved in neutrophil activation and degranulation, as well as the TNF-α signaling pathway. By comparing the PD-associated genes in blood with those in human brain dopamine neurons in our BRAINcode cohort, we found only 44 genes (9% of the known genes) showing significant changes with the same direction in both PD brain neurons and PD blood, among which are neuroinflammation-associated genes IKBIP, CXCR2, and NFKBIB. Our findings demonstrated consistently lower SNCA mRNA levels and the increased expression levels of VDR gene in the blood of early-stage PD patients. In summary, this study provides a generally useful computational framework for further biomarker development and early disease prediction. We also delineate a wide spectrum of the known and novel RNAs linked to PD that are detectable in circulating blood cells in a harmonized, large-scale dataset. Biological sciences/Computational biology and bioinformatics/Computational neuroscience Biological sciences/Neuroscience/Computational neuroscience Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Main Parkinson’s disease (PD) is a progressive neurodegenerative disease, with an estimated global number of patients exceeding 13 million by 2040 1,2 . PD is thought to be caused by combinatorial effects of environmental, epigenetic, and genetic contributions that exert many of their effects through cis - and trans -acting regulation of transcript abundance 3–6 . Progressive loss of dopamine neurons and an increasing burden of α-synuclein-positive neuronal inclusions (the so-called Lewy bodies) are hallmarks of PD 7,8 . Once PD neuropathology crosses a clinically relevant threshold, movement becomes relentlessly more impaired in PD patients. Biomarkers for early detection and quantitative tracking of disease progression are currently lacking 9,10 . By the time a patient is diagnosed with PD based on today’s clinical criteria (e.g., resting tremor, slow movements, and stiffness), up to 70% of vulnerable dopaminergic neurons have been lost 11 . Therefore, developing a panel of biomarkers for early and accurate diagnosis is urgently needed 12,13 . Additionally, PD is a slowly progressive and complex genetic disease that likely results from multiple genetic risk variants, each conferring small increases in susceptibility. PD GWASs have revealed thousands of genetic variants whose mutations are associated with disease risk 14 . However, the genetic germline is static and thus cannot be used quantitatively to track disease progression over time using serial measurements. While the strategy of constructing an aggregate measure from multiple individual markers has been fruitful in genetic studies of PD risk, the use of markers spanning multiple modalities (e.g., genetic, transcriptomic, clinical, and imaging-based markers) is needed to maximize its utility 15 , as it is unlikely that a single biomarker will adequately capture the genetic and environmental heterogeneity of PD. Individuals with a high risk of developing PD may show developmental potentials in their transcriptomics profiles before having clinical symptoms, even if they may pass the clinical PD tests 16–18 . Limited sample size remains one of the main pitfalls in current biomarker studies. A systematic review of published studies using α-synuclein species as a PD biomarker found that 84% of studies included 100 PD patients or fewer 19 . Previous efforts have made several individual cohorts available to study PD biomarkers, including the Michael J. Fox Foundation (MJFF) Parkinson’s Progression Markers Initiative (PPMI) 20,21 , the NINDS Parkinson's Disease Biomarkers Program (PDBP) 22 , and the MJFF BioFIND study 23 . All data from these efforts are now integrated into the Accelerating Medicines Partnership in Parkinson’s Disease (AMP PD) program, which to date has generated the largest PD longitudinal RNA sequencing data for about 8,500 samples covering more than 3,200 participants with deep clinical phenotype data. A study by Craig et al . 24 conducted an overview analysis of the PPMI dataset, finding that the neutrophil cell abundance is higher in patients with PD, while lymphocyte cell abundance is lower in patients with PD. They used all time-point samples in their cross-sectional analysis and treated the multiple visits from one subject as independent data points, which may not be representative of early time-point samples. In another study by Makarious et al ., the authors built multi-modal machine learning models, which have good performances; however, the case and control samples are unbalanced, and the models achieved a modest, balanced accuracy value of 0.68 13 . In our work, we leveraged the large-scale, total RNA sequencing baseline data (visit month=0) from the AMP PD to find the most significant differentially expressed RNAs from both known genes (e.g., those annotated mRNAs or lncRNAs in GENCODE) and novel, non-coding RNAs which include circular RNAs (circRNAs) and enhancer RNAs (eRNAs) between PD patients and healthy controls in a defined discovery dataset. We then built multi-gene classifiers using a selective set of PD-associated marker genes, as well as an integrative multi-omics classifier using gene expression data, genetic variant data, and clinical information to achieve better performance. In our study, all models were trained on a discovery dataset and tested on an independent replication dataset to evaluate their respective performance. We also validated PD-associated RNAs in an independent cohort with a secondary method. Enrichment analysis was conducted to find the distinguishable functional pathways or gene ontology terms between PD patients and healthy controls using the replicated differentially expressed protein-coding genes. Finally, we developed and tested innovative multi-omics classifiers which provided a reusable computational framework for PD diagnosis in clinical practices. Results High-quality, large-scale multi-omics cohort for Parkinson’s disease This study used data from the AMP PD program, which includes eight cohorts from the PPMI, PDBP, and BioFIND study, with PAXgene-based total RNA-seq data, whole genome sequences, and clinical data available for 3,274 participants (release v2). The PPMI is a longitudinal observational study of 1,923 participants with PD, or at risk for PD, and healthy volunteers, thereby contributing comprehensive clinical and imaging data and biological samples from 33 clinical sites around the world. In the PPMI study, patients in the PD cohort have been diagnosed within two years of enrollment. Importantly, they are de novo patients and as such, not yet taking PD medications, which may confound biomarker analyses. The PDBP is another PD program that collects standardized longitudinal clinical data and biospecimens across all stages of PD. The program has 1,604 participants and was developed to accelerate the discovery of promising new diagnostic and progression biomarkers for PD. The PDBP supports basic, translational, and clinical research through hypothesis testing, target and pathway discovery, biomarker development, and disease modeling. The Fox Investigation for New Discovery of Biomarkers (BioFIND) is an observational clinical study designed to discover and verify biomarkers of PD, which includes blood and cerebrospinal fluid, in 118 well-defined, moderately advanced people with PD and 88 control volunteers. (We obtained permission to access the AMP PD datasets on Google Cloud Storage for the analysis. All information on data collection and data processing procedures from the AMP PD program can be found at .) In our study, we leveraged primarily the baseline data (visit month=0) to systematically delineate genes associated with PD diagnosis at an early stage. The PPMI cohort was used as the discovery population and PDBP/BioFIND cohorts were defined as the replication population ( Fig. 1 ). Most of the PD cases in PPMI were recruited when they were newly diagnosed, while most PD cases in the replication dataset were moderate and advanced ( Extended Data Fig. 1 ). We wished to establish a set of marker genes exhibiting consistent changes from an early stage of PD and so we intentionally began with a study of individuals recently diagnosed with PD, and then subsequently validated our findings in advanced PD patients. We applied stringent, quality check standards to the samples and the genes (see Methods for details). After the stepwise sample filtrations, 551 PD and 437 control samples remained in the discovery dataset ( Extended Data Fig. 2 ), and 760 PD case samples and 452 control samples were used for analysis in the replication dataset ( Extended Data Fig. 3 ). After removing low abundance and low variance genes, 35,900 genes remained in the discovery dataset. The sex check shows high consistency between clinically reported sex and genetically determined sex ( Extended Data Fig. 4 a ). The case/control sample distributions on the plates showed that the percentages of healthy samples on some plates (P201, P207, P208, P209, P210, P211, P212, P213, P214, and P215) are higher than others in the discovery dataset, but there was no extreme plate outlier ( Extended Data Fig. 4 b ). Therefore, we included the plate as a covariant in our analysis design. Based on our data quality assessments, there were no extreme outliers, so we did not remove any additional samples. For the replication dataset, 31,935 genes were kept. The scatter plot demonstrated the sex check consistency between clinically reported sex and genetically determined sex ( Extended Data Fig. 4c ). The case/control sample distributions on the plates show more even distributions across all plates ( Extended Data Fig. 4d ). Finally, 988 baseline samples were included in the discovery dataset and 1,212 baseline samples were included in the replication dataset for all the downstream analyses. The basic clinical characteristics of the datasets after filtrations are summarized in Table 1 . Over 15,00 known and novel RNAs were found to be differentially expressed in PD blood samples Through the differential expression (DE) analysis of the GENCODE-annotated human genes in the discovery dataset, 874 known genes were significantly differentially expressed in PD in the discovery dataset with a Benjamin-Hochberg adjusted p-value of < 0.05 ( Fig. 1, Extended Data Table 1 ). In this study, we also identified a total of 26,035 candidate eRNAs using our in-house scripts 6 , all of which surpassed the established low expression filtration threshold (only eRNAs with read count > 5 in > 10% samples were kept). Out of these candidate eRNAs, 783 exhibited differential expression. Meanwhile, we identified 441,811 circRNAs in the blood samples. After applying a filtration process requiring at least two read counts in 10% of the samples, 3,052 circRNAs remained and were subjected to DE analysis, which has resulted in a discovery of 35 DE circRNAs ( Extended Data Table 2 ). In total, we found 1,692 known and novel RNAs differentially expressed in PD blood. Performance of PD diagnosis classifier models Utilizing the identified differentially expressed genes, eRNAs, and circRNAs, we constructed machine learning models for PD diagnosis classification. The PPMI samples (discovery population) were randomly split into a training set (80%) and a validation set (20%) 100 times, so that we had 100 pairs of training-validation datasets. The training set was used to build the classifiers which were validated on the corresponding validation set. All the trained models were tested on the independent replication cohorts. Different machine learning classifiers (n = 10) were trained and compared (see Methods for details). Since there were too many features, and to avoid overfitting problems during the model training, we used LASSO (least absolute shrinkage and selection operator) for feature selection. The PD diagnosis classifiers were first constructed using the 874 DEGs from PPMI with FDR < 0.05. After feature selection, there are 23 to 36 DEGs that were selected as the final predictors from each random split for model training and validation. Among all the 10 tested algorithms, we observed that the support vector machine with RBF kernel (SVM_rbf) had the best performances among others regarding the average area under the receiver operating characteristic curve (AUROC) values and the area under the precision-recall curve (AUPRC) values on the PDBP/BioFIND testing dataset ( Fig. 2a) . The mean and standard deviation of the AUROC and AUPRC values were 0.72 (0.03) and 0.72 (0.04), respectively, on the PPMI 20% withheld validation datasets, and 0.64 (0.01) and 0.74 (0.01) when the model was applied to the PDBP/BioFIND testing dataset ( Table 2, Extended Data Table 3 ). After adding the polygenic risk score (PRS) as the genetics feature to the selected DEGs in each random split, the logistic regression (LR) model demonstrated the best performance as the AUROC and AUPRC were improved to 0.75 (0.03) and 0.78 (0.03), respectively, on the validation datasets, and 0.70 (0.01) and 0.79 (0.01) on the independent testing dataset ( Fig. 2B, Table 2 ). With the clinical data (UPSIT-smell test score, sex, and age) added, the support vector machine (SVM) emerged as the optimal model. The AUROC and AUPRC values were raised to 0.91(0.02) and 0.92(0.02) on the validation dataset, and 0.89 (0.01) and 0.93 (0.01) on the testing dataset ( Fig. 2c, Table 2 ). Fig. 2D and Fig. 2E show the progressive improvement in model performance with stepwise addition of the genetics and clinical features on the validation and testing dataset. The comparisons of PD prediction potentials using DEGs, DE eRNAs, or DE circRNAs revealed that DE eRNA models exhibited comparable performances to the DE circRNA models, but their predictive powers fall below those of DEG models ( Extended Data Fig. 5, Extended Data Table 3 ). Moreover, combining all DE eRNAs, DE circRNAs, and DEGs did not contribute to an enhancement in model performance. Further exploration into the PD prediction abilities of PRS or clinical data demonstrated that the best PRS-based model displayed similar AUROC and AUPRC values to the DEG+PRS model on the testing dataset ( Table 2, Extended Data Table 3) . However, it exhibited lower precision, balanced accuracy values, and notably low specificity values, suggesting a tendency for PRS-based models to yield false positives. Additionally, PRS-based models had inferior performance on validation datasets compared to DEGs-based models. Therefore, DEGs proved effective in compensating for the shortcomings of PRS in predicting PD samples. Clinical data exhibited superior capabilities in distinguishing PD cases from healthy controls, although the performance values slightly lagged our final multi-omics model ( Table 2, Extended Data Table 3) . Delving into the selected features in each split, a total of 99 genes were chosen, with 12 of them recurrently selected as predictive features more than 80 times out of 100 selections ( Fig. 2f , Extended Data Table 3 ). The remarkable consistency in feature selection, coupled with low standard deviations in performance values, affirmed the high stability of our model. Looking into those selected genes, H19 is a long non-coding RNA, which was selected 100 times. H19 has been reported to be associated with PD progression and correlated with susceptibility to various CNS disorders 25,26 . We also found that 7 neutrophil genes were selected as the predictor during the 100 splits, which include PREX1 , SLCO4C1 , CXCR2 , DNAJC3 , CD93 , LAMP1 , and HEBP2 . The LAMP1 was recurrently selected 98 times. PREX1 and CXCR2 were also the two genes that were replicated in brain data. Above all, our final multi-omics model outperformed the recent publication with similar models with respect to accuracy (0.82 vs 0.75), balanced accuracy (0.83 vs 0.68), and AUROC (0.89 vs 0.85) on the testing dataset ( Table 2 ). We observed better performances than in the previous report, maybe because we included other genes in our DEGs as features in our models, in addition to the protein-coding genes used in the previous study. Additionally, we calculated the PRS using the 7,057 PD-associated significant variants instead of only the 90 SNPs that were used in the previously published paper. Also, the study by Makarious et al., they did not consider the balance issues between positive and negative samples for training the model, that may be also why their specificity value is low. Our performances are more robust since we used balanced training and testing dataset Replication of discovered DE RNAs To utilize the potential of the discovery dataset and confirm the discovered DE genes, DE eRNAs, and DE circRNAs, the DE genes and DE RNAs were also called from the replication dataset. Of the 874 DEGs from discovery dataset, 502 genes were replicated in the replication dataset with a nominal p-value < 0.05 ( Fig. 3a ), of which over 97.8% (491) of the genes have consistent direction changes in both discovery and replication datasets ( Fig. 3b , and the details are available in Extended Data Table 1 ). In the 783 initially discovered DE eRNAs, 599 of them were replicated with the same directional changes in the combined PDBP/BioFIND replication dataset. The dataset also revealed that among the replicated DE eRNAs, 396 were up-regulated, and 203 were down-regulated ( Extended Data Table 2 ). Regarding circRNAs, 21 of the initially discovered DE circRNAs were replicated in the PDBP/BioFIND dataset. Among these replicated DE circRNAs, 15 were up-regulated, and 6 were down-regulated ( Extended Data Table 2 ). The most significant DE eRNA in discovery dataset is chr5_10486550_10486710_plus (This location falls in the region of gene LINC02212), and the most significant DE circRNA in discovery dataset is chr1_17341942_17342402_plus (This location falls in the intron region of gene PADI4). We then further investigated the host genes of these replicated DE eRNAs and DE circRNAs. The 599 DE eRNAs and 21 DE circRNAs were mapped to 306 host genes (289 eRNA host genes, 18 circRNA host genes, 1 shared host gene (ENSG00000159339, PADI4 )). Although their host genes did not share the same enriched GO terms with DEGs, we noticed several PD-associated genes or genes that are involved in neutrophil activation in the host gene list. SPI1 is one of the member genes of GO:0042119 (neutrophil activation). It has been reported that SPIL1 plays a crucial role in the regulation of the genes relevant to specialized functions of microglia, therefore dysregulation of SPIL1 might contribute to the establishment or development of PD due to the accumulation of activated microglia 27–29 . PADI4 is a gene that can positively regulate TNF-α and CCL2 which can lead to the development of neuroinflammation 30,31 . PADI2 coordinates with PADI4 to regulate the assembly of the NLRP3 inflammasome to promote IL-1β release. Research also showed that PADI4 can participate in all aspects of neutrophil extracellular traps (NETs) 32 . Moreover, X-linked dystonia Parkinson’s disease is aggravated by increased levels of PADI2 , PADI4 , and inflammation in the prefrontal cortex and its derived fibroblasts 33 . The circRNA host gene RHBDD1 , also named RHBDL4 , has been implicated in a variety of diseases including Alzheimer’s and Parkinson's disease, which can cleave amyloid precursor protein inside the cell, causing it to bypass amyloidogenic processing, leading to reduced Aβ levels 34 . This gene had a significant negative log2 fold change in PD patients compared to the health controls in both discovery and replication cohorts. Among the 306 host genes, 53 genes were shared with replicated DEGs. The DEGs IKBIP, LAMP2, and VDR which are associated with PD and as mentioned above, were also among the host genes. Neutrophil activation and immune pathways were upregulated in PD patient blood The over-representation enrichment analysis was conducted on GO and WikiPathway terms using the 491 replicated genes with the same change directions. There are five significantly enriched GO biological processes (GO-BP) with FDR < 0.05, as well as five significantly enriched GO cellular component (GO-CC) terms ( Fig. 4 , Extended Data Table 4 ). Both the enriched GO biological processes and cellular components revealed that neutrophil activation and neutrophil degranulation are the key messages derived from the DEGs. Additionally, in the enriched GO-BP terms, we found that the genes were also involved in immune-related pathways, which was also confirmed by the enriched WikiPathways results ( Fig. 4b , Extended Data Table 4 ). By looking into the changes of the leading genes enriched in the neutrophil activation and neutrophil degranulation biological processes, we found that among the 29 DEGs involved in these two GO-BPs, all but one gene were upregulated in PD case samples in both the discovery and the replication datasets ( Fig. 4c , Extended Data Table 4 ). The results indicate that neutrophil activation and neutrophil degranulation was highly regulated in PD patients. Furthermore, the highly expressed neutrophil genes in up-regulation of the neutrophil activation and neutrophil degranulation pathway can serve as biomarkers for PD early diagnosis. We then asked on which human tissue and cell types these marker genes might manifest their impacts. By assessing their cell-type-specificity in 1335 curated single-cell and tissue types with WebCSEA, we found that the 491 DEGs were highly enriched in the blood neutrophil cells of the lymphatic organ system ( Fig. 4d ). This further suggests that the dysregulation of neutrophil cells could be a marker of early PD diagnosis. Among these 29 differentially expressed leading genes in neutrophil activation and neutrophil degranulation biological processes, several have been studied in the context of PD. A pathogenic mutation (p.N855S) in DNAJC13 was linked to autosomal dominant Lewy body PD 35–37 . APAF1 (apoptotic peptidase activating factor) was reported as a potential drug target for neurodegenerative diseases and APAF1 dominant negative inhibitor can prevent MPTP toxicity as antiapoptotic gene therapy for Parkinson's disease 38 . FCGR2A and FCGR2B are well known to play a role in modulating inflammatory responses and to be involved in phagocytosis. Two recent causality analysis of cerebrospinal fluid and blood proteomics showed that FCGR2A and FCGR2B are among the top causal proteins to PD risk 39,40 . While there is not much evidence for FCGR2A and FCGR2B ’s role in PD blood, Choi et al. showed that FCGR2B can function as a receptor for α-syn fibrils and regulate prion-like propagation of α-synuclein in neurons, and the FCGR2B-SHP-1/-2 signaling pathway may be a therapeutic target for the progression of PD 41 . Lastly, CD93 participates in pathophysiological processes of central nervous system inflammation 42 . We further validated some of the marker genes with a second digital expression NanoString technology in blood in an independent cohort of the Harvard Biomarker Study (HBS) ( Fig. 5, see Methods). SNCA is considered as the major causative gene involved in the onset of PD, both from a genetics and protein level 43 . We observed a reduction in SNCA RNA expression in PD samples compared to healthy controls across multiple cohorts, including PPMI, PDBP/BioFIND, BRAINcode, and HBS, using samples from both blood and brain on various platforms, such as RNAseq and NanoString. Our findings consistently showed lower SNCA mRNA levels in the blood of early-stage PD patients, which correlated with brain samples and were consistent with findings in an independent cohort 44,45 . Additionally, we replicated pathological levels of VDR and RANBP10 46 . Moreover, vitamin D is associated with neuroprotection in animal models of PD 47 and we previously reported reduced levels of the vitamin D receptor (VDR) in an unbiased microarray screen of PD blood samples 46 and found a 25-hydroxy-vitamin D deficiency in 17.6% of PD patients 47 . For these previously identified candidate biomarker RNAs of PD, we observed consistent changes in direction between PD and healthy control samples in both the NanoString (HBS) and RNA-seq data (PPMI and PDBP/BioFIND, Fig. 5 ). Additional to the dichotomic analysis between the PD and control groups, we further tested if any changes in gene expression are associated with the PD motor severity which is indicated by the MDS-UPDRS part III summary score. Our results are shown in the Extended Data Table 5 . We found that 2,236 genes and 4,045 genes were significantly associated (adjust p < 0.05) with the MDS-UPDRS part III summary scores in the discovery and replication datasets, respectively. Among these genes, 1,636 genes were shared by both the discovery and the replication datasets with the same change directions. Functional enrichment analysis conducted on the 1,636 replicated genes showed that the “neutrophil activation”, “neutrophil activation involved in immune response”, “neutrophil-mediated immunity”, and “neutrophil degranulation” are the top enriched GO-BP terms. This is consistent with our conclusion from the main dichotomic analysis between the PD cases and healthy controls. These findings suggest that neutrophil degranulation is also a potential biomarker in the blood for PD motor severity. Replicating blood-based marker genes in brain neurons Next, we wondered if any of the marker genes we detected in blood are also presented in brain neurons, as the neuronal RNAs could pass through the blood-brain barrier via mediators (e.g., exosomes) and be detectable in the blood stream. By analyzing the total RNAseq data of dopamine neurons that was laser-captured from >100 human brain samples in the BRAINcode cohort 6 , we identified 575 known genes that were significantly differentially expressed in PD (FDR < 0.05). Compared with the 491 blood marker genes consistently changed in both discovery and replication datasets, 44 genes were further confirmed with the same change direction in dopamine neuron samples ( Fig. 3, Extended Data Table 1 ). Among these 44 brain-blood shared DE genes, the LAMP2 gene has been reported to be differentially expressed between the early stages of PD and controls, and was also reported to be associated with the expression level of SNCA 48 . LAMP2 isoform LAMP2B is also a marker protein expressed on the surface of exosomes, which helps to transport cargos thru the blood-brain barrier. Several neuroinflammation-associated genes were replicated in our brain datasets, such as IKBIP 49 , CXCR2 50,51 , and NFKBIB 52 . Additionally, IL18R1 , a cytokine receptor that belongs to the interleukin 1 receptor family, was significantly increased in both PD blood and brain neurons. While the function of this cytokine receptor in PD is not experimentally verified, an increase in interleukin-1beta ( IL-1β ) was previously reported as a potential mediator of microglia activation in the PD rat model 53 . These genes were consistently upregulated. Note that only five out of the 44 brain-blood shared genes are in the neutrophil activation and neutrophil degranulation biological processes pathway. They are PREX1 , FCGR2A , CAB39 , CXCR2 , and LAMP2 . Discussion PD is a progressive, multisystem neurodegenerative disease that has been a huge burden on our society and the people it affects. Early diagnosis and biomarker discoveries that bolster the therapeutic pipeline for PD are urgently needed 54,55 . The Accelerating Medicines Partnership in Parkinson’s Disease (AMP PD) program has provided unprecedented opportunities for investigators, including this opportunity, to utilize the data to build an early diagnosis platform for PD patient diagnosis which could lead to improved treatment response and higher efficacy. Currently, PD diagnosis is mainly based on clinical phenotype detections which can provide high sensitivity for detecting parkinsonism 15,56 . However, clinical observation alone is often insufficient to predict PD status before the onset of the disease. Once symptoms emerge and are detectable, it usually indicates the development of PD(10, 12). It has been reported that in idiopathic PD, there is severe degeneration of the nigrostriatal neurons of the substantia nigra before neurologists can establish the diagnosis according to the widely accepted clinical diagnostic criteria 57 . It is conceivable that neuroprotective therapy starting at such a stage of the disease will fail to stop the degenerative process. Therefore, the identification of patients at risk and earlier stages of the disease appears to be essential for any successful neuroprotection. The observational PD phenotypes are reflections of the changes in transcriptomic profiles which are changing in advance of clinical phenotypes. Analyzing the transcriptomic changes between PD patients and healthy control samples can provide signals for preclinical diagnosis. Utilizing the large cohort datasets from AMP PD in this cross-sectional study, differentially expressed genes were initially discovered and then validated using these large sample-size cohorts. Functional enrichment analysis was conducted, and we found the neutrophil activation and degranulation were significantly enriched, which we recommend as a diagnostic marker 58 . As previously published, neutrophil infiltration plays an important role in the development of PD 59 . Studies have indicated that circulating neutrophils are increased in number in PD, while other circulating immune cells have either decreased or not changed in prevalence 60,61 . A study by Craig et al.that utilized the PPMI dataset found an increased number of neutrophils in PD patients compared to controls 24 . While neutrophils have yet to be identified in the brains of PD patients, neutrophils have been identified in the brains of AD patients and mouse models of neuroinflammation 62,63 . Moreover, circulating neutrophils express CD11b , an integrin that responds to aggregated α-synuclein in microglia 64 . Another study revealed that neutrophil degranulation was the most significantly altered molecular pathway in patients, with most genes in the neutrophil degranulation pathway containing nonsense or missense mutations 65 . In our work, we confirmed that the neutrophil activation and degranulation pathway were actively upregulated. By checking the literature and pathway annotation databases 66,67 , we know that neutrophils contain five different types of granules: primary granules, also known as azurophilic granules; secondary granules, also known as specific granules; tertiary granules; secretory vesicles; and ficolin-rich granules. The primary granules are the main storage sites of the most toxic mediators, including elastase, myeloperoxidase, cathepsins, and defensins. The secondary and tertiary granules contain lactoferrin and matrix metalloprotease 9 (also known as gelatinase B), respectively, among other substances. The secretory vesicles in human neutrophils contain human serum albumin, suggesting that they contain extracellular fluid that was derived from the endocytosis of the plasma membrane. Ficolin-rich granules are highly exocytosable, gelatinase-poor granules found in neutrophils and are rich in ficolin-1. Ficolin-1 is released from neutrophil granules by stimulation with fMLP or PMA. Granules are prevented from being released until receptors in the plasma membrane or phagosomal membrane signal to the cytoplasm to activate their movement to the cell membrane for secretion of their contents by degranulation. This is an important control mechanism as the neutrophil is highly enriched in tissue-destructive proteases. There is increasing evidence showing the links between blood cells and PD development. Variants at, or near, the gene LRRK2 locus have been known to be associated with PD. Reports have shown that full-length LRRK2 is a relatively common constituent of human peripheral blood mononuclear cells (PBMC), including affinity-isolated, CD14+ monocytes, CD19+ B-cells, and CD4+ as well as CD8+ T-cells 68 . There was also evidence showing both SNCA mRNA and protein are particularly abundant in erythroid cells 4 . Lymphocyte is another category of cells that play important roles in PD. There are enhanced numbers of both CD4+ and CD8+ T cells in the brain parenchyma which had been observed in neuropathological studies of PD 69–71 . A longitudinal case study of a PD patient found that alpha-synuclein-reactive T cells were most abundant in peripheral blood before the appearance of motor symptoms 72 . Above all, more studies are emerging to show the potential of diagnosis biomarkers in the expression profiles of circulating genes. The TNF-α signaling pathway is another enriched pathway from our WikiPathway enrichment analysis. TNF-α has been proven to be increased both in the brain and in the cerebrospinal fluid of Parkinsonian patients, and TNF-α is involved in the degenerative processes that occur in Parkinson's disease. TNF-α is the key player in the TNF-α signaling pathway. In our analysis, the leading-edge genes in this pathway include CFLAR, MAPK3, APAF1, PRKCZ, PYGL, MAP2K4, BTRC, NFKBIB, and RAF1. Currently, there are few studies that have focused on these genes, and so we may study these genes in our future research. Previous studies have reported an association between the SNCA transcript abundance in blood with early stage and imaging-supported, de novo PD. There is a paradoxical reduction in SNCA transcript counts in the blood of individuals with early-stage, neuroimaging-supported Parkinson’s disease 4,44,45 . In our analysis, although the SNCA transcript abundance did not show significant changes for the patient samples as compared to healthly samples, we confirmed reduced abundance trends in both our discovery and replication cohorts, as well as in our BRAINcode cohort. Literature reports have shown inconclusive SNCA protein changes in plasma which is likely due to hemolysis of erythrocytes in which SNCA is one of the most plentiful proteins 4 . There have been some studies that established machine learning classifiers with different focuses and using different datasets. Scherzer et al. built the first ML classifier in PD using 22 genes. Liu et al . used clinical and genetic information for the prediction of cognitive decline in patients with Parkinson’s disease and the progression of PD 73,74 , and Severson et al.identified subtypes of PD based on clinical data 75 . Here, to maximize the value of the massive amount of data, we tested several machine-learning methods for PD diagnosis classification using clinical data, transcriptomics data, and genetics data. Our final multi-omics model has high AUC values and high sensitivity and specificity as compared to other reports 13,58 , which means our model cannot only identify the PD patients but also recognize the low-risk individuals. In future studies, we will examine more advanced machine algorithms, such as the DNN, CNN, and VAE, to improve the performances and explore more meaningful insights behind the data. There may be limitations to the current analysis. The analysis was focused on the diagnostic classification of PD at the baseline in a cross-sectional design. Future analyses will be important to prospectively and longitudinally test diagnostic classifiers. Moreover, progression biomarkers are needed, and this will require analyses of longitudinal RNA data sets. To begin to translate these candidate classifiers to the clinic, more research is needed to clarify the high and low predictive values in different clinically relevant scenarios, for example, as an aid for augmented medicine in the patient populations of movement disorders clinics, or as a screening tool for high-risk individuals in the general population. These scenarios involve highly distinct incidences of PD patients and we require a clearer understanding of high predictive value and low predictive value in the outputs of the models using the selected biomarker genes. In this study, we identified a set of DE RNAs and defined neutrophil activation and degranulation as potential early diagnostic biomarkers. We built a high-performance PD classification model which could be helpful for PD diagnosis prediction. We provided a computational framework that will be helpful for PD biomarker discovery and provide disease risk prediction, which is a critical step for the better assessment of PD risk and accelerating the diagnosis of Parkinson’s disease. Methods Study design First, we discovered genes and RNAs that are differentially expressed in PD in an analysis of the discovery cohort. Also, the novel eRNAs and circRNAs were quantified in both discovery and replication datasets and the significantly differentially expressed eRNAs (DE eRNAs) and circRNAs (DE circRNAs) were presented in this work. Utilizing the DEGs and DE RNAs, genetics, and clinical data, we built the PD diagnosis classifier models for prediction of PD patients. We further replicated those significant DE genes and novel RNAs in a cross-sectional analysis of the replication cohort. In our previous study, we probed the transcriptome of dopamine neurons in post-mortem brains with various levels of neuropathology. We then evaluated the blood-based PD-associated genes (discovered and replicated DEGs) for association with PD neuropathology in dopamine neurons using our laser-captured RNA-seq dataset (BRAINcode, ) 6 . Meanwhile, the functional enrichment analysis was conducted on those replicated DEGs. As well, the cell type enrichment analysis was carried out to find the enriched cell types of the replicated DEGs ( Fig. 1 ). Sample and gene expression quality control Filters were applied to remove those participants as shown in Extended Data Fig. 2 and Extended Data Fig. 3 . The same filtration strategies were applied to both the discovery and the replication datasets. At the very beginning, participants without RNA-seq were removed. In the next step, only the patients that have the baseline RNA-seq data with RIN greater than 5.0 were kept in our following analysis. To limit batch effects due to ancestry, we restricted our analysis to patients self-identifying as White. Meanwhile, we restricted our analysis to patients listed as either cases or controls. Lastly, we excluded those participants with diagnosis conflicts during the follow-up visits after the initial enrollment in case and control groups separately. Those PD cases whose diagnosis changed during follow-up were removed. Similarly, control participants who developed PD were excluded. Prodromal participants and SWEDD (Scans without evidence of dopaminergic deficit) patients were also removed. Participants with missing clinical or genetic data were also moved as those data would be used in the following analysis. Quality control of expression data was performed to filter out lowly expressed genes and remove sample outliers. For the genes, we first removed genes that have low expression levels defined as counts of fewer than 5 reads in more than 90% of samples and variances of less than 1 across the samples. To check if there is any sex information that is mislabeled, a scatter plot of the expression levels of a Y chromosome-specific gene and an X chromosome-specific gene was plotted. We also verified the biases of sequencing data arising from case/control sample distributions on the plates were minimal. Identification of PD-associated mRNAs The differential expression analysis was conducted using DEseq2 (v1.36.0) 76 . The gene read counts data from Salmon 77 quantification result files were used. The primary differential expression was tested between the PD conditions (PD cases vs. healthy controls), and the age_at_baseline (continuous variable), sex, plate, RIN, and the top 10 principal components (PCs) of the genotype data were included as covariates in DEseq2. The replicated DEGs were further analyzed using ClusterProfiler 78 to find the enriched functions. We also performed cell-type-specific enrichment analysis using the WebCSEA online tool 79 to find which human tissue-cell types these genes might manifest their impacts on. As a secondary analysis, we further looked at gene expression changes associated with motor severity, indicated by the MDS-UPDRS part III summary score. Tests were performed in the same DESeq2 framework where the MDS-UPDRS score was treated as a continuous dependent variable. Identification of PD-associated enhancer RNAs and circular RNAs Since the AMP PD provided the raw whole sequencing data, we would like to know the non-coding novel RNAs, especially the eRNA and circRNA differences in PD patients and healthy individuals. We called eRNAs and circRNAs in all datasets. We used our previously developed method 6 to identify eRNA candidates in the blood. The circRNAs were called using the CIRCexplorer2 package 80 . Then differential expression analysis was conducted on the eRNA and circRNA reads count using DESeq2. Since the circRNAs have relatively lower reads count in the samples, we used all samples, instead of the baseline samples only, to increase the sample size in order to empower the DE circRNA discovery. The same covariates as in finding the DEGs were used. Construction of PD diagnosis classifier models We have built the classifiers utilizing the multi-modality data which includes transcriptomics, polygenic risk score (PRS), and clinical data. The PPMI samples (discovery cohort) were randomly split into a training set (80%) and a validation set (20%). We did the random splits 100 times to test the model's stability. The training set was used to build the classifiers. The validation set was used to optimize the hyper-parameters of each model through a grid search. Final models were tested on the independent PDBP/BioFIND samples (replication cohort). Three models were built in sequential order using the following feature sets respectively: transcriptomics only (“DEGs”), transcriptomics plus polygenic risk score (“DEGs+PRS”), and transcriptomics, polygenic risk score, and clinical data combined (“DEGs+PRS+Clinical”). The transcriptomics data is the 874 DEGs from the PPMI cohort. The PRS was calculated using PRSice-2 81 based on the 7,057 PD-associated significant variants from the recently published PD GWAS work 14 . Clinical data includes the total UPSIT score, sex, and age at the baseline. Since we have too many features, and the feature size is larger than the sample size, we have tried feature selections and modeling the classifiers to avoid overfitting. To train the models, the variance stabilizing transformed (VST) expression abundances were standardized after log transformations. Feature selection was conducted on the training set using the LASSO approach by making use of sklearn.linear_model.Lasso function and the parameter alpha were screened to pick the best one to have the best area under the receiver operating characteristic curve (AUROC) value. Only features with non-zero coefficients were included in the model. To take advantage of different machine learning algorithms, 10 different machine learning classifiers were trained and compared, including support vector machine with linear kernel (SVM), support vector machine with rbf kernel (SVM_rbf), linear regression (LN), logistic regression (LR), stochastic gradient descent (SGD), AdaBoost classifier (ABC), gradient boosting classifier (GBC), random forest (RF), k-nearest neighbors (KNN), and multiple layers perceptron classifier (MLP). To investigate if the eRNAs, or circRNAs are predictive for PD diagnosis, we also tested the classifiers using the DE eRNAs, and DE circRNAs separately. Confirmation in brain We tested blood biomarker transcripts using the BRAINcode dataset. The PD-associated RNAs that are also differentially expressed in the brain will be highly relevant and prioritized for validation. We conducted the DE analysis using the data from brain neuron samples and compared the blood DEGs and brain DEGs. In our BRAINcode v2 project, we performed laser-capture microdissection total RNA-sequencing (lcRNAseq) 3 on dopamine neurons from the midbrain substantia nigra pars compacta of 104 high-quality human postmortem brains (HC: n = 59; ILB: n = 27; PD: n=18). Many polyadenylated and non-polyadenylated transcripts are identified with high confidence. The DEGs in dopamine neurons were identified between PD samples and health control samples. The DEGs with the same fold change directions as in brain data were obtained. Evaluation of a second digital gene expression platform in the Harvard Biomarkers Study We also compared the expression levels of several PD-associated genes from the blood with our in-house NanoString data to validate our findings. The NanoString dataset with PD cases and healthy controls is nested in the Harvard Biomarker Study (HBS). The participant’s blood sample with high RNA quality (RIN ≥ 7) was processed for digital expression analysis on the NanoString platform 82 with 33 distinct molecular barcodes (29 PD-associated genes) to count the abundance of selected-transcripts directly in RNA from blood cells. A total of 617 PD cases and 618 healthy controls passed normalization processing to validate our findings. Declarations Data Availability The PPMI, PDBP, and BioFIND data can be accessed from the AMP-PD Google Cloud Storage with the approved “Data Use Approvement”. All the up-to-date information and data collection or data processing procedures on the AMP-PD program can be found at https://www.amp-pd.org. The brain neuron data and the NanoString were produced in our own lab. All the analysis code can be accessed at: . Acknowledgment This study was funded in part by NIH grant 1U01NS120637, R01AG057331, U01 NS082157, the U.S. Department of Defense (to C.R.S.), the American Parkinson Disease Association (APDA) Research Award (to X.D.). C.R.S.’s work is supported by NIH grants NINDS/NIA R01NS115144, U01NS095736, U01NS100603, and the American Parkinson Disease Association Center for Advanced Parkinson Research. X.D. received funding from the American Parkinson Disease Association (APDA). C.R.S and X.D.’s work was in part funded by Aligning Science Across Parkinson’s [ASAP-000301] through the Michael J. Fox Foundation for Parkinson’s Research (MJFF). For the purpose of open access, the author has applied a CC BY public copyright license to all Author Accepted Manuscripts arising from this submission. Data used in the preparation of this article were obtained from the Accelerating Medicine Partnership® (AMP®) Parkinson’s Disease (AMP PD) Knowledge Platform. For up-to-date information on the study, visit https://www.amp-pd.org.The AMP® PD program is a public-private partnership managed by the Foundation for the National Institutes of Health and funded by the National Institute of Neurological Disorders and Stroke (NINDS) in partnership with the Aligning Science Across Parkinson's (ASAP) initiative; Celgene Corporation, a subsidiary of Bristol-Myers Squibb Company; GlaxoSmithKline plc (GSK); The Michael J. Fox Foundation for Parkinson's Research ; Pfizer Inc.; AbbVie Inc.; Sanofi US Services Inc.; and Verily Life Sciences. ACCELERATING MEDICINES PARTNERSHIP and AMP are registered service marks of the U.S. Department of Health and Human Services. Clinical data and biosamples used in preparation of this article were obtained from the (i) Michael J. Fox Foundation for Parkinson’s Research (MJFF) and National Institutes of Neurological Disorders and Stroke (NINDS) BioFIND study,(ii) NINDS Parkinson's Disease Biomarkers Program (PDBP), (iii) MJFF Parkinson’s Progression Markers Initiative (PPMI). PPMI is sponsored by The Michael J. Fox Foundation for Parkinson’s Research and supported by a consortium of scientific partners: [list the full names of all of the PPMI funding partners found at https://www.ppmi-info.org/about-ppmi/who-we-are/study-sponsors]. The PPMI investigators have not participated in reviewing the data analysis or content of the manuscript. For up-to-date information on the study, visit . The Parkinson’s Disease Biomarker Program (PDBP) consortium is supported by the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health. A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. The PDBP investigators have not participated in reviewing the data analysis or content of the manuscript. References Jankovic, J. & Tan, E. K. Parkinson’s disease: Etiopathogenesis and treatment. J Neurol Neurosurg Psychiatry 91 , (2020). Dorsey, E. R., Sherer, T., Okun, M. S. & Bloemd, B. R. The emerging evidence of the Parkinson pandemic. Journal of Parkinson’s Disease vol. 8 Preprint at https://doi.org/10.3233/JPD-181474 (2018). Dong, X. et al. Circular RNAs in the human brain are tailored to neuron identity and neuropsychiatric disease. Nat Commun 14 , 5327 (2023). Scherzer, C. R. et al. GATA transcription factors directly regulate the Parkinson’s disease-linked gene α-synuclein. Proceedings of the National Academy of Sciences 105 , 10907–10912 (2008). Zheng, B. et al. PGC-1 α, A Potential Therapeutic Target for Early Intervention in Parkinson’s Disease. Sci Transl Med 2 , (2010). Dong, X. et al. Enhancers active in dopamine neurons are a primary link between genetic variation and neuropsychiatric disease. Nat Neurosci 21 , 1482–1492 (2018). Stefanis, L. α-Synuclein in Parkinson’s disease. Cold Spring Harb Perspect Med 2 , a009399 (2012). Carballo-Carbajal, I. et al. Brain tyrosinase overexpression implicates age-dependent neuromelanin production in Parkinson’s disease pathogenesis. Nat Commun 10 , 973 (2019). Le, W., Dong, J., Li, S. & Korczyn, A. D. Can Biomarkers Help the Early Diagnosis of Parkinson’s Disease? Neurosci Bull 33 , 535–542 (2017). Brooks, D. J. The early diagnosis of parkinson’s disease. Ann Neurol 44 , S10–S18 (1998). Bhat, S., Acharya, U. R., Hagiwara, Y., Dadmehr, N. & Adeli, H. Parkinson’s disease: Cause factors, measurable indicators, and early diagnosis. Comput Biol Med 102 , 234–241 (2018). Karapinar Senturk, Z. Early diagnosis of Parkinson’s disease using machine learning algorithms. Med Hypotheses 138 , 109603 (2020). Makarious, M. B. et al. Multi-modality machine learning predicting Parkinson’s disease. NPJ Parkinsons Dis 8 , 35 (2022). Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol 18 , 1091–1102 (2019). Hu, C., Ke, C. J. & Wu, C. Identification of biomarkers for early diagnosis of Parkinson’s disease by multi-omics joint analysis. Saudi J Biol Sci 27 , 2082–2088 (2020). Byron, S. A., Van Keuren-Jensen, K. R., Engelthaler, D. M., Carpten, J. D. & Craig, D. W. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet 17 , 257–271 (2016). Burgos, K. et al. Profiles of Extracellular miRNA in Cerebrospinal Fluid and Serum from Patients with Alzheimer’s and Parkinson’s Diseases Correlate with Disease Status and Features of Pathology. PLoS One 9 , e94839 (2014). Santiago, J. A. & Potashkin, J. A. Blood Transcriptomic Meta-analysis Identifies Dysregulation of Hemoglobin and Iron Metabolism in Parkinson’ Disease. Front Aging Neurosci 9 , 73 (2017). Chahine, L. M. & Stern, M. B. Parkinson’s Disease Biomarkers: Where Are We and Where Do We Go Next? Mov Disord Clin Pract 4 , 796–805 (2017). Marek, K. et al. The Parkinson Progression Marker Initiative (PPMI). Prog Neurobiol 95 , 629–635 (2011). Marek, K. et al. The Parkinson’s progression markers initiative (PPMI) – establishing a PD biomarker cohort. Ann Clin Transl Neurol 5 , 1460–1477 (2018). Rosenthal, L. S. et al. The NINDS Parkinson’s disease biomarkers program. Movement Disorders 31 , 915–923 (2016). Kang, U. J. et al. The BioFIND study: Characteristics of a clinically typical Parkinson’s disease biomarker cohort. Movement Disorders 31 , 924–932 (2016). Craig, D. W. et al. RNA sequencing of whole blood reveals early alterations in immune cells and gene expression in Parkinson’s disease. Nat Aging 1 , 734–747 (2021). Zhong, L., Liu, P., Fan, J. & Luo, Y. Long non-coding RNA H19: Physiological functions and involvements in central nervous system disorders. Neurochem Int 148 , 105072 (2021). Zhang, Y., Xia, Q. & Lin, J. LncRNA H19 Attenuates Apoptosis in MPTP-Induced Parkinson’s Disease Through Regulating miR-585-3p/PIK3R3. Neurochem Res 45 , 1700–1710 (2020). Satoh, J., Asahina, N., Kitano, S. & Kino, Y. A Comprehensive Profile of ChIP-Seq-Based PU.1/Spi1 Target Genes in Microglia. Gene Regul Syst Bio 8 , GRSB.S19711 (2014). Hossain, Md. B., Islam, Md. K., Adhikary, A., Rahaman, A. & Islam, Md. Z. Bioinformatics Approach to Identify Significant Biomarkers, Drug Targets Shared Between Parkinson’s Disease and Bipolar Disorder: A Pilot Study. Bioinform Biol Insights 16 , 117793222210792 (2022). Shen, R. et al. Association of Two Polymorphisms in CCL2 With Parkinson’s Disease: A Case-Control Study. Front Neurol 10 , (2019). Cheng, Y. et al. The regulation of macrophage polarization by hypoxia-PADI4 coordination in Rheumatoid arthritis. Int Immunopharmacol 99 , 107988 (2021). Zhu, C., Liu, C. & Chai, Z. Role of the PADI family in inflammatory autoimmune diseases and cancers: A systematic review. Front Immunol 14 , (2023). Thiam, H. R. et al. NETosis proceeds by cytoskeleton and endomembrane disassembly and PAD4-mediated chromatin decondensation and nuclear envelope rupture. Proceedings of the National Academy of Sciences 117 , 7326–7337 (2020). Petrozziello, T. et al. Neuroinflammation and histone H3 citrullination are increased in X-linked Dystonia Parkinsonism post-mortem prefrontal cortex. Neurobiol Dis 144 , 105032 (2020). Paschkowsky, S., Hamzé, M., Oestereich, F. & Munter, L. M. Alternative Processing of the Amyloid Precursor Protein Family by Rhomboid Protease RHBDL4. Journal of Biological Chemistry 291 , 21903–21912 (2016). Gagliardi, M. et al. DNAJC13 mutation screening in patients with Parkinson’s disease from South Italy. Parkinsonism Relat Disord 55 , 134–137 (2018). Lorenzo‐Betancor, O. et al. DNAJC13 p.Asn855Ser mutation screening in Parkinson’s disease and pathologically confirmed Lewy body disease patients. Eur J Neurol 22 , 1323–1325 (2015). Vilariño-Güell, C. et al. DNAJC13 mutations in Parkinson disease. Hum Mol Genet 23 , 1794–1801 (2014). Mochizuki, H. et al. An AAV-derived Apaf-1 dominant negative inhibitor prevents MPTP toxicity as antiapoptotic gene therapy for Parkinson’s disease. Proceedings of the National Academy of Sciences 98 , 10918–10923 (2001). Kaiser, S. et al. A proteogenomic view of Parkinson’s disease causality and heterogeneity. NPJ Parkinsons Dis 9 , 24 (2023). Gu, X.-J. et al. Expanding causal genes for Parkinson’s disease via multi-omics analysis. NPJ Parkinsons Dis 9 , 146 (2023). Choi, Y. R. et al. Prion-like Propagation of α-Synuclein Is Regulated by the FcγRIIB-SHP-1/2 Signaling Pathway in Neurons. Cell Rep 22 , 136–148 (2018). Liu, C., Cui, Z., Wang, S. & Zhang, D. CD93 and GIPC expression and localization during central nervous system inflammation. Neural Regen Res 9 , 1995 (2014). Pihlstrøm, L. et al. A comprehensive analysis of SNCA-related genetic risk in sporadic parkinson disease. Ann Neurol 84 , 117–129 (2018). Gwinn, K. et al. Parkinson’s disease biomarkers: perspective from the NINDS Parkinson’s Disease Biomarkers Program. Biomark Med 11 , 451–473 (2017). Locascio, J. J. et al. Association between α-synuclein blood transcripts and early, neuroimaging-supported Parkinson’s disease. Brain 138 , 2659–71 (2015). Scherzer, C. R. et al. Molecular markers of early Parkinson’s disease based on gene expression in blood. Proceedings of the National Academy of Sciences 104 , 955–960 (2007). Ding, H. et al. Unrecognized vitamin D3 deficiency is common in Parkinson disease: Harvard Biomarker Study. Neurology 81 , 1531–7 (2013). Murphy, K. E. et al. Lysosomal-associated membrane protein 2 isoforms are differentially affected in early Parkinson’s disease. Movement Disorders 30 , 1639–1647 (2015). Wu, H. et al. IKIP Negatively Regulates NF-κB Activation and Inflammation through Inhibition of IKKα/β Phosphorylation. J Immunol 204 , 418–427 (2020). Wu, F. et al. CXCR2 is essential for cerebral endothelial activation and leukocyte recruitment during neuroinflammation. J Neuroinflammation 12 , 98 (2015). Veenstra, M. & Ransohoff, R. M. Chemokine receptor CXCR2: physiology regulator and neuroinflammation controller? J Neuroimmunol 246 , 1–9 (2012). Shih, R.-H., Wang, C.-Y. & Yang, C.-M. NF-kappaB Signaling Pathways in Neurological Inflammation: A Mini Review. Front Mol Neurosci 8 , 77 (2015). Koprich, J. B., Reske-Nielsen, C., Mithal, P. & Isacson, O. Neuroinflammation mediated by IL-1β increases susceptibility of dopamine neurons to degeneration in an animal model of Parkinson’s disease. J Neuroinflammation 5 , 8 (2008). Ugrumov, M. Development of early diagnosis of Parkinson’s disease: Illusion or reality? CNS Neurosci Ther 26 , 997–1009 (2020). Chen, X. et al. The early diagnosis of Parkinson’s disease through combined biomarkers. Acta Neurol Scand 140 , 268–273 (2019). Katunina, E. A., Ilina, E. P., Sadekhova, G. I. & Gaisenuk, E. I. Approaches to the Early Diagnosis of Parkinson’s Disease. Neurosci Behav Physiol 50 , 393–400 (2020). Becker, G. et al. Early diagnosis of Parkinson’s disease. J Neurol 249 , 1–1 (2002). Pantaleo, E. et al. A Machine Learning Approach to Parkinson’s Disease Blood Transcriptomics. Genes (Basel) 13 , (2022). Wang, H. et al. Identification and Experimental Validation of Parkinson’s Disease with Major Depressive Disorder Common Genes. Mol Neurobiol 60 , 6092–6108 (2023). Jensen, M. P. et al. Lower Lymphocyte Count is Associated With Increased Risk of Parkinson’s Disease. Ann Neurol 89 , 803–812 (2021). Yacoubian, T. A. et al. Brain and Systemic Inflammation in De Novo Parkinson’s Disease. Movement Disorders 38 , 743–754 (2023). Cunningham, C., Wilcockson, D. C., Campion, S., Lunnon, K. & Perry, V. H. Central and Systemic Endotoxin Challenges Exacerbate the Local Inflammatory Response and Increase Neuronal Death during Chronic Neurodegeneration. The Journal of Neuroscience 25 , 9275–9284 (2005). Kasen, A. et al. Upregulation of α-synuclein following immune activation: Possible trigger of Parkinson’s disease. Neurobiol Dis 166 , 105654 (2022). Wang, S. et al. α-Synuclein, a chemoattractant, directs microglial migration via H 2 O 2 -dependent Lyn phosphorylation. Proceedings of the National Academy of Sciences 112 , (2015). Bandres-Ciga, S. et al. Large-scale pathway specific polygenic risk and transcriptomic community network analysis identifies novel functional pathways in Parkinson disease. Acta Neuropathol 140 , 341–358 (2020). Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res 50 , D687–D692 (2022). Lacy, P. Mechanisms of Degranulation in Neutrophils. Allergy, Asthma & Clinical Immunology 2 , 98 (2006). Hakimi, M. et al. Parkinson’s disease-linked LRRK2 is expressed in circulating and tissue immune cells and upregulated following recognition of microbial structures. J Neural Transm 118 , 795–808 (2011). Hobson, B. D. & Sulzer, D. Neuronal Presentation of Antigen and Its Possible Role in Parkinson’s Disease. J Parkinsons Dis 12 , S137–S147 (2022). Iba, M. et al. Neuroinflammation is associated with infiltration of T cells in Lewy body disease and α-synuclein transgenic models. J Neuroinflammation 17 , 214 (2020). Galiano-Landeira, J., Torra, A., Vila, M. & Bové, J. CD8 T cell nigral infiltration precedes synucleinopathy in early stages of Parkinson’s disease. Brain 143 , 3717–3733 (2020). Lindestam Arlehamn, C. S. et al. α-Synuclein-specific T cell reactivity is associated with preclinical and early Parkinson’s disease. Nat Commun 11 , 1875 (2020). Liu, G. et al. Prediction of cognition in Parkinson’s disease with a clinical–genetic score: a longitudinal analysis of nine cohorts. Lancet Neurol 16 , 620–629 (2017). Liu, G. et al. Genome-wide survival study identifies a novel synaptic locus and polygenic score for cognitive progression in Parkinson’s disease. Nat Genet 53 , 787–793 (2021). Severson, K. A. et al. Discovery of Parkinson’s disease states and disease progression modelling: a longitudinal data study using machine learning. Lancet Digit Health 3 , e555–e564 (2021). Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15 , 550 (2014). Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14 , 417–419 (2017). Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16 , 284–7 (2012). Dai, Y. et al. WebCSEA: web-based cell-type-specific enrichment analysis of genes. Nucleic Acids Res 50 , W782–W790 (2022). Zhang, X.-O. et al. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res 26 , 1277–87 (2016). Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8 , (2019). Geiss, G. K. et al. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol 26 , 317–25 (2008). Tables Table 1. Summary of clinical data at baseline Discovery dataset Replication dataset p-value PD ( N =551) HC (n=437) PD (n=760) HC (n=452) Sex Male 318 214 482 204 1.59×10 -4 Female 233 223 278 248 0.61 Age, years (mean (sd)) 62.91 (10.42) 57.18 (12.53) 64.99 (8.78) 62.94 (10.81) 0.89 Duration of disease (mean (sd)) 2.37 (4.12) \ 6.19 (5.18) \ 5.98×10 -41 RIN value (mean (sd)) 8.18 (0.90) 8.36 (0.84) 7.44 (0.83) 7.41 (0.85) 1.0 MOCA (mean (sd)) 24.86 (4.17) 27.94 (1.88) 25.30 (3.60) 26.53 (2.54) 1.0 UPSIT (mean (sd)) 22.22 (8.03) 33.6 (4.70) 19.62 (7.74) 32.48 (6.00) 0.84 Table 2. Performance merits of classifier models on validation and testing datasets Features (Model) Validation on PPMI withhold samples Testing on PDBP/BioFIND samples Accuracy Sensitivity Specificity Precision BalAcc AUROC AUPRC Accuracy Sensitivity Specificity Precision BalAcc AUROC AUPRC DEGs (SVM_rbf) 0.67 (0.03)* 0.77 (0.04) 0.55 (0.05) 0.69 (0.04) 0.66 (0.03) 0.72 (0.03) 0.72 (0.04) 0.63 (0.01) 0.73 (0.02) 0.47 (0.03) 0.70 (0.01) 0.60 (0.01) 0.64 (0.01) 0.74 (0.01) PRS (LR) 0.58 (0.03) 0.90 (0.06) 0.17 (0.04) 0.58 (0.03) 0.53 (0.03) 0.54 (0.03) 0.62 (0.04) 0.67 (0.01) 0.94 (0.03) 0.21 (0.07) 0.67 (0.01) 0.57 (0.02) 0.70 (0.01) 0.79 (0.01) Clinical (SVM) 0.78 (0.03) 0.78 (0.03) 0.79 (0.04) 0.82 (0.04) 0.79 (0.03) 0.85 (0.02) 0.86 (0.03) 0.79 (0.01) 0.76 (0.01) 0.83 (0.01) 0.88 (0.01) 0.79 (0.01) 0.86 (0.01) 0.90 (0.01) DEGs+ PRS (LR) 0.69 (0.03) 0.74 (0.03) 0.62 (0.05) 0.71 (0.04) 0.68 (0.03) 0.75 (0.03) 0.78 (0.03) 0.66 (0.01) 0.69 (0.02) 0.60 (0.02) 0.74 (0.01) 0.65 (0.01) 0.70 (0.01) 0.79 (0.01) DEGs+PRS+Clinical (SVM) 0.83 (0.02) 0.83 (0.03) 0.84 (0.04) 0.86 (0.04) 0.83 (0.02) 0.91 (0.02) 0.92 (0.02) 0.82 (0.01) 0.79 (0.01) 0.87 (0.01) 0.91 (0.01) 0.83 (0.01) 0.89 (0.01) 0.93 (0.01) DEGs+PRS+Clinical (Previous study**) 0.86 0.89 0.76 0.91 0.82 0.90 NA 0.75 0.93 0.43 0.74 0.68 0.85 NA *: They are the mean values from the 100 random splits, the values in the parentheses are the standard deviations. **: The previous study did the independent test on the PDBP dataset. Additional Declarations There is NO Competing Interest. Supplementary Files ExtendedDataFig1.pdf Extended Data Fig. 1. The distributions of disease duration at enrollment in discovery (a) and replication (b) cohorts. ExtendedDataFig2.pdf Extended Data Fig. 2. The steps on the discovery dataset. ExtendedDataFig3.pdf Extended Data Fig. 3. The steps on the replication dataset. ExtendedDataFig4.pdf Extended Data Fig. 4. Data quality control assessments in the discovery and replication datasets. (a, b) Sex check and the sample distributions on the plate of discovery dataset. (c, d) Sex check and the sample distributions on the plate of replication dataset. ExtendedDataFig5.pdf Extended Data Fig. 5. The AUROC, average precision, and balanced accuracy values of the models using DEGs, DE eRNAs, or DE circRNAs as features. ExtendedDataTable1DEGs.xlsx Extended Data Table 1 ExtendedDataTable2DEeRNAcircRNA.xlsx Extended Data Table 2 ExtendedDataTable3Classifier.xlsx Extended Data Table 3 ExtendedDataTable4Enrichemntanalysis.xlsx Extended Data Table 4 ExtendedDataTable5UPDRSresults.xlsx Extended Data Table 5 Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6837659","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":473226727,"identity":"c4c4047c-5420-4b04-95b1-8f8fcc6f0ddb","order_by":0,"name":"Xianjun Dong","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9UlEQVRIie3Rv2sCMRTA8RcepMvDrAmKf0Pg4OSQ1j9G6KTrLR0UDuJid4f+HZ2vZMhyXFfBDhXBpR1ukg4dPPHXFm8sNF94EEI+ZHgAodBfDOuRh0NeTwXQgfNFI8IWAHSbXKoJUhOiHW4/E/MBLfdsv+7NAwnAtxV5iMp4TyuzBVWUj/2xGZKa8mHfRwRCLJWxoJejOBobJJ1T3PYRjne7K0nMhAa52HmJQLr8Em2YsaSBuJeojFIpS0uqKGI2Lx1Jy6PkxUP0u3tty9R2W24eVT/pU1fMsvXy20MOoTyug0vG4bjcW7HqRCv4bfA8FAqF/l17iC5ClF7xZ/oAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-8052-9320","institution":"Adams Center of Parkinson's Disease Research and Department of Neurology, Yale School of Medicine, Yale University","correspondingAuthor":true,"prefix":"","firstName":"Xianjun","middleName":"","lastName":"Dong","suffix":""},{"id":473226728,"identity":"954bbfff-e1f1-4f58-a1c6-6c966149ee61","order_by":1,"name":"Ruifeng Hu","email":"","orcid":"","institution":"Yale School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Ruifeng","middleName":"","lastName":"Hu","suffix":""},{"id":473226729,"identity":"d51517a3-2ecf-40f6-88ee-7eca0ff7b7b6","order_by":2,"name":"Ruoxuan Wang","email":"","orcid":"","institution":"Brigham and Women's Hospital, Harvard Medical School","correspondingAuthor":false,"prefix":"","firstName":"Ruoxuan","middleName":"","lastName":"Wang","suffix":""},{"id":473226730,"identity":"197bc518-c7e1-47ac-a616-8163eadfa544","order_by":3,"name":"Jie Yuan","email":"","orcid":"","institution":"Yale School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jie","middleName":"","lastName":"Yuan","suffix":""},{"id":473226731,"identity":"3a9c05cb-922d-4b4d-8c32-0532775159c9","order_by":4,"name":"Zechuan Lin","email":"","orcid":"","institution":"Yale School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Zechuan","middleName":"","lastName":"Lin","suffix":""},{"id":473226732,"identity":"ef744531-3652-4b7c-8436-a1b35245bff0","order_by":5,"name":"Elizabeth Hutchins","email":"","orcid":"https://orcid.org/0000-0003-2543-0798","institution":"Neurogenomics Division, Translational Genomics Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Elizabeth","middleName":"","lastName":"Hutchins","suffix":""},{"id":473226733,"identity":"48844817-3fd7-4817-9af3-d44912449129","order_by":6,"name":"Barry Landin","email":"","orcid":"","institution":"Technome","correspondingAuthor":false,"prefix":"","firstName":"Barry","middleName":"","lastName":"Landin","suffix":""},{"id":473226734,"identity":"c086ca38-bec8-4cb2-b6e8-710f0c3d4193","order_by":7,"name":"Zhixiang Liao","email":"","orcid":"","institution":"Brigham and Women's Hospital, Harvard Medical School","correspondingAuthor":false,"prefix":"","firstName":"Zhixiang","middleName":"","lastName":"Liao","suffix":""},{"id":473226735,"identity":"021b6f28-2edd-4037-a182-2090c71a9595","order_by":8,"name":"Ganqiang Liu","email":"","orcid":"https://orcid.org/0000-0002-1921-9542","institution":"Sun Yat-Sen University","correspondingAuthor":false,"prefix":"","firstName":"Ganqiang","middleName":"","lastName":"Liu","suffix":""},{"id":473226736,"identity":"f8f70099-945d-463d-b3e2-569919c12b07","order_by":9,"name":"Clemens Scherzer","email":"","orcid":"","institution":"Yale School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Clemens","middleName":"","lastName":"Scherzer","suffix":""}],"badges":[],"createdAt":"2025-06-06 14:05:50","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6837659/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6837659/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":85073006,"identity":"92b9615e-c027-48d1-871c-30d594083eef","added_by":"auto","created_at":"2025-06-20 15:58:06","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":210500,"visible":true,"origin":"","legend":"\u003cp\u003eThe general workflow of our study. The study was designed in a cross-sectional approach. Significant DEGs/DE eRNAs/DE circRNAs were discovered in PPMI cohorts. \u0026nbsp;The PD diagnosis classifiers were built and tested utilizing the multi-omics data. Then the replicated DEGs/DE eRNAs/DE circRNAs were confirm in the PDBP/BioFIND cohort. Further analysis such as functional enrichment analysis, replication with brain sample data, and cell type enrichment analysis were conducted on the replicated DEGs.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/e23a2e429e412c897a47d9b7.png"},{"id":85071814,"identity":"07f3010a-10bc-49f6-a80a-f8952d2ff072","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":257918,"visible":true,"origin":"","legend":"\u003cp\u003eThe performance of the models. (a-c) the AUROC value distributions of 10 tested algorithms with step-wised add feature sets on the testing dataset. (d, e) AUROC curves of the best models with step-wised add feature sets on the validation dataset and testing dataset. (f) Top selected genes during feature selection.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/44d8cb9c89cd2cfe46b2d22f.png"},{"id":85072141,"identity":"0cb520f0-2c93-4330-9977-42f3c2edd931","added_by":"auto","created_at":"2025-06-20 15:50:06","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":218619,"visible":true,"origin":"","legend":"\u003cp\u003eResults of differentially expressed gene analysis. The 502 genes were replicated with a nominal p-value \u0026lt; 0.05 in the replication dataset (the green dots). The scatter plot and the Venn diagram show among the 502 replicated genes, 491 replicated DEGs have consistent change directions in both discovery and replication datasets. Forty-four replicated DEGs with consistent change directions were further confirmed in dopamine neuron samples.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/25118aed392809379bf8fc56.png"},{"id":85071819,"identity":"83ec0f1b-982b-491b-b5f4-cc64d8b6bad3","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":356150,"visible":true,"origin":"","legend":"\u003cp\u003eThe enrichment analysis results of the replicated differentially expressed genes and the replicated differentially expressed neutrophil genes. (a) The enriched GO-BO and GO-CC terms with adjust p \u0026lt; 0.05, the neutrophil degranulation and neutrophil activation are significantly enriched. And the ficolin-1-rich granule and membrane cell components were significantly enriched. (b) Both the enriched GO-BP and the WikiPathway show that DEGs are involved in immune-related pathways. (c) The volcano plot of replicated differentially expressed neutrophil genes in discovery datasets shows that most neutrophil genes are up-regulated in PD patients (the top 5 ranked genes were shown on the plot, the full list can be found in \u003cstrong\u003eExtended Data Table 2\u003c/strong\u003e). (d) Cell-type specific enrichment analysis shows the replicated genes are enriched in neutrophil cell types in the lymphatic system. The dashed red line in the plot indicates the significant threshold (p = 3.69 × 10\u003csup\u003e-5\u003c/sup\u003e) corrected with 1355 collected tissue-cell types. The solid grey line indicates the nominal significance (p = 0.001).\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/1ee9aa121e65effbdfb9ece0.png"},{"id":85071817,"identity":"96763373-e9ac-4c09-bdef-4bc53e4094c1","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":124034,"visible":true,"origin":"","legend":"\u003cp\u003eThe expression levels of three PD-associated genes \u003cem\u003eSNCA, RNABP10, and VDR \u003c/em\u003ein the discovery, replication, BRAINcode datasets, and NanoString dataset.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/46b8387ff89503683f1c7886.png"},{"id":89588752,"identity":"c60cc055-20ac-40ae-84cc-62b6d6fe0af3","added_by":"auto","created_at":"2025-08-21 15:39:35","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2391400,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/11cdf340-91c1-4497-99cf-c1bbb9a57146.pdf"},{"id":85071811,"identity":"28da4b5b-4926-41d5-af4e-08f79ca07f11","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":101688,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig. 1. The distributions of disease duration at enrollment in discovery (a) and replication (b) cohorts.\u003c/p\u003e","description":"","filename":"ExtendedDataFig1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/015c5d0ad6491fb613ba7491.pdf"},{"id":85072139,"identity":"0a1a797c-aaf0-422f-80eb-98764fbfe147","added_by":"auto","created_at":"2025-06-20 15:50:06","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":60708,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig. 2. The steps on the discovery dataset.\u003c/p\u003e","description":"","filename":"ExtendedDataFig2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/70762e3d602fc4cb0832f65e.pdf"},{"id":85073005,"identity":"6a44426f-5890-49bc-bfb6-dac5a724e551","added_by":"auto","created_at":"2025-06-20 15:58:06","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":59773,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig. 3. The steps on the replication dataset.\u003c/p\u003e","description":"","filename":"ExtendedDataFig3.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/7a4430b6921ffb4513a3e19d.pdf"},{"id":85072143,"identity":"98b2cc4e-e238-40d0-b277-0120831435be","added_by":"auto","created_at":"2025-06-20 15:50:06","extension":"pdf","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":560710,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig. 4. Data quality control assessments in the discovery and replication datasets. (a, b) Sex check and the sample distributions on the plate of discovery dataset. \u0026nbsp;(c, d) Sex check and the sample distributions on the plate of replication dataset.\u003c/p\u003e","description":"","filename":"ExtendedDataFig4.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/a2020761f9342ac6e49d54a2.pdf"},{"id":85071822,"identity":"29257e52-2718-432d-a52d-3b75dd646a14","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"pdf","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":267535,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig. 5. The AUROC, average precision, and balanced accuracy values of the models using DEGs, DE eRNAs, or DE circRNAs as features.\u003c/p\u003e","description":"","filename":"ExtendedDataFig5.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/279dc06bbfc5044d78920814.pdf"},{"id":85071825,"identity":"cee740a2-6574-4305-9f57-438a7c069a66","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"xlsx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":1696656,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Table 1\u003c/p\u003e","description":"","filename":"ExtendedDataTable1DEGs.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/96a76ea20e9bba158582f3f5.xlsx"},{"id":85071821,"identity":"e757fd1f-92eb-4265-bd08-da391c4242af","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"xlsx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":202339,"visible":true,"origin":"","legend":"Extended Data Table 2","description":"","filename":"ExtendedDataTable2DEeRNAcircRNA.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/ee34562d7c49982679038aa9.xlsx"},{"id":85071823,"identity":"2f3c6317-dc04-427b-852c-d9e2e648b745","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"xlsx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":32185,"visible":true,"origin":"","legend":"Extended Data Table 3","description":"","filename":"ExtendedDataTable3Classifier.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/45279652df0a125d27b11b40.xlsx"},{"id":85071818,"identity":"6775abbb-51d9-495d-877d-ceafff2ca67c","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"xlsx","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":25642,"visible":true,"origin":"","legend":"Extended Data Table 4","description":"","filename":"ExtendedDataTable4Enrichemntanalysis.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/70e3a0c5f974996a0d902255.xlsx"},{"id":85071824,"identity":"a5f732b7-ff6e-416c-a8f3-f99296ea4f65","added_by":"auto","created_at":"2025-06-20 15:42:06","extension":"xlsx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":1312049,"visible":true,"origin":"","legend":"Extended Data Table 5","description":"","filename":"ExtendedDataTable5UPDRSresults.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6837659/v1/b720b357cc3d821c9d047267.xlsx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Multi-omics machine learning classifier and blood transcriptomic signature of Parkinson’s disease","fulltext":[{"header":"Main","content":"\u003cp\u003eParkinson’s disease (PD) is a progressive neurodegenerative disease, with an estimated global number of patients exceeding 13 million by 2040 \u003csup\u003e1,2\u003c/sup\u003e. PD is thought to be caused by combinatorial effects of environmental, epigenetic, and genetic contributions that exert many of their effects through \u003cem\u003ecis\u003c/em\u003e- and \u003cem\u003etrans\u003c/em\u003e-acting regulation of transcript abundance\u003csup\u003e3–6\u003c/sup\u003e. Progressive loss of dopamine neurons and an increasing burden of α-synuclein-positive neuronal inclusions (the so-called Lewy bodies) are hallmarks of PD\u003csup\u003e7,8\u003c/sup\u003e. Once PD neuropathology crosses a clinically relevant threshold, movement becomes relentlessly more impaired in PD patients. Biomarkers for early detection and quantitative tracking of disease progression are currently lacking\u003csup\u003e9,10\u003c/sup\u003e. By the time a patient is diagnosed with PD based on today’s clinical criteria (e.g., resting tremor, slow movements, and stiffness), up to 70% of vulnerable dopaminergic neurons have been lost\u003csup\u003e11\u003c/sup\u003e. Therefore, developing a panel of biomarkers for early and accurate diagnosis is urgently needed\u003csup\u003e12,13\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAdditionally, PD is a slowly progressive and complex genetic disease that likely results from multiple genetic risk variants, each conferring small increases in susceptibility. PD GWASs have revealed thousands of genetic variants whose mutations are associated with disease risk\u003csup\u003e14\u003c/sup\u003e. However, the genetic germline is static and thus cannot be used quantitatively to track disease progression over time using serial measurements. While the strategy of constructing an aggregate measure from multiple individual markers has been fruitful in genetic studies of PD risk, the use of markers spanning multiple modalities (e.g., genetic, transcriptomic, clinical, and imaging-based markers) is needed to maximize its utility\u003csup\u003e15\u003c/sup\u003e, as it is unlikely that a single biomarker will adequately capture the genetic and environmental heterogeneity of PD. Individuals with a high risk of developing PD may show developmental potentials in their transcriptomics profiles before having clinical symptoms, even if they may pass the clinical PD tests\u003csup\u003e16–18\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eLimited sample size remains one of the main pitfalls in current biomarker studies. A systematic review of published studies using α-synuclein species as a PD biomarker found that 84% of studies included 100 PD patients or fewer\u003csup\u003e19\u003c/sup\u003e. Previous efforts have made several individual cohorts available to study PD biomarkers, including the Michael\u0026nbsp;J. Fox Foundation (MJFF) Parkinson’s Progression Markers Initiative\u0026nbsp;(PPMI)\u003csup\u003e20,21\u003c/sup\u003e, the NINDS Parkinson's Disease Biomarkers Program (PDBP)\u003csup\u003e22\u003c/sup\u003e, and the MJFF BioFIND study\u003csup\u003e23\u003c/sup\u003e. All data from these efforts are now integrated into the Accelerating Medicines Partnership in Parkinson’s Disease (AMP PD) program, which to date has generated the largest PD longitudinal RNA sequencing data for about 8,500 samples covering more than 3,200 participants with deep clinical phenotype data. A study by Craig et al\u003cem\u003e.\u003c/em\u003e\u003csup\u003e24\u003c/sup\u003e conducted an overview analysis of the PPMI dataset, finding that the neutrophil cell abundance is higher in patients with PD, while lymphocyte cell abundance is lower in patients with PD. They used all time-point samples in their cross-sectional analysis and treated the multiple visits from one subject as independent data points, which may not be representative of early time-point samples. In another study by Makarious et al\u003cem\u003e.,\u003c/em\u003e the authors built multi-modal machine learning models, which have good performances; however, the case and control samples are unbalanced, and the models achieved a modest, balanced accuracy value of 0.68 \u003csup\u003e13\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eIn our work, we leveraged the large-scale, total RNA sequencing baseline data (visit month=0) from the AMP PD to find the most significant differentially expressed RNAs from both known genes (e.g., those annotated mRNAs or lncRNAs in GENCODE) and novel, non-coding RNAs which include circular RNAs (circRNAs) and enhancer RNAs (eRNAs) between PD patients and healthy controls in a defined discovery dataset. We then built multi-gene classifiers using a selective set of PD-associated marker genes, as well as an integrative multi-omics classifier using gene expression data, genetic variant data, and clinical information to achieve better performance. In our study, all models were trained on a discovery dataset and tested on an independent replication dataset to evaluate their respective performance. We also validated PD-associated RNAs in an independent cohort with a secondary method. Enrichment analysis was conducted to find the distinguishable functional pathways or gene ontology terms between PD patients and healthy controls using the replicated differentially expressed protein-coding genes. Finally, we developed and tested innovative multi-omics classifiers which provided a reusable computational framework for PD diagnosis in clinical practices.\u003c/p\u003e\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eHigh-quality, large-scale multi-omics cohort for Parkinson’s disease\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eThis study used data from the AMP PD program, which includes eight cohorts from the PPMI, PDBP, and BioFIND study, with PAXgene-based total RNA-seq data, whole genome sequences, and clinical data available for 3,274 participants (release v2). The PPMI is a longitudinal observational study of 1,923 participants with PD, or at risk for PD, and healthy volunteers, thereby contributing comprehensive clinical and imaging data and biological samples from 33 clinical sites around the world. In the PPMI study, patients in the PD cohort have been diagnosed within two years of enrollment. Importantly, they are de novo patients and as such, not yet taking PD medications, which may confound biomarker analyses. The PDBP is another PD program that collects standardized longitudinal clinical data and biospecimens across all stages of PD. The program has 1,604 participants and was developed to accelerate the discovery of promising new diagnostic and progression biomarkers for PD. The PDBP supports basic, translational, and clinical research through hypothesis testing, target and pathway discovery, biomarker development, and disease modeling. The Fox Investigation for New Discovery of Biomarkers (BioFIND) is an observational clinical study designed to discover and verify biomarkers of PD, which includes blood and cerebrospinal fluid, in 118 well-defined, moderately advanced people with PD and 88 control volunteers. (We obtained permission to access the AMP PD datasets on Google Cloud Storage for the analysis. All information on data collection and data processing procedures from the AMP PD program can be found at\u0026nbsp;.)\u003c/p\u003e\u003cp\u003eIn our study, we leveraged primarily the baseline data (visit month=0) to systematically delineate genes associated with PD diagnosis at an early stage. The PPMI cohort was used as the discovery population and PDBP/BioFIND cohorts were defined as the replication population (\u003cstrong\u003eFig. 1\u003c/strong\u003e). Most of the PD cases in PPMI were recruited when they were newly diagnosed, while most PD cases in the replication dataset were moderate and advanced (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 1\u003c/strong\u003e). We wished to establish a set of marker genes exhibiting consistent changes from an early stage of PD and so we intentionally began with a study of individuals recently diagnosed with PD, and then subsequently validated our findings in advanced PD patients. We applied stringent, quality check standards to the samples and the genes (see Methods for details). After the stepwise sample filtrations, 551 PD and 437 control samples remained in the discovery dataset (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 2\u003c/strong\u003e), and 760 PD case samples and 452 control samples were used for analysis in the replication dataset (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 3\u003c/strong\u003e). After removing low abundance and low variance genes, 35,900 genes remained in the discovery dataset. The sex check shows high consistency between clinically reported sex and genetically determined sex (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 4\u003c/strong\u003e\u003cstrong\u003ea\u003c/strong\u003e). The case/control sample distributions on the plates showed that the percentages of healthy samples on some plates (P201, P207, P208, P209, P210, P211, P212, P213, P214, and P215) are higher than others in the discovery dataset, but there was no extreme plate outlier (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 4\u003c/strong\u003e\u003cstrong\u003eb\u003c/strong\u003e). Therefore, we included the plate as a covariant in our analysis design. Based on our data quality assessments, there were no extreme outliers, so we did not remove any additional samples. For the replication dataset, 31,935 genes were kept. The scatter plot demonstrated the sex check consistency between clinically reported sex and genetically determined sex (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 4c\u003c/strong\u003e). The case/control sample distributions on the plates show more even distributions across all plates (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 4d\u003c/strong\u003e). Finally, 988 baseline samples were included in the discovery dataset and 1,212 baseline samples were included in the replication dataset for all the downstream analyses. The basic clinical characteristics of the datasets after filtrations are summarized in \u003cstrong\u003eTable 1\u003c/strong\u003e.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eOver 15,00 known and novel RNAs were found to be differentially expressed in PD blood samples\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eThrough the differential expression (DE) analysis of the GENCODE-annotated human genes in the discovery dataset, 874 known genes were significantly differentially expressed in PD in the discovery dataset with a Benjamin-Hochberg adjusted p-value of \u0026lt; 0.05 (\u003cstrong\u003eFig. 1, Extended Data Table 1\u003c/strong\u003e). In this study, we also identified a total of 26,035 candidate eRNAs using our in-house scripts\u003csup\u003e6\u003c/sup\u003e, all of which surpassed the established low expression filtration threshold (only eRNAs with read count \u0026gt; 5 in \u0026gt; 10% samples were kept). Out of these candidate eRNAs, 783 exhibited differential expression. Meanwhile, we identified 441,811 circRNAs in the blood samples. After applying a filtration process requiring at least two read counts in 10% of the samples, 3,052 circRNAs remained and were subjected to DE analysis, which has resulted in a discovery of 35 DE circRNAs (\u003cstrong\u003eExtended Data Table 2\u003c/strong\u003e). In total, we found 1,692 known and novel RNAs differentially expressed in PD blood.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003ePerformance of PD diagnosis classifier models\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eUtilizing the identified differentially expressed genes, eRNAs, and circRNAs, we constructed machine learning models for PD diagnosis classification. The PPMI samples (discovery population) were randomly split into a training set (80%) and a validation set (20%) 100 times, so that we had 100 pairs of training-validation datasets. The training set was used to build the classifiers which were validated on the corresponding validation set. All the trained models were tested on the independent replication cohorts. Different machine learning classifiers (n = 10) were trained and compared (see Methods for details). Since there were too many features, and to avoid overfitting problems during the model training, we used LASSO (least absolute shrinkage and selection operator) for feature selection. The PD diagnosis classifiers were first constructed using the 874 DEGs from PPMI with FDR \u0026lt; 0.05. After feature selection, there are 23 to 36 DEGs that were selected as the final predictors from each random split for model training and validation. Among all the 10 tested algorithms, we observed that the support vector machine with RBF kernel (SVM_rbf) had the best performances among others regarding the average area under the receiver operating characteristic curve (AUROC) values and the area under the precision-recall curve (AUPRC) values on the PDBP/BioFIND testing dataset (\u003cstrong\u003eFig. 2a)\u003c/strong\u003e. The mean and standard deviation of the AUROC and AUPRC values were 0.72 (0.03) and 0.72 (0.04), respectively, on the PPMI 20% withheld validation datasets, and 0.64 (0.01) and 0.74 (0.01) when the model was applied to the PDBP/BioFIND testing dataset (\u003cstrong\u003eTable 2, Extended Data Table 3\u003c/strong\u003e). After adding the polygenic risk score (PRS) as the genetics feature to the selected DEGs in each random split, the logistic regression (LR) model demonstrated the best performance as the AUROC and AUPRC were improved to 0.75 (0.03) and 0.78 (0.03), respectively, on the validation datasets, and 0.70 (0.01) and 0.79 (0.01) on the independent testing dataset (\u003cstrong\u003eFig. 2B, Table 2\u003c/strong\u003e). With the clinical data (UPSIT-smell test score, sex, and age) added, the support vector machine (SVM) emerged as the optimal model. The AUROC and AUPRC values were raised to 0.91(0.02) and 0.92(0.02) on the validation dataset, and 0.89 (0.01) and 0.93 (0.01) on the testing dataset (\u003cstrong\u003eFig. 2c, Table 2\u003c/strong\u003e). Fig. 2D and Fig. 2E show the progressive improvement in model performance with stepwise addition of the genetics and clinical features on the validation and testing dataset.\u0026nbsp;\u003c/p\u003e\u003cp\u003eThe comparisons of PD prediction potentials using DEGs, DE eRNAs, or DE circRNAs revealed that DE eRNA models exhibited comparable performances to the DE circRNA models, but their predictive powers fall below those of DEG models (\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 5, Extended Data Table 3\u003c/strong\u003e). Moreover, combining all DE eRNAs, DE circRNAs, and DEGs did not contribute to an enhancement in model performance. Further exploration into the PD prediction abilities of PRS or clinical data demonstrated that the best PRS-based model displayed similar AUROC and AUPRC values to the DEG+PRS model on the testing dataset (\u003cstrong\u003eTable 2, Extended Data Table 3)\u003c/strong\u003e. However, it exhibited lower precision, balanced accuracy values, and notably low specificity values, suggesting a tendency for PRS-based models to yield false positives. Additionally, PRS-based models had inferior performance on validation datasets compared to DEGs-based models. Therefore, DEGs proved effective in compensating for the shortcomings of PRS in predicting PD samples. Clinical data exhibited superior capabilities in distinguishing PD cases from healthy controls, although the performance values slightly lagged our final multi-omics model (\u003cstrong\u003eTable 2, Extended Data Table 3)\u003c/strong\u003e.\u0026nbsp;\u003c/p\u003e\u003cp\u003eDelving into the selected features in each split, a total of 99 genes were chosen, with 12 of them recurrently selected as predictive features more than 80 times out of 100 selections (\u003cstrong\u003eFig. 2f\u003c/strong\u003e\u003cstrong\u003e,\u003c/strong\u003e\u003cstrong\u003eExtended Data Table 3\u003c/strong\u003e). The remarkable consistency in feature selection, coupled with low standard deviations in performance values, affirmed the high stability of our model. Looking into those selected genes, \u003cem\u003eH19\u003c/em\u003e is a long non-coding RNA, which was selected 100 times. H19 has been reported to be associated with PD progression and correlated with susceptibility to various CNS disorders\u003csup\u003e25,26\u003c/sup\u003e. We also found that 7 neutrophil genes were selected as the predictor during the 100 splits, which include \u003cem\u003ePREX1\u003c/em\u003e, \u003cem\u003eSLCO4C1\u003c/em\u003e, \u003cem\u003eCXCR2\u003c/em\u003e, \u003cem\u003eDNAJC3\u003c/em\u003e, \u003cem\u003eCD93\u003c/em\u003e, \u003cem\u003eLAMP1\u003c/em\u003e, and \u003cem\u003eHEBP2\u003c/em\u003e. The \u003cem\u003eLAMP1\u0026nbsp;\u003c/em\u003ewas recurrently selected 98 times. \u003cem\u003ePREX1\u0026nbsp;\u003c/em\u003eand \u003cem\u003eCXCR2\u003c/em\u003e were also the two genes that were replicated in brain data.\u003c/p\u003e\u003cp\u003eAbove all, our final multi-omics model outperformed the recent publication with similar models with respect to accuracy (0.82 vs 0.75), balanced accuracy (0.83 vs 0.68), and AUROC (0.89 vs 0.85) on the testing dataset (\u003cstrong\u003eTable 2\u003c/strong\u003e). We observed better performances than in the previous report, maybe because we included other genes in our DEGs as features in our models, in addition to the protein-coding genes used in the previous study. Additionally, we calculated the PRS using the 7,057 PD-associated significant variants instead of only the 90 SNPs that were used in the previously published paper.\u0026nbsp; Also, the study by Makarious et al., they did not consider the balance issues between positive and negative samples for training the model, that may be also why their specificity value is low. Our performances are more robust since we used balanced training and testing dataset\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eReplication of discovered DE RNAs\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eTo utilize the potential of the discovery dataset and confirm the discovered DE genes, DE eRNAs, and DE circRNAs, the DE genes and DE RNAs were also called from the replication dataset. Of the 874 DEGs from discovery dataset, 502 genes were replicated in the replication dataset with a nominal p-value \u0026lt; 0.05 (\u003cstrong\u003eFig. 3a\u003c/strong\u003e), of which over 97.8% (491) of the genes have consistent direction changes in both discovery and replication datasets (\u003cstrong\u003eFig. 3b\u003c/strong\u003e, and the details are available in \u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eTable 1\u003c/strong\u003e). In the 783 initially discovered DE eRNAs, 599 of them were replicated with the same directional changes in the combined PDBP/BioFIND replication dataset. The dataset also revealed that among the replicated DE eRNAs, 396 were up-regulated, and 203 were down-regulated (\u003cstrong\u003eExtended Data Table 2\u003c/strong\u003e). Regarding circRNAs, 21 of the initially discovered DE circRNAs were replicated in the PDBP/BioFIND dataset. Among these replicated DE circRNAs, 15 were up-regulated, and 6 were down-regulated (\u003cstrong\u003eExtended Data Table 2\u003c/strong\u003e). The most significant DE eRNA in discovery dataset is chr5_10486550_10486710_plus (This location falls in the region of gene LINC02212), and the most significant DE circRNA in discovery dataset is chr1_17341942_17342402_plus (This location falls in the intron region of gene PADI4). \u0026nbsp;\u0026nbsp;\u003c/p\u003e\u003cp\u003eWe then further investigated the host genes of these replicated DE eRNAs and DE circRNAs. The 599 DE eRNAs and 21 DE circRNAs were mapped to 306 host genes (289 eRNA host genes, 18 circRNA host genes, 1 shared host gene (ENSG00000159339, \u003cem\u003ePADI4\u003c/em\u003e)). Although their host genes did not share the same enriched GO terms with DEGs, we noticed several PD-associated genes or genes that are involved in neutrophil activation in the host gene list. \u003cem\u003eSPI1\u003c/em\u003e is one of the member genes of GO:0042119 (neutrophil activation). It has been reported that \u003cem\u003eSPIL1\u003c/em\u003e plays a crucial role in the regulation of the genes relevant to specialized functions of microglia, therefore dysregulation of \u003cem\u003eSPIL1\u003c/em\u003e might contribute to the establishment or development of PD due to the accumulation of activated microglia\u003csup\u003e27–29\u003c/sup\u003e.\u0026nbsp;\u003cem\u003ePADI4\u003c/em\u003e is a gene that can positively regulate \u003cem\u003eTNF-α\u003c/em\u003e and \u003cem\u003eCCL2\u003c/em\u003e which can lead to the development of neuroinflammation\u003csup\u003e30,31\u003c/sup\u003e. \u003cem\u003ePADI2\u003c/em\u003e coordinates with \u003cem\u003ePADI4\u003c/em\u003e to regulate the assembly of the \u003cem\u003eNLRP3\u003c/em\u003e inflammasome to promote \u003cem\u003eIL-1β\u003c/em\u003e release.\u0026nbsp;Research also showed that \u003cem\u003ePADI4\u003c/em\u003e can participate in all aspects of neutrophil extracellular traps (NETs)\u003csup\u003e32\u003c/sup\u003e.\u0026nbsp;Moreover, X-linked dystonia Parkinson’s disease is aggravated by increased levels of \u003cem\u003ePADI2\u003c/em\u003e, \u003cem\u003ePADI4\u003c/em\u003e, and inflammation in the prefrontal cortex and its derived fibroblasts\u003csup\u003e33\u003c/sup\u003e. The circRNA host gene \u003cem\u003eRHBDD1\u003c/em\u003e, also named \u003cem\u003eRHBDL4\u003c/em\u003e, has been implicated in a variety of diseases including Alzheimer’s and Parkinson's disease, which can cleave amyloid precursor protein inside the cell, causing it to bypass amyloidogenic processing, leading to reduced Aβ levels\u003csup\u003e34\u003c/sup\u003e. This gene had a significant negative log2 fold change in PD patients compared to the health controls in both discovery and replication cohorts. Among the 306 host genes, 53 genes were shared with replicated DEGs. The DEGs\u0026nbsp;\u003cem\u003eIKBIP, LAMP2, and VDR\u003c/em\u003e which are associated with PD and as mentioned above, were also among the host genes.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eNeutrophil activation and immune pathways were upregulated in PD patient blood\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eThe over-representation enrichment analysis was conducted on GO and WikiPathway terms using the 491 replicated genes with the same change directions. There are five significantly enriched GO biological processes (GO-BP) with FDR \u0026lt; 0.05, as well as five significantly enriched GO cellular component (GO-CC) terms (\u003cstrong\u003eFig. 4\u003c/strong\u003e, \u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eTable 4\u003c/strong\u003e). Both the enriched GO biological processes and cellular components revealed that neutrophil activation and neutrophil degranulation are the key messages derived from the DEGs. Additionally, in the enriched GO-BP terms, we found that the genes were also involved in immune-related pathways, which was also confirmed by the enriched WikiPathways results (\u003cstrong\u003eFig. 4b\u003c/strong\u003e, \u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eTable 4\u003c/strong\u003e). \u0026nbsp;\u0026nbsp;\u003c/p\u003e\u003cp\u003eBy looking into the changes of the leading genes enriched in the neutrophil activation and neutrophil degranulation biological processes, we found that among the 29 DEGs involved in these two GO-BPs, all but one gene were \u003cem\u003eupregulated\u003c/em\u003e in PD case samples in both the discovery and the replication datasets (\u003cstrong\u003eFig. 4c\u003c/strong\u003e, \u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eTable 4\u003c/strong\u003e). The results indicate that neutrophil activation and neutrophil degranulation was highly regulated in PD patients. Furthermore, the highly expressed neutrophil genes in up-regulation of the neutrophil activation and neutrophil degranulation pathway can serve as biomarkers for PD early diagnosis.\u0026nbsp;\u003c/p\u003e\u003cp\u003eWe then asked on which human tissue and cell types these marker genes might manifest their impacts. By assessing their cell-type-specificity in 1335 curated single-cell and tissue types with WebCSEA, we found that the 491 DEGs were highly enriched in the blood neutrophil cells of the lymphatic organ system (\u003cstrong\u003eFig. 4d\u003c/strong\u003e). This further suggests that the dysregulation of neutrophil cells could be a marker of early PD diagnosis.\u003c/p\u003e\u003cp\u003eAmong these 29 differentially expressed leading genes in\u0026nbsp;neutrophil activation and neutrophil degranulation biological processes, several have been studied in the context of PD. A pathogenic mutation (p.N855S) in \u003cem\u003eDNAJC13\u003c/em\u003e was linked to autosomal dominant Lewy body PD\u003csup\u003e35–37\u003c/sup\u003e. \u003cem\u003eAPAF1\u0026nbsp;\u003c/em\u003e(apoptotic peptidase activating factor) was reported as a potential drug target for neurodegenerative diseases and \u003cem\u003eAPAF1\u003c/em\u003e dominant negative inhibitor can prevent MPTP toxicity as antiapoptotic gene therapy for Parkinson's disease\u003csup\u003e38\u003c/sup\u003e. \u003cem\u003eFCGR2A\u003c/em\u003e and \u003cem\u003eFCGR2B\u0026nbsp;\u003c/em\u003eare well known to play a role in modulating inflammatory responses and to be involved in phagocytosis. Two recent causality analysis of cerebrospinal fluid and blood proteomics showed that \u003cem\u003eFCGR2A\u003c/em\u003e and \u003cem\u003eFCGR2B\u003c/em\u003e are among the top causal proteins to PD risk\u003csup\u003e39,40\u003c/sup\u003e. While there is not much evidence for \u003cem\u003eFCGR2A\u003c/em\u003e and \u003cem\u003eFCGR2B\u003c/em\u003e’s role in PD blood, Choi et al. showed that \u003cem\u003eFCGR2B\u003c/em\u003e can function as a receptor for α-syn fibrils and regulate prion-like propagation of α-synuclein in neurons, and the \u003cem\u003eFCGR2B-SHP-1/-2\u003c/em\u003e signaling pathway may be a therapeutic target for the progression of PD\u003csup\u003e41\u003c/sup\u003e. Lastly, \u003cem\u003eCD93\u003c/em\u003e participates in pathophysiological processes of central nervous system inflammation\u003csup\u003e42\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\u003cp\u003eWe further validated some of the marker genes with a second digital expression NanoString technology in blood in an independent cohort of the\u0026nbsp;Harvard Biomarker Study (HBS) (\u003cstrong\u003eFig. 5,\u0026nbsp;\u003c/strong\u003esee Methods). \u003cem\u003eSNCA\u003c/em\u003e is considered as the major causative gene involved in the onset of PD, both from a genetics and protein level\u003csup\u003e43\u003c/sup\u003e. We observed a reduction in SNCA RNA expression in PD samples compared to healthy controls across multiple cohorts, including PPMI, PDBP/BioFIND, BRAINcode, and HBS, using samples from both blood and brain on various platforms, such as RNAseq and NanoString. Our findings consistently showed lower SNCA mRNA levels in the blood of early-stage PD patients, which correlated with brain samples and were consistent with findings in an independent cohort \u003csup\u003e44,45\u003c/sup\u003e. Additionally, we replicated pathological levels of \u003cem\u003eVDR\u003c/em\u003e and \u003cem\u003eRANBP10\u0026nbsp;\u003c/em\u003e\u003csup\u003e46\u003c/sup\u003e. Moreover, vitamin D is associated with neuroprotection in animal models of PD\u0026nbsp;\u003csup\u003e47\u003c/sup\u003e and we previously reported reduced levels of the vitamin D receptor (VDR) in an unbiased microarray screen of PD blood samples\u0026nbsp;\u003csup\u003e46\u003c/sup\u003e and\u0026nbsp;found a 25-hydroxy-vitamin D deficiency in 17.6% of PD patients \u003csup\u003e47\u003c/sup\u003e. For these previously identified candidate biomarker RNAs of PD, we observed consistent changes in direction between PD and healthy control samples in both the NanoString (HBS) and RNA-seq data (PPMI and PDBP/BioFIND, \u003cstrong\u003eFig. 5\u003c/strong\u003e).\u003c/p\u003e\u003cp\u003eAdditional to the dichotomic analysis between the PD and control groups, we further tested if any changes in gene expression are associated with the PD motor severity which is indicated by the MDS-UPDRS part III summary score. Our results are shown in the \u003cstrong\u003eExtended Data Table 5\u003c/strong\u003e. We found that 2,236 genes and 4,045 genes were significantly associated (adjust p \u0026lt; 0.05) with the MDS-UPDRS part III summary scores in the discovery and replication datasets, respectively. Among these genes, 1,636 genes were shared by both the discovery and the replication datasets with the same change directions. Functional enrichment analysis conducted on the 1,636 replicated genes showed that the “neutrophil activation”, “neutrophil activation involved in immune response”, “neutrophil-mediated immunity”, and “neutrophil degranulation” are the top enriched GO-BP terms. This is consistent with our conclusion from the main dichotomic analysis between the PD cases and healthy controls. These findings suggest that neutrophil degranulation is also a potential biomarker in the blood for PD motor severity.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eReplicating blood-based marker genes in brain neurons\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eNext, we wondered if any of the marker genes we detected in blood are also presented in brain neurons, as the neuronal RNAs could pass through the blood-brain barrier via mediators (e.g., exosomes) and be detectable in the blood stream. By analyzing the total RNAseq data of dopamine neurons that was laser-captured from \u0026gt;100 human brain samples in the BRAINcode cohort\u003csup\u003e6\u003c/sup\u003e, we identified 575 known genes that were significantly differentially expressed in PD (FDR \u0026lt; 0.05). Compared with the 491 blood marker genes consistently changed in both discovery and replication datasets, 44 genes were further confirmed\u0026nbsp;with the same change direction in dopamine neuron samples (\u003cstrong\u003eFig. 3,\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eTable 1\u003c/strong\u003e).\u003c/p\u003e\u003cp\u003eAmong these 44 brain-blood shared\u0026nbsp;DE genes, the \u003cem\u003eLAMP2\u0026nbsp;\u003c/em\u003egene has been reported to be differentially expressed between the early stages of PD and controls, and was also reported to be associated with the expression level of \u003cem\u003eSNCA\u003c/em\u003e\u003csup\u003e48\u003c/sup\u003e. LAMP2 isoform LAMP2B is also a marker protein expressed on the surface of exosomes, which helps to transport cargos thru the blood-brain barrier. Several neuroinflammation-associated genes were replicated in our brain datasets, such as\u003cem\u003e\u0026nbsp;IKBIP\u003c/em\u003e\u003csup\u003e49\u003c/sup\u003e\u003cem\u003e,\u003c/em\u003e \u003cem\u003eCXCR2\u003c/em\u003e\u003csup\u003e50,51\u003c/sup\u003e\u003cem\u003e,\u0026nbsp;\u003c/em\u003eand\u003cem\u003e\u0026nbsp;NFKBIB\u003c/em\u003e\u003csup\u003e52\u003c/sup\u003e. Additionally, \u003cem\u003eIL18R1\u003c/em\u003e, a cytokine receptor that belongs to the interleukin 1 receptor family, was significantly increased in both PD blood and brain neurons. While the function of this cytokine receptor in PD is not experimentally verified, an increase in interleukin-1beta (\u003cem\u003eIL-1β\u003c/em\u003e) was previously reported as a potential mediator of microglia activation in the PD rat model\u003csup\u003e53\u003c/sup\u003e. These genes were consistently upregulated. Note that only five out of the 44 brain-blood shared genes are in the\u0026nbsp;neutrophil activation and neutrophil degranulation biological processes pathway. They are \u003cem\u003ePREX1\u003c/em\u003e, \u003cem\u003eFCGR2A\u003c/em\u003e, \u003cem\u003eCAB39\u003c/em\u003e, \u003cem\u003eCXCR2\u003c/em\u003e, and \u003cem\u003eLAMP2\u003c/em\u003e.\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003ePD is a progressive, multisystem neurodegenerative disease that has been a huge burden on our society and the people it affects. Early diagnosis and biomarker discoveries that bolster the therapeutic pipeline for PD are urgently needed\u003csup\u003e54,55\u003c/sup\u003e. The Accelerating Medicines Partnership in Parkinson’s Disease (AMP PD) program has provided unprecedented opportunities for investigators, including this opportunity, to utilize the data to build an early diagnosis platform for PD patient diagnosis which could lead to improved treatment response and higher efficacy. \u0026nbsp;\u003c/p\u003e\u003cp\u003eCurrently, PD diagnosis is mainly based on clinical phenotype detections which can provide high sensitivity for detecting parkinsonism\u003csup\u003e15,56\u003c/sup\u003e. However, clinical observation alone is often insufficient to predict PD status before the onset of the disease. Once symptoms emerge and are detectable, it usually indicates the development of PD(10, 12). It has been reported that in idiopathic PD, there is severe degeneration of the nigrostriatal neurons of the substantia nigra before neurologists can establish the diagnosis according to the widely accepted clinical diagnostic criteria\u003csup\u003e57\u003c/sup\u003e. It is conceivable that neuroprotective therapy starting at such a stage of the disease will fail to stop the degenerative process. Therefore, the identification of patients at risk and earlier stages of the disease appears to be essential for any successful neuroprotection. The observational PD phenotypes\u0026nbsp;are reflections of the changes in transcriptomic\u0026nbsp;profiles which are changing in advance of clinical phenotypes. Analyzing the transcriptomic changes between PD patients and healthy control samples can provide signals for preclinical diagnosis. Utilizing the large cohort datasets from AMP PD in this cross-sectional study, differentially expressed genes were initially discovered and then validated using these large sample-size cohorts.\u0026nbsp;\u003c/p\u003e\u003cp\u003eFunctional enrichment analysis was conducted, and we found the neutrophil activation and degranulation were significantly enriched, which we recommend as a diagnostic marker\u003csup\u003e58\u003c/sup\u003e. As previously published, neutrophil infiltration plays an important role in the development of PD\u003csup\u003e59\u003c/sup\u003e. Studies have indicated that circulating neutrophils are increased in number in PD, while other circulating immune cells have either decreased or not changed in prevalence\u003csup\u003e60,61\u003c/sup\u003e. A study by Craig et al.that utilized the PPMI dataset found an increased number of neutrophils in PD patients compared to controls\u003csup\u003e24\u003c/sup\u003e. While neutrophils have yet to be identified in the brains of PD patients, neutrophils have been identified in the brains of AD patients and mouse models of neuroinflammation\u003csup\u003e62,63\u003c/sup\u003e. Moreover, circulating neutrophils express \u003cem\u003eCD11b\u003c/em\u003e, an integrin that responds to aggregated α-synuclein in microglia\u003csup\u003e64\u003c/sup\u003e. Another study revealed that neutrophil degranulation was the most significantly altered molecular pathway in patients, with most genes in the neutrophil degranulation pathway containing nonsense or missense mutations\u003csup\u003e65\u003c/sup\u003e. In our work, we confirmed that the neutrophil activation and degranulation pathway were actively upregulated.\u0026nbsp;By checking the literature and pathway annotation databases\u003csup\u003e66,67\u003c/sup\u003e, we know that neutrophils contain five different types of granules: primary granules, also known as azurophilic granules; secondary granules, also known as specific granules; tertiary granules; secretory vesicles; and ficolin-rich granules. The primary granules are the main storage sites of the most toxic mediators, including elastase, myeloperoxidase, cathepsins, and defensins. The secondary and tertiary granules contain lactoferrin and matrix metalloprotease 9 (also known as gelatinase B), respectively, among other substances. The secretory vesicles in human neutrophils contain human serum albumin, suggesting that they contain extracellular fluid that was derived from the endocytosis of the plasma membrane. Ficolin-rich granules are highly exocytosable, gelatinase-poor granules found in neutrophils and are rich in ficolin-1. Ficolin-1 is released from neutrophil granules by stimulation with fMLP or PMA. Granules are prevented from being released until receptors in the plasma membrane or phagosomal membrane signal to the cytoplasm to activate their movement to the cell membrane for secretion of their contents by degranulation. This is an important control mechanism as the neutrophil is highly enriched in tissue-destructive proteases. \u0026nbsp;\u003c/p\u003e\u003cp\u003eThere is increasing evidence showing the links between blood cells and PD development. Variants at, or near, the gene \u003cem\u003eLRRK2\u003c/em\u003e locus have been known to be associated with PD. Reports have shown that full-length \u003cem\u003eLRRK2\u003c/em\u003e is a relatively common constituent of human peripheral blood mononuclear cells (PBMC), including affinity-isolated, CD14+ monocytes, CD19+ B-cells, and CD4+ as well as CD8+ T-cells\u003csup\u003e68\u003c/sup\u003e. There was also evidence showing both \u003cem\u003eSNCA\u003c/em\u003e mRNA and protein are particularly abundant in erythroid cells\u003csup\u003e4\u003c/sup\u003e. Lymphocyte is another category of cells that play important roles in PD. There are enhanced numbers of both CD4+ and CD8+ T cells in the brain parenchyma which had been observed in neuropathological studies of PD\u003csup\u003e69–71\u003c/sup\u003e. A longitudinal case study of a PD patient found that alpha-synuclein-reactive T cells were most abundant in peripheral blood before the appearance of motor symptoms\u003csup\u003e72\u003c/sup\u003e. Above all, more studies are emerging to show the potential of diagnosis biomarkers in the expression profiles of circulating genes.\u0026nbsp;\u003c/p\u003e\u003cp\u003eThe TNF-α signaling pathway is another enriched pathway from our WikiPathway enrichment analysis. TNF-α has been proven to be increased both in the brain and in the cerebrospinal fluid of Parkinsonian patients, and TNF-α is involved in the degenerative processes that occur in Parkinson's disease. TNF-α is the key player in the TNF-α signaling pathway. In our analysis, the leading-edge genes in this pathway include \u003cem\u003eCFLAR, MAPK3, APAF1, PRKCZ, PYGL, MAP2K4, BTRC, NFKBIB,\u0026nbsp;\u003c/em\u003eand \u003cem\u003eRAF1.\u003c/em\u003e Currently, there are few studies that have focused on these genes, and so we may study these genes in our future research.\u003c/p\u003e\u003cp\u003ePrevious studies have reported an association between the \u003cem\u003eSNCA\u003c/em\u003e transcript abundance in blood with early stage and imaging-supported, de novo PD. There is a paradoxical reduction in \u003cem\u003eSNCA\u003c/em\u003e transcript counts in the blood of individuals with early-stage, neuroimaging-supported Parkinson’s disease\u003csup\u003e4,44,45\u003c/sup\u003e. In our analysis, although the \u003cem\u003eSNCA\u003c/em\u003e transcript abundance did not show significant changes for the patient samples as compared to healthly samples, we confirmed reduced abundance trends in both our discovery and replication cohorts, as well as in our BRAINcode cohort. Literature reports have shown inconclusive SNCA protein changes in plasma which is likely due to hemolysis of erythrocytes in which SNCA is one of the most plentiful proteins\u003csup\u003e4\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThere have been some studies that established machine learning classifiers with different focuses and using different datasets. Scherzer et al. built the first ML classifier in PD using 22 genes. Liu et al\u003cem\u003e.\u003c/em\u003e used clinical and genetic information for the prediction of cognitive decline in patients with Parkinson’s disease and the progression of PD \u003csup\u003e73,74\u003c/sup\u003e, and Severson et al.identified subtypes of PD based on clinical data\u003csup\u003e75\u003c/sup\u003e. Here, to maximize the value of the massive amount of data, we tested several machine-learning methods for PD diagnosis classification using clinical data, transcriptomics data, and genetics data. Our final multi-omics model has high AUC values and high sensitivity and specificity as compared to other reports\u003csup\u003e13,58\u003c/sup\u003e, which means our model cannot only identify the PD patients but also recognize the low-risk individuals. In future studies, we will examine more advanced machine algorithms, such as the DNN, CNN, and VAE, to improve the performances and explore more meaningful insights behind the data.\u003c/p\u003e\u003cp\u003eThere may be limitations to the current analysis. The analysis was focused on the diagnostic classification of PD at the baseline in a cross-sectional design. Future analyses will be important to prospectively and longitudinally test diagnostic classifiers. Moreover, progression biomarkers are needed, and this will require analyses of longitudinal RNA data sets. To begin to translate these candidate classifiers to the clinic, more research is needed to clarify the high and low predictive values in different clinically relevant scenarios, for example, as an aid for augmented medicine in the patient populations of movement disorders clinics, or as a screening tool for high-risk individuals in the general population. These scenarios involve highly distinct incidences of PD patients and we require a clearer understanding of high predictive value and low predictive value in the outputs of the models using the selected biomarker genes.\u003c/p\u003e\u003cp\u003eIn this study, we identified a set of DE RNAs and defined neutrophil activation and degranulation as potential early diagnostic biomarkers. We built a high-performance PD classification model which could be helpful for PD diagnosis prediction. We provided a computational framework that will be helpful for PD biomarker discovery and provide disease risk prediction, which is a critical step for the better assessment of PD risk and accelerating the diagnosis of Parkinson’s disease.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003eStudy design\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eFirst, we discovered genes and RNAs that are differentially expressed in PD in an analysis of the discovery cohort. Also, the novel eRNAs and circRNAs were quantified in both discovery and replication datasets and the significantly differentially expressed eRNAs (DE eRNAs) and circRNAs (DE circRNAs) were presented in this work. Utilizing the DEGs and DE RNAs, genetics, and clinical data, we built the PD diagnosis classifier models for prediction of PD patients. We further replicated those significant DE genes and novel RNAs in a cross-sectional analysis of the replication cohort. In our previous study, we probed the transcriptome of dopamine neurons in post-mortem brains with various levels of neuropathology. We then evaluated the blood-based PD-associated genes (discovered and replicated DEGs) for association with PD neuropathology in dopamine neurons using our laser-captured RNA-seq dataset (BRAINcode,\u0026nbsp;)\u003csup\u003e6\u003c/sup\u003e. \u0026nbsp; Meanwhile, the functional enrichment analysis was conducted on those replicated DEGs. As well, the cell type enrichment analysis was carried out to find the enriched cell types of the replicated DEGs (\u003cstrong\u003eFig. 1\u003c/strong\u003e).\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eSample and gene expression quality control\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eFilters were applied to remove those participants as shown in \u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 2\u003c/strong\u003e and \u003cstrong\u003eExtended Data\u003c/strong\u003e \u003cstrong\u003eFig. 3\u003c/strong\u003e. The same filtration strategies were applied to both the discovery and the replication datasets. At the very beginning, participants without RNA-seq were removed. In the next step, only the patients that have the baseline RNA-seq data with RIN greater than 5.0 were kept in our following analysis. To limit batch effects due to ancestry, we restricted our analysis to patients self-identifying as White. Meanwhile, we restricted our analysis to patients listed as either cases or controls. Lastly, we excluded those participants with diagnosis conflicts during the follow-up visits after the initial enrollment in case and control groups separately. \u0026nbsp;Those PD cases whose diagnosis changed during follow-up were removed. Similarly, control participants who developed PD were excluded. Prodromal participants and SWEDD (Scans without evidence of dopaminergic deficit) patients were also removed. Participants with missing clinical or genetic data were also moved as those data would be used in the following analysis. \u0026nbsp;\u003c/p\u003e\u003cp\u003eQuality control of expression data was performed to filter out lowly expressed genes and remove sample outliers. For the genes, we first removed genes that have low expression levels defined as counts of fewer than 5 reads in more than 90% of samples and variances of less than 1 across the samples. To check if there is any sex information that is mislabeled, a scatter plot of the expression levels of a Y chromosome-specific gene and an X chromosome-specific gene was plotted. We also verified the biases of sequencing data arising from case/control sample distributions on the plates were minimal.\u0026nbsp;\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eIdentification of PD-associated mRNAs\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eThe differential expression analysis was conducted using DEseq2 (v1.36.0)\u003csup\u003e76\u003c/sup\u003e. The gene read counts data from Salmon\u003csup\u003e77\u003c/sup\u003e quantification result files were used. The primary differential expression was tested between the PD conditions (PD cases vs. healthy controls), and the age_at_baseline (continuous variable), sex, plate, RIN, and the top 10 principal components (PCs) of the genotype data were included as covariates in DEseq2. The replicated DEGs were further analyzed using ClusterProfiler\u003csup\u003e78\u003c/sup\u003e to find the enriched functions. We also performed cell-type-specific enrichment analysis using the WebCSEA online tool\u003csup\u003e79\u003c/sup\u003e to find which human tissue-cell types these genes might manifest their impacts on.\u0026nbsp;\u003c/p\u003e\u003cp\u003eAs a secondary analysis, we further looked at gene expression changes associated with motor severity, indicated by the MDS-UPDRS part III summary score. Tests were performed in the same DESeq2 framework where the MDS-UPDRS score was treated as a continuous dependent variable.\u0026nbsp;\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eIdentification of PD-associated enhancer RNAs and circular RNAs\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eSince the AMP PD provided the raw whole sequencing data, we would like to know the non-coding novel RNAs, especially the eRNA and circRNA differences in PD patients and healthy individuals. We called eRNAs and circRNAs in all datasets. We used our previously developed method\u003csup\u003e6\u003c/sup\u003e to identify eRNA candidates in the blood. The circRNAs were called using the CIRCexplorer2 package\u003csup\u003e80\u003c/sup\u003e.\u0026nbsp;Then differential expression analysis was conducted on the eRNA and circRNA reads count using DESeq2. Since the circRNAs have relatively lower reads count in the samples, we used all samples, instead of the baseline samples only, to increase the sample size in order to empower the DE circRNA discovery. The same covariates as in finding the DEGs were used.\u0026nbsp;\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eConstruction of PD diagnosis classifier models\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eWe have built the classifiers utilizing the multi-modality data which includes transcriptomics, polygenic risk score (PRS), and clinical data. The PPMI samples (discovery cohort) were randomly split into a training set (80%) and a validation set (20%). We did the random splits 100 times to test the model's stability. The training set was used to build the classifiers. The validation set was used to optimize the hyper-parameters of each model through a grid search. Final models were tested on the independent PDBP/BioFIND samples (replication cohort).\u003c/p\u003e\u003cp\u003eThree models were built in sequential order using the following feature sets respectively: transcriptomics only (“DEGs”), transcriptomics plus polygenic risk score (“DEGs+PRS”), and transcriptomics, polygenic risk score, and clinical data combined (“DEGs+PRS+Clinical”). The transcriptomics data is the 874 DEGs from the PPMI cohort. The PRS was calculated using PRSice-2\u003csup\u003e81\u003c/sup\u003e based on the 7,057 PD-associated significant variants from the recently published PD GWAS work\u003csup\u003e14\u003c/sup\u003e. Clinical data includes the total UPSIT score, sex, and age at the baseline. Since we have too many features, and the feature size is larger than the sample size, we have tried feature selections and modeling the classifiers to avoid overfitting.\u0026nbsp;\u003c/p\u003e\u003cp\u003eTo train the models, the variance stabilizing transformed (VST) expression abundances were standardized after log transformations. \u0026nbsp;Feature selection was conducted on the training set using the LASSO approach by making use of sklearn.linear_model.Lasso function and the parameter alpha were screened to pick the best one to have the best area under the receiver operating characteristic curve (AUROC) value. Only features with non-zero coefficients were included in the model. To take advantage of different machine learning algorithms, 10 different machine learning classifiers were trained and compared, including support vector machine with linear kernel (SVM), support vector machine with rbf kernel (SVM_rbf), linear regression (LN), logistic regression (LR), stochastic gradient descent (SGD), AdaBoost classifier (ABC), gradient boosting classifier (GBC), random forest (RF), k-nearest neighbors (KNN), and multiple layers perceptron classifier (MLP).\u003c/p\u003e\u003cp\u003eTo investigate if the eRNAs, or circRNAs are predictive for PD diagnosis, we also tested the classifiers using the DE eRNAs, and DE circRNAs separately.\u0026nbsp;\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eConfirmation in brain\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eWe tested blood biomarker transcripts using the BRAINcode dataset. The PD-associated RNAs that are also differentially expressed in the brain will be highly relevant and prioritized for validation. We conducted the DE analysis using the data from brain neuron samples and compared the blood DEGs and brain DEGs. In our BRAINcode v2 project, we performed laser-capture microdissection total RNA-sequencing (lcRNAseq)\u003csup\u003e3\u003c/sup\u003e on dopamine neurons from the midbrain substantia nigra pars compacta of 104 high-quality human postmortem brains (HC: n = 59; ILB: n = 27; PD: n=18). Many polyadenylated and non-polyadenylated transcripts are identified with high confidence. The DEGs in dopamine neurons were identified between PD samples and health control samples. The DEGs with the same fold change directions as in brain data were obtained. \u0026nbsp;\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eEvaluation of a second digital gene expression platform in the Harvard Biomarkers Study\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eWe also compared the expression levels of several PD-associated genes from the blood with our in-house NanoString data to validate our findings. The NanoString dataset with PD cases and healthy controls is nested in the Harvard Biomarker Study (HBS). The participant’s blood sample with high RNA quality (RIN\u0026nbsp;≥ 7)\u0026nbsp;was processed for digital expression analysis on the NanoString platform\u003csup\u003e82\u003c/sup\u003e with 33 distinct molecular barcodes (29 PD-associated genes) to count the abundance of selected-transcripts directly in RNA from blood cells. A total of 617 PD cases and 618 healthy controls passed normalization processing to validate our findings.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe PPMI, PDBP, and BioFIND data can be accessed from the AMP-PD Google Cloud Storage with the approved “Data Use Approvement”. All the up-to-date information and data collection or data processing procedures on the AMP-PD program can be found at https://www.amp-pd.org. The brain neuron data and the NanoString were produced in our own lab. All the analysis code can be accessed at:\u0026nbsp;. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was funded in part by NIH grant 1U01NS120637, R01AG057331, U01 NS082157, the U.S. Department of Defense (to C.R.S.), the American Parkinson Disease Association (APDA) Research Award (to X.D.). C.R.S.’s work is supported by NIH grants NINDS/NIA R01NS115144, U01NS095736, U01NS100603, and the American Parkinson Disease Association Center for Advanced Parkinson Research. X.D. received funding from the American Parkinson Disease Association (APDA). C.R.S and X.D.’s work was in part funded by Aligning Science Across Parkinson’s [ASAP-000301] through the Michael J. Fox Foundation for Parkinson’s Research (MJFF). For the purpose of open access, the author has applied a CC BY public copyright license to all Author Accepted Manuscripts arising from this submission.\u003c/p\u003e\n\u003cp\u003eData used in the preparation of this article were obtained from the Accelerating Medicine Partnership® (AMP®) Parkinson’s Disease (AMP PD) Knowledge Platform. For up-to-date information on the study, visit https://www.amp-pd.org.The AMP® PD program is a public-private partnership managed by the Foundation for the National Institutes of Health and funded by the National Institute of Neurological Disorders and Stroke (NINDS) in partnership with the Aligning Science Across Parkinson's (ASAP) initiative; Celgene Corporation, a subsidiary of Bristol-Myers Squibb Company; GlaxoSmithKline plc (GSK); The Michael J. Fox Foundation for Parkinson's Research ; Pfizer Inc.; AbbVie Inc.; Sanofi US Services Inc.; and Verily Life Sciences. \u0026nbsp;ACCELERATING MEDICINES PARTNERSHIP and AMP are registered service marks of the U.S. Department of Health and Human Services.\u003c/p\u003e\n\u003cp\u003eClinical data and biosamples used in preparation of this article were obtained from the (i) Michael J. Fox Foundation for Parkinson’s Research (MJFF) and National Institutes of Neurological Disorders and Stroke (NINDS) BioFIND study,(ii) NINDS Parkinson's Disease Biomarkers Program (PDBP), (iii) MJFF Parkinson’s Progression Markers Initiative (PPMI). PPMI is sponsored by The Michael J. Fox Foundation for Parkinson’s Research and supported by a consortium of scientific partners: [list the full names of all of the PPMI funding partners found at https://www.ppmi-info.org/about-ppmi/who-we-are/study-sponsors]. The PPMI investigators have not participated in reviewing the data analysis or content of the manuscript. For up-to-date information on the study, visit\u0026nbsp;. The Parkinson’s Disease Biomarker Program (PDBP) consortium is supported by the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health. \u0026nbsp;A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. The PDBP investigators have not participated in reviewing the data analysis or content of \u0026nbsp;the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eJankovic, J. \u0026amp; Tan, E. K. Parkinson\u0026rsquo;s disease: Etiopathogenesis and treatment. \u003cem\u003eJ Neurol Neurosurg Psychiatry\u003c/em\u003e \u003cstrong\u003e91\u003c/strong\u003e, (2020).\u003c/li\u003e\n\u003cli\u003eDorsey, E. R., Sherer, T., Okun, M. S. \u0026amp; Bloemd, B. R. The emerging evidence of the Parkinson pandemic. \u003cem\u003eJournal of Parkinson\u0026rsquo;s Disease\u003c/em\u003e vol. 8 Preprint at https://doi.org/10.3233/JPD-181474 (2018).\u003c/li\u003e\n\u003cli\u003eDong, X. \u003cem\u003eet al.\u003c/em\u003e Circular RNAs in the human brain are tailored to neuron identity and neuropsychiatric disease. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 5327 (2023).\u003c/li\u003e\n\u003cli\u003eScherzer, C. R. \u003cem\u003eet al.\u003c/em\u003e GATA transcription factors directly regulate the Parkinson\u0026rsquo;s disease-linked gene \u0026alpha;-synuclein. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e105\u003c/strong\u003e, 10907\u0026ndash;10912 (2008).\u003c/li\u003e\n\u003cli\u003eZheng, B. \u003cem\u003eet al.\u003c/em\u003e \u003cem\u003ePGC-1\u003c/em\u003e \u0026alpha;, A Potential Therapeutic Target for Early Intervention in Parkinson\u0026rsquo;s Disease. \u003cem\u003eSci Transl Med\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, (2010).\u003c/li\u003e\n\u003cli\u003eDong, X. \u003cem\u003eet al.\u003c/em\u003e Enhancers active in dopamine neurons are a primary link between genetic variation and neuropsychiatric disease. \u003cem\u003eNat Neurosci\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 1482\u0026ndash;1492 (2018).\u003c/li\u003e\n\u003cli\u003eStefanis, L. \u0026alpha;-Synuclein in Parkinson\u0026rsquo;s disease. \u003cem\u003eCold Spring Harb Perspect Med\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, a009399 (2012).\u003c/li\u003e\n\u003cli\u003eCarballo-Carbajal, I. \u003cem\u003eet al.\u003c/em\u003e Brain tyrosinase overexpression implicates age-dependent neuromelanin production in Parkinson\u0026rsquo;s disease pathogenesis. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 973 (2019).\u003c/li\u003e\n\u003cli\u003eLe, W., Dong, J., Li, S. \u0026amp; Korczyn, A. D. Can Biomarkers Help the Early Diagnosis of Parkinson\u0026rsquo;s Disease? \u003cem\u003eNeurosci Bull\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 535\u0026ndash;542 (2017).\u003c/li\u003e\n\u003cli\u003eBrooks, D. J. The early diagnosis of parkinson\u0026rsquo;s disease. \u003cem\u003eAnn Neurol\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, S10\u0026ndash;S18 (1998).\u003c/li\u003e\n\u003cli\u003eBhat, S., Acharya, U. R., Hagiwara, Y., Dadmehr, N. \u0026amp; Adeli, H. Parkinson\u0026rsquo;s disease: Cause factors, measurable indicators, and early diagnosis. \u003cem\u003eComput Biol Med\u003c/em\u003e \u003cstrong\u003e102\u003c/strong\u003e, 234\u0026ndash;241 (2018).\u003c/li\u003e\n\u003cli\u003eKarapinar Senturk, Z. Early diagnosis of Parkinson\u0026rsquo;s disease using machine learning algorithms. \u003cem\u003eMed Hypotheses\u003c/em\u003e \u003cstrong\u003e138\u003c/strong\u003e, 109603 (2020).\u003c/li\u003e\n\u003cli\u003eMakarious, M. B. \u003cem\u003eet al.\u003c/em\u003e Multi-modality machine learning predicting Parkinson\u0026rsquo;s disease. \u003cem\u003eNPJ Parkinsons Dis\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 35 (2022).\u003c/li\u003e\n\u003cli\u003eNalls, M. A. \u003cem\u003eet al.\u003c/em\u003e Identification of novel risk loci, causal insights, and heritable risk for Parkinson\u0026rsquo;s disease: a meta-analysis of genome-wide association studies. \u003cem\u003eLancet Neurol\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1091\u0026ndash;1102 (2019).\u003c/li\u003e\n\u003cli\u003eHu, C., Ke, C. J. \u0026amp; Wu, C. Identification of biomarkers for early diagnosis of Parkinson\u0026rsquo;s disease by multi-omics joint analysis. \u003cem\u003eSaudi J Biol Sci\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 2082\u0026ndash;2088 (2020).\u003c/li\u003e\n\u003cli\u003eByron, S. A., Van Keuren-Jensen, K. R., Engelthaler, D. M., Carpten, J. D. \u0026amp; Craig, D. W. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. \u003cem\u003eNat Rev Genet\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 257\u0026ndash;271 (2016).\u003c/li\u003e\n\u003cli\u003eBurgos, K. \u003cem\u003eet al.\u003c/em\u003e Profiles of Extracellular miRNA in Cerebrospinal Fluid and Serum from Patients with Alzheimer\u0026rsquo;s and Parkinson\u0026rsquo;s Diseases Correlate with Disease Status and Features of Pathology. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, e94839 (2014).\u003c/li\u003e\n\u003cli\u003eSantiago, J. A. \u0026amp; Potashkin, J. A. Blood Transcriptomic Meta-analysis Identifies Dysregulation of Hemoglobin and Iron Metabolism in Parkinson\u0026rsquo; Disease. \u003cem\u003eFront Aging Neurosci\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 73 (2017).\u003c/li\u003e\n\u003cli\u003eChahine, L. M. \u0026amp; Stern, M. B. Parkinson\u0026rsquo;s Disease Biomarkers: Where Are We and Where Do We Go Next? \u003cem\u003eMov Disord Clin Pract\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 796\u0026ndash;805 (2017).\u003c/li\u003e\n\u003cli\u003eMarek, K. \u003cem\u003eet al.\u003c/em\u003e The Parkinson Progression Marker Initiative (PPMI). \u003cem\u003eProg Neurobiol\u003c/em\u003e \u003cstrong\u003e95\u003c/strong\u003e, 629\u0026ndash;635 (2011).\u003c/li\u003e\n\u003cli\u003eMarek, K. \u003cem\u003eet al.\u003c/em\u003e The Parkinson\u0026rsquo;s progression markers initiative (PPMI) \u0026ndash; establishing a PD biomarker cohort. \u003cem\u003eAnn Clin Transl Neurol\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1460\u0026ndash;1477 (2018).\u003c/li\u003e\n\u003cli\u003eRosenthal, L. S. \u003cem\u003eet al.\u003c/em\u003e The NINDS Parkinson\u0026rsquo;s disease biomarkers program. \u003cem\u003eMovement Disorders\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 915\u0026ndash;923 (2016).\u003c/li\u003e\n\u003cli\u003eKang, U. J. \u003cem\u003eet al.\u003c/em\u003e The BioFIND study: Characteristics of a clinically typical Parkinson\u0026rsquo;s disease biomarker cohort. \u003cem\u003eMovement Disorders\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 924\u0026ndash;932 (2016).\u003c/li\u003e\n\u003cli\u003eCraig, D. W. \u003cem\u003eet al.\u003c/em\u003e RNA sequencing of whole blood reveals early alterations in immune cells and gene expression in Parkinson\u0026rsquo;s disease. \u003cem\u003eNat Aging\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, 734\u0026ndash;747 (2021).\u003c/li\u003e\n\u003cli\u003eZhong, L., Liu, P., Fan, J. \u0026amp; Luo, Y. Long non-coding RNA H19: Physiological functions and involvements in central nervous system disorders. \u003cem\u003eNeurochem Int\u003c/em\u003e \u003cstrong\u003e148\u003c/strong\u003e, 105072 (2021).\u003c/li\u003e\n\u003cli\u003eZhang, Y., Xia, Q. \u0026amp; Lin, J. LncRNA H19 Attenuates Apoptosis in MPTP-Induced Parkinson\u0026rsquo;s Disease Through Regulating miR-585-3p/PIK3R3. \u003cem\u003eNeurochem Res\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, 1700\u0026ndash;1710 (2020).\u003c/li\u003e\n\u003cli\u003eSatoh, J., Asahina, N., Kitano, S. \u0026amp; Kino, Y. A Comprehensive Profile of ChIP-Seq-Based PU.1/Spi1 Target Genes in Microglia. \u003cem\u003eGene Regul Syst Bio\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, GRSB.S19711 (2014).\u003c/li\u003e\n\u003cli\u003eHossain, Md. B., Islam, Md. K., Adhikary, A., Rahaman, A. \u0026amp; Islam, Md. Z. Bioinformatics Approach to Identify Significant Biomarkers, Drug Targets Shared Between Parkinson\u0026rsquo;s Disease and Bipolar Disorder: A Pilot Study. \u003cem\u003eBioinform Biol Insights\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 117793222210792 (2022).\u003c/li\u003e\n\u003cli\u003eShen, R. \u003cem\u003eet al.\u003c/em\u003e Association of Two Polymorphisms in CCL2 With Parkinson\u0026rsquo;s Disease: A Case-Control Study. \u003cem\u003eFront Neurol\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, (2019).\u003c/li\u003e\n\u003cli\u003eCheng, Y. \u003cem\u003eet al.\u003c/em\u003e The regulation of macrophage polarization by hypoxia-PADI4 coordination in Rheumatoid arthritis. \u003cem\u003eInt Immunopharmacol\u003c/em\u003e \u003cstrong\u003e99\u003c/strong\u003e, 107988 (2021).\u003c/li\u003e\n\u003cli\u003eZhu, C., Liu, C. \u0026amp; Chai, Z. Role of the PADI family in inflammatory autoimmune diseases and cancers: A systematic review. \u003cem\u003eFront Immunol\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eThiam, H. R. \u003cem\u003eet al.\u003c/em\u003e NETosis proceeds by cytoskeleton and endomembrane disassembly and PAD4-mediated chromatin decondensation and nuclear envelope rupture. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e117\u003c/strong\u003e, 7326\u0026ndash;7337 (2020).\u003c/li\u003e\n\u003cli\u003ePetrozziello, T. \u003cem\u003eet al.\u003c/em\u003e Neuroinflammation and histone H3 citrullination are increased in X-linked Dystonia Parkinsonism post-mortem prefrontal cortex. \u003cem\u003eNeurobiol Dis\u003c/em\u003e \u003cstrong\u003e144\u003c/strong\u003e, 105032 (2020).\u003c/li\u003e\n\u003cli\u003ePaschkowsky, S., Hamz\u0026eacute;, M., Oestereich, F. \u0026amp; Munter, L. M. Alternative Processing of the Amyloid Precursor Protein Family by Rhomboid Protease RHBDL4. \u003cem\u003eJournal of Biological Chemistry\u003c/em\u003e \u003cstrong\u003e291\u003c/strong\u003e, 21903\u0026ndash;21912 (2016).\u003c/li\u003e\n\u003cli\u003eGagliardi, M. \u003cem\u003eet al.\u003c/em\u003e DNAJC13 mutation screening in patients with Parkinson\u0026rsquo;s disease from South Italy. \u003cem\u003eParkinsonism Relat Disord\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 134\u0026ndash;137 (2018).\u003c/li\u003e\n\u003cli\u003eLorenzo‐Betancor, O. \u003cem\u003eet al.\u003c/em\u003e DNAJC13 p.Asn855Ser mutation screening in Parkinson\u0026rsquo;s disease and pathologically confirmed Lewy body disease patients. \u003cem\u003eEur J Neurol\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 1323\u0026ndash;1325 (2015).\u003c/li\u003e\n\u003cli\u003eVilari\u0026ntilde;o-G\u0026uuml;ell, C. \u003cem\u003eet al.\u003c/em\u003e DNAJC13 mutations in Parkinson disease. \u003cem\u003eHum Mol Genet\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 1794\u0026ndash;1801 (2014).\u003c/li\u003e\n\u003cli\u003eMochizuki, H. \u003cem\u003eet al.\u003c/em\u003e An AAV-derived Apaf-1 dominant negative inhibitor prevents MPTP toxicity as antiapoptotic gene therapy for Parkinson\u0026rsquo;s disease. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e98\u003c/strong\u003e, 10918\u0026ndash;10923 (2001).\u003c/li\u003e\n\u003cli\u003eKaiser, S. \u003cem\u003eet al.\u003c/em\u003e A proteogenomic view of Parkinson\u0026rsquo;s disease causality and heterogeneity. \u003cem\u003eNPJ Parkinsons Dis\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 24 (2023).\u003c/li\u003e\n\u003cli\u003eGu, X.-J. \u003cem\u003eet al.\u003c/em\u003e Expanding causal genes for Parkinson\u0026rsquo;s disease via multi-omics analysis. \u003cem\u003eNPJ Parkinsons Dis\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 146 (2023).\u003c/li\u003e\n\u003cli\u003eChoi, Y. R. \u003cem\u003eet al.\u003c/em\u003e Prion-like Propagation of \u0026alpha;-Synuclein Is Regulated by the Fc\u0026gamma;RIIB-SHP-1/2 Signaling Pathway in Neurons. \u003cem\u003eCell Rep\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 136\u0026ndash;148 (2018).\u003c/li\u003e\n\u003cli\u003eLiu, C., Cui, Z., Wang, S. \u0026amp; Zhang, D. CD93 and GIPC expression and localization during central nervous system inflammation. \u003cem\u003eNeural Regen Res\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1995 (2014).\u003c/li\u003e\n\u003cli\u003ePihlstr\u0026oslash;m, L. \u003cem\u003eet al.\u003c/em\u003e A comprehensive analysis of SNCA-related genetic risk in sporadic parkinson disease. \u003cem\u003eAnn Neurol\u003c/em\u003e \u003cstrong\u003e84\u003c/strong\u003e, 117\u0026ndash;129 (2018).\u003c/li\u003e\n\u003cli\u003eGwinn, K. \u003cem\u003eet al.\u003c/em\u003e Parkinson\u0026rsquo;s disease biomarkers: perspective from the NINDS Parkinson\u0026rsquo;s Disease Biomarkers Program. \u003cem\u003eBiomark Med\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 451\u0026ndash;473 (2017).\u003c/li\u003e\n\u003cli\u003eLocascio, J. J. \u003cem\u003eet al.\u003c/em\u003e Association between \u0026alpha;-synuclein blood transcripts and early, neuroimaging-supported Parkinson\u0026rsquo;s disease. \u003cem\u003eBrain\u003c/em\u003e \u003cstrong\u003e138\u003c/strong\u003e, 2659\u0026ndash;71 (2015).\u003c/li\u003e\n\u003cli\u003eScherzer, C. R. \u003cem\u003eet al.\u003c/em\u003e Molecular markers of early Parkinson\u0026rsquo;s disease based on gene expression in blood. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e104\u003c/strong\u003e, 955\u0026ndash;960 (2007).\u003c/li\u003e\n\u003cli\u003eDing, H. \u003cem\u003eet al.\u003c/em\u003e Unrecognized vitamin D3 deficiency is common in Parkinson disease: Harvard Biomarker Study. \u003cem\u003eNeurology\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 1531\u0026ndash;7 (2013).\u003c/li\u003e\n\u003cli\u003eMurphy, K. E. \u003cem\u003eet al.\u003c/em\u003e Lysosomal-associated membrane protein 2 isoforms are differentially affected in early Parkinson\u0026rsquo;s disease. \u003cem\u003eMovement Disorders\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 1639\u0026ndash;1647 (2015).\u003c/li\u003e\n\u003cli\u003eWu, H. \u003cem\u003eet al.\u003c/em\u003e IKIP Negatively Regulates NF-\u0026kappa;B Activation and Inflammation through Inhibition of IKK\u0026alpha;/\u0026beta; Phosphorylation. \u003cem\u003eJ Immunol\u003c/em\u003e \u003cstrong\u003e204\u003c/strong\u003e, 418\u0026ndash;427 (2020).\u003c/li\u003e\n\u003cli\u003eWu, F. \u003cem\u003eet al.\u003c/em\u003e CXCR2 is essential for cerebral endothelial activation and leukocyte recruitment during neuroinflammation. \u003cem\u003eJ Neuroinflammation\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 98 (2015).\u003c/li\u003e\n\u003cli\u003eVeenstra, M. \u0026amp; Ransohoff, R. M. Chemokine receptor CXCR2: physiology regulator and neuroinflammation controller? \u003cem\u003eJ Neuroimmunol\u003c/em\u003e \u003cstrong\u003e246\u003c/strong\u003e, 1\u0026ndash;9 (2012).\u003c/li\u003e\n\u003cli\u003eShih, R.-H., Wang, C.-Y. \u0026amp; Yang, C.-M. NF-kappaB Signaling Pathways in Neurological Inflammation: A Mini Review. \u003cem\u003eFront Mol Neurosci\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 77 (2015).\u003c/li\u003e\n\u003cli\u003eKoprich, J. B., Reske-Nielsen, C., Mithal, P. \u0026amp; Isacson, O. Neuroinflammation mediated by IL-1\u0026beta; increases susceptibility of dopamine neurons to degeneration in an animal model of Parkinson\u0026rsquo;s disease. \u003cem\u003eJ Neuroinflammation\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 8 (2008).\u003c/li\u003e\n\u003cli\u003eUgrumov, M. Development of early diagnosis of Parkinson\u0026rsquo;s disease: Illusion or reality? \u003cem\u003eCNS Neurosci Ther\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 997\u0026ndash;1009 (2020).\u003c/li\u003e\n\u003cli\u003eChen, X. \u003cem\u003eet al.\u003c/em\u003e The early diagnosis of Parkinson\u0026rsquo;s disease through combined biomarkers. \u003cem\u003eActa Neurol Scand\u003c/em\u003e \u003cstrong\u003e140\u003c/strong\u003e, 268\u0026ndash;273 (2019).\u003c/li\u003e\n\u003cli\u003eKatunina, E. A., Ilina, E. P., Sadekhova, G. I. \u0026amp; Gaisenuk, E. I. Approaches to the Early Diagnosis of Parkinson\u0026rsquo;s Disease. \u003cem\u003eNeurosci Behav Physiol\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, 393\u0026ndash;400 (2020).\u003c/li\u003e\n\u003cli\u003eBecker, G. \u003cem\u003eet al.\u003c/em\u003e Early diagnosis of Parkinson\u0026rsquo;s disease. \u003cem\u003eJ Neurol\u003c/em\u003e \u003cstrong\u003e249\u003c/strong\u003e, 1\u0026ndash;1 (2002).\u003c/li\u003e\n\u003cli\u003ePantaleo, E. \u003cem\u003eet al.\u003c/em\u003e A Machine Learning Approach to Parkinson\u0026rsquo;s Disease Blood Transcriptomics. \u003cem\u003eGenes (Basel)\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eWang, H. \u003cem\u003eet al.\u003c/em\u003e Identification and Experimental Validation of Parkinson\u0026rsquo;s Disease with Major Depressive Disorder Common Genes. \u003cem\u003eMol Neurobiol\u003c/em\u003e \u003cstrong\u003e60\u003c/strong\u003e, 6092\u0026ndash;6108 (2023).\u003c/li\u003e\n\u003cli\u003eJensen, M. P. \u003cem\u003eet al.\u003c/em\u003e Lower Lymphocyte Count is Associated With Increased Risk of Parkinson\u0026rsquo;s Disease. \u003cem\u003eAnn Neurol\u003c/em\u003e \u003cstrong\u003e89\u003c/strong\u003e, 803\u0026ndash;812 (2021).\u003c/li\u003e\n\u003cli\u003eYacoubian, T. A. \u003cem\u003eet al.\u003c/em\u003e Brain and Systemic Inflammation in De Novo Parkinson\u0026rsquo;s Disease. \u003cem\u003eMovement Disorders\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 743\u0026ndash;754 (2023).\u003c/li\u003e\n\u003cli\u003eCunningham, C., Wilcockson, D. C., Campion, S., Lunnon, K. \u0026amp; Perry, V. H. Central and Systemic Endotoxin Challenges Exacerbate the Local Inflammatory Response and Increase Neuronal Death during Chronic Neurodegeneration. \u003cem\u003eThe Journal of Neuroscience\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 9275\u0026ndash;9284 (2005).\u003c/li\u003e\n\u003cli\u003eKasen, A. \u003cem\u003eet al.\u003c/em\u003e Upregulation of \u0026alpha;-synuclein following immune activation: Possible trigger of Parkinson\u0026rsquo;s disease. \u003cem\u003eNeurobiol Dis\u003c/em\u003e \u003cstrong\u003e166\u003c/strong\u003e, 105654 (2022).\u003c/li\u003e\n\u003cli\u003eWang, S. \u003cem\u003eet al.\u003c/em\u003e \u0026alpha;-Synuclein, a chemoattractant, directs microglial migration via H \u003csub\u003e2\u003c/sub\u003e O \u003csub\u003e2\u003c/sub\u003e -dependent Lyn phosphorylation. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e112\u003c/strong\u003e, (2015).\u003c/li\u003e\n\u003cli\u003eBandres-Ciga, S. \u003cem\u003eet al.\u003c/em\u003e Large-scale pathway specific polygenic risk and transcriptomic community network analysis identifies novel functional pathways in Parkinson disease. \u003cem\u003eActa Neuropathol\u003c/em\u003e \u003cstrong\u003e140\u003c/strong\u003e, 341\u0026ndash;358 (2020).\u003c/li\u003e\n\u003cli\u003eGillespie, M. \u003cem\u003eet al.\u003c/em\u003e The reactome pathway knowledgebase 2022. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D687\u0026ndash;D692 (2022).\u003c/li\u003e\n\u003cli\u003eLacy, P. Mechanisms of Degranulation in Neutrophils. \u003cem\u003eAllergy, Asthma \u0026amp; Clinical Immunology\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 98 (2006).\u003c/li\u003e\n\u003cli\u003eHakimi, M. \u003cem\u003eet al.\u003c/em\u003e Parkinson\u0026rsquo;s disease-linked LRRK2 is expressed in circulating and tissue immune cells and upregulated following recognition of microbial structures. \u003cem\u003eJ Neural Transm\u003c/em\u003e \u003cstrong\u003e118\u003c/strong\u003e, 795\u0026ndash;808 (2011).\u003c/li\u003e\n\u003cli\u003eHobson, B. D. \u0026amp; Sulzer, D. Neuronal Presentation of Antigen and Its Possible Role in Parkinson\u0026rsquo;s Disease. \u003cem\u003eJ Parkinsons Dis\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, S137\u0026ndash;S147 (2022).\u003c/li\u003e\n\u003cli\u003eIba, M. \u003cem\u003eet al.\u003c/em\u003e Neuroinflammation is associated with infiltration of T cells in Lewy body disease and \u0026alpha;-synuclein transgenic models. \u003cem\u003eJ Neuroinflammation\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 214 (2020).\u003c/li\u003e\n\u003cli\u003eGaliano-Landeira, J., Torra, A., Vila, M. \u0026amp; Bov\u0026eacute;, J. CD8 T cell nigral infiltration precedes synucleinopathy in early stages of Parkinson\u0026rsquo;s disease. \u003cem\u003eBrain\u003c/em\u003e \u003cstrong\u003e143\u003c/strong\u003e, 3717\u0026ndash;3733 (2020).\u003c/li\u003e\n\u003cli\u003eLindestam Arlehamn, C. S. \u003cem\u003eet al.\u003c/em\u003e \u0026alpha;-Synuclein-specific T cell reactivity is associated with preclinical and early Parkinson\u0026rsquo;s disease. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1875 (2020).\u003c/li\u003e\n\u003cli\u003eLiu, G. \u003cem\u003eet al.\u003c/em\u003e Prediction of cognition in Parkinson\u0026rsquo;s disease with a clinical\u0026ndash;genetic score: a longitudinal analysis of nine cohorts. \u003cem\u003eLancet Neurol\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 620\u0026ndash;629 (2017).\u003c/li\u003e\n\u003cli\u003eLiu, G. \u003cem\u003eet al.\u003c/em\u003e Genome-wide survival study identifies a novel synaptic locus and polygenic score for cognitive progression in Parkinson\u0026rsquo;s disease. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 787\u0026ndash;793 (2021).\u003c/li\u003e\n\u003cli\u003eSeverson, K. A. \u003cem\u003eet al.\u003c/em\u003e Discovery of Parkinson\u0026rsquo;s disease states and disease progression modelling: a longitudinal data study using machine learning. \u003cem\u003eLancet Digit Health\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, e555\u0026ndash;e564 (2021).\u003c/li\u003e\n\u003cli\u003eLove, M. I., Huber, W. \u0026amp; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. \u003cem\u003eGenome Biol\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 550 (2014).\u003c/li\u003e\n\u003cli\u003ePatro, R., Duggal, G., Love, M. I., Irizarry, R. A. \u0026amp; Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. \u003cem\u003eNat Methods\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 417\u0026ndash;419 (2017).\u003c/li\u003e\n\u003cli\u003eYu, G., Wang, L.-G., Han, Y. \u0026amp; He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. \u003cem\u003eOMICS\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 284\u0026ndash;7 (2012).\u003c/li\u003e\n\u003cli\u003eDai, Y. \u003cem\u003eet al.\u003c/em\u003e WebCSEA: web-based cell-type-specific enrichment analysis of genes. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, W782\u0026ndash;W790 (2022).\u003c/li\u003e\n\u003cli\u003eZhang, X.-O. \u003cem\u003eet al.\u003c/em\u003e Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. \u003cem\u003eGenome Res\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 1277\u0026ndash;87 (2016).\u003c/li\u003e\n\u003cli\u003eChoi, S. W. \u0026amp; O\u0026rsquo;Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale data. \u003cem\u003eGigascience\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, (2019).\u003c/li\u003e\n\u003cli\u003eGeiss, G. K. \u003cem\u003eet al.\u003c/em\u003e Direct multiplexed measurement of gene expression with color-coded probe pairs. \u003cem\u003eNat Biotechnol\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 317\u0026ndash;25 (2008).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 1. Summary of clinical data at baseline\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"816\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 17.7696%;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 8.70098%;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 32.2304%;\"\u003e\n \u003cp\u003e\u0026nbsp;Discovery dataset\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 31.7402%;\"\u003e\n \u003cp\u003eReplication dataset\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003ep-value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 17.7696%;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 8.70098%;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e\u0026nbsp;PD (\u003cem\u003eN\u003c/em\u003e=551)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003eHC (n=437)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003ePD (n=760)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003eHC (n=452)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 17.7696%;\"\u003e\n \u003cp\u003eSex\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 8.70098%;\"\u003e\n \u003cp\u003eMale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e\u0026nbsp;318\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e214\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e482\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e204\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e1.59\u0026times;10\u003csup\u003e-4\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 17.7696%;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 8.70098%;\"\u003e\n \u003cp\u003eFemale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e\u0026nbsp;233\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e223\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e278\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e248\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e0.61\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 26.4706%;\"\u003e\n \u003cp\u003eAge, years (mean (sd))\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e62.91 (10.42)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e\u0026nbsp;57.18 (12.53)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e64.99 (8.78)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e62.94 (10.81)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 26.4706%;\"\u003e\n \u003cp\u003eDuration of disease (mean (sd))\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e2.37 (4.12)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e\u0026nbsp;\\\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e6.19 (5.18)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e\\\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e5.98\u0026times;10\u003csup\u003e-41\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 26.4706%;\"\u003e\n \u003cp\u003eRIN value (mean (sd))\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e8.18 (0.90)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e\u0026nbsp;8.36 (0.84)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e7.44 (0.83)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e7.41 (0.85)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 26.4706%;\"\u003e\n \u003cp\u003eMOCA (mean (sd))\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e24.86 (4.17)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e\u0026nbsp;27.94 (1.88)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e25.30 (3.60)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e26.53 (2.54)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" valign=\"top\" style=\"width: 26.4706%;\"\u003e\n \u003cp\u003eUPSIT (mean (sd))\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 14.7059%;\"\u003e\n \u003cp\u003e22.22 (8.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e\u0026nbsp;33.6 (4.70)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 17.6471%;\"\u003e\n \u003cp\u003e19.62 (7.74)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 13.9706%;\"\u003e\n \u003cp\u003e32.48 (6.00)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 9.55882%;\"\u003e\n \u003cp\u003e0.84\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eTable 2. Performance merits of classifier models on validation and testing datasets\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"971\" class=\"fr-table-selection-hover\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003eFeatures (Model)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 417px;\"\u003e\n \u003cp\u003eValidation on PPMI withhold samples\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"7\" valign=\"top\" style=\"width: 428px;\"\u003e\n \u003cp\u003eTesting on PDBP/BioFIND samples\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003eSensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003eSpecificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003eBalAcc\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003eAUROC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003eAUPRC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003eSensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003eSpecificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003eBalAcc\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003eAUROC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003eAUPRC\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003eDEGs (SVM_rbf)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e0.67 (0.03)*\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.77 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.55 (0.05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.69 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.66 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.72 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003e0.72 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003e0.63 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.73 (0.02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.47 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.70 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.60 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.64 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.74 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003ePRS (LR)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e0.58 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.90 (0.06)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.17 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.58 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.53 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.54 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003e0.62 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003e0.67 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.94 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.21 (0.07)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.67 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.57 (0.02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.70 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.79 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003eClinical (SVM)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e0.78 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.78 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.79 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.82 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.79 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.85 (0.02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003e0.86 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003e0.79 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.76 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.83 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.88 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.79 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.86 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.90 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003eDEGs+ PRS (LR)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e0.69 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.74 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.62 (0.05)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.71 (0.04)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.68 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.75 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003e0.78 (0.03)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003e0.66 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.69 (0.02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.60 (0.02)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.74 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.65 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.70 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.79 (0.01)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDEGs+PRS+Clinical (SVM)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.83 (0.02)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.83 (0.03)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.84 (0.04)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.86 (0.04)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.83 (0.02)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.91 (0.02)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.92 (0.02)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.82 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.79 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.87 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.91 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.83 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.89 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.93 (0.01)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 126px;\"\u003e\n \u003cp\u003eDEGs+PRS+Clinical (Previous study**)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e0.86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 54px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 67px;\"\u003e\n \u003cp\u003e0.75\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 66px;\"\u003e\n \u003cp\u003e0.43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.74\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 52px;\"\u003e\n \u003cp\u003e0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e*: They are the mean values from the 100 random splits, the values in the parentheses are the standard deviations.\u003c/p\u003e\n\u003cp\u003e**: The previous study did the independent test on the PDBP dataset.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6837659/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6837659/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Early diagnosis and biomarker discovery to bolster the therapeutic pipeline for Parkinson’s disease (PD) are urgently needed. In this study, we leverage the large-scale, whole-blood total RNA and DNA sequencing data from the Accelerating Medicines Partnership in Parkinson’s Disease (AMP PD) program to identify PD-associated RNAs, including both known genes and novel circular RNAs (circRNA) and enhancer RNAs (eRNAs). Initially, 874 known genes, 783 eRNAs, and 35 circRNAs were found differentially expressed in PD blood in the PPMI cohort (FDR \u003c 0.05). Based on these findings, a novel multi-omics machine learning model was built to predict PD diagnosis with high performance (AUC = 0.89), which was superior to previous models. We further replicated this discovery in an independent PDBP/BioFIND cohort and confirmed 1,111 significant marker genes, including 491 known genes, 599 eRNAs, and 21 circRNAs. Functional enrichment analysis showed that the PD-associated genes are involved in neutrophil activation and degranulation, as well as the TNF-α signaling pathway. By comparing the PD-associated genes in blood with those in human brain dopamine neurons in our BRAINcode cohort, we found only 44 genes (9% of the known genes) showing significant changes with the same direction in both PD brain neurons and PD blood, among which are neuroinflammation-associated genes IKBIP, CXCR2, and NFKBIB. Our findings demonstrated consistently lower SNCA mRNA levels and the increased expression levels of VDR gene in the blood of early-stage PD patients. In summary, this study provides a generally useful computational framework for further biomarker development and early disease prediction. We also delineate a wide spectrum of the known and novel RNAs linked to PD that are detectable in circulating blood cells in a harmonized, large-scale dataset.","manuscriptTitle":"Multi-omics machine learning classifier and blood transcriptomic signature of Parkinson’s disease","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-20 15:42:01","doi":"10.21203/rs.3.rs-6837659/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a58d5a0f-bad6-46fb-b94f-f9f462db8189","owner":[],"postedDate":"June 20th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":50258044,"name":"Biological sciences/Computational biology and bioinformatics/Computational neuroscience"},{"id":50258045,"name":"Biological sciences/Neuroscience/Computational neuroscience"}],"tags":[],"updatedAt":"2025-08-21T15:31:27+00:00","versionOfRecord":[],"versionCreatedAt":"2025-06-20 15:42:01","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6837659","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6837659","identity":"rs-6837659","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.