AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia

doi:10.21203/rs.3.rs-4750719/v1

AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia

2024 · doi:10.21203/rs.3.rs-4750719/v1

preprint OA: closed

Full text JSON View at publisher

Full text 193,161 characters · extracted from preprint-html · click to expand

AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia Aliaksandra Sikirzhytskaya, Ilya Tyagin, S. Scott Sutton, Michael D. Wyatt, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4750719/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Neurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing. Biological sciences/Drug discovery Biological sciences/Computational biology and bioinformatics/Data processing Biological sciences/Computational biology and bioinformatics/Literature mining Biological sciences/Computational biology and bioinformatics/Machine learning Health sciences/Diseases/Neurological disorders/Dementia Artificial intelligence text-mining dementia drug repurposing statistical analysis AGATHA classification pathway analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Background Over the past decade, advancements in analytical methods have opened dramatic new opportunities to unveil hidden connections among complex networks ( 1 ). The advancement of Artificial Intelligence (AI) techniques enables researchers to query and analyze massive datasets, simulate experiments virtually, and generate scientific hypotheses through advanced analysis. Such tools have been used extensively by the pharmaceutical industry to lower the costs of drug discovery ( 2 , 3 ). The development of novel therapeutic agents from idea to FDA approval involves substantial commitments of time and money ( 4 ). The FDA strictly evaluates efficacy and safety when approving new therapeutics, but the costs of newly approved therapeutics are now under intense scrutiny as the costs of new drug development have risen. Repurposing existing FDA-approved drugs for new indications can alleviate drug development costs in part because the safety profile and clinical experience already exists for the drug ( 5 ). Repurposing significantly accelerates the entire process by taking advantage of crucial steps that already occurred in the original FDA approval process ( 6 ). The key to the initial steps of drug repurposing is to find a connection between an existing drug and a disease of interest that is worth exploring preclinically or clinically for a new therapeutic indication. In many cases the necessary knowledge may already be present in biomedical literature; however, the connections between various pieces of information may not be obvious. To determine the hidden connections, we developed AI-based literature analysis tools: MOLIERE ( 7 ), followed by Automatic Graph Mining And Transformer based Hypothesis Generation Approach (AGATHA) ( 8 ). The recent development of AI allows automated extraction of valuable information from unstructured text such as scientific abstracts or articles, thus enabling efficient and scalable processing of textual data that dramatically saves time and effort compared to manual processing ( 9 ). AI algorithms can identify themes, topics, and clusters within a collection of documents, thus helping researchers to overcome challenging, time-demanding analysis of literature ( 10 ). In vast databases like PubMed, search results often overwhelm users with excessive information that is hard to curate without detailed, time intensive assessment. Natural language processing (NLP) techniques in AI frameworks not only speed up the research but also enhance extracting valuable knowledge from massive datasets, which is challenging to achieve manually ( 11 ). Drug repurposing research benefits from these advantages in many ways ( 5 , 12 , 13 ). Studying disease at the gene level remains a challenging task despite recent advancements in genomics and technology. NLP methods were successfully used for a variety of gene-related tasks including but not limited to the identification of unique anticancer targets ( 2 ), predicting cognitive decline ( 14 ), interpreting microbial genes ( 15 ) and others. To achieve successful results in NLP calculations, it is imperative to have high-quality training datasets, conduct preprocessing procedures to normalize the data and reduce noise, choose appropriate model architecture (such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), or transformers), and optimize hyperparameters such as learning rate, batch size, dropout rate, and model architecture configurations. AGATHA is an effective literature-based discovery tool capable of extracting relevant information by sifting through immense scientific databases including PubMed, which expands annually by over one million papers ( 16 ). This tool analyzes the collection of lexical elements (e.g., words, phrases, and lemmas) within each research article abstract to identify possible hidden connections among terms specified by the user. The likelihood of the potential connection is estimated using a multi-headed self-attention mechanism accounting for the spatial relationships between individual terms in a latent vector space, which we will refer to as the “AGATHA space”. The AGATHA system pipeline can roughly be split into two stages: ( 1 ) semantic knowledge network construction and its embedding into a low-dimensional vector space and ( 2 ) transformer-based predictor training. These two stages can be used independently from each other, which we leverage in the current work here. The first stage results in a large multi-layered knowledge network, which connects individual units of information (such as the UMLS terms, semantic predicates or entities) with their corresponding literature sources. For example, a term representing “breast cancer” is connected to all sentences and semantic predicates containing this term. After construction, we perform network embedding, such that each node is assigned a learned vector representation (or coordinates) of 512 dimensions. This allows us to establish spatial relationships between individual concepts by computing distances between them ( 8 ). Terms that are logically connected, such as different stages of the same health condition or its type, tend to cluster together in AGATHA space. Conversely, terms that are logically distant from each other are positioned relatively far apart. This approach aims to facilitate an intuitive understanding of the relationships and connections between the many scientific terms and concepts within scientific literature. The second stage results in a transformer-based predictor model, which is trained to prioritize meaningful associations between biomedical concepts above random noise. For each term-term association, it outputs a score within a unit interval indicating the likelihood of this association being biomedically relevant, based on the insights learned from scientific abstracts. When we use the AGATHA predictor in a one-to-many setting (like in this work), we obtain the probability distribution over a range of pairs, where the source term is fixed (e.g., a particular disease) and target terms represent a group of concepts of similar semantic type (e.g., list of different genes). Therefore, we can identify what genes are more likely to be associated with a particular disease and select the most prominent candidates for further downstream analysis. To classify AGATHA outcomes, we applied multivariant statistical methods, including Partial Least Squares Discriminant Analysis (PLSDA) ( 17 ) ( 18 ). This method helps categorize and predict samples (diseases/genes) belonging to specific classes. PLSDA has advantages compared to other discriminant methods due to its ability to handle data with high variability and a power to reduce dataset dimensionality through the utilization of latent variables ( 19 ). In addition to the classification analysis, unsupervised clustering was utilized to unveil latent relationships that cannot be directly measured within the multivariate data. The combination of these steps followed by comprehensive pathway analysis helps to explain the biological significance of the classification outcomes and produces a final list of genes as candidates for drug repurposing. In this work, we focused on application of our methodology to identify candidate drugs suitable to be repurposed as treatments for neurodegenerative diseases ( 20 ), which pose major challenges in healthcare as the seventh leading cause of death in the world ( 21 ). The term “neurodegenerative” covers a wide spectrum of neurocognitive conditions that despite their different pathologies often share common symptoms, in which dementia is a major outcome ( 22 ). Therefore, the analysis included a broad spectrum of neurodegeneration in the dementia domain to search for common themes and pathways. Initially, the classification model facilitated the extraction of “dementia” genes, which were subsequently analyzed within the context of the pathways in which they participate. Once it was confirmed that the proposed method effectively extracts the necessary data, the same procedure was employed on the remaining non-neurodegenerative disease classes to obtain specific genes for each group. Next, after obtaining a list of genes that have a high likelihood to be associated with each disease, they were mapped in the Dementia class to assess their places on a probability scale. Genes for which known small molecules interact with the pathways of interest were prioritized to select FDA-approved medications or medications in experimental or investigational status. We chose a total of 38 drugs for potential repurposing with a focus on six of them, all of which have demonstrated effectiveness in treating diseases unrelated to central nervous system function. Materials and methods AGATHA is an open-source algorithm available at: https://github.com/IlyaTyagin/AGATHA-C-GP . The operational principles of AGATHA are detailed in the Additional Information section. Statistical analysis was performed on a multidimensional dataset using MATLAB R2023a software from MathWorks (Natick, MA) and the PLS Toolbox from Eigenvector Research, Inc. (Wenatchee, WA). Pathway analysis was conducted using g:profiler tool ( 23 ). Gene characterization and selection of potential drugs were achieved with the help of GeneCards ( 24 ) and CTD( 25 ) databases. Principal Component Analysis (PCA) ( 26 ) was primarily used for dimensionality reduction and preliminary data structure analysis. This widely-employed method is based on transforming the introduced data into a set of principal components that describe the variance of the data. It involves a series of mathematical steps, including calculating the covariance matrix of the data, computing its eigenvalues, and subsequently reducing the dimensionality of the data. The prepared dataset was normalized by the total area, and auto-scaled by the division of each of the 512-column in the calibration matrix by its standard deviation. Subsequently, cross-validation was performed using the Venetian Blinds approach, consisting of 10 splits with a blind thickness of 1. The achieved model showed a clear separation between the Dementia and SUD classes with the rest of the data falling into a single cohesive group. However, the cross-validated Root Mean Square Error for this model was extremely low (0.00231327), which indicates a good accuracy of the model. The classification model was calculated using the PLSDA method ( 17 ), which can deal with heterogeneous data and describe it by only a few Latent Variables (LV). LVs are calculated using regression coefficients, determined for each component, and followed by estimating their positions in the PLSDA space. To moderate the risk of overfitting, Venetian-blinds cross-validation was employed. This method involves partitioning the data into k equal-sized segments and alternately using them as training and validation sets. Alongside dimensionality reduction, PLSDA ensures that the calculated components possess unique information by being orthogonally opposed to each other. Unsupervised hierarchical clustering analysis was applied on part of the Dementia-classified data to group similar values into clusters based on their common characteristics. In cluster analysis, pairs of samples with the smallest distance between them are identified and merged without knowledge of class origin. These similar clusters are then grouped together in dendrogram visualization to provide a clearer representation. Ward’s method was used to minimize the variance within each cluster by evaluating the differences between merging two groups of data ( 27 ). This approach is especially efficient when handling high-dimensional data, such as our disease/coordinate or gene/coordinate sets, or when clusters are more likely to exhibit equal variance within them. These differences were estimated by the sum of squared deviations from the mean (variance) after merging the clusters (Mahalanobis distance). Results Data description Diseases and conditions of interest were selected from the Disease Database provided by the Unified Medical Language System (UMLS) ( 28 , 29 ) and combined into the Health Conditions Data Set (HCDS) comprising a total of 122 terms, which are categorized into seven groups: Dementia (24 conditions), Diabetes (12 conditions), Arthritis (9 conditions), Heart Conditions/Diseases (14 conditions), Hypertension (11 conditions), Cancer (12 conditions), and Substance Use Disorders (SUD) (40 conditions/substances). All the selected terms are formal names for diseases and health conditions and the last group (SUD) additionally contains the most common substances of abuse (Table 1 ). Table 1 Health Condition Data Set. Diseases, conditions, and substances are grouped by the type and color coded regarding their attribution. # Name UMLS code # Name UMLS code # Name UMLS code # Name Code # Name UMLS code Dementia 26 Monogenic Diabetes c3888631 52 Atrial Fibrillation c0004238 77 Osteosarcoma c0029463 103 Methadone c0025605 1 Aids Dementia Complex c0001849 27 Steroid-Induced Diabetes c0342269 53 Cardiac Arrhythmia c0003811 78 Pancreas Carcinoma c0235974 104 Hydrocodone c0020264 2 Alcohol Amnestic Disorder c0001940 28 Type 1 Diabetes Mellitus c0011854 54 Rheumatic Heart Disease c0035439 79 Prostate Cancer c0376358 105 Meperidine c0025376 3 Alcohol-Induced Mental Disorders c0033936 29 Type 2 Diabetes Mellitus c0011860 55 Myocardial Infarction c0027051 80 Melanoma c0025202 106 Oxycodone c0030049 4 Alzheimer Disease, Early Onset c0750901 30 Maturity Onset Diabetes Mellitus In Young c0342276 56 Cardiomyopathies c0878544 Substance abuse disorder 107 Oxymorphone c0030073 5 Alzheimer Disease, Late Onset c0494463 31 Cystic Fibrosis Related Diabetes c2242728 57 Atrial Septal Defects c0018817 81 Alcohol Abuse c0085762 108 Diacetylmorphine c001189 6 Amyotrophic Lateral Sclerosis c0002736 32 Gestational Diabetes c0085207 Hypertension 82 Ethanol c0001962 109 Mitragynine c0066619 7 Corticobasal Degeneration c0393570 33 Wolfram Syndrome 1 c4551693 58 High Blood Pressure c0020538 83 Amphetamine c0002658 110 Hydromorphone c0012306 8 Dementia Associated with Alcoholism c0236656 Arthritis 59 Chronic Hypertension c0745114 84 Cannabicyclohexanol c3492502 111 Loperamide Hydrochloride c0282221 9 Dementia, Vascular c0011269 34 Arthritis c0003864 60 Essential Hypertension c0085580 85 Synthetic Cannabinoids c0006864 112 Delysid c0024334 10 Frontotemporal Dementia c0338451 35 Ankylosing Spondylitis c0038013 61 Malignant Hypertension c0020540 86 Cannabidiol c0936079 113 Ketamine c0022614 11 Hand c4285693 36 Psoriatic Arthritis c0003872 62 Hypertensive Crisis c0020546 87 Central Nervous System Depressants c0007681 114 Mescaline c0025460 12 Hiv Encephalopathy c0276548 37 Pyogenic Arthritis c0003869 63 Isolated Systolic Hypertension c0745133 88 Dextromethorphan c0011816 115 Ecstasy Abuse c0743247 13 Huntington's Disease c0020179 38 Rheumatoid Arthritis c0003873 64 Resistant Hypertension c0745130 89 Barbiturates, Benzodiazepines c0004745 116 Psilocybin c0033850 14 Lewy Body Disease c0752347 39 Fibromyalgia c0016053 65 Gestational Hypertension c0852036 90 Flunitrazepam c0016296 117 Salvia c3668955 15 Major Neurocognitive Disorder c4087461 40 Reactive Arthritis c0085435 66 Secondary Hypertension c0155616 91 Rohypnol c0699927 118 Phencyclidine c0031381 16 Mild Cognitive Disorder c1270972 41 Gout c0018099 67 Cerebrovascular Accident c0038454 92 Opium c0029112 119 Nandrolone c0027368 17 Mild Neurocognitive Motor Disorder c2609165 42 Degenerative Polyarthritis c0029408 68 Transient Ischemic Attack c0007787 93 Sodium Oxybate c0037537 120 Oxandrolone c0029995 18 Parkinson's Disease c0030567 Heart disease/condition Cancer 94 Cathine c0069021 121 Oxymetholone c0030072 19 Presenile Dementia c0011265 43 Angina Pectoris c0002962 69 Breast Cancer c0006142 95 Tobacco c0040329 122 Testosterone Cypionate c0076181 20 Prion Disease c0162534 45 Heart Failure c0018801 70 Cervical Cancer c0007847 96 Nicotine c0028040 21 Senile Dementia c0002395 46 Congenital Heart Defects c0018798 71 Colorectal Cancer c0009402 97 Khat c1386575 22 Senile Psychosis c1457889 47 Heart Valve Disease c0018824 72 Leukemia c0023418 98 Benzoylmethylecgonine c0009170 23 Subcortical Vascular Dementia c0393561 48 Coronary Artery Disease c1956346 73 Lymphoma c0024299 99 Methylphenidate c0025810 Diabetes 49 Peripheral Artery Disease c1704436 74 Malignant Lung Neoplasm c0242379 100 Codeine c0009214 24 Prediabetes Syndrome c0362046 50 Coronary Arteriosclerosis c0010054 75 Malignant Neoplasm Of Skin c0007114 101 Morphine c0026549 25 Latent Autoimmune Diabetes in Adults c1739108 51 Disorder Of Pericardium c0265122 76 Neuroblastoma c0027819 102 Loperamide c0023992 We hypothesized that there are spatial clusters within the AGATHA space that correspond to different groups of health conditions (Table 1 ). This implies that by mapping genes within the AGATHA space and analyzing their positions relative to disease groups, we can uncover previously unrecognized links between specific gene sets and health conditions. These genes are specifically evaluated for potential drug repurposing opportunities. A general logical workflow is represented in Scheme 1 below. Disease-categorized data from a variety of databases was mapped to the AGATHA space for further characterization. Then, classification methods were used to build a discrimination model that extracted four lists of genes: 1) genes specific for each disease class; 2) Dementia genes, highly ranked in other disease classes; 3) Disease genes, highly ranked in Dementia class, and 4) genes common for all diseases. These groups of genes were used for pathway analysis performed using g:profiler tool ( 23 ), which helped to select the candidates for drug repurposing evaluation. Exploratory analysis of semantic links revealed by the AGATHA embedding space. The complex relationships between the selected health condition terms ( 30 , 31 ) (Table 1 ), described across multiple diverse scientific articles, were assessed using AGATHA text mining software. Following semantic embedding, these health condition terms are represented as points within a high-dimensional latent space, the embedding space we named earlier as AGATHA space (see the first paragraph of the Results section). The coordinates of these terms are calculated to reflect their semantic properties in such a way that words or phrases with similar meanings are represented by points that are closer to each other within the space. The number of dimensions in an embedding space is typically influenced by the volume and complexity of the text data being analyzed. However, it is also determined by the specific requirements of the model and the task at hand. While a larger and more complex dataset might benefit from a higher-dimensional space to capture more nuanced semantic relationships, the choice of dimensionality also depends on computational constraints and the desired balance between detail and efficiency. In our case, preliminary studies indicated that an efficient embedding of the information contained within the PubMed database of scientific abstracts is achieved by using an embedding space with 512 dimensions ( 8 ). In Fig. 1 , we see the relative positions of HCDS groups as visualized in 3D space. This visualization is the result of condensing the original 512-dimensional data into a more comprehensible three-dimensional space using Principal Component Analysis (PCA). Two distinct clusters are formed by two non-overlapping sets of health conditions: SUD (green diamonds) and Dementia (red diamonds). The five remaining sets — Diabetes, Arthritis, Heart Diseases, Hypertension, and Cancer — form a tight spatial cluster that is separated from both the SUD and Dementia clusters. Subclusters corresponding to these five groups of health conditions remain distinguishable. However, they are positioned close to each other resulting in the overlap of certain groups. These observations have led us to hypothesize that the AGATHA space contains a spatial pattern characteristic of the health condition groups. In further sections we implement advanced statistics to identify and characterize such patterns. Classification analysis of health condition groups mapped to the AGATHA space. The validity of spatial patterns in the AGATHA space associated with health condition groups was tested using PLSDA, a standard partial least square classification approach. In this study, we leverage both the interpretability of the multiclass PLSDA models and their capability to effectively handle collinear data. The strong predictive performance of the PLSDA models was subsequently employed to investigate gene/health condition associations. PLSDA classification has been demonstrated to be a successful method for addressing multivariate data, offering tunable model complexity ( 18 ). We used PLSDA to build a supervised classification model (Fig. 2 ) with classes defined by health condition groups (Table 1 ). Extensive preliminary classification trials (not reported here) enabled the identification of optimal data preprocessing and classification parameters. For the final classification model, the input matrix containing coordinates in 512-dimensional AGATHA space for all health conditions was preprocessed using normalization by the total area and auto-scaling. Despite the high dimensionality of the input data generated through complex algorithms implemented in the AGATHA text mining software, four latent variables were sufficient to produce a robust classification of health conditions. Generally, latent variables are calculated so that each subsequent latent variable captures the shared variance remaining after the extraction by the previously calculated variables. A total of 16.31% of the data was covered by the first four latent variables. The stability of the PLSDA classification model was verified using the Venetian blinds cross-validation approach, which involves dividing the data into ten equally sized folds. The final classification model effectively categorizes health conditions into seven predefined groups, as shown in Fig. 2 . A , demonstrating cross-validated sensitivity and specificity parameters within the range of 0.786 to 0.990. As expected from the exploratory analysis (Fig. 1 ), the Dementia and SUD classes exhibited the best classification performance. The Dementia panel in Fig. 2 . A reveals that all health conditions initially selected for the Dementia group have a probability close to 100% of being classified as part of the Dementia class. Note that the 0–1 range on the Y-axis in the panel corresponds to a 0-100% range of probabilities. These observations suggest that all the health conditions we originally selected for the Dementia group constitute a distinct spatial cluster in the AGATHA space. Furthermore, should a small portion (one-tenth) of the Dementia set be omitted as 'unknown' health conditions during the training phase, these 'unknowns' are likely to be accurately classified in subsequent classification analysis. Interestingly, this observation holds true not only for Dementia and SUD, but also for Diabetes. In Fig. 1 , Diabetes is the most distant from the Dementia and SUD groups, neighboring but not overlapping with the other groups. Discriminating between the Arthritis, Cancer, Heart Disease/Condition, and Hypertension groups is also achievable, as shown in the Discussion section, despite overlapping regions. Two distinct pairs of groups can be identified: the Heart Disease/Condition and Hypertension pair, and the Arthritis and Cancer pair (Fig. 2 . A corresponding panels). Health conditions originally selected for these four groups overlap and show a non-zero probability of being assigned to another class of the pair. The proximity and overlap of the Heart Disease/Condition and Hypertension groups can be explained by the shared physiological characteristics of these disorders( 32 ). There are also certain connections between Cancer and Arthritis, such as associations with chronic inflammation and paraneoplastic arthritis( 33 ). While considering the physiological origins of connections within these two pairs of health condition groups is beyond the scope of this proof-of-concept study, we will later demonstrate that, upon more detailed analysis of the overlapping groups (see black circles in Figs. 2 . B and C ), it is possible to build a robust classification model for discrimination of all groups. Assigning human genes to health condition groups using AGATHA latent space and advanced statistical methods. Text mining algorithms provide a unique opportunity to connect scientific concepts using lexical context. As demonstrated above, the AGATHA algorithm successfully condensed scientific information within the PubMed database, capturing lexical context characteristics of the health condition groups we selected for this proof-of-concept study. In this section, we explore the ability of the AGATHA system to uncover hidden connections between genes and health conditions. This was achieved by mapping all human genes to the AGATHA space and categorizing them into health condition groups using the PLSDA classification model. This step was followed by an in-depth analysis of the identified gene clusters in the context of diseases, physiological pathways, and drugs known to interact with these pathways. The complete list of human genes, mapped to the AGATHA space as a matrix with 20,889 rows and 512 columns, was analyzed using the PLSDA model developed for HCDS. Figure 3 illustrates the distribution of genes among all disease classes, with the color bar showing their attribution to the Dementia class in each category. As seen in the figure above, the distribution of Dementia genes does not follow the same pattern across all other classes. At this point, the evaluation of gene distribution in the Dementia class is necessary to show that the calculated model is coherent from the biological point of view. To achieve this, genes with a probability exceeding 80% were analyzed using hierarchical clustering. This approach aided in investigating the internal structure of the data, followed by the pathway analysis of the calculated clusters. A total of 1079 genes with high probability to be associated with dementia were identified by the classification model and further subjected to unsupervised cluster analysis using agglomerative Ward’s method with a total of four principal components and Mahalanobis distance that accounts for the variations of multivariate data (Fig. 4 ).The selected threshold allowed for gene separation into four distinct clusters that were further subjected to a pathway analysis to justify the biological meaning of data distribution. The dendrogram in Fig. 4 shows four well-separated gene clusters, each defined by specific biological processes and mechanisms. These assignments were determined through pathway analysis, which can be summarized as follows: Compared with Clusters 2–4, Cluster 1 is separated from the remaining data at the initial threshold level, indicating its unique characteristics. Pathway analysis revealed that the processes within Cluster 1 do not show direct connections to specific physical or behavioral anomalies and cannot be linked to any specific disease category. However, this information is still useful when genes from this cluster are mapped as high-ranked in other disease categories. Cluster 2 has a strong connection to several kinds of pathways known to be altered in neurodegenerative conditions including but not limited to Alzheimer's disease, Amyotrophic lateral sclerosis, Parkinson's disease, and Apoptosis - multiple species as labeled by g-profiler. Cluster 3 has characteristics analogous to Cluster 2, such as nervous system development, presynaptic endocytosis, neuron projection organization, visual perception, regulation of neuron projection development, dendrite morphogenesis, and regulation of cell projection organization, and neuron projection. Some of the pathways discussed here are relevant to conditions such as Parkinsonism, disturbances in higher cognitive functions, central motor function disruptions, Ataxia, speech impairments related to the nervous system, and the life cycle of the HIV-1 virus. Cluster 4 is different from the other three by having pathways related to substance abuse. It includes nicotine, cocaine, amphetamine addictions, alcoholism, and some pathways connected to the nervous system such as neuroactive ligand-receptor interaction, dopaminergic synapse, retrograde endocannabinoid signaling, axon guidance, and many more ( Additional Table 1 . Summary of pathways and genes for the Dementia class ). As a result, we acquired a list of genes with a high probability of being connected to Dementia as well as being simultaneously highly ranked in the remaining six classes. After evaluation by the GeneCards database, these genes were separated into one specific group ( Additional Table 2 . Dementia non-specific genes, highly ranked in other disease groups ). Subsequently, highly ranked genes from the Diabetes, Arthritis, Heart, Hypertension, Cancer, and SUD classes were extracted for each disease/condition and were mapped in the Dementia group (Fig. 5 ). For most of the classes, they were not presented at the top of the probability scale, so only the ones with the highest likelihood to be connected to Dementia were combined. ( Additional Table 3 . Disease-specific genes, highly ranked in Dementia ). In addition to the described analyses, we followed the same procedure and extracted the top 1079 genes for each disease resulting in six lists: Diabetes, Arthritis, Heart conditions/diseases, Hypertension, Cancer, and SUD. These lists were reduced by retaining only genes distinct to each specific class of disorders to remove excessive overlapping of biological information. Based on these results, pathway analysis was performed for the specific gene lists and summarized in Table 2 . Table 2 Main pathways identified for the lists of specific genes for all disease classes. Disease Number of genes Significant pathways Dementia 518 Glutamatergic synapse pathway, neuron development, axon development, Alzheimer disease, Amyotrophic lateral sclerosis, Dementia, Brain atrophy, Frontotemporal dementia, Parkinsonism, regulation of synaptic plasticity Diabetes 109 Insulin and sugar metabolism-related pathways, glucose transmembrane transporter activity, insulin secretion, monosaccharide transmembrane transport, Type II diabetes mellitus, Abnormal hemoglobin Arthritis 32 Immune response-activating cell surface receptor signaling pathway, activation of immune response, regulation of B cell receptor signaling pathway, protein deglutamylation, Abnormal lymphocyte proliferation Heart 67 Blood circulation, regulation of blood circulation, regulation of heart contraction, regulation of heart rate, heart development Hypertension 0 N/A Cancer 18 DNA damage sensor activity, Homologous recombination, RAD51B-RAD51C-RAD51D-XRCC2-XRCC3 complex SUD 397 Nicotine addiction, morphine addiction, cocaine addiction, Common pathways underlying drug addiction, Drug metabolism - cytochrome P450, drug adme, Steroid hormone biosynthesis, dopamine neurotransmitter receptor activity, dopamine secretion, modulation of chemical synaptic transmission, Retrograde endocannabinoid signaling Pathway overview Pathways specific for Dementia class based on selected genes are described above. Diabetes (109 genes) The diabetes group included insulin and sugar metabolism-related pathways, glucose transmembrane transporter activity, insulin secretion, monosaccharide transmembrane transport, Type II diabetes mellitus, abnormal hemoglobin, and others, as well as general pathways that can be assigned to a variety of conditions. Arthritis (32 genes) The arthritis group included a variety of autoimmune and inflammatory diseases, which is reflected by the list of different characteristic pathways: immune response-activating cell surface receptor signaling pathway, activation of immune response, regulation of B cell receptor signaling pathway, protein deglutamylation, abnormal lymphocyte proliferation, and others. Heart condition/disease (67 genes) The group included pathways such as blood circulation, regulation of blood circulation, regulation of heart contraction, regulation of heart rate, heart development, and others. By nature, some of these pathways are related to hypertension which can explain an absence of specific genes in the Hypertension cluster. Hypertension (0 genes) There were no specific genes identified for the current group. Hypertension can be caused by various factors. It is related to the function of the heart, diabetes due to damaged arteries, excessive alcohol consumption, and others. Some of these causes were considered by the rest of the six classes, so the absence of hypertension-specific genes is not surprising and can be investigated further. Cancer (18 genes) The cancer group can be described by several pathways, related to DNA damage sensing activity( 34 ). This class includes the RAD51B-RAD51C-RAD51D-XRCC2-XRCC3 complex, in which inactivating mutations predispose to breast, ovarian and prostate cancers( 35 ). SUD (397 genes) The selected SUD group was composed of various addictions and individual substances. This was reflected in identified pathways that included chemical carcinogenesis - DNA adducts, nicotine addiction, morphine addiction, cocaine addiction, Common pathways underlying drug addiction, Drug metabolism - cytochrome P450, drug ADME (absorption, distribution, metabolism and excretion), steroid hormone biosynthesis, dopamine neurotransmitter receptor activity, dopamine secretion, modulation of chemical synaptic transmission, retrograde endocannabinoid signaling and many others. Drug repurposing The classification model predicted the list of genes with high potential to be associated with Dementia. These hidden connections are selected based on the learned patterns and relationships in the data indirectly revealing acquaintances between terms. Further developments in understanding these relationships will require additional interpretation and analysis beyond the model itself. We selected a list of potential drugs for repurposing analysis based on the presence of genes specific to six groups of diseases within the Dementia class using GeneCards and Comparative Toxicogenomics Database (CTD) ( 25 ). Chosen medications were additionally verified by the DrugBank database ( 36 ) to track their approval status as well as the stage of Clinical trials ( Additional Table 4 ). Finally, Table 3 summarizes the most significant candidates for drug repurposing based on the combination of statistically predicted gene/disease connections discovered by the classification model and gene/drug connections identified using databases listed above. Specifically, Bosentan, Mecamylamine, and Methylphenidate are the most compelling candidates, ranking at the top of the lists of small molecules for their predicted targets (EDNRB, CHRNA4, SLC6A3). Every drug was selected from the list of FDA-approved medications that target specific genes predicted by the classification model. We illustrate the pharmacological use of these drugs, and the level of clinical data as well as provide a literature-based explanation of their potency. Table 3 Drugs for repurposing for dementia treatment FDA-approved drug name Gene name Class predicted by model Gene/drug connection based on literature Clinical Trials phase completed Bosentan EDNRB Hypertension Bosentan is a EDNRB blocker. Literature data demonstrate that both Bosentan and selective EDNRA blockers effectively prevent the rise in pulmonary artery pressure induced by hypoxia, aiding in the restoration of oxygen saturation ( 37 ). 4 - Pulmonary Arterial Hypertension (PAH) 4 - Eisenmenger's Syndrome 4 - Type 2 Diabetes Mellitus Mecamylamine CHRNA4 SUD Mecamylamine, a nicotine antagonist, is utilized for the treatment of moderate to severe essential hypertension and uncomplicated malignant hypertension. The relationship between CHRNA4 and mecamylamine stems from their interaction within the cholinergic neurotransmission pathway mediated by nAChRs. Regarding the functional properties of α4β2-nAChR, distinctive characteristics include heightened sensitivities to antagonism by mecamylamine or DHβE, as well as heightened sensitivities to agonist action of EBDN or nicotine ( 38 ). 3 - Alcohol Dependency / Depression 2 - Age - Related Macular Degeneration (AMD) 2 - Depression / Major Depressive Disorder (MDD) 2 Diabetic Macular Edema (DME) 2 - Smoking 1 - Autism Disorder / Pervasive Development Disorder Methylphenidate SLC6A3 SUD Methylphenidate (MPH) is a catecholamine reuptake inhibitor; it improves cognitive performance of impaired patients by indirectly increasing extracellular dopamine (DA) levels in both striatal and extrastriatal regions through the blockade of dopamine transporters (DAT), while also blocking noradrenaline transporters ( 39 ). 4 - Attention Deficit Hyperactivity Disorder (ADHD) 4 - Neurofibromatosis, type 1 (von Recklinghausen's disease) 4 - Anxiety Disorders 4 - Cocaine Related Disorders / Substance Related Disorders 4 - Dementia of the Alzheimer's Type 4- Alzheimer's Disease (AD) / Dementia 4 - Depression Tretinoin CD33 Arthritis Tretinoin promptly elicited an inflammatory tumor microenvironment dominated by interferon, marked by heightened infiltration of CD8 + T cells ( 40 ). 4 - Acne 4 - Childhood Acute Promyelocytic Leukemia Imatinib GAB2 Arthritis Imatinib induces unique alterations in the phosphorylation state and interactome of Gab2 ( 41 ). 4 - Breast Cancer 4 - Chronic Myeloid Leukemia (CML) 4 - Gastrointestinal Stromal Tumor (GIST) Hydralazine RABGEF1 Arthritis RABGEF1, guanine nucleotide exchange factor (GEF), activates Rab GTPases by promoting the exchange of GDP for GTP on these small GTP-binding proteins playing an important role in vesicle-mediated transport. While hydralazine primarily acts on blood vessels to lower blood pressure, it may indirectly influence cellular processes like vesicle-mediated transport through its effects on cell signaling and metabolism ( 42 ). 4 - Hypertension / Type 2 Diabetes Mellitus 4 - Congestive Heart Failure (CHF) 3 - Atherosclerosis Finally, we removed specific genes within the top 20% for each disease and merged them into a unified list of common genes, totaling 1616 items ( Additional table 5 . Common genes and their pathways ). Excluding specific information allowed for the identification of general biological processes that can be applicable across various diseases or conditions, such as protein binding, catalytic activity, response to stimuli, biological regulation, cell communication, membrane involvement, cytoplasmic activities, and more. As of now, this list may not serve as a basis for future target selection, but it effectively illustrates the shared nature of diseases. Discussion Over the last decade, various literature-mining methods were introduced for biological analysis. AI technology provides researchers with an opportunity to perform experiments with biomedical entity normalization applied to multiple datasets ( 43 ). This facilitates the identification of intricate gene citations in scientific articles and books ( 44 ) and aids drug repurposing efforts ( 45 , 46 ). Previously we introduced literature mining methods ( 7 , 8 , 47 ) and demonstrated their potential application in many areas including the introduction of a new use for existing approved drug therapies ( 48 ). As a result of this project, a list of potential drugs for dementia treatment was extracted by AGATHA and advanced statistical analysis. The method identified hidden connections and pathways related to different diseases and neurodegeneration specifically. AGATHA-calculated variables for 122 diseases were separated into seven classes to calculate the PLSDA classification model. Initial discrimination showed that Dementia and SUD were separated from the rest of the group, agglomerating well-defined clusters when other diseases stay in a uniform cloud. However, when these two classes are excluded from the dataset, the rest of the diseases separate without any overlap (Fig. 6 ). This illustrates the potential of the method to be applied for future projects studying other diseases or gene combinations. In the next step 20889 genes were classified by the PLSDA model, which revealed different distribution patterns among the classes (Fig. 3 ). It appears that in most cases Dementia genes are present at the bottom of the probability scale. It was noted that certain Dementia genes within the top 20% have a probability of being associated with SUD, with only three showing a connection to Diabetes. These genes are not prevalent in the top tier of the other four classes (Fig. 5 ). As a result, a close look at the genes on top of the Dementia class showed elements of neurodegeneration as well as substance abuse. A total of 1079 genes (top 20%, Fig. 2 A, Dementia plot ) were subjected to the pathway analysis which proved their belonging to that class, since they play crucial roles in numerous vital biological processes. Notably, they are involved in the Glutamatergic synapse pathway, which contributes to ensuring proper brain function. Disturbances in glutamate transmission or the improper regulation of glutamate receptors have been linked to various neurological disorders, such as epilepsy, Alzheimer's disease, and schizophrenia. On the other hand, it has been shown that changes in metaplasticity of glutamatergic synapses play a significant role in the development of chronic SUD ( 49 ). In addition, it is known that tryptophan metabolism can have implications in the context of substance abuse due to its role in the production of neurotransmitters, including serotonin ( 50 ), which was shown in studies of patients with alcohol use disorder ( 51 , 52 ). The same pathway leads to development of Alzheimer’s disease due to the inhibition of various enzymes responsible for the biosynthesis of β-amyloid ( 53 ). Thus, the genes that have a higher probability of being associated with Dementia can serve as potential targets for future drug repurposing due to their shared nature between SUD and Dementia based on the discovered pathways. As the result of mapping statistically allocated Dementia genes in the remaining classes, we obtained a list of genes highly ranked in other diseases. A selection procedure was performed for the Diabetes, Arthritis, Heart conditions/diseases, Hypertension, Cancer, and SUD classes extracting the same number of genes as was performed for Dementia. Highly ranked genes in every group were then mapped in the Dementia class to evaluate their positions. This resulted in a separate list of genes that are not necessarily specific for any of the selected types of neurodegenerative disorders but have higher scores in general. Based on acquired information, the list of potential drugs for repurposing was created using GeneCards and CTD (Table 3 ). The suggested method enabled us to explore textual data from various angles. Apart from examining the interconnection of genes, it facilitates the identification of genes unique to each type of disease (Table 2 ). The exploration of genes common for all 122 groups revealed their tendency to be present in pathways included in many biological processes simultaneously, proving the accuracy of the proposed method. The pathways disclosed in this list have a wide range of meanings and can be attributed to many processes or disorders. These similarities could be potentially used in future steps of the research project to discover new hidden connections. To summarize, the combination of the literature-mining method AGATHA, coupled with advanced statistical analysis allowed for the separation of the different lists of genes: Dementia genes, highly ranked in other disease classes, Disease genes, highly ranked in Dementia class, genes specific for every disorder, genes common for all diseases. This information was used for the selection of potential drugs for repurposing and has the potential of being used for future experiments involving finding new common pathways, selecting specific genes within the same group of diseases, or creating a robust automatic prediction method for the different inquiries. Conclusions We developed an AI-based literature mining tool AGATHA and proposed its novel use to discover drugs with the potential for repurposing in the context of neurocognitive disorders. The accomplished a primary objective of identifying hidden connections between approved medications and specific health conditions through advanced statistical analysis, including techniques like PLSDA and unsupervised clustering. The methodology involved grouping scientific terms related to different health conditions and genes, followed by building discrimination models to extract lists of disease-specific genes. These genes were explored through pathway analysis to select candidates for drug repurposing. As a result, we selected six main drugs for the subsequent bench study: Bosentan, Mecamylamine, Methylphenidate, Tretinoin, Imatinib, and Hydralazine. Abbreviations AI: Artificial Intelligence AGATHA: Automatic Graph Mining And Transformer based Hypothesis Generation Approach NLP: Natural language processing RNNs: recurrent neural networks CNNs: convolutional neural networks PLSDA: Partial Least Squares Discriminant Analysis CTD: Comparative Toxicogenomics Database PCA: Principal Component Analysis LV: Latent Variable HCDS: Health Conditions Data Set SUD: Substance Use Disorder ADME: absorption, distribution, metabolism and excretion Declarations Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors have declared that no competing interests exist. Funding information: This project was supported by awards from NIH R01DA054992 (MS, MDW). Author Contribution IT and IS developed the AGATHA system. AS and IT performed all calculations and wrote the main manuscript. AS, IT, IS and MS conceptualized the project. All authors reviewed and approved the manuscript. Acknowledgement We acknowledge the support of NIH for their financial assistance, and the College of Pharmacy for providing the necessary infrastructure and resources. Also, we would like to thank Dr. Vitali Sikirzhytski for the insightful feedback and suggestions, which significantly improved the quality of this manuscript. Data Availability AGATHA is an open-source algorithm available at: https://github.com/IlyaTyagin/AGATHA-C-GP. All statistical models as well as data used for their calculation are available on Zenodo at link http://doi.org/10.5281/zenodo.11521211 References Amiri R, Razmara J, Parvizpour S, Izadkhah H. A novel efficient drug repurposing framework through drug-disease association data integration using convolutional neural networks. BMC Bioinformatics. 2023;24(1):442. You Y, Lai X, Pan Y, Zheng H, Vera J, Liu S, et al. Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther. 2022;7(1):156. Chen X, Zhang J, Zhao Q, Ding L, Wu Z, Jia Z, He D. Application and teaching of computer molecular simulation embedded technology and artificial intelligence in drug research and development. Open Life Sci. 2023;18(1):20220675. Hay M, Thomas DW, Craighead JL, Economides C, Rosenthal J. Clinical development success rates for investigational drugs. Nat Biotechnol. 2014;32(1):40-51. Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41-58. Nosengo N. Can you teach old drugs new tricks? Nature. 2016;534(7607):314-6. Sybrandt J, Shtutman M, Safro I. MOLIERE: Automatic Biomedical Hypothesis Generation System. KDD : proceedings International Conference on Knowledge Discovery & Data Mining. 2017;2017:1633-42. Sybrandt J, Tyagin I, Shtutman M, Safro I, editors. AGATHA: Automatic Graph Mining And Transformer based Hypothesis Generation Approach. Proceedings of the 29th ACM International Conference on Information & Knowledge Management; 2020. Extance A. How AI technology can tame the scientific literature. Nature. 2018;561(7722):273-4. Zia A, Aziz M, Popa I, Khan SA, Hamedani AF, Asif AR. Artificial Intelligence-Based Medical Data Mining. J Pers Med. 2022;12(9). Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics. 2011;27(3):408-15. Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform. 2020;12(1):46. Hua Y, Dai X, Xu Y, Xing G, Liu H, Lu T, et al. Drug repositioning: Progress and challenges in drug discovery for various diseases. Eur J Med Chem. 2022;234:114239. Graham SA, Lee EE, Jeste DV, Van Patten R, Twamley EW, Nebeker C, et al. Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review. Psychiatry Res. 2020;284:112732. Miller D, Stern A, Burstein D. Deciphering microbial gene function using natural language processing. Nat Commun. 2022;13(1):5731. Landhuis E. Scientific literature: Information overload. Nature. 2016;535(7612):457-8. Matthew Barker WR. Partial Least Squares for Discrimination. Journal of Chemometrics. 2003;17(3):166-73. Lee LC, Liong CY, Jemain AA. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. Analyst. 2018;143(15):3526-39. Bocklitz T. Richard G. Brereton: Chemometrics: data driven extraction for science, 2nd ed. Anal Bioanal Chem. 2019;411(14):2995-6. Davenport F, Gallacher J, Kourtzi Z, Koychev I, Matthews PM, Oxtoby NP, et al. Neurodegenerative disease of the brain: a survey of interdisciplinary approaches. J R Soc Interface. 2023;20(198):20220406. WHO. The top 10 causes of death 2020 [Available from: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death. McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR, Jr., Kawas CH, et al. The diagnosis of dementia due to Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7(3):263-9. Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023;51(W1):W207-W12. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics. 2016;54:1 30 1-1 3. Davis AP, Murphy CG, Johnson R, Lay JM, Lennon-Hopkins K, Saraceni-Richards C, et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 2013;41(Database issue):D1104-14. Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 1933;24(6):417-41. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association. 1963;58(301):236-44. Lindberg C. The Unified Medical Language System (UMLS) of the National Library of Medicine. J Am Med Rec Assoc. 1990;61(5):40-2. System UML. Diseases Database Source Information U.S. National Library of Medicine2010 [Available from: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/DDB/index.html. Pietzner M, Wheeler E, Carrasco-Zanini J, Cortes A, Koprulu M, Worheide MA, et al. Mapping the proteo-genomic convergence of human diseases. Science. 2021;374(6569):eabj1541. Frijters R, van Vugt M, Smeets R, van Schaik R, de Vlieg J, Alkema W. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Comput Biol. 2010;6(9). Benjamin EJ, Virani SS, Callaway CW, Chamberlain AM, Chang AR, Cheng S, et al. Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association. Circulation. 2018;137(12):e67-e492. Kim ST, Chu Y, Misoi M, Suarez-Almazor ME, Tayar JH, Lu H, et al. Distinct molecular and immune hallmarks of inflammatory arthritis induced by immune checkpoint inhibitors for cancer therapy. Nat Commun. 2022;13(1):1970. Cybulla E, Vindigni A. Leveraging the replication stress response to optimize cancer therapy. Nat Rev Cancer. 2023;23(1):6-24. Greenhough LA, Liang CC, Belan O, Kunzelmann S, Maslen S, Rodrigo-Brenni MC, et al. Structure and function of the RAD51B-RAD51C-RAD51D-XRCC2 tumour suppressor. Nature. 2023;619(7970):650-7. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):D668-72. Stobdan T, Zhou D, Williams AT, Cabrales P, Haddad GG. Cardiac-specific knockout and pharmacological inhibition of Endothelin receptor type B lead to cardiac resistance to extreme hypoxia. J Mol Med (Berl). 2018;96(9):975-82. Eaton JB, Peng JH, Schroeder KM, George AA, Fryer JD, Krishnan C, et al. Characterization of human alpha 4 beta 2-nicotinic acetylcholine receptors stably and heterologously expressed in native nicotinic receptor-null SH-EP1 human epithelial cells. Mol Pharmacol. 2003;64(6):1283-94. Kasparbauer AM, Rujescu D, Riedel M, Pogarell O, Costa A, Meindl T, et al. Methylphenidate effects on brain activity as a function of SLC6A3 genotype and striatal dopamine transporter availability. Neuropsychopharmacology. 2015;40(3):736-45. Tilsed CM, Casey TH, de Jong E, Bosco A, Zemek RM, Salmons J, et al. Retinoic Acid Induces an IFN-Driven Inflammatory Tumour Microenvironment, Sensitizing to Immune Checkpoint Therapy. Front Oncol. 2022;12:849793. Halbach S, Rigbolt KT, Wohrle FU, Diedrich B, Gretzmeier C, Brummer T, Dengjel J. Alterations of Gab2 signalling complexes in imatinib and dasatinib treated chronic myeloid leukaemia cells. Cell Commun Signal. 2013;11(1):30. Arce C, Segura-Pacheco B, Perez-Cardenas E, Taja-Chayeb L, Candelaria M, Duennas-Gonzalez A. Hydralazine target: from blood vessels to the epigenome. J Transl Med. 2006;4:10. Ji Z, Wei Q, Xu H. BERT-based Ranking for Biomedical Entity Normalization. AMIA Jt Summits Transl Sci Proc. 2020;2020:269-77. Wei CH, Kao HY. Cross-species gene normalization by species inference. BMC Bioinformatics. 2011;12 Suppl 8(Suppl 8):S5. Zhou Y, Hou Y, Shen J, Huang Y, Martin W, Cheng F. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov. 2020;6:14. Fiscon G, Conte F, Farina L, Paci P. SAveRUNNER: A network-based algorithm for drug repurposing and its application to COVID-19. PLoS Comput Biol. 2021;17(2):e1008686. Sybrandt J, Shtutman M, Safro I. Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking. 2018 IEEE International Conference on Big Data (Big Data). 2018:1494-503. Aksenova M, Sybrandt J, Cui B, Sikirzhytski V, Ji H, Odhiambo D, et al. Inhibition of the Dead Box RNA Helicase 3 Prevents HIV-1 Tat and Cocaine-Induced Neurotoxicity by Targeting Microglia Activation. J Neuroimmune Pharmacol. 2019. Chiamulera C, Piva A, Abraham WC. Glutamate receptors and metaplasticity in addiction. Curr Opin Pharmacol. 2021;56:39-45. Badawy AA. Tryptophan metabolism in alcoholism. Adv Exp Med Biol. 1999;467:265-74. Petrakis IL, Buonopane A, O'Malley S, Cermik O, Trevisan L, Boutros NN, et al. The effect of tryptophan depletion on alcohol self-administration in non-treatment-seeking alcoholic individuals. Alcohol Clin Exp Res. 2002;26(7):969-75. Mechtcheriakov S, Gleissenthall GV, Geisler S, Arnhard K, Oberacher H, Schurr T, et al. Tryptophan-kynurenine metabolism during acute alcohol withdrawal in patients with alcohol use disorder: The role of immune activation. Alcohol Clin Exp Res. 2022;46(9):1648-56. Savonije K, Weaver DF. The Role of Tryptophan Metabolism in Alzheimer's Disease. Brain Sci. 2023;13(2). Scheme Scheme 1 is available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files Additionalfile1.docx Additionalfile2.docx Additionalfile3.docx Additionalfile4.docx Additionalfile5.docx Additionalfile6.docx floatimage1.png Scheme 1. Workflow of gene selection for drug repurposing based on literature-mining strategy. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4750719","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":338200473,"identity":"9ffe1e19-1d1b-4279-9148-875723f44a5f","order_by":0,"name":"Aliaksandra Sikirzhytskaya","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+UlEQVRIiWNgGAWjYDACZjaGAwxAxA9kH3jYABNmI0KLJFD1gUSitEAkDzAYADUyEKVFvp0t8XDFnzvyxtcOPzyQuMMmsYH/8AOGD2WHcWoxOMx24OAZnmeG226nGRxIPJNmzCCRZsA44xweLczsDQcbJA4zbrudANTSdliOQYKHgZm3DbcW+WaQFoPD9ptnp38AaeFh4D/DwPwXjxYGkMMaEg4nbpDOgdrCkMPAzIhHC9AvCQcbDhxOnnE7pwDsFzagXw72nEvH7bD+Y8YfG/4ctu2fnb75w0dgiPXzH3744EeZNW6HYQBQjBwgQf0oGAWjYBSMAiwAAKOjX0kHrOcgAAAAAElFTkSuQmCC","orcid":"","institution":"University of South Carolina","correspondingAuthor":true,"prefix":"","firstName":"Aliaksandra","middleName":"","lastName":"Sikirzhytskaya","suffix":""},{"id":338200474,"identity":"9daa737b-2d8e-4abf-8f14-3b3654515d4a","order_by":1,"name":"Ilya Tyagin","email":"","orcid":"","institution":"University of Delaware","correspondingAuthor":false,"prefix":"","firstName":"Ilya","middleName":"","lastName":"Tyagin","suffix":""},{"id":338200475,"identity":"6c9be57a-749a-4bcb-b401-6930ec23dde9","order_by":2,"name":"S. Scott Sutton","email":"","orcid":"","institution":"University of South Carolina","correspondingAuthor":false,"prefix":"","firstName":"S.","middleName":"Scott","lastName":"Sutton","suffix":""},{"id":338200476,"identity":"02e6b6c3-4efe-45ab-b172-626d58d7fd65","order_by":3,"name":"Michael D. Wyatt","email":"","orcid":"","institution":"University of South Carolina","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"D.","lastName":"Wyatt","suffix":""},{"id":338200477,"identity":"46044be4-bf15-4109-ae61-5f3f0f8d84ea","order_by":4,"name":"Ilya Safro","email":"","orcid":"","institution":"University of Delaware","correspondingAuthor":false,"prefix":"","firstName":"Ilya","middleName":"","lastName":"Safro","suffix":""},{"id":338200478,"identity":"f60f1fb5-4a75-45c6-9ad6-99b31ea2e7fd","order_by":5,"name":"Michael Shtutman","email":"","orcid":"","institution":"University of South Carolina","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"","lastName":"Shtutman","suffix":""}],"badges":[],"createdAt":"2024-07-16 15:08:41","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4750719/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4750719/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":62657055,"identity":"402155f0-adfd-4599-8db3-77e0dc1a6e5f","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":97433,"visible":true,"origin":"","legend":"\u003cp\u003ePrincipal Component Analysis of the Health Condition Data Set (HCDS).\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/5f82eddbe4d6def378a056da.png"},{"id":62657057,"identity":"aa1fba08-229e-473c-a732-ef920475bb34","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":173704,"visible":true,"origin":"","legend":"\u003cp\u003eResults of cross-validated PLSDA classification analysis. \u003cstrong\u003eA\u003c/strong\u003e – Probabilities for each disease to be classified as an assigned class, Q Residuals vs T\u003csup\u003e2\u003c/sup\u003e Hoteling plot error plot. PLSDA plots represent diseases separation for the first two (\u003cstrong\u003eB\u003c/strong\u003e) and three (\u003cstrong\u003eC\u003c/strong\u003e) latent variables.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/fc6a2dffda839cba0f1123e1.png"},{"id":62657059,"identity":"9ab483e6-cc8f-48fb-ba18-10e189dbb256","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":290248,"visible":true,"origin":"","legend":"\u003cp\u003ePLSDA prediction of gene distribution among diseases A – Dementia, B – Diabetes, C – Arthritis, D – Heart condition/disease, E – Hypertension, F – Cancer, G – SUD, H – Q residuals vs Hotelling T^2 plot. Genes colored by the prediction probability to be assigned to Dementia.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/548c076229134ecdd5fe424c.png"},{"id":62657058,"identity":"36909d66-37fb-4497-a6fe-530fd7c81542","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":46281,"visible":true,"origin":"","legend":"\u003cp\u003eHierarchical clustering analysis using Ward’s method. Different colors illustrate gene groupings characterized by various disease markers.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/0c5b234e6818ef8d820630e8.png"},{"id":62657061,"identity":"6a4503f8-cb98-4444-af29-062c463e947f","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1684654,"visible":true,"origin":"","legend":"\u003cp\u003ePLSDA prediction. Genes highly ranked for every disease (pink markers) mapped by their prediction probability for the Dementia group.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/8c3073936036f99c185e9b80.png"},{"id":62658013,"identity":"840d1932-df82-4af7-86fa-a69b8d153c00","added_by":"auto","created_at":"2024-08-17 02:14:00","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":64212,"visible":true,"origin":"","legend":"\u003cp\u003eClass predicted probability for Diabetes, Arthritis, Heart disease/condition, Hypertension and Cancer groups.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/f45bb8b30fc68183fabe5d6c.png"},{"id":66525629,"identity":"12ea63f9-c676-4935-817c-7e6959dde140","added_by":"auto","created_at":"2024-10-14 04:54:18","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3248678,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/d1c18702-4c39-47c4-b2c4-b39cdad7249d.pdf"},{"id":62657056,"identity":"e90b2ba1-e3de-455a-b37c-a6f7c7037126","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":21460,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile1.docx","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/ae4253cb575a67210631008c.docx"},{"id":62658014,"identity":"8048c211-94e2-40d4-9e58-ec05b5007177","added_by":"auto","created_at":"2024-08-17 02:14:00","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":21263,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile2.docx","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/3a68eba04175945bee12bb50.docx"},{"id":62658900,"identity":"a42d1027-2fa4-4f09-a484-c911ab2533a5","added_by":"auto","created_at":"2024-08-17 02:22:00","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":25512,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile3.docx","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/71f5c091b61d42a720469a99.docx"},{"id":62658012,"identity":"9f5b085b-63cf-47c3-a9ea-ffe225685a28","added_by":"auto","created_at":"2024-08-17 02:14:00","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":23490,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile4.docx","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/2e238e3d14e6e21c37d056d5.docx"},{"id":62657066,"identity":"542647f2-9f72-456c-91aa-faa6f864e44a","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":85848,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile5.docx","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/f8dd9877114d8fcb3710fee9.docx"},{"id":62658016,"identity":"6aab3007-3ba1-48c2-9f43-fbca37587cd3","added_by":"auto","created_at":"2024-08-17 02:14:00","extension":"docx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":33592,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile6.docx","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/63afb41f47738397fc5786b7.docx"},{"id":62657064,"identity":"295e0ae6-0914-424e-a476-d86e5e11c81c","added_by":"auto","created_at":"2024-08-17 02:06:00","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":140443,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eScheme 1.\u003c/strong\u003e Workflow of gene selection for drug repurposing based on literature-mining strategy.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-4750719/v1/b2d7636aebe77e7b0de7efec.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia","fulltext":[{"header":"Background","content":"\u003cp\u003eOver the past decade, advancements in analytical methods have opened dramatic new opportunities to unveil hidden connections among complex networks (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). The advancement of Artificial Intelligence (AI) techniques enables researchers to query and analyze massive datasets, simulate experiments virtually, and generate scientific hypotheses through advanced analysis. Such tools have been used extensively by the pharmaceutical industry to lower the costs of drug discovery (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). The development of novel therapeutic agents from idea to FDA approval involves substantial commitments of time and money (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). The FDA strictly evaluates efficacy and safety when approving new therapeutics, but the costs of newly approved therapeutics are now under intense scrutiny as the costs of new drug development have risen. Repurposing existing FDA-approved drugs for new indications can alleviate drug development costs in part because the safety profile and clinical experience already exists for the drug (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e). Repurposing significantly accelerates the entire process by taking advantage of crucial steps that already occurred in the original FDA approval process (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). The key to the initial steps of drug repurposing is to find a connection between an existing drug and a disease of interest that is worth exploring preclinically or clinically for a new therapeutic indication. In many cases the necessary knowledge may already be present in biomedical literature; however, the connections between various pieces of information may not be obvious. To determine the hidden connections, we developed AI-based literature analysis tools: MOLIERE (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e), followed by Automatic Graph Mining And Transformer based Hypothesis Generation Approach (AGATHA) (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). The recent development of AI allows automated extraction of valuable information from unstructured text such as scientific abstracts or articles, thus enabling efficient and scalable processing of textual data that dramatically saves time and effort compared to manual processing (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e). AI algorithms can identify themes, topics, and clusters within a collection of documents, thus helping researchers to overcome challenging, time-demanding analysis of literature (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e). In vast databases like PubMed, search results often overwhelm users with excessive information that is hard to curate without detailed, time intensive assessment. Natural language processing (NLP) techniques in AI frameworks not only speed up the research but also enhance extracting valuable knowledge from massive datasets, which is challenging to achieve manually (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). Drug repurposing research benefits from these advantages in many ways (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e). Studying disease at the gene level remains a challenging task despite recent advancements in genomics and technology. NLP methods were successfully used for a variety of gene-related tasks including but not limited to the identification of unique anticancer targets (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e), predicting cognitive decline (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e), interpreting microbial genes (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e) and others. To achieve successful results in NLP calculations, it is imperative to have high-quality training datasets, conduct preprocessing procedures to normalize the data and reduce noise, choose appropriate model architecture (such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), or transformers), and optimize hyperparameters such as learning rate, batch size, dropout rate, and model architecture configurations.\u003c/p\u003e \u003cp\u003eAGATHA is an effective literature-based discovery tool capable of extracting relevant information by sifting through immense scientific databases including PubMed, which expands annually by over one million papers (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e). This tool analyzes the collection of lexical elements (e.g., words, phrases, and lemmas) within each research article abstract to identify possible hidden connections among terms specified by the user. The likelihood of the potential connection is estimated using a multi-headed self-attention mechanism accounting for the spatial relationships between individual terms in a latent vector space, which we will refer to as the \u0026ldquo;AGATHA space\u0026rdquo;.\u003c/p\u003e \u003cp\u003eThe AGATHA system pipeline can roughly be split into two stages: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) semantic knowledge network construction and its embedding into a low-dimensional vector space and (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) transformer-based predictor training. These two stages can be used independently from each other, which we leverage in the current work here. The first stage results in a large multi-layered knowledge network, which connects individual units of information (such as the UMLS terms, semantic predicates or entities) with their corresponding literature sources. For example, a term representing \u0026ldquo;breast cancer\u0026rdquo; is connected to all sentences and semantic predicates containing this term. After construction, we perform network embedding, such that each node is assigned a learned vector representation (or coordinates) of 512 dimensions. This allows us to establish spatial relationships between individual concepts by computing distances between them (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). Terms that are logically connected, such as different stages of the same health condition or its type, tend to cluster together in AGATHA space. Conversely, terms that are logically distant from each other are positioned relatively far apart. This approach aims to facilitate an intuitive understanding of the relationships and connections between the many scientific terms and concepts within scientific literature.\u003c/p\u003e \u003cp\u003eThe second stage results in a transformer-based predictor model, which is trained to prioritize meaningful associations between biomedical concepts above random noise. For each term-term association, it outputs a score within a unit interval indicating the likelihood of this association being biomedically relevant, based on the insights learned from scientific abstracts. When we use the AGATHA predictor in a one-to-many setting (like in this work), we obtain the probability distribution over a range of pairs, where the source term is fixed (e.g., a particular disease) and target terms represent a group of concepts of similar semantic type (e.g., list of different genes). Therefore, we can identify what genes are more likely to be associated with a particular disease and select the most prominent candidates for further downstream analysis.\u003c/p\u003e \u003cp\u003eTo classify AGATHA outcomes, we applied multivariant statistical methods, including Partial Least Squares Discriminant Analysis (PLSDA) (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e) (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e). This method helps categorize and predict samples (diseases/genes) belonging to specific classes. PLSDA has advantages compared to other discriminant methods due to its ability to handle data with high variability and a power to reduce dataset dimensionality through the utilization of latent variables (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e). In addition to the classification analysis, unsupervised clustering was utilized to unveil latent relationships that cannot be directly measured within the multivariate data. The combination of these steps followed by comprehensive pathway analysis helps to explain the biological significance of the classification outcomes and produces a final list of genes as candidates for drug repurposing.\u003c/p\u003e \u003cp\u003eIn this work, we focused on application of our methodology to identify candidate drugs suitable to be repurposed as treatments for neurodegenerative diseases (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e), which pose major challenges in healthcare as the seventh leading cause of death in the world (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e). The term \u0026ldquo;neurodegenerative\u0026rdquo; covers a wide spectrum of neurocognitive conditions that despite their different pathologies often share common symptoms, in which dementia is a major outcome (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e). Therefore, the analysis included a broad spectrum of neurodegeneration in the dementia domain to search for common themes and pathways.\u003c/p\u003e \u003cp\u003eInitially, the classification model facilitated the extraction of \u0026ldquo;dementia\u0026rdquo; genes, which were subsequently analyzed within the context of the pathways in which they participate. Once it was confirmed that the proposed method effectively extracts the necessary data, the same procedure was employed on the remaining non-neurodegenerative disease classes to obtain specific genes for each group. Next, after obtaining a list of genes that have a high likelihood to be associated with each disease, they were mapped in the Dementia class to assess their places on a probability scale. Genes for which known small molecules interact with the pathways of interest were prioritized to select FDA-approved medications or medications in experimental or investigational status. We chose a total of 38 drugs for potential repurposing with a focus on six of them, all of which have demonstrated effectiveness in treating diseases unrelated to central nervous system function.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cp\u003eAGATHA is an open-source algorithm available at: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/IlyaTyagin/AGATHA-C-GP\u003c/span\u003e\u003cspan address=\"https://github.com/IlyaTyagin/AGATHA-C-GP\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. The operational principles of AGATHA are detailed in the \u003cb\u003eAdditional Information\u003c/b\u003e section. Statistical analysis was performed on a multidimensional dataset using MATLAB R2023a software from MathWorks (Natick, MA) and the PLS Toolbox from Eigenvector Research, Inc. (Wenatchee, WA). Pathway analysis was conducted using g:profiler tool (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e). Gene characterization and selection of potential drugs were achieved with the help of GeneCards (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e) and CTD(\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e) databases.\u003c/p\u003e \u003cp\u003ePrincipal Component Analysis (PCA) (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e) was primarily used for dimensionality reduction and preliminary data structure analysis. This widely-employed method is based on transforming the introduced data into a set of principal components that describe the variance of the data. It involves a series of mathematical steps, including calculating the covariance matrix of the data, computing its eigenvalues, and subsequently reducing the dimensionality of the data. The prepared dataset was normalized by the total area, and auto-scaled by the division of each of the 512-column in the calibration matrix by its standard deviation. Subsequently, cross-validation was performed using the Venetian Blinds approach, consisting of 10 splits with a blind thickness of 1. The achieved model showed a clear separation between the Dementia and SUD classes with the rest of the data falling into a single cohesive group. However, the cross-validated Root Mean Square Error for this model was extremely low (0.00231327), which indicates a good accuracy of the model.\u003c/p\u003e \u003cp\u003eThe classification model was calculated using the PLSDA method (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e), which can deal with heterogeneous data and describe it by only a few Latent Variables (LV). LVs are calculated using regression coefficients, determined for each component, and followed by estimating their positions in the PLSDA space. To moderate the risk of overfitting, Venetian-blinds cross-validation was employed. This method involves partitioning the data into k equal-sized segments and alternately using them as training and validation sets. Alongside dimensionality reduction, PLSDA ensures that the calculated components possess unique information by being orthogonally opposed to each other.\u003c/p\u003e \u003cp\u003eUnsupervised hierarchical clustering analysis was applied on part of the Dementia-classified data to group similar values into clusters based on their common characteristics. In cluster analysis, pairs of samples with the smallest distance between them are identified and merged without knowledge of class origin. These similar clusters are then grouped together in dendrogram visualization to provide a clearer representation. Ward\u0026rsquo;s method was used to minimize the variance within each cluster by evaluating the differences between merging two groups of data (\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e). This approach is especially efficient when handling high-dimensional data, such as our disease/coordinate or gene/coordinate sets, or when clusters are more likely to exhibit equal variance within them. These differences were estimated by the sum of squared deviations from the mean (variance) after merging the clusters (Mahalanobis distance).\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eData description\u003c/h2\u003e \u003cp\u003eDiseases and conditions of interest were selected from the Disease Database provided by the Unified Medical Language System (UMLS) (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e) and combined into the Health Conditions Data Set (HCDS) comprising a total of 122 terms, which are categorized into seven groups: Dementia (24 conditions), Diabetes (12 conditions), Arthritis (9 conditions), Heart Conditions/Diseases (14 conditions), Hypertension (11 conditions), Cancer (12 conditions), and Substance Use Disorders (SUD) (40 conditions/substances). All the selected terms are formal names for diseases and health conditions and the last group (SUD) additionally contains the most common substances of abuse (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eHealth Condition Data Set. Diseases, conditions, and substances are grouped by the type and color coded regarding their attribution.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"15\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c14\" colnum=\"14\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c15\" colnum=\"15\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003e#\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eName\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUMLS code\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e#\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eName\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eUMLS code\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003e#\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eName\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eUMLS code\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003e#\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eName\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eCode\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003e#\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c14\"\u003e \u003cp\u003eName\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c15\"\u003e \u003cp\u003eUMLS code\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c3\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDementia\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMonogenic Diabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec3888631\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eAtrial Fibrillation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0004238\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eOsteosarcoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0029463\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e103\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eMethadone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0025605\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAids Dementia Complex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0001849\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSteroid-Induced Diabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0342269\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCardiac Arrhythmia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0003811\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003ePancreas Carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0235974\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e104\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eHydrocodone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0020264\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlcohol Amnestic Disorder\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0001940\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eType 1 Diabetes Mellitus\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0011854\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRheumatic Heart Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0035439\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eProstate Cancer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0376358\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e105\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eMeperidine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0025376\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlcohol-Induced Mental Disorders\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0033936\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eType 2 Diabetes Mellitus\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0011860\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMyocardial Infarction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0027051\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eMelanoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0025202\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e106\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eOxycodone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0030049\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlzheimer Disease, Early Onset\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0750901\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaturity Onset Diabetes Mellitus In Young\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0342276\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCardiomyopathies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0878544\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c12\" namest=\"c10\"\u003e \u003cp\u003e\u003cb\u003eSubstance abuse disorder\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e107\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eOxymorphone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0030073\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlzheimer Disease, Late Onset\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0494463\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCystic Fibrosis Related Diabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec2242728\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eAtrial Septal Defects\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0018817\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eAlcohol Abuse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0085762\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e108\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eDiacetylmorphine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec001189\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAmyotrophic Lateral Sclerosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0002736\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eGestational Diabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0085207\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c9\" namest=\"c7\"\u003e \u003cp\u003e\u003cb\u003eHypertension\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eEthanol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0001962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e109\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eMitragynine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0066619\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCorticobasal Degeneration\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0393570\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eWolfram Syndrome 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec4551693\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eHigh Blood Pressure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0020538\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eAmphetamine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0002658\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e110\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eHydromorphone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0012306\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDementia Associated with Alcoholism\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0236656\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c6\" namest=\"c4\"\u003e \u003cp\u003e\u003cb\u003eArthritis\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eChronic Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0745114\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eCannabicyclohexanol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec3492502\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e111\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eLoperamide Hydrochloride\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0282221\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDementia, Vascular\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0011269\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eArthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0003864\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eEssential Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0085580\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eSynthetic Cannabinoids\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0006864\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eDelysid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0024334\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFrontotemporal Dementia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0338451\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAnkylosing Spondylitis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0038013\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMalignant Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0020540\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eCannabidiol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0936079\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e113\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eKetamine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0022614\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHand\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec4285693\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePsoriatic Arthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0003872\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eHypertensive Crisis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0020546\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eCentral Nervous System Depressants\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0007681\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e114\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eMescaline\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0025460\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHiv Encephalopathy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0276548\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePyogenic Arthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0003869\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eIsolated Systolic Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0745133\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eDextromethorphan\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0011816\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e115\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eEcstasy Abuse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0743247\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHuntington's Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0020179\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRheumatoid Arthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0003873\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eResistant Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0745130\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eBarbiturates, Benzodiazepines\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0004745\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e116\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003ePsilocybin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0033850\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLewy Body Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0752347\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFibromyalgia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0016053\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eGestational Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0852036\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eFlunitrazepam\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0016296\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e117\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eSalvia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec3668955\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMajor Neurocognitive Disorder\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec4087461\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eReactive Arthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0085435\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eSecondary Hypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0155616\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eRohypnol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0699927\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e118\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003ePhencyclidine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0031381\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMild Cognitive Disorder\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec1270972\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eGout\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0018099\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCerebrovascular Accident\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0038454\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eOpium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0029112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e119\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eNandrolone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0027368\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMild Neurocognitive Motor Disorder\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec2609165\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDegenerative Polyarthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0029408\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eTransient Ischemic Attack\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0007787\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eSodium Oxybate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0037537\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e120\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eOxandrolone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0029995\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eParkinson's Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0030567\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c6\" namest=\"c4\"\u003e \u003cp\u003e\u003cb\u003eHeart disease/condition\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c9\" namest=\"c7\"\u003e \u003cp\u003e\u003cb\u003eCancer\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eCathine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0069021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eOxymetholone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0030072\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePresenile Dementia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0011265\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAngina Pectoris\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0002962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eBreast Cancer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0006142\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eTobacco\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0040329\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c13\"\u003e \u003cp\u003e122\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c14\"\u003e \u003cp\u003eTestosterone Cypionate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c15\"\u003e \u003cp\u003ec0076181\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrion Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0162534\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHeart Failure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0018801\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eCervical Cancer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0007847\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eNicotine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0028040\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" morerows=\"6\" nameend=\"c15\" namest=\"c13\" rowspan=\"7\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSenile Dementia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0002395\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCongenital Heart Defects\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0018798\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eColorectal Cancer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0009402\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eKhat\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec1386575\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSenile Psychosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec1457889\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHeart Valve Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0018824\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eLeukemia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0023418\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eBenzoylmethylecgonine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0009170\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSubcortical Vascular Dementia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0393561\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCoronary Artery Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec1956346\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eLymphoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0024299\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eMethylphenidate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0025810\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c3\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDiabetes\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePeripheral Artery Disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec1704436\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMalignant Lung Neoplasm\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0242379\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eCodeine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0009214\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrediabetes Syndrome\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec0362046\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCoronary Arteriosclerosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0010054\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMalignant Neoplasm Of Skin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0007114\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e101\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eMorphine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0026549\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLatent Autoimmune Diabetes in Adults\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ec1739108\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDisorder Of Pericardium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ec0265122\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eNeuroblastoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ec0027819\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e102\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003eLoperamide\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e \u003cp\u003ec0023992\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eWe hypothesized that there are spatial clusters within the AGATHA space that correspond to different groups of health conditions (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This implies that by mapping genes within the AGATHA space and analyzing their positions relative to disease groups, we can uncover previously unrecognized links between specific gene sets and health conditions. These genes are specifically evaluated for potential drug repurposing opportunities.\u003c/p\u003e \u003cp\u003eA general logical workflow is represented in Scheme \u003cspan refid=\"Sch1\" class=\"InternalRef\"\u003e1\u003c/span\u003e below. Disease-categorized data from a variety of databases was mapped to the AGATHA space for further characterization. Then, classification methods were used to build a discrimination model that extracted four lists of genes: 1) genes specific for each disease class; 2) Dementia genes, highly ranked in other disease classes; 3) Disease genes, highly ranked in Dementia class, and 4) genes common for all diseases. These groups of genes were used for pathway analysis performed using \u003cb\u003eg:profiler\u003c/b\u003e tool (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e), which helped to select the candidates for drug repurposing evaluation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eExploratory analysis of semantic links revealed by the AGATHA embedding space.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe complex relationships between the selected health condition terms (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), described across multiple diverse scientific articles, were assessed using AGATHA text mining software. Following semantic embedding, these health condition terms are represented as points within a high-dimensional latent space, the embedding space we named earlier as AGATHA space (see the first paragraph of the \u003cspan refid=\"Sec3\" class=\"InternalRef\"\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/span\u003e section). The coordinates of these terms are calculated to reflect their semantic properties in such a way that words or phrases with similar meanings are represented by points that are closer to each other within the space. The number of dimensions in an embedding space is typically influenced by the volume and complexity of the text data being analyzed. However, it is also determined by the specific requirements of the model and the task at hand. While a larger and more complex dataset might benefit from a higher-dimensional space to capture more nuanced semantic relationships, the choice of dimensionality also depends on computational constraints and the desired balance between detail and efficiency. In our case, preliminary studies indicated that an efficient embedding of the information contained within the PubMed database of scientific abstracts is achieved by using an embedding space with 512 dimensions (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). In Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, we see the relative positions of HCDS groups as visualized in 3D space. This visualization is the result of condensing the original 512-dimensional data into a more comprehensible three-dimensional space using Principal Component Analysis (PCA). Two distinct clusters are formed by two non-overlapping sets of health conditions: SUD (green diamonds) and Dementia (red diamonds). The five remaining sets \u0026mdash; Diabetes, Arthritis, Heart Diseases, Hypertension, and Cancer \u0026mdash; form a tight spatial cluster that is separated from both the SUD and Dementia clusters. Subclusters corresponding to these five groups of health conditions remain distinguishable. However, they are positioned close to each other resulting in the overlap of certain groups. These observations have led us to hypothesize that the AGATHA space contains a spatial pattern characteristic of the health condition groups. In further sections we implement advanced statistics to identify and characterize such patterns.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eClassification analysis of health condition groups mapped to the AGATHA space.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe validity of spatial patterns in the AGATHA space associated with health condition groups was tested using PLSDA, a standard partial least square classification approach. In this study, we leverage both the interpretability of the multiclass PLSDA models and their capability to effectively handle collinear data. The strong predictive performance of the PLSDA models was subsequently employed to investigate gene/health condition associations.\u003c/p\u003e \u003cp\u003ePLSDA classification has been demonstrated to be a successful method for addressing multivariate data, offering tunable model complexity (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e). We used PLSDA to build a supervised classification model (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) with classes defined by health condition groups (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Extensive preliminary classification trials (not reported here) enabled the identification of optimal data preprocessing and classification parameters. For the final classification model, the input matrix containing coordinates in 512-dimensional AGATHA space for all health conditions was preprocessed using normalization by the total area and auto-scaling. Despite the high dimensionality of the input data generated through complex algorithms implemented in the AGATHA text mining software, four latent variables were sufficient to produce a robust classification of health conditions. Generally, latent variables are calculated so that each subsequent latent variable captures the shared variance remaining after the extraction by the previously calculated variables. A total of 16.31% of the data was covered by the first four latent variables.\u003c/p\u003e \u003cp\u003eThe stability of the PLSDA classification model was verified using the Venetian blinds cross-validation approach, which involves dividing the data into ten equally sized folds. The final classification model effectively categorizes health conditions into seven predefined groups, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003cb\u003eA\u003c/b\u003e, demonstrating cross-validated sensitivity and specificity parameters within the range of 0.786 to 0.990. As expected from the exploratory analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), the Dementia and SUD classes exhibited the best classification performance. The Dementia panel in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003cb\u003eA\u003c/b\u003e reveals that all health conditions initially selected for the Dementia group have a probability close to 100% of being classified as part of the Dementia class. Note that the 0\u0026ndash;1 range on the Y-axis in the panel corresponds to a 0-100% range of probabilities. These observations suggest that all the health conditions we originally selected for the Dementia group constitute a distinct spatial cluster in the AGATHA space. Furthermore, should a small portion (one-tenth) of the Dementia set be omitted as 'unknown' health conditions during the training phase, these 'unknowns' are likely to be accurately classified in subsequent classification analysis. Interestingly, this observation holds true not only for Dementia and SUD, but also for Diabetes. In Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Diabetes is the most distant from the Dementia and SUD groups, neighboring but not overlapping with the other groups. Discriminating between the Arthritis, Cancer, Heart Disease/Condition, and Hypertension groups is also achievable, as shown in the \u003cspan refid=\"Sec13\" class=\"InternalRef\"\u003e\u003cb\u003eDiscussion\u003c/b\u003e\u003c/span\u003e section, despite overlapping regions. Two distinct pairs of groups can be identified: the Heart Disease/Condition and Hypertension pair, and the Arthritis and Cancer pair (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003cb\u003eA\u003c/b\u003e corresponding panels). Health conditions originally selected for these four groups overlap and show a non-zero probability of being assigned to another class of the pair. The proximity and overlap of the Heart Disease/Condition and Hypertension groups can be explained by the shared physiological characteristics of these disorders(\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e). There are also certain connections between Cancer and Arthritis, such as associations with chronic inflammation and paraneoplastic arthritis(\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e). While considering the physiological origins of connections within these two pairs of health condition groups is beyond the scope of this proof-of-concept study, we will later demonstrate that, upon more detailed analysis of the overlapping groups (see black circles in Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003cb\u003eB\u003c/b\u003e and \u003cb\u003eC\u003c/b\u003e), it is possible to build a robust classification model for discrimination of all groups.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eAssigning human genes to health condition groups using AGATHA latent space and advanced statistical methods.\u003c/b\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eText mining algorithms provide a unique opportunity to connect scientific concepts using lexical context. As demonstrated above, the AGATHA algorithm successfully condensed scientific information within the PubMed database, capturing lexical context characteristics of the health condition groups we selected for this proof-of-concept study. In this section, we explore the ability of the AGATHA system to uncover hidden connections between genes and health conditions. This was achieved by mapping all human genes to the AGATHA space and categorizing them into health condition groups using the PLSDA classification model. This step was followed by an in-depth analysis of the identified gene clusters in the context of diseases, physiological pathways, and drugs known to interact with these pathways.\u003c/p\u003e \u003cp\u003eThe complete list of human genes, mapped to the AGATHA space as a matrix with 20,889 rows and 512 columns, was analyzed using the PLSDA model developed for HCDS. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e illustrates the distribution of genes among all disease classes, with the color bar showing their attribution to the Dementia class in each category.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eAs seen in the figure above, the distribution of Dementia genes does not follow the same pattern across all other classes. At this point, the evaluation of gene distribution in the Dementia class is necessary to show that the calculated model is coherent from the biological point of view. To achieve this, genes with a probability exceeding 80% were analyzed using hierarchical clustering. This approach aided in investigating the internal structure of the data, followed by the pathway analysis of the calculated clusters.\u003c/p\u003e \u003cp\u003eA total of 1079 genes with high probability to be associated with dementia were identified by the classification model and further subjected to unsupervised cluster analysis using agglomerative Ward\u0026rsquo;s method with a total of four principal components and Mahalanobis distance that accounts for the variations of multivariate data (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).The selected threshold allowed for gene separation into four distinct clusters that were further subjected to a pathway analysis to justify the biological meaning of data distribution.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe dendrogram in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows four well-separated gene clusters, each defined by specific biological processes and mechanisms. These assignments were determined through pathway analysis, which can be summarized as follows:\u003c/p\u003e \u003cp\u003eCompared with \u003cb\u003eClusters 2\u0026ndash;4, Cluster 1\u003c/b\u003e is separated from the remaining data at the initial threshold level, indicating its unique characteristics. Pathway analysis revealed that the processes within Cluster 1 do not show direct connections to specific physical or behavioral anomalies and cannot be linked to any specific disease category. However, this information is still useful when genes from this cluster are mapped as high-ranked in other disease categories. \u003cb\u003eCluster 2\u003c/b\u003e has a strong connection to several kinds of pathways known to be altered in neurodegenerative conditions including but not limited to Alzheimer's disease, Amyotrophic lateral sclerosis, Parkinson's disease, and Apoptosis - multiple species as labeled by g-profiler. \u003cb\u003eCluster 3\u003c/b\u003e has characteristics analogous to Cluster 2, such as nervous system development, presynaptic endocytosis, neuron projection organization, visual perception, regulation of neuron projection development, dendrite morphogenesis, and regulation of cell projection organization, and neuron projection. Some of the pathways discussed here are relevant to conditions such as Parkinsonism, disturbances in higher cognitive functions, central motor function disruptions, Ataxia, speech impairments related to the nervous system, and the life cycle of the HIV-1 virus. \u003cb\u003eCluster 4\u003c/b\u003e is different from the other three by having pathways related to substance abuse. It includes nicotine, cocaine, amphetamine addictions, alcoholism, and some pathways connected to the nervous system such as neuroactive ligand-receptor interaction, dopaminergic synapse, retrograde endocannabinoid signaling, axon guidance, and many more (\u003cb\u003eAdditional\u003c/b\u003e Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. \u003cb\u003eSummary of pathways and genes for the Dementia class\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eAs a result, we acquired a list of genes with a high probability of being connected to Dementia as well as being simultaneously highly ranked in the remaining six classes. After evaluation by the GeneCards database, these genes were separated into one specific group (\u003cb\u003eAdditional\u003c/b\u003e Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. \u003cb\u003eDementia non-specific genes, highly ranked in other disease groups\u003c/b\u003e). Subsequently, highly ranked genes from the Diabetes, Arthritis, Heart, Hypertension, Cancer, and SUD classes were extracted for each disease/condition and were mapped in the Dementia group (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). For most of the classes, they were not presented at the top of the probability scale, so only the ones with the highest likelihood to be connected to Dementia were combined. (\u003cb\u003eAdditional\u003c/b\u003e Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. \u003cb\u003eDisease-specific genes, highly ranked in Dementia\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn addition to the described analyses, we followed the same procedure and extracted the top 1079 genes for each disease resulting in six lists: Diabetes, Arthritis, Heart conditions/diseases, Hypertension, Cancer, and SUD. These lists were reduced by retaining only genes distinct to each specific class of disorders to remove excessive overlapping of biological information. Based on these results, pathway analysis was performed for the specific gene lists and summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMain pathways identified for the lists of specific genes for all disease classes.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDisease\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumber of genes\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSignificant pathways\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDementia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e518\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGlutamatergic synapse pathway, neuron development, axon development, Alzheimer disease, Amyotrophic lateral sclerosis, Dementia, Brain atrophy, Frontotemporal dementia, Parkinsonism, regulation of synaptic plasticity\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e109\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInsulin and sugar metabolism-related pathways, glucose transmembrane transporter activity, insulin secretion, monosaccharide transmembrane transport, Type II diabetes mellitus, Abnormal hemoglobin\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImmune response-activating cell surface receptor signaling pathway, activation of immune response, regulation of B cell receptor signaling pathway, protein deglutamylation, Abnormal lymphocyte proliferation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHeart\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBlood circulation, regulation of blood circulation, regulation of heart contraction, regulation of heart rate, heart development\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eN/A\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCancer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDNA damage sensor activity, Homologous recombination, RAD51B-RAD51C-RAD51D-XRCC2-XRCC3 complex\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSUD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e397\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNicotine addiction, morphine addiction, cocaine addiction, Common pathways underlying drug addiction, Drug metabolism - cytochrome P450, drug adme, Steroid hormone biosynthesis, dopamine neurotransmitter receptor activity, dopamine secretion, modulation of chemical synaptic transmission, Retrograde endocannabinoid signaling\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003ePathway overview\u003c/h2\u003e \u003cp\u003ePathways specific for \u003cb\u003eDementia\u003c/b\u003e class based on selected genes are described above.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eDiabetes (109 genes)\u003c/h2\u003e \u003cp\u003eThe diabetes group included insulin and sugar metabolism-related pathways, glucose transmembrane transporter activity, insulin secretion, monosaccharide transmembrane transport, Type II diabetes mellitus, abnormal hemoglobin, and others, as well as general pathways that can be assigned to a variety of conditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eArthritis (32 genes)\u003c/h2\u003e \u003cp\u003eThe arthritis group included a variety of autoimmune and inflammatory diseases, which is reflected by the list of different characteristic pathways: immune response-activating cell surface receptor signaling pathway, activation of immune response, regulation of B cell receptor signaling pathway, protein deglutamylation, abnormal lymphocyte proliferation, and others.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eHeart condition/disease (67 genes)\u003c/h2\u003e \u003cp\u003eThe group included pathways such as blood circulation, regulation of blood circulation, regulation of heart contraction, regulation of heart rate, heart development, and others. By nature, some of these pathways are related to hypertension which can explain an absence of specific genes in the Hypertension cluster.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eHypertension (0 genes)\u003c/h2\u003e \u003cp\u003eThere were no specific genes identified for the current group. Hypertension can be caused by various factors. It is related to the function of the heart, diabetes due to damaged arteries, excessive alcohol consumption, and others. Some of these causes were considered by the rest of the six classes, so the absence of hypertension-specific genes is not surprising and can be investigated further.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eCancer (18 genes)\u003c/h2\u003e \u003cp\u003eThe cancer group can be described by several pathways, related to DNA damage sensing activity(\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e). This class includes the RAD51B-RAD51C-RAD51D-XRCC2-XRCC3 complex, in which inactivating mutations predispose to breast, ovarian and prostate cancers(\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eSUD (397 genes)\u003c/h2\u003e \u003cp\u003eThe selected SUD group was composed of various addictions and individual substances. This was reflected in identified pathways that included chemical carcinogenesis - DNA adducts, nicotine addiction, morphine addiction, cocaine addiction, Common pathways underlying drug addiction, Drug metabolism - cytochrome P450, drug ADME (absorption, distribution, metabolism and excretion), steroid hormone biosynthesis, dopamine neurotransmitter receptor activity, dopamine secretion, modulation of chemical synaptic transmission, retrograde endocannabinoid signaling and many others.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eDrug repurposing\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe classification model predicted the list of genes with high potential to be associated with Dementia. These hidden connections are selected based on the learned patterns and relationships in the data indirectly revealing acquaintances between terms. Further developments in understanding these relationships will require additional interpretation and analysis beyond the model itself. We selected a list of potential drugs for repurposing analysis based on the presence of genes specific to six groups of diseases within the Dementia class using GeneCards and Comparative Toxicogenomics Database (CTD) (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e). Chosen medications were additionally verified by the DrugBank database (\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e) to track their approval status as well as the stage of Clinical trials (\u003cb\u003eAdditional Table\u0026nbsp;4\u003c/b\u003e). Finally, Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e summarizes the most significant candidates for drug repurposing based on the combination of statistically predicted gene/disease connections discovered by the classification model and gene/drug connections identified using databases listed above. Specifically, Bosentan, Mecamylamine, and Methylphenidate are the most compelling candidates, ranking at the top of the lists of small molecules for their predicted targets (EDNRB, CHRNA4, SLC6A3). Every drug was selected from the list of FDA-approved medications that target specific genes predicted by the classification model. We illustrate the pharmacological use of these drugs, and the level of clinical data as well as provide a literature-based explanation of their potency.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDrugs for repurposing for dementia treatment\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFDA-approved drug name\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGene name\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eClass predicted by model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGene/drug connection based on literature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClinical Trials phase completed\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBosentan\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eEDNRB\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBosentan is a EDNRB blocker. Literature data demonstrate that both Bosentan and selective EDNRA blockers effectively prevent the rise in pulmonary artery pressure induced by hypoxia, aiding in the restoration of oxygen saturation (\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4 - Pulmonary Arterial Hypertension (PAH)\u003c/p\u003e \u003cp\u003e4 - Eisenmenger's Syndrome\u003c/p\u003e \u003cp\u003e4 - Type 2 Diabetes Mellitus\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMecamylamine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eCHRNA4\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSUD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMecamylamine, a nicotine antagonist, is utilized for the treatment of moderate to severe essential hypertension and uncomplicated malignant hypertension.\u003c/p\u003e \u003cp\u003eThe relationship between CHRNA4 and mecamylamine stems from their interaction within the cholinergic neurotransmission pathway mediated by nAChRs. Regarding the functional properties of α4β2-nAChR, distinctive characteristics include heightened sensitivities to antagonism by mecamylamine or DHβE, as well as heightened sensitivities to agonist action of EBDN or nicotine (\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e3 - Alcohol Dependency / Depression\u003c/p\u003e \u003cp\u003e2 - Age - Related Macular Degeneration (AMD)\u003c/p\u003e \u003cp\u003e2 - Depression / Major Depressive Disorder (MDD)\u003c/p\u003e \u003cp\u003e2 Diabetic Macular Edema (DME)\u003c/p\u003e \u003cp\u003e2 - Smoking\u003c/p\u003e \u003cp\u003e1 - Autism Disorder / Pervasive Development Disorder\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMethylphenidate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eSLC6A3\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSUD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMethylphenidate (MPH) is a catecholamine reuptake inhibitor; it improves cognitive performance of impaired patients by indirectly increasing extracellular dopamine (DA) levels in both striatal and extrastriatal regions through the blockade of dopamine transporters (DAT), while also blocking noradrenaline transporters (\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4 - Attention Deficit Hyperactivity Disorder (ADHD)\u003c/p\u003e \u003cp\u003e4 - Neurofibromatosis, type 1 (von Recklinghausen's disease)\u003c/p\u003e \u003cp\u003e4 - Anxiety Disorders\u003c/p\u003e \u003cp\u003e4 - Cocaine Related Disorders / Substance Related Disorders\u003c/p\u003e \u003cp\u003e4 - Dementia of the Alzheimer's Type\u003c/p\u003e \u003cp\u003e4- Alzheimer's Disease (AD) / Dementia\u003c/p\u003e \u003cp\u003e4 - Depression\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTretinoin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eCD33\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eArthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTretinoin promptly elicited an inflammatory tumor microenvironment dominated by interferon, marked by heightened infiltration of CD8\u0026thinsp;+\u0026thinsp;T cells (\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4 - Acne\u003c/p\u003e \u003cp\u003e4 - Childhood Acute Promyelocytic Leukemia\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eImatinib\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eGAB2\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eArthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eImatinib induces unique alterations in the phosphorylation state and interactome of Gab2 (\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4 - Breast Cancer\u003c/p\u003e \u003cp\u003e4 - Chronic Myeloid Leukemia (CML)\u003c/p\u003e \u003cp\u003e4 - Gastrointestinal Stromal Tumor (GIST)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHydralazine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eRABGEF1\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eArthritis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRABGEF1, guanine nucleotide exchange factor (GEF), activates Rab GTPases by promoting the exchange of GDP for GTP on these small GTP-binding proteins playing an important role in vesicle-mediated transport.\u003c/p\u003e \u003cp\u003eWhile hydralazine primarily acts on blood vessels to lower blood pressure, it may indirectly influence cellular processes like vesicle-mediated transport through its effects on cell signaling and metabolism (\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e).\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4 - Hypertension / Type 2 Diabetes Mellitus\u003c/p\u003e \u003cp\u003e4 - Congestive Heart Failure (CHF)\u003c/p\u003e \u003cp\u003e3 - Atherosclerosis\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFinally, we removed specific genes within the top 20% for each disease and merged them into a unified list of common genes, totaling 1616 items (\u003cb\u003eAdditional table \u003cspan refid=\"MOESM5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. Common genes and their pathways\u003c/b\u003e). Excluding specific information allowed for the identification of general biological processes that can be applicable across various diseases or conditions, such as protein binding, catalytic activity, response to stimuli, biological regulation, cell communication, membrane involvement, cytoplasmic activities, and more. As of now, this list may not serve as a basis for future target selection, but it effectively illustrates the shared nature of diseases.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eOver the last decade, various literature-mining methods were introduced for biological analysis. AI technology provides researchers with an opportunity to perform experiments with biomedical entity normalization applied to multiple datasets (\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e). This facilitates the identification of intricate gene citations in scientific articles and books (\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e) and aids drug repurposing efforts (\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e). Previously we introduced literature mining methods (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e) and demonstrated their potential application in many areas including the introduction of a new use for existing approved drug therapies (\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e). As a result of this project, a list of potential drugs for dementia treatment was extracted by AGATHA and advanced statistical analysis. The method identified hidden connections and pathways related to different diseases and neurodegeneration specifically. AGATHA-calculated variables for 122 diseases were separated into seven classes to calculate the PLSDA classification model. Initial discrimination showed that Dementia and SUD were separated from the rest of the group, agglomerating well-defined clusters when other diseases stay in a uniform cloud. However, when these two classes are excluded from the dataset, the rest of the diseases separate without any overlap (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). This illustrates the potential of the method to be applied for future projects studying other diseases or gene combinations.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn the next step 20889 genes were classified by the PLSDA model, which revealed different distribution patterns among the classes (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). It appears that in most cases Dementia genes are present at the bottom of the probability scale. It was noted that certain Dementia genes within the top 20% have a probability of being associated with SUD, with only three showing a connection to Diabetes. These genes are not prevalent in the top tier of the other four classes (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). As a result, a close look at the genes on top of the Dementia class showed elements of neurodegeneration as well as substance abuse. A total of 1079 genes (top 20%, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, \u003cb\u003eDementia plot\u003c/b\u003e) were subjected to the pathway analysis which proved their belonging to that class, since they play crucial roles in numerous vital biological processes. Notably, they are involved in the Glutamatergic synapse pathway, which contributes to ensuring proper brain function. Disturbances in glutamate transmission or the improper regulation of glutamate receptors have been linked to various neurological disorders, such as epilepsy, Alzheimer's disease, and schizophrenia. On the other hand, it has been shown that changes in metaplasticity of glutamatergic synapses play a significant role in the development of chronic SUD (\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e). In addition, it is known that tryptophan metabolism can have implications in the context of substance abuse due to its role in the production of neurotransmitters, including serotonin (\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e), which was shown in studies of patients with alcohol use disorder (\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e). The same pathway leads to development of Alzheimer\u0026rsquo;s disease due to the inhibition of various enzymes responsible for the biosynthesis of β-amyloid (\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e). Thus, the genes that have a higher probability of being associated with Dementia can serve as potential targets for future drug repurposing due to their shared nature between SUD and Dementia based on the discovered pathways. As the result of mapping statistically allocated Dementia genes in the remaining classes, we obtained a list of genes highly ranked in other diseases.\u003c/p\u003e \u003cp\u003eA selection procedure was performed for the Diabetes, Arthritis, Heart conditions/diseases, Hypertension, Cancer, and SUD classes extracting the same number of genes as was performed for Dementia. Highly ranked genes in every group were then mapped in the Dementia class to evaluate their positions. This resulted in a separate list of genes that are not necessarily specific for any of the selected types of neurodegenerative disorders but have higher scores in general. Based on acquired information, the list of potential drugs for repurposing was created using GeneCards and CTD (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The suggested method enabled us to explore textual data from various angles. Apart from examining the interconnection of genes, it facilitates the identification of genes unique to each type of disease (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe exploration of genes common for all 122 groups revealed their tendency to be present in pathways included in many biological processes simultaneously, proving the accuracy of the proposed method. The pathways disclosed in this list have a wide range of meanings and can be attributed to many processes or disorders. These similarities could be potentially used in future steps of the research project to discover new hidden connections. To summarize, the combination of the literature-mining method AGATHA, coupled with advanced statistical analysis allowed for the separation of the different lists of genes: Dementia genes, highly ranked in other disease classes, Disease genes, highly ranked in Dementia class, genes specific for every disorder, genes common for all diseases. This information was used for the selection of potential drugs for repurposing and has the potential of being used for future experiments involving finding new common pathways, selecting specific genes within the same group of diseases, or creating a robust automatic prediction method for the different inquiries.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eWe developed an AI-based literature mining tool AGATHA and proposed its novel use to discover drugs with the potential for repurposing in the context of neurocognitive disorders. The accomplished a primary objective of identifying hidden connections between approved medications and specific health conditions through advanced statistical analysis, including techniques like PLSDA and unsupervised clustering. The methodology involved grouping scientific terms related to different health conditions and genes, followed by building discrimination models to extract lists of disease-specific genes. These genes were explored through pathway analysis to select candidates for drug repurposing. As a result, we selected six main drugs for the subsequent bench study: Bosentan, Mecamylamine, Methylphenidate, Tretinoin, Imatinib, and Hydralazine.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003e\u003cstrong\u003eAI:\u003c/strong\u003e Artificial Intelligence\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAGATHA:\u003c/strong\u003e Automatic Graph Mining And Transformer based Hypothesis Generation Approach\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNLP:\u003c/strong\u003e Natural language processing\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRNNs:\u003c/strong\u003e recurrent neural networks\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCNNs:\u003c/strong\u003e convolutional neural networks\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePLSDA:\u003c/strong\u003e Partial Least Squares Discriminant Analysis\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCTD:\u003c/strong\u003e Comparative Toxicogenomics Database\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePCA:\u003c/strong\u003e Principal Component Analysis\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLV:\u003c/strong\u003e Latent Variable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHCDS:\u003c/strong\u003e Health Conditions Data Set\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSUD:\u003c/strong\u003e Substance Use Disorder\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eADME:\u003c/strong\u003e absorption, distribution, metabolism and excretion\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eEthics approval and consent to participate:\u003c/h2\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003ch2\u003eConsent for publication:\u003c/h2\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003ch2\u003e\u0026nbsp;Competing interests:\u003c/h2\u003e\n\u003cp\u003eThe authors have declared that no competing interests exist.\u003c/p\u003e\n\u003ch2\u003eFunding information:\u003c/h2\u003e\n\u003cp\u003eThis project was supported by awards from NIH R01DA054992 (MS, MDW).\u003c/p\u003e\n\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\n\u003cp\u003eIT and IS developed the AGATHA system. AS and IT performed all calculations and wrote the main manuscript. AS, IT, IS and MS conceptualized the project. All authors reviewed and approved the manuscript.\u003c/p\u003e\n\u003ch2\u003eAcknowledgement\u003c/h2\u003e\n\u003cp\u003eWe acknowledge the support of NIH for their financial assistance, and the College of Pharmacy for providing the necessary infrastructure and resources. Also, we would like to thank Dr. Vitali Sikirzhytski for the insightful feedback and suggestions, which significantly improved the quality of this manuscript.\u003c/p\u003e\n\u003ch2\u003eData Availability\u003c/h2\u003e\n\u003cp\u003eAGATHA is an open-source algorithm available at: https://github.com/IlyaTyagin/AGATHA-C-GP. All statistical models as well as data used for their calculation are available on Zenodo at link http://doi.org/10.5281/zenodo.11521211\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAmiri R, Razmara J, Parvizpour S, Izadkhah H. A novel efficient drug repurposing framework through drug-disease association data integration using convolutional neural networks. BMC Bioinformatics. 2023;24(1):442.\u003c/li\u003e\n\u003cli\u003eYou Y, Lai X, Pan Y, Zheng H, Vera J, Liu S, et al. Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther. 2022;7(1):156.\u003c/li\u003e\n\u003cli\u003eChen X, Zhang J, Zhao Q, Ding L, Wu Z, Jia Z, He D. Application and teaching of computer molecular simulation embedded technology and artificial intelligence in drug research and development. Open Life Sci. 2023;18(1):20220675.\u003c/li\u003e\n\u003cli\u003eHay M, Thomas DW, Craighead JL, Economides C, Rosenthal J. Clinical development success rates for investigational drugs. Nat Biotechnol. 2014;32(1):40-51.\u003c/li\u003e\n\u003cli\u003ePushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41-58.\u003c/li\u003e\n\u003cli\u003eNosengo N. Can you teach old drugs new tricks? Nature. 2016;534(7607):314-6.\u003c/li\u003e\n\u003cli\u003eSybrandt J, Shtutman M, Safro I. MOLIERE: Automatic Biomedical Hypothesis Generation System. KDD : proceedings International Conference on Knowledge Discovery \u0026amp; Data Mining. 2017;2017:1633-42.\u003c/li\u003e\n\u003cli\u003eSybrandt J, Tyagin I, Shtutman M, Safro I, editors. AGATHA: Automatic Graph Mining And Transformer based Hypothesis Generation Approach. Proceedings of the 29th ACM International Conference on Information \u0026amp; Knowledge Management; 2020.\u003c/li\u003e\n\u003cli\u003eExtance A. How AI technology can tame the scientific literature. Nature. 2018;561(7722):273-4.\u003c/li\u003e\n\u003cli\u003eZia A, Aziz M, Popa I, Khan SA, Hamedani AF, Asif AR. Artificial Intelligence-Based Medical Data Mining. J Pers Med. 2022;12(9).\u003c/li\u003e\n\u003cli\u003eDoughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics. 2011;27(3):408-15.\u003c/li\u003e\n\u003cli\u003eJarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform. 2020;12(1):46.\u003c/li\u003e\n\u003cli\u003eHua Y, Dai X, Xu Y, Xing G, Liu H, Lu T, et al. Drug repositioning: Progress and challenges in drug discovery for various diseases. Eur J Med Chem. 2022;234:114239.\u003c/li\u003e\n\u003cli\u003eGraham SA, Lee EE, Jeste DV, Van Patten R, Twamley EW, Nebeker C, et al. Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review. Psychiatry Res. 2020;284:112732.\u003c/li\u003e\n\u003cli\u003eMiller D, Stern A, Burstein D. Deciphering microbial gene function using natural language processing. Nat Commun. 2022;13(1):5731.\u003c/li\u003e\n\u003cli\u003eLandhuis E. Scientific literature: Information overload. Nature. 2016;535(7612):457-8.\u003c/li\u003e\n\u003cli\u003eMatthew Barker WR. Partial Least Squares for Discrimination. Journal of Chemometrics. 2003;17(3):166-73.\u003c/li\u003e\n\u003cli\u003eLee LC, Liong CY, Jemain AA. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. Analyst. 2018;143(15):3526-39.\u003c/li\u003e\n\u003cli\u003eBocklitz T. Richard G. Brereton: Chemometrics: data driven extraction for science, 2nd ed. Anal Bioanal Chem. 2019;411(14):2995-6.\u003c/li\u003e\n\u003cli\u003eDavenport F, Gallacher J, Kourtzi Z, Koychev I, Matthews PM, Oxtoby NP, et al. Neurodegenerative disease of the brain: a survey of interdisciplinary approaches. J R Soc Interface. 2023;20(198):20220406.\u003c/li\u003e\n\u003cli\u003eWHO. The top 10 causes of death 2020 [Available from: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death.\u003c/li\u003e\n\u003cli\u003eMcKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR, Jr., Kawas CH, et al. The diagnosis of dementia due to Alzheimer\u0026apos;s disease: recommendations from the National Institute on Aging-Alzheimer\u0026apos;s Association workgroups on diagnostic guidelines for Alzheimer\u0026apos;s disease. Alzheimers Dement. 2011;7(3):263-9.\u003c/li\u003e\n\u003cli\u003eKolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023;51(W1):W207-W12.\u003c/li\u003e\n\u003cli\u003eStelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics. 2016;54:1 30 1-1 3.\u003c/li\u003e\n\u003cli\u003eDavis AP, Murphy CG, Johnson R, Lay JM, Lennon-Hopkins K, Saraceni-Richards C, et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 2013;41(Database issue):D1104-14.\u003c/li\u003e\n\u003cli\u003eHotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 1933;24(6):417-41.\u003c/li\u003e\n\u003cli\u003eHierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association. 1963;58(301):236-44.\u003c/li\u003e\n\u003cli\u003eLindberg C. The Unified Medical Language System (UMLS) of the National Library of Medicine. J Am Med Rec Assoc. 1990;61(5):40-2.\u003c/li\u003e\n\u003cli\u003eSystem UML. Diseases Database Source Information U.S. National Library of Medicine2010 [Available from: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/DDB/index.html.\u003c/li\u003e\n\u003cli\u003ePietzner M, Wheeler E, Carrasco-Zanini J, Cortes A, Koprulu M, Worheide MA, et al. Mapping the proteo-genomic convergence of human diseases. Science. 2021;374(6569):eabj1541.\u003c/li\u003e\n\u003cli\u003eFrijters R, van Vugt M, Smeets R, van Schaik R, de Vlieg J, Alkema W. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Comput Biol. 2010;6(9).\u003c/li\u003e\n\u003cli\u003eBenjamin EJ, Virani SS, Callaway CW, Chamberlain AM, Chang AR, Cheng S, et al. Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association. Circulation. 2018;137(12):e67-e492.\u003c/li\u003e\n\u003cli\u003eKim ST, Chu Y, Misoi M, Suarez-Almazor ME, Tayar JH, Lu H, et al. Distinct molecular and immune hallmarks of inflammatory arthritis induced by immune checkpoint inhibitors for cancer therapy. Nat Commun. 2022;13(1):1970.\u003c/li\u003e\n\u003cli\u003eCybulla E, Vindigni A. Leveraging the replication stress response to optimize cancer therapy. Nat Rev Cancer. 2023;23(1):6-24.\u003c/li\u003e\n\u003cli\u003eGreenhough LA, Liang CC, Belan O, Kunzelmann S, Maslen S, Rodrigo-Brenni MC, et al. Structure and function of the RAD51B-RAD51C-RAD51D-XRCC2 tumour suppressor. Nature. 2023;619(7970):650-7.\u003c/li\u003e\n\u003cli\u003eWishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):D668-72.\u003c/li\u003e\n\u003cli\u003eStobdan T, Zhou D, Williams AT, Cabrales P, Haddad GG. Cardiac-specific knockout and pharmacological inhibition of Endothelin receptor type B lead to cardiac resistance to extreme hypoxia. J Mol Med (Berl). 2018;96(9):975-82.\u003c/li\u003e\n\u003cli\u003eEaton JB, Peng JH, Schroeder KM, George AA, Fryer JD, Krishnan C, et al. Characterization of human alpha 4 beta 2-nicotinic acetylcholine receptors stably and heterologously expressed in native nicotinic receptor-null SH-EP1 human epithelial cells. Mol Pharmacol. 2003;64(6):1283-94.\u003c/li\u003e\n\u003cli\u003eKasparbauer AM, Rujescu D, Riedel M, Pogarell O, Costa A, Meindl T, et al. Methylphenidate effects on brain activity as a function of SLC6A3 genotype and striatal dopamine transporter availability. Neuropsychopharmacology. 2015;40(3):736-45.\u003c/li\u003e\n\u003cli\u003eTilsed CM, Casey TH, de Jong E, Bosco A, Zemek RM, Salmons J, et al. Retinoic Acid Induces an IFN-Driven Inflammatory Tumour Microenvironment, Sensitizing to Immune Checkpoint Therapy. Front Oncol. 2022;12:849793.\u003c/li\u003e\n\u003cli\u003eHalbach S, Rigbolt KT, Wohrle FU, Diedrich B, Gretzmeier C, Brummer T, Dengjel J. Alterations of Gab2 signalling complexes in imatinib and dasatinib treated chronic myeloid leukaemia cells. Cell Commun Signal. 2013;11(1):30.\u003c/li\u003e\n\u003cli\u003eArce C, Segura-Pacheco B, Perez-Cardenas E, Taja-Chayeb L, Candelaria M, Duennas-Gonzalez A. Hydralazine target: from blood vessels to the epigenome. J Transl Med. 2006;4:10.\u003c/li\u003e\n\u003cli\u003eJi Z, Wei Q, Xu H. BERT-based Ranking for Biomedical Entity Normalization. AMIA Jt Summits Transl Sci Proc. 2020;2020:269-77.\u003c/li\u003e\n\u003cli\u003eWei CH, Kao HY. Cross-species gene normalization by species inference. BMC Bioinformatics. 2011;12 Suppl 8(Suppl 8):S5.\u003c/li\u003e\n\u003cli\u003eZhou Y, Hou Y, Shen J, Huang Y, Martin W, Cheng F. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov. 2020;6:14.\u003c/li\u003e\n\u003cli\u003eFiscon G, Conte F, Farina L, Paci P. SAveRUNNER: A network-based algorithm for drug repurposing and its application to COVID-19. PLoS Comput Biol. 2021;17(2):e1008686.\u003c/li\u003e\n\u003cli\u003eSybrandt J, Shtutman M, Safro I. Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking. 2018 IEEE International Conference on Big Data (Big Data). 2018:1494-503.\u003c/li\u003e\n\u003cli\u003eAksenova M, Sybrandt J, Cui B, Sikirzhytski V, Ji H, Odhiambo D, et al. Inhibition of the Dead Box RNA Helicase 3 Prevents HIV-1 Tat and Cocaine-Induced Neurotoxicity by Targeting Microglia Activation. J Neuroimmune Pharmacol. 2019.\u003c/li\u003e\n\u003cli\u003eChiamulera C, Piva A, Abraham WC. Glutamate receptors and metaplasticity in addiction. Curr Opin Pharmacol. 2021;56:39-45.\u003c/li\u003e\n\u003cli\u003eBadawy AA. Tryptophan metabolism in alcoholism. Adv Exp Med Biol. 1999;467:265-74.\u003c/li\u003e\n\u003cli\u003ePetrakis IL, Buonopane A, O\u0026apos;Malley S, Cermik O, Trevisan L, Boutros NN, et al. The effect of tryptophan depletion on alcohol self-administration in non-treatment-seeking alcoholic individuals. Alcohol Clin Exp Res. 2002;26(7):969-75.\u003c/li\u003e\n\u003cli\u003eMechtcheriakov S, Gleissenthall GV, Geisler S, Arnhard K, Oberacher H, Schurr T, et al. Tryptophan-kynurenine metabolism during acute alcohol withdrawal in patients with alcohol use disorder: The role of immune activation. Alcohol Clin Exp Res. 2022;46(9):1648-56.\u003c/li\u003e\n\u003cli\u003eSavonije K, Weaver DF. The Role of Tryptophan Metabolism in Alzheimer\u0026apos;s Disease. Brain Sci. 2023;13(2).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Scheme ","content":"\u003cp\u003eScheme 1 is available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Artificial intelligence, text-mining, dementia, drug repurposing, statistical analysis, AGATHA, classification, pathway analysis","lastPublishedDoi":"10.21203/rs.3.rs-4750719/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4750719/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eNeurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing.\u003c/p\u003e","manuscriptTitle":"AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-17 02:05:55","doi":"10.21203/rs.3.rs-4750719/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"1db550c9-f1ff-49a3-8b0e-b3522098949e","owner":[],"postedDate":"August 17th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":35831357,"name":"Biological sciences/Drug discovery"},{"id":35831358,"name":"Biological sciences/Computational biology and bioinformatics/Data processing"},{"id":35831359,"name":"Biological sciences/Computational biology and bioinformatics/Literature mining"},{"id":35831360,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":35831361,"name":"Health sciences/Diseases/Neurological disorders/Dementia"}],"tags":[],"updatedAt":"2024-10-14T04:54:04+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-17 02:05:55","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4750719","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4750719","identity":"rs-4750719","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00