GETgene-AI: A Framework for Prioritizing Actionable Cancer Drug Targets

doi:10.1101/2025.01.21.634201

GETgene-AI: A Framework for Prioritizing Actionable Cancer Drug Targets

2025 · doi:10.1101/2025.01.21.634201

preprint OA: closed CC-BY-ND-4.0

📄 Open PDF Full text JSON View at publisher

Full text 100,471 characters · extracted from oa-pdf · 11 sections · click to expand

Abstract

7 Identifying actionable drug targets is a critical challenge in cancer research, where high-dimensional genomic 8 data and the complexity of tumor biology often hinder effective prioritization. To address this, we 9 developed GETgene-AI, a novel computational framework that integrates network-based prioritization, 10 machine learning, and automated literature analysis to identify and rank potential therapeutic targets. Central to 11 GETgene-AI is the G.E.T. strategy, which combines three data streams: mutational frequency (G List), 12 differential expression (E List), and known drug targets (T List). These components are iteratively refined and 13 ranked using the Biological Entity Expansion and Ranking Engine (BEERE) , leveraging protein-protein 14 interaction networks, functional annotations, and experimental evidence. Additionally, GETgene-AI 15 incorporates GPT-4o, an advanced large language model, to automate literature-based ranking, reducing 16 manual curation and increasing efficiency. 17 18 In this study, we applied GETgene-AI to pancreatic cancer as a case study. The framework successfully 19 identified high-priority targets such as PIK3CA and PRKCA, validated through experimental evidence and 20 clinical relevance. Benchmarking against GEO2R and STRING demonstrated GETgene-AI’s superior 21 performance, achieving higher precision, recall, and efficiency in prioritizing actionable targets. Moreover, the 22 framework mitigated false positives by deprioritizing genes lacking functional or clinical significance. 23 24 While demonstrated on pancreatic cancer, the modular design of GETgene-AI enables scalability across 25 diverse cancers and diseases. By integrating multi-omics datasets with advanced computational and AI-driven 26 approaches, GETgene-AI provides a versatile and robust platform for accelerating cancer drug discovery. This 27 framework bridges computational innovations with translational research to improve patient outcomes. 28 29

Introduction

30 Drug target discovery is a multifaceted process central to developing effective therapies 31 for a range of diseases (S.-F. Zhou and Zhong 2017; Lindsay 2003). Traditionally, it involves the 32 manual investigation of scientific literature and biomedical databases to identify biological 33 targets linked to disease mechanisms, followed by evaluations based on relevance, preclinical 34 and clinical success rates, and research popularity (Paananen and Fortino 2020; Y. Zhou et al. 35 2022; Trajanoska et al. 2023; Santos et al. 2017). While foundational, this approach often suffers 36 from inefficiencies, high failure rates, and uncertainties in translating early discoveries into 37 viable treatments (D. Sun et al. 2022; Singh et al. 2023). These challenges are particularly 38 pronounced in cancer, where the heterogeneity of the disease demands more precise and 39 individualized strategies (L. Zhu et al. 2021; Somarelli et al. 2019). Unlike other conditions, 40 cancers arise from diverse genetic and molecular abnormalities, requiring advanced techniques to 41 identify and prioritize therapeutic targets (Mroz and Rocco 2017). Cancer medications often 42 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 2 target rapidly dividing cells by disrupting DNA replication or cell division, as seen with 43 chemotherapeutic agents (Gu, Hickey, and Malkas 2023; Y. Sun et al. 2021). Additionally, some 44 therapies induce apoptosis in cancer cells by targeting pathways that regulate cell survival 45 (Sellers and Fisher 1999; Lim et al. 2019). 46 Significant innovations have been developed to streamline, reform, and reduce the cost of 47 drug target discovery, primarily through in silico assessments (Sadybekov and Katritch 2023). 48 These assessments utilize neural networks, genomic datasets, and machine learning algorithms to 49 predict key genes of interest (Serrano Nájera, Narganes Carlón, and Crowther 2021; Yue et al. 50 2017; Muhammad et al. 2014). These in silico tools offer new methods of analysis, uncovering 51 novel insights and advancing the identification of potential therapeutic targets (Sliwoski et al. 52 2014; Huan, Wu, and Chen 2010). The standard in silico framework for drug target discovery 53 typically begins with the construction of protein-protein interaction (PPI) networks, where 54 potential targets are ranked based on their network connectivity and centrality (Odongo, 55 Demiroglu-Zergeroglu, and Çakır 2024; Y. Chen and Xu 2016). Network centrality approaches 56 to drug target discovery are commonly referred to as network-based prioritization. By leveraging 57 network-based prioritization, researchers can efficiently analyze and interpret vast genomic 58 datasets to identify critical regulatory genes implicated in cancer development (Chang et al. 59 2021; Sonehara and Okada 2021). The ability to efficiently process genomic information and 60 derive meaningful insights is pivotal for identifying relevant drug targets, underscoring the 61 importance of network-based prioritization (J. Y. Chen, Piquette-Miller, and Smith 2013; Huang, 62 Li, and Chen 2009). Network-based approaches are particularly favored for this purpose, as they 63 utilize established biological networks to identify genes associated with specific diseases (Shim, 64 Hwang, and Lee 2015; Huang et al. 2012). These approaches prioritize disease-related genes by 65 integrating data from PPI networks and known gene-drug associations (Mohsen et al. 2021; 66 Zhang et al. 2021). These approaches are also able to be easily visualized (Huan, Wu, and Chen 67 2010). However, traditional network-based approaches often fail to incorporate crucial genomic 68 information, such as protein expression across tissues, gene mutation frequencies, and 69 differential gene expression profiles, which limits their utility (Petti et al. 2020). To address these 70 limitations, supplemental in silico methods have been developed to achieve a more 71 comprehensive and nuanced analysis of drug targets (Nitsch et al. 2010). 72 Differential gene expression is a critical method for identifying genes significantly altered 73 between conditions, such as cancerous versus normal tissues (Bai et al. 2013; Van de Sande et al. 74 2023). A common approach to calculate differential expression is by measuring "fold change," 75 which represents the ratio of gene expression levels between these conditions (Love, Huber, and 76 Anders 2014; Mutch et al. 2002). However, the threshold value used to define fold change can 77 introduce arbitrariness into the prioritization process, potentially affecting the accuracy of target 78 identification (McCarthy and Smyth 2009). GEO2R, a widely used tool, employs fold change to 79 prioritize genes under experimental conditions, specifically comparing gene expression in 80 cancerous versus normal tissues (Barrett et al. 2013). In addition to fold change, frequency-based 81 prioritization methods focus on genes with higher mutational rates in disease contexts, as these 82 genes may serve as common therapeutic targets (Dinstag and Shamir 2020; López-Cortés et al. 83 2018). While valuable, statistical significance methods for gene prioritization can be prone to 84 bias, especially due to sample selection, which can skew results (Lazzeroni, Lu, and Belitskaya-85 Lévy 2014). To mitigate these biases, network centrality-based prioritization has been developed, 86 leveraging gene connectivity within biological networks. This method offers a more balanced 87 approach to gene selection, expands gene lists, and enhances disease association measures, 88 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 3 improving the identification of relevant therapeutic targets (Janyasupab, Suratanee, and Plaimas 89 2021; Magger et al. 2012). Large Language Models (LLMs), such as GPT-4, have emerged as 90 another transformative in silico approach to drug discovery (Liu et al. 2021; Oniani et al. 2024). 91 LLMs can predict essential information about gene targets, including structural domains of 92 proteins, protein structure, toxicity and adverse effects, functional significance, clinical and 93 preclinical relevance, and treatment efficacy (Sallam 2023; Tripathi et al. 2024). Furthermore, 94 GPT-4 has demonstrated the ability to rival human performance in conducting literature reviews, 95 thus streamlining a critical component of the drug target discovery process (Khraisha et al. 2024; 96 Li, Zhu, and Chen 2010). 97 98 To address challenges in cancer drug target discovery, we developed GETgene-AI, a 99 framework that combines network-based analysis with artificial intelligence to prioritize 100 actionable targets. Central to GETgene-AI is the G.E.T. strategy, which integrates three key data 101 streams: the G List (genes with genetic mutations, variations functionally implicated in genotype-102 to-phenotype association studies of the disease), the E List (disease target tissue-specific 103 expressions of the candidate gene), and the T List (established drug targets based on reports from 104 literature, patents, clinical trials, or existing approved drugs). Initial gene lists are derived from 105 multiple sources, including fold change, copy number alterations, and mutational rates across 106 various biological databases. To mitigate biases from incomplete data, GETgene-AI incorporates 107 diverse datasets, ensuring robust prioritization. The framework iteratively refines these lists 108 through the network-based tool BEERE, annotating genes with biological context to create a 109 high-quality, prioritized gene list. This iterative process expands and ranks gene candidates based 110 on biological annotations, enhancing the accuracy of target identification. By combining 111 traditional and in silico methods, GETgene-AI bridges gaps in drug discovery and facilitates the 112 development of personalized cancer therapies. Additionally, GPT-4o is integrated into the 113 process to improve literature review efficiency and further annotate the target list, enhancing the 114 overall workflow. 115 116 While pancreatic cancer serves as a case study in this framework, the underlying 117 methodology is adaptable to a wide range of cancers and diseases, provided that relevant 118 genomic and clinical data are available. This adaptability underscores the framework's potential 119 as a versatile and powerful tool for drug discovery across diverse disease contexts. By 120 incorporating comprehensive in silico analyses and leveraging high-throughput genomic data, the 121 framework can be applied to identify novel drug targets for various malignancies, thereby 122 accelerating the discovery of therapeutic options. Furthermore, the novel drug targets identified 123 through our case study in pancreatic cancer not only offer insights into the unique molecular 124 mechanisms driving this aggressive cancer but also present promising avenues for therapeutic 125 intervention. These targets have the potential to facilitate breakthroughs in pancreatic cancer 126 treatment, paving the way for the development of more effective, personalized therapies. Given 127 the significant unmet clinical need in pancreatic cancer, these findings are poised to contribute to 128 the development of targeted therapies that could improve patient outcomes and survival rates. 129

Methods

130 Our framework, "GET" (Genes, Expression, Target), provides essential biological context 131 for gene prioritization through a comprehensive, multifaceted approach. This involves 132 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 4 assembling and prioritizing three distinct lists: the Gene list (G list), Expression list (E list), and 133 Target list (T list). The G list includes genes frequently mutated, functionally significant genes, 134 and those associated with established pathways. The E list identifies genes with the highest 135 levels of differential expression, while the T list encompasses genes recognized as significant 136 drug targets in the context of pancreatic cancer (PDAC). To generate these lists, disease-specific 137 gene data were collected from diverse genomic databases. An iterative process was employed 138 using a network-based prioritization tool to annotate each gene within biological contexts, 139 including differential expression, protein expression in critical organs, mutational frequency, 140 copy number alterations, and network-based ranking scores. Following the GRIPPs method for 141 compiling statistically relevant candidate genes and iterative network-based processing (Gong 142 and Chen 2023), modality-specific cutoffs ensured the inclusion of high-quality data in each list. 143 Each list was then prioritized and expanded using the BEERE network-ranking tool, selected for 144 its efficacy in refining gene lists and filtering low-quality data (Yue et al. 2019). Further 145 annotation categorized genes based on key factors relevant to drug target prioritization, such as 146 differential expression, mutation frequency, and copy number alterations. Candidate genes were 147 subsequently validated through literature review, supported by large language models (e.g., 148 ChatGPT-4). Using PDAC as a case study due to its poor prognosis (J.-X. Hu et al. 2021), our 149

Method

produced quantitative data and novel insights into potential drug targets. 150 Compiling the Gene list from Genetic Mutations 151 For the "GENE" component of our "GET" framework, we compiled three subsets: 152 PAGER-NC, COSMIC-MUT, and CBP-CNA-MUT. The initial "GENE" list was derived from 153 the PAGER (Yue et al. 2018; H. Huang et al. 2012; Yue et al. 2022), cBioPortal (de Bruijn et al. 154 2023), and COSMIC(Tate et al. 2019) databases. To mitigate potential sample biases and data 155 incompleteness (e.g., studies failing to detect specific genes), multiple datasets from the same 156 databases were utilized where possible. Genes associated with the term "Pancreatic Cancer" were 157 manually retrieved from these databases, and cutoffs were established to ensure the selection of 158 genes most relevant to pancreatic cancer. 159 PAGER was employed to incorporate a biological pathway perspective into gene ranking, 160 offering functional significance (Chowbina et al. 2009). From PAGER, 844 candidate genes 161 were heuristically selected based on an nCoCo score ranging from 5 to 100. The nCoCo score 162 reflects the biological relatedness of gene sets within PAGER (Yue et al. 2022; 2018; H. Huang 163 et al. 2012). A lower cutoff of 5 was chosen because literature review indicated that functional 164 groups below this threshold were not relevant to any disease or function, while a score above 100 165 was associated with major roles in other significant biological processes. For the cBioPortal and 166 COSMIC cancer databases, cutoffs were determined based on the point at which mutational 167 frequency no longer demonstrated significance to cancer in the literature. From cBioPortal, 1,000 168 genes were retrieved using a cutoff of 8.2% for copy number alterations (CNA) and 2.8% for 169 mutational frequency. From COSMIC, 649 genes were compiled using a heuristic cutoff of 20% 170 mutational frequency. Candidate genes from cBioPortal, PAGER, and COSMIC were then 171 combined to form the "G list," which comprised a total of 2,493 genes. 172 The BEERE tool employs an initial ranking algorithm and two iterative ranking 173 algorithms—PageRank and an ant-colony algorithm—both of which have demonstrated success 174 across diverse knowledge domains (Yue et al. 2019). BEERE expands the gene list using the 175 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 5 nearest-neighbor network constructed from protein-protein interactions in the HAPPI 2.0 176 database (X. Wu et al. 2012; Jake Yue Chen, Mamidipalli, and Huan 2009; Jake Y. Chen, 177 Pandey, and Nguyen 2017). Following the GRIPPs method (Gong and Chen 2023) the "G list" 178 was iteratively prioritized and expanded through BEERE, resulting in the "processed G list." 179 180 181 182 Figure 1: General overview of the GET list compilation and ranking process. A:Initial 183 gene lists from each of the three subsets are compiled. 2493 genes are compiled in the initial G 184 list, 2000 genes are compiled in the initial E list, and 131 genes are compiled in the initial T list. 185 B: Each list is prioritized using the BEERE network ranking and expansion tool, before being 186 annotated with genomic information and GPT4o literature review. C: A weighted score is 187 calculated to rank the list and genes are manually validated through literature review. 188 Compiling candidate genes for the “Expression” 189 subset 190 5 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 6 Candidate genes were identified through analysis performed using GEO2R on the GEO dataset 191 GSE28735, titled "Pancreatic ductal adenocarcinoma tumor and adjacent non-tumor tissue" (G. 192 Zhang et al. 2012; 2013). Samples were divided into tumor and non-tumor groups using the 193 "Define groups" feature, with "human pancreatic tumor tissue patient sample" designated as the 194 tumor group and "human pancreatic nontumor patient sample" as the non-tumor group. The 195 dataset included a total of 90 patient samples, evenly divided into 45 tumor samples and 45 non-196 tumor samples. Differentially expressed genes were analyzed using the "analyze" function in 197 GEO2R. The top 2,000 genes with the highest differential expression were compiled into the 198 initial "E list." This list was subsequently iteratively processed using the BEERE software, 199 following the GRIPPs method. 200 201 Compiling candidate genes for the “Target” subset 202 Finally, 131 genes were identified from DrugBank, a comprehensive database containing 203 information on various drugs. Genes were extracted by searching for "Pancreatic Cancer," 204 "Pancreatic Ductal Adenocarcinoma," and "Neuroendocrine Pancreatic Cancer" within the drugs 205 section of DrugBank. Drugs marked as being used for the treatment of pancreatic cancer were 206 identified by reviewing their descriptions in the Summary, Background, Indication, Associated 207 Conditions, or Clinical Trial categories. Each drug was manually reviewed for its mechanism of 208 action, summary, and background to determine whether it was directly used for treating 209 pancreatic cancer or for supportive purposes, such as chemotherapy relief, pain management, or 210 sedation. Drugs meeting these criteria had all their associated gene targets listed under the 211 "Targets" section of DrugBank compiled. Through this process, 131 unique genes were 212 identified. 213 214 Iterative ranking and generation of GET lists 215 We refined the candidate gene lists through an iterative process employing the BEERE 216 tool for gene prioritization and expansion. A customized pipeline was developed, building upon 217 the framework established by the GRIPPs method (Gong and Chen 2023). Using the previously 218 compiled G, E, and T lists, we integrated the data into three distinct categories for gene 219 prioritization. The first, the GET list, provided a comprehensive ranking by combining all three 220 categories—mutational frequency, differential expression, and established drug targets. The 221 second, the GT list, focused on genes that are both frequently mutated and recognized as 222 established drug targets, offering insights into genes with high mutation frequencies and 223 functional relevance in pancreatic cancer. Lastly, the prioritized E list was independently ranked 224 using BEERE to specifically assess and prioritize genes based solely on their differential 225 expression. This systematic approach ensured a robust and multifaceted analysis of candidate 226 genes, enhancing the identification of potential therapeutic targets. 227 Each list underwent a standardized processing workflow. Initially, BEERE was employed 228 to expand each list by incorporating genes from the nearest-neighbor network of protein-protein 229 interactions derived from the HAPPI 2.0 database. This expansion enhanced the 230 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 7 comprehensiveness and quality of the lists. Following gene expansion, BEERE’s network and 231 significance ranking algorithms were applied to prioritize the lists, generating statistically 232 significant rankings of candidate genes. Subsequently, each list was heuristically filtered to retain 233 only the top 500 genes before undergoing additional rounds of refinement. This iterative process 234 was repeated three times, as further iterations led to convergence of the lists into a single pool, 235 diminishing their differentiation. At three iterations, the lists retained distinct gene rankings and 236 compositions. The refined lists were then integrated to create the “Initial GET List,” which 237 underwent further prioritization and expansion via BEERE to produce the “Final GET List.” 238 Additionally, two derivative lists were constructed: the “GT List,” which combined the Genes 239 and Targets lists followed by BEERE-based expansion and prioritization, and the “Expression 240 List,” which consisted of the refined E list used for further analyses. 241 Annotation of GET list: Assessment of candidate genes 242 based on clinical trials 243 Clinical trials play a crucial role in evaluating the efficacy of targeting specific genes. To assess 244 the clinical relevance of each gene, clinical trial frequency was used as a metric for clinical trial 245 popularity. Genes targeted by drugs cited in pancreatic cancer treatment were manually compiled. 246 This process involved querying the term “Pancreatic Cancer” on ClinicalTrials.gov and collecting 247 all drugs listed for clinical pancreatic cancer treatment. The corresponding target genes for these 248 drugs were identified by querying the "targets" section in DrugBank. In total, 357 drugs targeting 249 253 genes were compiled. These genes were then annotated with BEERE scores generated 250 through the previously described methodology. Additionally, raw, quantifiable genomic data 251 were incorporated into the analysis. This included mutation frequency data sourced from 252 cBioPortal (de Bruijn et al. 2023) and protein expression data from ProteinAtlas (Uhlén et al. 253 2015), covering key tissues such as the brain, gastrointestinal system, liver, and kidney to account 254 for potential adverse effects of gene inhibition or activation. 255 GPT-4o aided literature assessment 256 Recent research has demonstrated that GPT-4o performs "human-like" literature reviews, 257 particularly in screening and analyzing scientific literature (Khraisha et al. 2024). For this study, 258 abstracts related to pancreatic cancer genes and treatments were downloaded using PubMed's 259 "save" feature. A total of 5,091 abstracts were collected and uploaded for analysis by GPT-4o 260 through a custom GPTo interface. Due to the data processing limitations of GPT-4o, abstracts 261 were filtered to include only meta-analyses, clinical trials, and systematic reviews on PubMed to 262 ensure high-quality input data. 263 The custom GPTo model was configured with specific instructions to rank genes based 264 on a scoring system with a maximum score of 400 points, distributed across four categories: 265 functional significance in pancreatic cancer, research popularity, treatment effectiveness when 266 targeting or inhibiting the gene, and protein structure. Each category was allocated 100 points, 267 and the resulting metric was termed the GPT-4 score. To mitigate GPT-4o's known issue of 268 "hallucination" or the generation of inaccurate or nonexistent information, the model was 269 explicitly instructed to base its rankings solely on the uploaded research database. Additionally, 270 the model was required to cite articles referenced during the ranking process and provide 271 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 8 explanations for the scores assigned to each gene in every category. GPT-4 outputs were 272 manually verified against curated datasets to ensure biological relevance and mitigate 273 hallucinations. Citations provided by GPT-4 were cross-referenced with PubMed to confirm 274 validity. All cited articles were manually verified, and any errors or hallucinations were 275 addressed by instructing the model to re-search the uploaded literature database for accurate 276 mentions of the gene. 277 Comprehensive gene ranking 278 Following the ranking of the GET list and the compilation of genes targeted in clinical 279 trials, genes were further annotated with relevant biological information. Mutational frequency, a 280 key category in gene ontology ranking(Timar and Kashofer 2020) ,was assessed alongside Copy 281 Number Alterations (CNA), another critical category in ontology(Beroukhim et al. 2010) . Data 282 for these categories were obtained from CBioPortal [30] using the “Pancreatic Cancer (UTSW, 283 Nat Commun 2015)” and “Pancreatic Adenocarcinoma (TCGA, PanCancer Atlas)” studies, both 284 of which utilized whole-exome sequencing for all samples.Tissue-specific expression was also 285 considered a vital factor in gene prioritization (Beroukhim et al. 2010). Genes with high 286 expression in essential tissues—such as the heart, liver, gastrointestinal system, brain, and 287 kidneys—pose a higher risk of adverse effects when targeted, necessitating their de-288 prioritization. Annotation of tissue expression was performed using the “RNA expression score” 289 provided by ProteinAtlas (Uhlén et al. 2015), a comprehensive database mapping protein 290 expression in various organs. This RNA expression score, manually calculated, measures the 291 RNA expression levels of genes across different tissues. 292 A weighted score, termed the RP score, was developed to integrate multiple factors into the 293 prioritization process. Spearman correlations were calculated between CNA, mutational 294 frequencies, GET list scores, tissue expression levels, E list scores, GT list scores, and clinical 295 trial popularity. Clinical trial popularity was defined as the number of trials testing drugs 296 targeting specific genes for cancer treatment. This RP score provided a comprehensive and 297 robust metric for ranking genes as potential therapeutic targets. 298 In Table 1, we observe the weights of each modality of ranking drug targets ordered 299 from greatest to least. 300 Modality of ranking Weighted Score GT LIST Score 0.329 CNA(CBIOPORTAL UTSW NAT COMMUN 2015) 0.201 Expression LIST Score 0.088 GET LIST Score 0.085 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 9 301 Table 1: Weights each modality was assigned for calculation of the RP score in GETgene-AI. 302 Mitigation of Bias and False Positives 303 To address potential sample biases and data incompleteness—such as studies failing to detect 304 specific genes—multiple datasets from the same databases were utilized wherever possible. This 305 redundancy ensured a more comprehensive analysis and minimized the impact of dataset-306 specific variability. For example, multiple studies within CBioPortal, such as “Pancreatic Cancer 307 (UTSW, Nat Commun 2015)” and “Pancreatic Adenocarcinoma (TCGA, PanCancer Atlas),” 308 were analyzed concurrently to increase the reliability of mutational frequency and CNA data. 309 To further enhance the accuracy of the prioritization process, each gene within the top 250 310 ranked by RP score was manually verified through a literature review to confirm its role in 311 cancer biology. This step was critical in identifying and eliminating false positives. Notably, no 312 genes within the top 250 were found to be false positives, validating the robustness of the RP 313 scoring methodology. 314 Additionally, hallucination errors from GPT-4o were mitigated through a structured training 315 approach. The model was instructed to explicitly cite sources used in the calculation of each 316 gene’s ranking score. These citations were manually evaluated for accuracy and relevance, 317 ensuring that the ranking process was grounded in verifiable scientific evidence. This dual-318 layered validation—automated scoring combined with manual review—was integral to 319 maintaining the integrity and reliability of the gene prioritization framework. 320 Statistical Methods 321 Mutation frequency(cBioporta lTCGA PanCancerAtlas) 0.079 CNA(CBIOPORTAL TCGA PANCANCERATLAS) 0.048 Mutation frequency(Cbioportal UTSW Nat Commun 2015) -0.023 Brain Expression Score -0.054 Kidney Expression Score -0.081 Gastrointestinal Expression Score -0.095 Liver Expression Score -0.101 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 10 Spearman correlation coefficients were computed to assess the alignment of GPT-4o 322 rankings with network-derived rankings. The Spearman correlation between the GPT-4 score 323 and the Weighted Score was 0.291, indicating some significance. Interestingly, GPT-4 score is 324 more strongly correlated with all BEERE list ranking scores, with 0.478 between GPT-4 score 325 and Expression list score, 0.457 between GPT-4 score and Combined weighted score of all 326 BEERE lists, 0.454 correlations between GPT-4 score and GET list score, and 0.444 between 327 GPT-4 score and GT list score. These results indicate that the GPT-4 score is more similar to that 328 of standard network prioritization techniques, which may be a result of the training data utilized. 329 Comparing research relevance to rank on GETgene-AI 330 To compare the popularity to the rankings of each gene in both the GPT-4 Score and the 331 RP scores, the amount of results contained on PubMed when searching “Gene name Pancreatic 332 Cancer” were compiled and used for the GPT-LIT score, and the RP-LIT score. The GPT-LIT 333 score is the GPT4-score divided by the amount of publications on PubMed, while the RP-lit 334 score is the RP-score divided by the amount of publications on PubMed. Genes with no 335 functional relationship to cancer in any way were excluded from the rankings to remove false 336 positives. 337

Results

338 Enhancement provided by AI 339 GPT4o was employed to conduct a literature assessment for our gene list, though its 340 output was not included in the final weighted score. Nonetheless, the GPT score showed strong 341 correlations with both the weighted score and all three GET list scores. GPT-4o prioritized genes 342 such as MYC and SRC, reflecting their prominence in the literature, which complemented 343 GETgene-AI’s reliance on network mutational analysis. To minimize false positives in the GPT-344 4o scoring process, we instructed GPT-4o to cite articles directly from its database. Although 345 GPT-4o did not demonstrate a higher rate of experimental validation, it reduced the time required 346 for the literature review by 80%. The cited articles were subsequently manually verified. RP-LIT 347 score and GPT-score are highly correlated, with extremely similar rankings for each gene. Per 348 Spearman correlation calculation, GPT4o’s score out of 400 has a +0.457 correlation with the 349 weighted score, indicating significant correlation. 350 In Table 2, we observe the ranking differences between the GPT-4 score and the GET ranking 351 score. 352 Gene GPT4-Score ranking GET ranking Experimental Validation? Citation MYC 1 2 Yes (Huan Zhang et al. 2024) SRC 2 3 Yes (Su et al. 2023) EGFR 3 4 Yes (F. Wu et al. 2023) TERT 4 27 Yes (Campa et al. 2015) RRM2 5 21 Yes (Li et al. 2022) PIK3CA 6 1 Yes (Payne et al. 2015) TOP2A 7 16 Yes (Pei, Yin, and Liu 2018) .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 11 NTRK1 8 22 Yes (Cheng et al. 2013) PTGS2 9 25 Yes (Hingorani et al. 2003) EGF 10 30 Yes (Sheng et al. 2020) CDK1 11 5 Yes (J. Huang et al. 2021) MAPK1 12 10 Yes (Si et al. 2023) KRAS 13 13 Yes (Timar and Kashofer 2020) MTOR 14 11 Yes (Stanciu et al. 2022) MSLN 15 37 Yes (J. Hu et al. 2024) RET 16 28 Yes (Bhamidipati et al. 2023) AKT1 17 31 Yes (Arasanz et al. 2019) JAK2 18 9 Yes (B. Huang, Lang, and Li 2022) MET 19 34 Yes (Pothula et al. 2020) PDCD1 20 38 Yes (Marabelle et al. 2020) 353 Table 2: Top 20 highest ranked genes based off of GPT4 score compared to their ranks in 354 GET and their status as experimentally validated drug targets. 355 356 Comparing GETgene-AI to other frameworks 357 We benchmarked GETgene-AI against two other frameworks: one focused on differential 358 expression and the other on network-based prioritization. GEO2R was chosen for the differential 359 expression comparison, using the GSE28735 dataset incorporated into the 'Expression list' aspect 360 of our GET lists. Genes were sorted by log-fold change (log-fc), representing the difference in 361 gene expression between tumor and non-tumor groups. In the GEO2R list, the top-ranked genes 362 were PNLIPRP1 and PNLIPRP2, both pancreatic lipase-related proteins critical for digestion and 363 fat absorption (G. Zhu et al. 2021), but not considered viable targets for pancreatic cancer. The 364 third-highest gene, IAPP (Islet Amyloid Polypeptide), has been shown not to function as a tumor 365 suppressor, and loss of IAPP signaling is not linked to pancreatic cancer (Taylor et al. 2023). Of 366 the top 50 genes in GEO2R, 30 were experimentally validated for relevance to pancreatic cancer. 367 In comparison, GETgene-AI identified 49 experimentally validated targets in its top 50, 368 demonstrating a 38% improvement over GEO2R. GEO2R, lacking analysis of mutational 369 frequency, functional impact, network-based analysis, and adverse effects, falls short in drug 370 target discovery. In contrast, GETgene-AI benefits from statistical filtering and incorporation of 371 genomic information, improving both the efficiency and quality of genes in each list. 372 373 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 12 374 Figure 2: Volcano plot GSE28735: Microarray gene-expression profiles of 45 matching pairs of 375 tumor vs. nontumor, Padj<0.05. Blue indicates down regulated while red indicates upregulated. 376 377 For the network-based framework, we used STRING, a database that integrates protein-378 protein interactions (Szklarczyk et al. 2023), and focused on the KEGG pathway hsa0512 (M. 379 Kanehisa and Goto 2000; Minoru Kanehisa et al. 2025; Minoru Kanehisa 2019). To rank the 380 genes, we exported the list and sorted them based on node degree, which measures the number of 381 interactions a protein has within the network (Bozhilova et al. 2019). The highest-ranked gene in 382 the STRING list was AKT1, a protein kinase that stimulates cell growth and proliferation 383 (Grassilli et al. 2020). However, AKT1 has been found to resist inhibition by switching its 384 metabolism from glycolysis to mitochondrial cellular respiration (Arasanz et al. 2019), and it has 385 a low mutational frequency of just 1% in a sample of 19,784 patients with various tumors (Millis 386 et al. 2016). Due to its low mutational frequency and the challenges associated with inhibiting 387 AKT1, it was ranked 33rd in GETgene-AI. A literature review of the top 50 genes in the 388 STRING list revealed that 46 were experimentally validated for pancreatic cancer, whereas 389 GETgene-AI identified 49 experimentally validated genes in the top 50, demonstrating a 6% 390 improvement over STRING. STRING’s limitations, such as its failure to account for mutational 391 frequency and other important factors in drug target identification, result in a narrower focus, 392 with only 81 targets identified compared to GETgene-AI’s more comprehensive analysis. 393 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 13 394 Figure 3: Network constructed by STRING utilizing the KEGG pathway HG0512. Content 395 inside each node is known or predicted 3d structure of protein. Turquoise edges mean Protein-396 protein interactions from curated databases, purple means experimentally determined. Green, 397 red, and dark blue edges indicate predicted Protein-protein interactions. Light green edges 398 represent text mining, black represents co-expression, and light purple represents protein 399 homology. 400 401 402 13 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 14 In table 3 we observe the ranking overlap for the top 15 genes for all three frameworks. The top 403 15 highest ranked targets in both GETgene-AI and STRING have all been experimentally 404 validated within pancreatic cancer, but 8 of the highest ranking targets in the GEO2R approach 405 have not. 406 407 GETgene-AI top genes Experimentally validated? STRING top genes Experimentally validated? GEO2R top genes Experimentally validated? PIK3CA Yes AKT1 Yes PNLIPRP1 No MYC Yes TP53 Yes PNLIPRP2 No SRC Yes KRAS Yes IAPP No EGFR Yes PTEN Yes CTRC No CDK1 Yes SRC Yes GP2 Yes PRKCA Yes STAT3 Yes CEL No TNF Yes EGFR Yes CPA2 Yes LCK Yes MTOR Yes ALB Yes JAK2 Yes BCL2 Yes CUZD1 Yes MAPK1 Yes PIK3CA Yes ERP27 No MTOR Yes CDKN2A Yes CLPS Yes AURKB Yes HRAS Yes SERPINI2 Yes KRAS Yes CCND1 Yes PLA2G1B Yes MAPK8 Yes NFKB1 Yes CELA2A No TOP2A Yes CDKN1A Yes CELA2B No 408 Table 3: Top 15 genes from GETGENE-AI, STRING, and GEO2R and their status as 409 experimentally validated drug targets. 410 411 Figure 4: Bar graph displaying the percent of experimentally validated targets out of the top 50 412 genes with each framework. 413 0 20 40 60 80 100 120 GEO2R STRING GETgene-AI Percentage of Experimentally Validated targets (out of top 50) .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 15 414 GETgene-AI Rankings 415 416 417 Gene Weighted score CHAT GPT score GT list score Mutation Frequency (cBioportal TCGA PanCancer Atlas) RP-LIT score GPT-LIT score GET list score Expression list score PIK3CA 34.8 310 58.7 2.8 0.199 1.771 96 97 MYC 30.1 330 9.5 0.0 0.032 0.349 214 210 SRC 20.0 320 0.0 1.1 0.044 0.711 143 144 EGFR 18.2 320 2.4 0.6 0.010 0.171 134 133 CDK1 15.9 305 15.3 65.4 0.134 2.563 30 7 PRKCA 15.3 305 3.0 0.0 1.702 25.556 101 102 TNF 12.1 270 2.4 0.0 0.013 0.292 83 86 LCK 11.5 220 1.7 0.0 1.274 24.444 62 60 JAK2 10.6 285 1.0 0.6 0.082 2.192 67 67 MAPK1 10.3 305 11.6 3.4 0.139 4.122 7 7 AURKB 9.1 295 0.0 0.6 0.008 0.246 70 70 KRAS 8.7 220 1.7 1.7 0.335 8.462 48 47 MAPK8 7.8 295 0.0 0.0 0.002 0.068 121 117 MTOR 7.1 220 1.7 0.0 0.588 18.333 52 52 ITGA4 6.9 220 4.3 0.6 2.298 73.333 40 37 TOP2A 6.9 310 10.2 1.1 0.215 9.688 0 0 CHEK1 6.7 220 1.7 0.0 0.128 4.231 46 45 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 16 BCL2 6.2 220 1.7 0.6 0.012 0.418 41 41 PRKCB 6.0 250 1.4 0.6 1.004 41.667 60 58 ERBB4 5.5 220 3.4 0.6 0.184 7.333 81 83 Table 4: Highest 20 genes ranked on GETgene-AI. Weighted score is RP score, CHAT GPT 418 score is GPT4 score. 419 420 Validation of gene rankings 421 422 During the iterative ranking process, genes with no functional relevance to cancer were 423 systematically excluded. For example, genes ranked high due to algorithmic artifacts but without 424 experimental validation or literature support were deprioritized. 425 PIK3CA ranks as the highest gene on the list. It encodes for the enzyme PI3K, which 426 plays a crucial role in cell growth, metabolism, proliferation, and apoptosis(Conway et al. 2019). 427 In addition, PIK3CA regulates various downstream effectors such as AKT and mTOR(Ala 428 2022). Mutations in PIK3CA have been shown to make cancers highly sensitive to dual therapy 429 with PI3K/mTOR inhibitors in preclinical studies (Huayu Zhang et al. 2021), further 430 emphasizing its potential as a key drug target. Moreover, PIK3CA-null tumors have 431 demonstrated increased susceptibility to T-cell surveillance in vitro (Sivaram et al. 2019). In 432 pancreatic cancer, inhibition of PIK3CA has been shown to initiate tumorigenesis in 433 experimental studies (Payne et al. 2015), underscoring its importance in both cancer progression 434 and therapeutic targeting. 435 The next highest-ranked gene on the list, MYC, is an established target in pancreatic 436 cancer. MYC is highly ranked due to its top GET list score, which indicates its significant 437 network centrality among the top 500 most expressed, clinically relevant, and highest mutational 438 frequency genes. Overexpression of c-MYC is a key marker of aggressive pancreatic cancer, 439 where it binds to the promoters of various genes (Hayashi, Hong, and Iacobuzio-Donahue 2021). 440 Although MYC plays a crucial regulatory role in pancreatic cancer, its protein structure presents 441 challenges for targeting. However, recent small molecule drugs have shown high efficacy in 442 preclinical studies (Ala 2022). Despite these promising developments, MYC's inherent targeting 443 difficulties result in a relatively low GT list score. 444 SRC ranks as the third-highest gene on our list, owing to its high scores in both the GET 445 list and Expression list score modalities. Inhibition of SRC in pancreatic cancer has been shown 446 in both in vitro and in vivo studies to reverse chemoresistance to pyroptosis (Su et al. 2023). 447 Additionally, aberrant SRC activity promotes tumorigenesis and is frequently associated with 448 poor prognosis in pancreatic ductal adenocarcinoma (PDAC) (Poh and Ernst 2023). Several 449 SRC-targeting cancer drugs have been developed and are currently under clinical investigation 450 (Hilbig 2008). 451 EGFR is the fourth highest ranking gene. EGFR is ranked highly due to its high GET list 452 and Expression list scores. EGFR also plays a role in tumorigenesis, mostly in lung and breast 453 cancer (Sigismund, Avanzato, and Lanzetti 2018). Anti-EGFR agents showed significant clinical 454 promise, despite adverse effects (Verma et al. 2020). 455 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 17 KRAS ranks twelfth on our list, despite its prominence in research, with over 4,545 456 articles on KRAS mutations in pancreatic cancer available on PubMed. Its lower ranking is 457 attributed to a low expression score. The KRAS oncogene plays a critical role in the initiation 458 and maintenance of pancreatic tumors (Luo 2021). KRAS mutations are present in over 90% of 459 pancreatic ductal adenocarcinoma (PDAC) cases, but therapeutic inhibition remains highly 460 challenging, with inhibitors only recently being discovered (Bannoura et al. 2021). 461 CDK1 ranks fifth on our list, largely due to its high scores in both the GET and 462 Expression lists. CDK1 is strongly correlated with prognosis and is highly expressed in 463 pancreatic cancer tissue, as well as in response to gemcitabine, an approved pancreatic cancer 464 drug (Xu et al. 2023). Additionally, inhibition of CDK1, along with CDK2 and CDK5, has been 465 shown to overcome IFNG-triggered acquired resistance in pancreatic tumor immunity (J. Huang 466 et al. 2021). 467 PRKCA ranks seventh on our list. It encodes protein kinase C and is mutated in various 468 cancers. PRKCA’s high ranking is attributed to its strong GET and Expression list scores, as well 469 as its extremely low organ expression score. It is strongly associated with the activation of the 470 protein translation initiation pathway (Rosenberg et al. 2018) and is a hallmark mutation in 471 chordoid gliomas (Jiang et al. 2019). PRKCA also contributes to susceptibility to pancreatic 472 cancer through the peroxisome proliferator-activated receptor (PPAR) signaling pathway, which 473 plays a key role in the development and progression of pancreatic cancer (Liu et al. 2020). 474 Inhibition of PRKCA has shown antitumor activity in patients with advanced non-small cell lung 475 cancer (NCSLC) (Villalona-Calero et al. 2004). 476 TNF was the eighth highest ranked gene on our list. TNF or Tumor Necrosis Factor 477 upregulation is associated with invasion and immunomodulation of pancreatic cancer (Wiedmann 478 et al. 2023). TNF mutated macrophages have also been shown to influence cancer cells into 479 adopting more aggressive behaviors through lineage reprogramming (Tu et al. 2021). 480 LCK is the ninth highest ranked gene on our list. The gene LCK has been expressed in 481 tumor cells, and is a key gene in the development of T cells (Bommhardt, Schraven, and Simeoni 482 2019). High LCK protein expression has been associated with improved patient survival in 483 cancer (Cancer Genome Atlas Network 2015). The LCK gene in relation to pancreatic cancer 484 only has four publications on PubMed as of May 2024. The identification of LCK as a high 485 priority target demonstrates GETgene-AI’s capability to identify genes with strong biological 486 relevance but lower literature prominence. 487 ITGA4 is ranked 15th on our list. ITGA4 has an extremely low organ expression score. 488 ITGA4 has only had 4 articles on PubMed discussing its role in Pancreatic Cancer. ITGA4 has 489 potential to be an independent prognostic indicator for patient survival, and has been linked to 490 the PI3K/AKT pathway (Faleiro et al. 2021). The identification of ITGA4 as a high priority 491 target demonstrates GETgene-AI’s capability to identify genes with strong biological relevance 492 but lower literature prominence. 493 KCNA is the 34th highest ranked gene on our list. KCNA is significant, as it contains 494 zero pubmed publications describing its relation to Pancreatic cancer. Furthermore, only three 495 publications on PubMed have ever mentioned its relation to cancer in general. The identification 496 of KCNA as a high priority target demonstrates GETgene-AI’s capability to identify genes with 497 strong biological relevance but lower literature prominence. Differentially high KCNA 498 expression is observed in stomach and lung cancers, and has a positive correlation to infiltrated 499 immune cells and survival rate (Angi et al. 2023). 500 501 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 18 502 False Positives and Limitations 503 False positives are an inherent risk in large scale computational analyses. By 504 incorporating interative refinement and excluding genes without functional or experimental 505 support, the GETgene-AI framework minimizes such risks. Future validation efforts will focus on 506 further refining these rankings through experimental studies. The literature assessment provided 507 by generative AI will also be improved as AI progresses and our model is trained on more 508 experimental data to minimize hallucinations. 509 To mitigate false positives, genes with no functional relevance were systemically 510 excluded. For example, genes ranked high due to algorithmic artifacts but without experimental 511 validation or literature were deprioritized. Genes such as ITGA4 and PRKCB, which both 512 contain less than 10 research articles on PubMed on its role in Pancreatic cancer were ranked 513 lower than many popular targets due to their low scores in the GET, GT, and Expression lists as a 514

Result

of deprioritizing targets without experimental validation or literature each list before RP 515 score calculation. 516 This study is limited by the lack of experimental validation for the top-ranked targets. 517 Additionally, the reliance on publicly available datasets may introduce biases due to incomplete 518 annotations. 519 520 Broader Implications and Generalizability 521 The current study focuses on pancreatic cancer; however, the GETgene-AI framework 522 can be readily adapted to other cancers or diseases with similar genomic and clinical data 523 resources. Future studies will explore its application in breast and lung cancers by utilizing the 524 same process described. The GETgene-AI framework’s ability to implement literature review, 525 large scale sequencing data, and network centrality scores allow for a comprehensive method of 526 drug target prioritization. GETgene-AI is also easily scalable due to utilizing computational 527

Methods

of prioritization and elimination of statistically insignificant data. 528 529

Discussion

530 Through our application of GETgene-AI to pancreatic cancer, we have highlighted the 531 need to investigate potential drug targets PIK3CA, PRKCA, LCK, MAPK8, ITGA4, PRKCB and 532 KCNA1. GETgene-AI’s approach to drug target prioritization implements literature review, large 533 scale sequencing data, network based centrality scoring, and assessment of potential adverse 534 effects through organ expression scores. This implementation provides a scalable and 535 comprehensive approach towards drug target prioritization which can be replicated to other 536 cancers with similar data available. GETGENE-AI’s ability to deprioritize genes with low 537 mutational relevance demonstrates its superiority in narrowing down actionable targets 538 efficiently. 539 540 Contributions and Limitations Provided by GPT4o 541 GPT4o significantly enhanced the efficiency of literature-based ranking by automating 542

Abstract

reviews and prioritization. GPT4o increased the efficiency of literature review by over 543 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 19 80%. However, inherent challenges, such as hallucination, required manual verification to ensure 544 accuracy. While GPT4o adds value, its integration should be approached cautiously, with 545 safeguards to mitigate errors. Utilizing more experimental data to train GPT4o will also increase 546 its accuracy when performing prioritization in the future. 547 548 Future Directions 549 The current study does not explore non-cancer applications or other disease contexts. 550 Future studies will validate the framework in additional cancers, such as breast and lung, and 551 assess its applicability to neurodegenerative diseases such as Alzheimer’s and Parkinson’s. 552 Nonetheless, by integrating computational methods with large-scale genomic data, the 553 GETGENE-AI framework bridges critical gaps in drug discovery, enabling faster identification 554 of actionable targets and advancing personalized medicine. 555 Future work will focus on validating the top-ranked targets, such as PIK3CA and 556 PRKCA, using CRISPR-mediated knockouts in pancreatic cancer cell lines. Drug response 557 assays will be performed to assess the therapeutic relevance of these targets. 558 559 560

Conclusions

561 The GET framework represents a significant advancement in computational drug 562 discovery, integrating network-based prioritization with machine learning to identify actionable 563 therapeutic targets efficiently. Genes highlighted through our case study in pancreatic cancer 564 such as PRKCA, LCK, ITGA4, and PRKCB are novel targets that require further exploration. 565 While this study focuses on pancreatic cancer, the GETGENE-AI framework is adaptable to 566 other cancers and diseases, offering a modular and versatile approach for target discovery. 567 GPT4o enhanced the efficiency and accuracy of literature-based ranking, reducing manual 568 workload and aligning well with network-based rankings. However, its reliance on manual 569 verification underscores the need for cautious integration into automated pipelines. By refining 570 target discovery methods, the GETGENE-AI framework paves the way for personalized 571 therapeutic strategies and accelerates the translational research in oncology. Future work will 572 focus on expanding the framework to other cancers, improving ranking metrics, and integrating 573 multi-omnics datasets to enhance its predictive power. Future iterations of GETgene-AI aim to 574 integrate multi-omics datasets, such as single-cell RNA-seq and metabolomics, to capture greater 575 biological complexity. 576 Data availability 577 All files containing genes and drugs utilized in the GETGENE-AI process have been 578 uploaded to our Github repository https://github.com/alphamind-club/GETGENE-AI. “GEO2R 579 assessment”, “GETGENE-AIvsGET”, “GETvsGPT4vsSTRINGvsGEO2R”, and “STRING 580 Assessment” contain experimental validation assessments performed to compare the top 50 genes 581 from each approach. Other files are the initial lists of genes utilized to form each of the GET lists 582 before cutoffs were applied. ChatGPT’s GPT4o model was used for the GPT4o literature 583 assessment score, which requires a monthly subscription. The custom GPT4o model made for 584 this study is: Research GPT. ProteinAtlas was used to find the expression levels. 585 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 20 Acknowledgment 586 Both authors thank the administrative support of AlphaMind Club for making this 587 mentored research possible. JYC thanks the generous support of the startup fund to the Systems 588 Pharmacology AI Research Center at UAB and NIH common fund grant award U54-OD036472, 589 which partially supported this research. The authors acknowledge the use of ChatGPT in 590 improving the structure and readability of the manuscript. 591

References

592 Ala, Moein. 2022. “Target C-Myc to Treat Pancreatic Cancer.” Cancer Biology & Therapy 23 593 (1): 34–50. https://doi.org/10.1080/15384047.2021.2017223. 594 Angi, Beatrice, Silvia Muccioli, Ildikò Szabò, and Luigi Leanza. 2023. “A Meta-Analysis Study 595 to Infer Voltage-Gated K+ Channels Prognostic Value in Different Cancer Types.” 596 Antioxidants (Basel, Switzerland) 12 (3): 573. https://doi.org/10.3390/antiox12030573. 597 Arasanz, H., M. Zuazo, E. Santamaría, A. I. Bocanegra, M. Gato-Cañas, G. Fernández-Hinojal, 598 C. Hernández-Saez, et al. 2019. “Adaption of Pancreatic Cancer Cells to AKT1 Inhibition 599 Induces the Acquisition of Cancer Stem-Cell like Phenotype through Upregulation of 600 Mitochondrial Functions.” Annals of Oncology 30 (October):v11. 601 https://doi.org/10.1093/annonc/mdz238.036. 602 Bannoura, Sahar F., Md. Hafiz Uddin, Misako Nagasaka, Farzeen Fazili, Mohammed Najeeb Al-603 Hallak, Philip A. Philip, Bassel El-Rayes, and Asfar S. Azmi. 2021. “Targeting KRAS in 604 Pancreatic Cancer: New Drugs on the Horizon.” Cancer and Metastasis Reviews 40 (3): 605 819–35. https://doi.org/10.1007/s10555-021-09990-2. 606 Beroukhim, Rameen, Craig H. Mermel, Dale Porter, Guo Wei, Soumya Raychaudhuri, Jerry 607 Donovan, Jordi Barretina, et al. 2010. “The Landscape of Somatic Copy-Number 608 Alteration across Human Cancers.” Nature 463 (7283): 899–905. 609 https://doi.org/10.1038/nature08822. 610 Bhamidipati, Deepak, Sireesha Yedururi, Jason Huse, Sri Veda Chinapuvvula, Jie Wu, and 611 Vivek Subbiah. 2023. “Exceptional Responses to Selpercatinib in RET Fusion-Driven 612 Metastatic Pancreatic Cancer.” JCO Precision Oncology 7 (September):e2300252. 613 https://doi.org/10.1200/PO.23.00252. 614 Bozhilova, Lyuba V., Alan V. Whitmore, Jonny Wray, Gesine Reinert, and Charlotte M. Deane. 615 2019. “Measuring Rank Robustness in Scored Protein Interaction Networks.” BMC 616 Bioinformatics 20 (1): 446. https://doi.org/10.1186/s12859-019-3036-6. 617 Bruijn, Ino de, Ritika Kundra, Brooke Mastrogiacomo, Thinh Ngoc Tran, Luke Sikina, Tali 618 Mazor, Xiang Li, et al. 2023. “Analysis and Visualization of Longitudinal Genomic and 619 Clinical Data from the AACR Project GENIE Biopharma Collaborative in cBioPortal.” 620 Cancer Research 83 (23): 3861–67. https://doi.org/10.1158/0008-5472.CAN-23-0816. 621 Campa, Daniele, Cosmeri Rizzato, Rachael Stolzenberg-Solomon, Paola Pacetti, Pavel Vodicka, 622 Sean P. Cleary, Gabriele Capurso, et al. 2015. “TERT Gene Harbors Multiple Variants 623 Associated with Pancreatic Cancer Susceptibility.” International Journal of Cancer 137 624 (9): 2175–83. https://doi.org/10.1002/ijc.29590. 625 Cancer Genome Atlas Network. 2015. “Genomic Classification of Cutaneous Melanoma.” Cell 626 161 (7): 1681–96. https://doi.org/10.1016/j.cell.2015.05.044. 627 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 21 Chen, Jake Y., Ragini Pandey, and Thanh M. Nguyen. 2017. “HAPPI-2: A Comprehensive and 628 High-Quality Map of Human Annotated and Predicted Protein Interactions.” BMC 629 Genomics 18 (1): 182. https://doi.org/10.1186/s12864-017-3512-1. 630 Chen, Jake Yue, SudhaRani Mamidipalli, and Tianxiao Huan. 2009. “HAPPI: An Online 631 Database of Comprehensive Human Annotated and Predicted Protein Interactions.” BMC 632 Genomics 10 (1): S16. https://doi.org/10.1186/1471-2164-10-S1-S16. 633 Cheng, Yao, Dong-mei Diao, Hao Zhang, Yong-Chun Song, and Cheng-Xue Dang. 2013. 634 “Proliferation Enhanced by NGF-NTRK1 Signaling Makes Pancreatic Cancer Cells More 635 Sensitive to 2DG-Induced Apoptosis.” International Journal of Medical Sciences 10 (5): 636 634–40. https://doi.org/10.7150/ijms.5547. 637 Chowbina, Sudhir R., Xiaogang Wu, Fan Zhang, Peter M. Li, Ragini Pandey, Harini N. 638 Kasamsetty, and Jake Y. Chen. 2009. “HPD: An Online Integrated Human Pathway 639 Database Enabling Systems Biology Studies.” BMC Bioinformatics 10 (11): S5. 640 https://doi.org/10.1186/1471-2105-10-S11-S5. 641 Conway, James Rw, David Herrmann, Tr Jeffry Evans, Jennifer P. Morton, and Paul Timpson. 642 2019. “Combating Pancreatic Cancer with PI3K Pathway Inhibitors in the Era of 643 Personalised Medicine.” Gut 68 (4): 742–58. https://doi.org/10.1136/gutjnl-2018-316822. 644 Faleiro, Inês, Vânia Palma Roberto, Secil Demirkol Canli, Nicolas A. Fraunhoffer, Juan Iovanna, 645 Ali Osmay Gure, Wolfgang Link, and Pedro Castelo-Branco. 2021. “DNA Methylation 646 of PI3K/AKT Pathway-Related Genes Predicts Outcome in Patients with Pancreatic 647 Cancer: A Comprehensive Bioinformatics-Based Study.” Cancers 13 (24): 6354. 648 https://doi.org/10.3390/cancers13246354. 649 Gong, Eric, and Jake Y. Chen. 2023. “Prioritizing Complex Disease Genes from Heterogeneous 650 Public Databases.” Bioinformatics. https://doi.org/10.1101/2023.02.09.527562. 651 Grassilli, Silvia, Federica Brugnoli, Rossano Lattanzio, Simonetta Buglioni, and Valeria 652 Bertagnolo. 2020. “Vav1 Down-Modulates Akt2 Expression in Cells from Pancreatic 653 Ductal Adenocarcinoma: Nuclear Vav1 as a Potential Regulator of Akt Related 654 Malignancy in Pancreatic Cancer.” Biomedicines 8 (10): 379. 655 https://doi.org/10.3390/biomedicines8100379. 656 Gu, Long, Robert J. Hickey, and Linda H. Malkas. 2023. “Therapeutic Targeting of DNA 657 Replication Stress in Cancer.” Genes 14 (7): 1346. 658 https://doi.org/10.3390/genes14071346. 659 Hayashi, Akimasa, Jungeui Hong, and Christine A. Iacobuzio-Donahue. 2021. “The Pancreatic 660 Cancer Genome Revisited.” Nature Reviews Gastroenterology & Hepatology 18 (7): 661 469–81. https://doi.org/10.1038/s41575-021-00463-z. 662 Hilbig, Andreas. 2008. “Src Kinase and Pancreatic Cancer.” In Pancreatic Cancer, 177:179–85. 663 Recent Results in Cancer Research. Berlin, Heidelberg: Springer Berlin Heidelberg. 664 https://doi.org/10.1007/978-3-540-71279-4_19. 665 Hingorani, Sunil R., Emanuel F. Petricoin, Anirban Maitra, Vinodh Rajapakse, Catrina King, 666 Michael A. Jacobetz, Sally Ross, et al. 2003. “Preinvasive and Invasive Ductal Pancreatic 667 Cancer and Its Early Detection in the Mouse.” Cancer Cell 4 (6): 437–50. 668 https://doi.org/10.1016/s1535-6108(03)00309-x. 669 Hu, Jian-Xiong, Cheng-Fei Zhao, Wen-Biao Chen, Qi-Cai Liu, Qu-Wen Li, Yan-Ya Lin, and 670 Feng Gao. 2021. “Pancreatic Cancer: A Review of Epidemiology, Trend, and Risk 671 Factors.” World Journal of Gastroenterology 27 (27): 4298–4321. 672 https://doi.org/10.3748/wjg.v27.i27.4298. 673 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 22 Hu, Jili, Jia Wang, Xu Guo, Qing Fan, Xinming Li, Kai Li, Zhuoyin Wang, et al. 2024. “MSLN 674 Induced EMT, Cancer Stem Cell Traits and Chemotherapy Resistance of Pancreatic 675 Cancer Cells.” Heliyon 10 (8): e29210. https://doi.org/10.1016/j.heliyon.2024.e29210. 676 Huan, Tianxiao, Xiaogang Wu, and Jake Y. Chen. 2010. “Systems Biology Visualization Tools 677 for Drug Target Discovery.” Expert Opinion on Drug Discovery 5 (5): 425–39. 678 https://doi.org/10.1517/17460441003725102. 679 Huang, Bei, Xiaoling Lang, and Xihong Li. 2022. “The Role of IL-6/JAK2/STAT3 Signaling 680 Pathway in Cancers.” Frontiers in Oncology 12:1023177. 681 https://doi.org/10.3389/fonc.2022.1023177. 682 Huang, Hui, Xiaogang Wu, Madhankumar Sonachalam, Sammed N. Mandape, Ragini Pandey, 683 Karl F. MacDorman, Ping Wan, and Jake Y. Chen. 2012. “PAGED: A Pathway and 684 Gene-Set Enrichment Database to Enable Molecular Phenotype Discoveries.” BMC 685 Bioinformatics 13 (15): S2. https://doi.org/10.1186/1471-2105-13-S15-S2. 686 Huang, Jin, Pan Chen, Ke Liu, Jiao Liu, Borong Zhou, Runliu Wu, Qiu Peng, et al. 2021. 687 “CDK1/2/5 Inhibition Overcomes IFNG-Mediated Adaptive Immune Resistance in 688 Pancreatic Cancer.” Gut 70 (5): 890–99. https://doi.org/10.1136/gutjnl-2019-320441. 689 Jiang, Honghong, Qiaofen Fu, Xin Song, Chunlei Ge, Ruilei Li, Zhen Li, Baozhen Zeng, et al. 690 2019. “HDGF and PRKCA Upregulation Is Associated with a Poor Prognosis in Patients 691 with Lung Adenocarcinoma.” Oncology Letters 18 (5): 4936–46. 692 https://doi.org/10.3892/ol.2019.10812. 693 Kanehisa, M., and S. Goto. 2000. “KEGG: Kyoto Encyclopedia of Genes and Genomes.” 694 Nucleic Acids Research 28 (1): 27–30. https://doi.org/10.1093/nar/28.1.27. 695 Kanehisa, Minoru. 2019. “Toward Understanding the Origin and Evolution of Cellular 696 Organisms.” Protein Science: A Publication of the Protein Society 28 (11): 1947–51. 697 https://doi.org/10.1002/pro.3715. 698 Kanehisa, Minoru, Miho Furumichi, Yoko Sato, Yuriko Matsuura, and Mari Ishiguro-Watanabe. 699 2025. “KEGG: Biological Systems Database as a Model of the Real World.” Nucleic 700 Acids Research 53 (D1): D672–77. https://doi.org/10.1093/nar/gkae909. 701 Khraisha, Qusai, Sophie Put, Johanna Kappenberg, Azza Warraitch, and Kristin Hadfield. 2024. 702 “Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-703 4’s Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature 704 in Multiple Languages.” Research Synthesis Methods 15 (4): 616–26. 705 https://doi.org/10.1002/jrsm.1715. 706 Li, Wenyi, Qiwei Chen, Weiwei Gao, and Hui Zeng. 2022. “ARID1A Promotes 707 Chemosensitivity to Gemcitabine in Pancreatic Cancer through Epigenetic Silencing of 708 RRM2.” Die Pharmazie 77 (7): 224–29. https://doi.org/10.1691/ph.2022.1881. 709 Lim, Bora, Yoshimi Greer, Stanley Lipkowitz, and Naoko Takebe. 2019. “Novel Apoptosis-710 Inducing Agents for the Treatment of Cancer, a New Arsenal in the Toolbox.” Cancers 711 11 (8): 1087. https://doi.org/10.3390/cancers11081087. 712 Lindsay, Mark A. 2003. “Target Discovery.” Nature Reviews Drug Discovery 2 (10): 831–38. 713 https://doi.org/10.1038/nrd1202. 714 Liu, Xiaowen, Danwen Qian, Hongliang Liu, James L. Abbruzzese, Sheng Luo, Kyle M. Walsh, 715 and Qingyi Wei. 2020. “Genetic Variants of the Peroxisome Proliferator-Activated 716 Receptor (PPAR) Signaling Pathway Genes and Risk of Pancreatic Cancer.” Molecular 717 Carcinogenesis 59 (8): 930–39. https://doi.org/10.1002/mc.23208. 718 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 23 Luo, Ji. 2021. “KRAS Mutation in Pancreatic Cancer.” Seminars in Oncology 48 (1): 10–18. 719 https://doi.org/10.1053/j.seminoncol.2021.02.003. 720 Marabelle, Aurelien, Dung T. Le, Paolo A. Ascierto, Anna Maria Di Giacomo, Ana De Jesus-721 Acosta, Jean-Pierre Delord, Ravit Geva, et al. 2020. “Efficacy of Pembrolizumab in 722 Patients With Noncolorectal High Microsatellite Instability/Mismatch Repair-Deficient 723 Cancer: Results From the Phase II KEYNOTE-158 Study.” Journal of Clinical 724 Oncology: Official Journal of the American Society of Clinical Oncology 38 (1): 1–10. 725 https://doi.org/10.1200/JCO.19.02105. 726 Millis, Sherri Z., Sadakatsu Ikeda, Sandeep Reddy, Zoran Gatalica, and Razelle Kurzrock. 2016. 727 “Landscape of Phosphatidylinositol-3-Kinase Pathway Alterations Across 19/i1 784 728 Diverse Solid Tumors.” JAMA Oncology 2 (12): 1565–73. 729 https://doi.org/10.1001/jamaoncol.2016.0891. 730 Mroz, Edmund A., and James W. Rocco. 2017. “The Challenges of Tumor Genetic Diversity.” 731 Cancer 123 (6): 917–27. https://doi.org/10.1002/cncr.30430. 732 Paananen, Jussi, and Vittorio Fortino. 2020. “An Omics Perspective on Drug Target Discovery 733 Platforms.” Briefings in Bioinformatics 21 (6): 1937–53. 734 https://doi.org/10.1093/bib/bbz122. 735 Payne, S. N., M. E. Maher, N. H. Tran, D. R. Van De Hey, T. M. Foley, A. E. Yueh, A. A. 736 Leystra, et al. 2015. “PIK3CA Mutations Can Initiate Pancreatic Tumorigenesis and Are 737 Targetable with PI3K Inhibitors.” Oncogenesis 4 (10): e169–e169. 738 https://doi.org/10.1038/oncsis.2015.28. 739 Pei, Yao-Fei, Xi-Min Yin, and Xi-Qiang Liu. 2018. “TOP2A Induces Malignant Character of 740 Pancreatic Cancer through Activating β -Catenin Signaling Pathway.” Biochimica Et 741 Biophysica Acta. Molecular Basis of Disease 1864 (1): 197–207. 742 https://doi.org/10.1016/j.bbadis.2017.10.019. 743 Poh, Ashleigh R., and Matthias Ernst. 2023. “Functional Roles of SRC Signaling in Pancreatic 744 Cancer: Recent Insights Provide Novel Therapeutic Opportunities.” Oncogene 42 (22): 745 1786–1801. https://doi.org/10.1038/s41388-023-02701-x. 746 Pothula, Srinivasa P., Zhihong Xu, David Goldstein, Romano C. Pirola, Jeremy S. Wilson, and 747 Minoti V. Apte. 2020. “Targeting HGF/c-MET Axis in Pancreatic Cancer.” International 748 Journal of Molecular Sciences 21 (23): 9170. https://doi.org/10.3390/ijms21239170. 749 Rosenberg, Shai, Iva Simeonova, Franck Bielle, Maite Verreault, Bertille Bance, Isabelle Le 750 Roux, Mailys Daniau, et al. 2018. “A Recurrent Point Mutation in PRKCA Is a Hallmark 751 of Chordoid Gliomas.” Nature Communications 9 (1): 2371. 752 https://doi.org/10.1038/s41467-018-04622-w. 753 Santos, Rita, Oleg Ursu, Anna Gaulton, A. Patrícia Bento, Ramesh S. Donadi, Cristian G. 754 Bologa, Anneli Karlsson, et al. 2017. “A Comprehensive Map of Molecular Drug 755 Targets.” Nature Reviews. Drug Discovery 16 (1): 19–34. 756 https://doi.org/10.1038/nrd.2016.230. 757 Sellers, William R., and David E. Fisher. 1999. “Apoptosis and Cancer Drug Targeting.” Journal 758 of Clinical Investigation 104 (12): 1655–61. 759 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC409892/. 760 Sheng, Weiwei, Xiaoyang Shi, Yiheng Lin, Jingtong Tang, Chao Jia, Rongxian Cao, Jian Sun, 761 Guosen Wang, Lei Zhou, and Ming Dong. 2020. “Musashi2 Promotes EGF-Induced 762 EMT in Pancreatic Cancer via ZEB1-ERK/MAPK Signaling.” Journal of Experimental 763 & Clinical Cancer Research: CR 39 (1): 16. https://doi.org/10.1186/s13046-020-1521-4. 764 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 24 Si, Hongtao, Ning Zhang, Chang Shi, Zhanjiang Luo, and Senlin Hou. 2023. “Tumor-765 Suppressive miR-29c Binds to MAPK1 Inhibiting the ERK/MAPK Pathway in 766 Pancreatic Cancer.” Clinical & Translational Oncology: Official Publication of the 767 Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico 768 25 (3): 803–16. https://doi.org/10.1007/s12094-022-02991-9. 769 Sigismund, Sara, Daniele Avanzato, and Letizia Lanzetti. 2018. “Emerging Functions of the 770 EGFR in Cancer.” Molecular Oncology 12 (1): 3–20. https://doi.org/10.1002/1878-771 0261.12155. 772 Singh, Natesh, Philippe Vayer, Shivalika Tanwar, Jean-Luc Poyet, Katya Tsaioun, and Bruno O. 773 Villoutreix. 2023. “Drug Discovery and Development: Introduction to the General Public 774 and Patient Groups.” Frontiers in Drug Discovery 3 (May). 775 https://doi.org/10.3389/fddsv.2023.1201419. 776 Sivaram, Nithya, Patrick A. McLaughlin, Han V. Han, Oleksi Petrenko, Ya-Ping Jiang, Lisa M. 777 Ballou, Kien Pham, Chen Liu, Adrianus W.M. Van Der Velden, and Richard Z. Lin. 778 2019. “Tumor-Intrinsic PIK3CA Represses Tumor Immunogenicity in a Model of 779 Pancreatic Cancer.” Journal of Clinical Investigation 129 (8): 3264–76. 780 https://doi.org/10.1172/JCI123540. 781 Somarelli, Jason A, Amy M Boddy, Heather L Gardner, Suzanne Bartholf DeWitt, Joanne 782 Tuohy, Kate Megquier, Maya U Sheth, et al. 2019. “Improving Cancer Drug Discovery 783 by Studying Cancer across the Tree of Life.” Molecular Biology and Evolution 37 (1): 784 11–17. https://doi.org/10.1093/molbev/msz254. 785 Stanciu, Silviu, Florentina Ionita-Radu, Constantin Stefani, Daniela Miricescu, Iulia-Ioana 786 Stanescu-Spinu, Maria Greabu, Alexandra Ripszky Totan, and Mariana Jinga. 2022. 787 “Targeting PI3K/AKT/mTOR Signaling Pathway in Pancreatic Cancer: From Molecular 788 to Clinical Aspects.” International Journal of Molecular Sciences 23 (17): 10132. 789 https://doi.org/10.3390/ijms231710132. 790 Su, Liangping, Yitian Chen, Cheng Huang, Sangqing Wu, XiaoJuan Wang, Xinbao Zhao, 791 Qiuping Xu, et al. 2023. “Targeting Src Reactivates Pyroptosis to Reverse 792 Chemoresistance in Lung and Pancreatic Cancer Models.” Science Translational 793 Medicine 15 (678): eabl7895. https://doi.org/10.1126/scitranslmed.abl7895. 794 Sun, Duxin, Wei Gao, Hongxiang Hu, and Simon Zhou. 2022. “Why 90% of Clinical Drug 795 Development Fails and How to Improve It?” Acta Pharmaceutica Sinica. B 12 (7): 3049–796 62. https://doi.org/10.1016/j.apsb.2022.02.002. 797 Sun, Ying, Yang Liu, Xiaoli Ma, and Hao Hu. 2021. “The Influence of Cell Cycle Regulation on 798 Chemotherapy.” International Journal of Molecular Sciences 22 (13): 6923. 799 https://doi.org/10.3390/ijms22136923. 800 Szklarczyk, Damian, Rebecca Kirsch, Mikaela Koutrouli, Katerina Nastou, Farrokh Mehryary, 801 Radja Hachilif, Annika L. Gable, et al. 2023. “The STRING Database in 2023: Protein-802 Protein Association Networks and Functional Enrichment Analyses for Any Sequenced 803 Genome of Interest.” Nucleic Acids Research 51 (D1): D638–46. 804 https://doi.org/10.1093/nar/gkac1000. 805 Tate, John G., Sally Bamford, Harry C. Jubb, Zbyslaw Sondka, David M. Beare, Nidhi Bindal, 806 Harry Boutselakis, et al. 2019. “COSMIC: The Catalogue Of Somatic Mutations In 807 Cancer.” Nucleic Acids Research 47 (D1): D941–47. 808 https://doi.org/10.1093/nar/gky1015. 809 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 25 Taylor, Austin J., Evgeniy Panzhinskiy, Paul C. Orban, Francis C. Lynn, David F. Schaeffer, 810 James D. Johnson, Janel L. Kopp, and C. Bruce Verchere. 2023. “Islet Amyloid 811 Polypeptide Does Not Suppress Pancreatic Cancer.” Molecular Metabolism 68 812 (January):101667. https://doi.org/10.1016/j.molmet.2023.101667. 813 Timar, Jozsef, and Karl Kashofer. 2020. “Molecular Epidemiology and Diagnostics of KRAS 814 Mutations in Human Cancer.” Cancer Metastasis Reviews 39 (4): 1029–38. 815 https://doi.org/10.1007/s10555-020-09915-5. 816 Trajanoska, Katerina, Claude Bhérer, Daniel Taliun, Sirui Zhou, J. Brent Richards, and Vincent 817 Mooser. 2023. “From Target Discovery to Clinical Drug Development with Human 818 Genetics.” Nature 620 (7975): 737–45. https://doi.org/10.1038/s41586-023-06388-8. 819 Tu, Mengyu, Lukas Klein, Elisa Espinet, Theodoros Georgomanolis, Florian Wegwitz, Xiaojuan 820 Li, Laura Urbach, et al. 2021. “TNF-α -Producing Macrophages Determine Subtype 821 Identity and Prognosis via AP1 Enhancer Reprogramming in Pancreatic Cancer.” Nature 822 Cancer 2 (11): 1185–1203. https://doi.org/10.1038/s43018-021-00258-w. 823 Uhlén, Mathias, Linn Fagerberg, Björn M. Hallström, Cecilia Lindskog, Per Oksvold, Adil 824 Mardinoglu, Åsa Sivertsson, et al. 2015. “Proteomics. Tissue-Based Map of the Human 825 Proteome.” Science (New York, N.Y.) 347 (6220): 1260419. 826 https://doi.org/10.1126/science.1260419. 827 Verma, Henu K., Praveen K. Kampalli, Saikrishna Lakkakula, Gayathri Chalikonda, Lakkakula 828 V.K.S. Bhaskar, and Smaranika Pattnaik. 2020. “A Retrospective Look at Anti-EGFR 829 Agents in Pancreatic Cancer Therapy.” Current Drug Metabolism 20 (12): 958–66. 830 https://doi.org/10.2174/1389200220666191122104955. 831 Villalona-Calero, Miguel A., Paul Ritch, Jose A. Figueroa, Gregory A. Otterson, Robert Belt, 832 Edward Dow, Sebastian George, et al. 2004. “A Phase I/II Study of LY900003, an 833 Antisense Inhibitor of Protein Kinase C-Alpha, in Combination with Cisplatin and 834 Gemcitabine in Patients with Advanced Non-Small Cell Lung Cancer.” Clinical Cancer 835 Research: An Official Journal of the American Association for Cancer Research 10 (18 836 Pt 1): 6086–93. https://doi.org/10.1158/1078-0432.CCR-04-0779. 837 Wiedmann, Lena, Francesca De Angelis Rigotti, Nuria Vaquero-Siguero, Elisa Donato, Elisa 838 Espinet, Iris Moll, Elisenda Alsina-Sanchis, et al. 2023. “HAPLN1 Potentiates Peritoneal 839 Metastasis in Pancreatic Cancer.” Nature Communications 14 (1): 2353. 840 https://doi.org/10.1038/s41467-023-38064-w. 841 Wu, Fan, Jin He, Qianxi Deng, Jun Chen, Mingyu Peng, Jiayi Xiao, Yiwei Zeng, et al. 2023. 842 “Neuroglobin Inhibits Pancreatic Cancer Proliferation and Metastasis by Targeting the 843 GNAI1/EGFR/AKT/ERK Signaling Axis.” Biochemical and Biophysical Research 844 Communications 664 (July):108–16. https://doi.org/10.1016/j.bbrc.2023.04.080. 845 Wu, Xiaogang, Hui Huang, Tao Wei, Ragini Pandey, Christoph Reinhard, Shuyu D. Li, and Jake 846 Y. Chen. 2012. “Network Expansion and Pathway Enrichment Analysis towards 847 Biologically Significant Findings from Microarrays.” Journal of Integrative 848 Bioinformatics 9 (2): 213. https://doi.org/10.2390/biecoll-jib-2012-213. 849 Xu, Xiaodong, Yimin Ding, Junbin Jin, Chengjie Xu, Wenyi Hu, Songtao Wu, Guoping Ding, 850 Rui Cheng, Liping Cao, and Shengnan Jia. 2023. “Post-Translational Modification of 851 CDK1–STAT3 Signaling by Fisetin Suppresses Pancreatic Cancer Stem Cell Properties.” 852 Cell & Bioscience 13 (1): 176. https://doi.org/10.1186/s13578-023-01118-z. 853 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 26 Yue, Zongliang, Radomir Slominski, Samuel Bharti, and Jake Y. Chen. 2022. “PAGER Web 854 APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional 855 Genomics.” Frontiers in Genetics 13 (April). https://doi.org/10.3389/fgene.2022.820361. 856 Yue, Zongliang, Christopher D Willey, Anita B Hjelmeland, and Jake Y Chen. 2019. “BEERE: 857 A Web Server for Biomedical Entity Expansion, Ranking and Explorations.” Nucleic 858 Acids Research 47 (W1): W578–86. https://doi.org/10.1093/nar/gkz428. 859 Yue, Zongliang, Qi Zheng, Michael T Neylon, Minjae Yoo, Jimin Shin, Zhiying Zhao, Aik 860 Choon Tan, and Jake Y Chen. 2018. “PAGER 2.0: An Update to the Pathway, 861 Annotated-List and Gene-Signature Electronic Repository for Human Network Biology.” 862 Nucleic Acids Research 46 (D1): D668–76. https://doi.org/10.1093/nar/gkx1040. 863 Zhang, Geng, Peijun He, Hanson Tan, Anuradha Budhu, Jochen Gaedcke, B. Michael Ghadimi, 864 Thomas Ried, et al. 2013. “Integration of Metabolomics and Transcriptomics Revealed a 865 Fatty Acid Network Exerting Growth Inhibitory Effects in Human Pancreatic Cancer.” 866 Clinical Cancer Research: An Official Journal of the American Association for Cancer 867 Research 19 (18): 4983–93. https://doi.org/10.1158/1078-0432.CCR-13-0209. 868 Zhang, Geng, Aaron Schetter, Peijun He, Naotake Funamizu, Jochen Gaedcke, B. Michael 869 Ghadimi, Thomas Ried, et al. 2012. “DPEP1 Inhibits Tumor Cell Invasiveness, Enhances 870 Chemosensitivity and Predicts Clinical Outcome in Pancreatic Ductal Adenocarcinoma.” 871 PloS One 7 (2): e31507. https://doi.org/10.1371/journal.pone.0031507. 872 Zhang, Huan, Yan Sun, Zhaokai Wang, Xiaoju Huang, Lu Tang, Ke Jiang, and Xin Jin. 2024. 873 “ZDHHC20-Mediated S-Palmitoylation of YTHDF3 Stabilizes MYC mRNA to Promote 874 Pancreatic Cancer Progression.” Nature Communications 15 (1): 4642. 875 https://doi.org/10.1038/s41467-024-49105-3. 876 Zhang, Huayu, Amy Ferguson, Grant Robertson, Muchen Jiang, Teng Zhang, Cathie Sudlow, 877 Keith Smith, Kristiina Rannikmae, and Honghan Wu. 2021. “Benchmarking Network-878 Based Gene Prioritization Methods for Cerebral Small Vessel Disease.” Briefings in 879 Bioinformatics 22 (5): bbab006. https://doi.org/10.1093/bib/bbab006. 880 Zhou, Shu-Feng, and Wei-Zhu Zhong. 2017. “Drug Design and Discovery: Principles and 881 Applications.” Molecules 22 (2): 279. https://doi.org/10.3390/molecules22020279. 882 Zhou, Ying, Yintao Zhang, Xichen Lian, Fengcheng Li, Chaoxin Wang, Feng Zhu, Yunqing 883 Qiu, and Yuzong Chen. 2022. “Therapeutic Target Database Update 2022: Facilitating 884 Drug Discovery with Enriched Comparative Data of Targeted Agents.” Nucleic Acids 885 Research 50 (D1): D1398–1407. https://doi.org/10.1093/nar/gkab953. 886 Zhu, Guoying, Qing Fang, Fengshang Zhu, Dongping Huang, and Changqing Yang. 2021. 887 “Structure and Function of Pancreatic Lipase-Related Protein 2 and Its Relationship With 888 Pathological States.” Frontiers in Genetics 12 (July). 889 https://doi.org/10.3389/fgene.2021.693538. 890 Zhu, Liang, Minlin Jiang, Hao Wang, Hui Sun, Jun Zhu, Wencheng Zhao, Qiyu Fang, et al. 891 2021. “A Narrative Review of Tumor Heterogeneity and Challenges to Tumor Drug 892 Therapy.” Annals of Translational Medicine 9 (16): 1351. https://doi.org/10.21037/atm-893 21-1948. 894 Zhang, Huan et al. “ZDHHC20-mediated S-palmitoylation of YTHDF3 stabilizes 895 MYC mRNA to promote pancreatic cancer progression.” 896 Nature communications vol. 15,1 4642. 31 May. 2024, doi:10.1038/s41467-024-897 49105-3 898 Wu, Fan et al. “Neuroglobin inhibits pancreatic cancer proliferation and metastasis by targeting 899 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 27 the GNAI1/EGFR/AKT/ERK signaling axis.” Biochemical and biophysical research 900 communications vol. 664 (2023): 108-116. doi:10.1016/j.bbrc.2023.04.080 901 Campa, Daniele et al. “TERT gene harbors multiple variants associated with pancreatic cancer 902 susceptibility.” International journal of cancer vol. 137,9 (2015): 2175-83. 903 doi:10.1002/ijc.29590 904 Li, Wenyi et al. “ARID1A promotes chemosensitivity to gemcitabine in pancreatic cancer through 905 epigenetic silencing of RRM2.” Die Pharmazie vol. 77,7 (2022): 224-229. 906 doi:10.1691/ph.2022.1881 907 Pei, Yao-Fei et al. “TOP2A induces malignant character of pancreatic cancer through activating 908 β -catenin signaling pathway.” Biochimica et biophysica acta. Molecular basis of 909 disease vol. 1864,1 (2018): 197-207. doi:10.1016/j.bbadis.2017.10.019 910 Cheng, Yao et al. “Proliferation enhanced by NGF-NTRK1 signaling makes pancreatic cancer 911 cells more sensitive to 2DG-induced apoptosis.” International journal of medical 912 sciences vol. 10,5 (2013): 634-40. doi:10.7150/ijms.5547 913 Hingorani, Sunil R et al. “Preinvasive and invasive ductal pancreatic cancer and its early 914 detection in the mouse.” Cancer cell vol. 4,6 (2003): 437-50. doi:10.1016/s1535-915 6108(03)00309-x 916 Sheng, Weiwei et al. “Musashi2 promotes EGF-induced EMT in pancreatic cancer via ZEB1- 917 ERK/MAPK signaling.” Journal of experimental & clinical cancer research : CR vol. 39,1 918 16. 17 Jan. 2020, doi:10.1186/s13046-020-1521-4 919 920 Si, Hongtao et al. “Tumor-suppressive miR-29c binds to MAPK1 inhibiting the ERK/MAPK 921 pathway in pancreatic cancer.” Clinical & translational oncology : official publication of 922 the Federation of Spanish Oncology Societies and of the National Cancer Institute of 923 Mexico vol. 25,3 (2023): 803-816. doi:10.1007/s12094-022-02991-9 924 Stanciu, Silviu et al. “Targeting PI3K/AKT/mTOR Signaling Pathway in Pancreatic Cancer: From 925 Molecular to Clinical Aspects.” International journal of molecular sciences vol. 23,17 926 10132. 4 Sep. 2022, doi:10.3390/ijms231710132 927 Hu, Jili et al. “MSLN induced EMT, cancer stem cell traits and chemotherapy resistance of 928 pancreatic cancer cells.” Heliyon vol. 10,8 e29210. 7 Apr. 2024, 929 doi:10.1016/j.heliyon.2024.e29210 930 Bhamidipati, Deepak et al. “Exceptional Responses to Selpercatinib in RET Fusion-Driven 931 Metastatic Pancreatic Cancer.” JCO precision oncology vol. 7 (2023): e2300252. 932 doi:10.1200/PO.23.00252 933 Huang, Bei et al. “The role of IL-6/JAK2/STAT3 signaling pathway in cancers.” Frontiers in 934 oncology vol. 12 1023177. 16 Dec. 2022, doi:10.3389/fonc.2022.1023177 935 Pothula, Srinivasa P et al. “Targeting HGF/c-MET Axis in Pancreatic Cancer.” International 936 journal of molecular sciences vol. 21,23 9170. 1 Dec. 2020, doi:10.3390/ijms21239170 937 Marabelle, Aurelien et al. “Efficacy of Pembrolizumab in Patients With Noncolorectal High 938 Microsatellite Instability/Mismatch Repair-Deficient Cancer: Results From the Phase II 939 KEYNOTE-158 Study.” Journal of clinical oncology : official journal of the American 940 Society of Clinical Oncology vol. 38,1 (2020): 1-10. doi:10.1200/JCO.19.02105 941 942 943 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 28 Figures 944 Figure 1945 946 947 Figure 1. General overview of the GET list compilation and ranking process. A. Initial 948 gene lists from each of the three subsets are compiled. 2493 genes are compiled in the initial G 949 list, 2000 genes are compiled in the initial E list, and 131 genes are compiled in the initial T list. 950 B. Each list is prioritized using the BEERE network ranking and expansion tool, before being 951 annotated with genomic information and GPT4 literature review. C. A weighted score is 952 calculated to rank the list and genes are manually validated through literature review. 953 954 955 956 957 958 959 960 961 28 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 29 962 963 Table 1: 964 In Table 1, we observe the weights of each modality of ranking drug targets ordered from 965 greatest to least. 966 967 Table 1: Weights each modality was assigned for calculation of the RP score in GETGENE-AI. 968 969 970 971 972 973 974 975 Modality of ranking Weighted Score GT LIST Score 0.329 CNA(CBIOPORTAL UTSW NAT COMMUN 2015) 0.201 Expression LIST Score 0.088 GET LIST Score 0.085 Mutation frequency(cBioporta lTCGA PanCancerAtlas) 0.079 CNA(CBIOPORTAL TCGA PANCANCERATLAS) 0.048 Mutation frequency(Cbioportal UTSW Nat Commun 2015) -0.023 Brain Expression Score -0.054 Kidney Expression Score -0.081 Gastrointestinal Expression Score -0.095 Liver Expression Score -0.101 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 30 976 977 Table 2: 978 Gene GPT4-Score ranking GET ranking Experimental Validation? Citation MYC 1 2 Yes (Huan Zhang et al. 2024) SRC 2 3 Yes (Su et al. 2023) EGFR 3 4 Yes (F. Wu et al. 2023) TERT 4 27 Yes (Campa et al. 2015) RRM2 5 21 Yes (Li et al. 2022) PIK3CA 6 1 Yes (Payne et al. 2015) TOP2A 7 16 Yes (Pei, Yin, and Liu 2018) NTRK1 8 22 Yes (Cheng et al. 2013) PTGS2 9 25 Yes (Hingorani et al. 2003) EGF 10 30 Yes (Sheng et al. 2020) CDK1 11 5 Yes (J. Huang et al. 2021) MAPK1 12 10 Yes (Si et al. 2023) KRAS 13 13 Yes (Timar and Kashofer 2020) MTOR 14 11 Yes (Stanciu et al. 2022) MSLN 15 37 Yes (J. Hu et al. 2024) RET 16 28 Yes (Bhamidipati et al. 2023) AKT1 17 31 Yes (Arasanz et al. 2019) JAK2 18 9 Yes (B. Huang, Lang, and Li 2022) MET 19 34 Yes (Pothula et al. 2020) PDCD1 20 38 Yes (Marabelle et al. 2020) Table 2: Top 20 highest ranked genes based off of GPT4 score compared to their ranks in 979 GET and their status as experimentally validated drug targets. 980 981 982 983 984 985 986 987 988 989 990 991 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 31 992 993 994 995 996 997 Figure 2: 998 999 1000 Figure 2: Figure 2: Volcano plot GSE28735: Microarray gene-expression profiles of 45 matching 1001 pairs of tumor vs. nontumor, Padj<0.05. Blue indicates down regulated while red indicates 1002 upregulated. 1003 1004 1005 1006 1007 1008 1009 1010 1011 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 32 1012 1013 1014 Figure 3: 1015 1016 Figure 3: Network constructed by STRING utilizing the KEGG pathway HG0512. Content 1017 inside each node is known or predicted 3d structure of protein. Turquoise edges mean Protein-1018 protein interactions from curated databases, purple means experimentally determined. Green, 1019 red, and dark blue edges indicate predicted Protein-protein interactions. Light green edges 1020 32 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 33 represent text mining, black represents co-expression, and light purple represents protein 1021 homology. 1022 1023 1024 1025 1026 1027 1028 1029 Table 3: 1030 GETGENE- AI top genes Experimentally validated? STRING top genes Experimentally validated? GEO2R top genes Experimentally validated? PIK3CA Yes AKT1 Yes PNLIPRP1 No MYC Yes TP53 Yes PNLIPRP2 No SRC Yes KRAS Yes IAPP No EGFR Yes PTEN Yes CTRC No CDK1 Yes SRC Yes GP2 Yes PRKCA Yes STAT3 Yes CEL No TNF Yes EGFR Yes CPA2 Yes LCK Yes MTOR Yes ALB Yes JAK2 Yes BCL2 Yes CUZD1 Yes MAPK1 Yes PIK3CA Yes ERP27 No MTOR Yes CDKN2A Yes CLPS Yes AURKB Yes HRAS Yes SERPINI2 Yes KRAS Yes CCND1 Yes PLA2G1B Yes MAPK8 Yes NFKB1 Yes CELA2A No TOP2A Yes CDKN1A Yes CELA2B No 1031 Table 3: Top 15 genes from GETGENE-AI, STRING, and GEO2R and their status as 1032 experimentally validated drug targets. 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 34 1043 1044 1045 1046 1047 1048 1049 1050 Figure 4: 1051 1052 1053 1054 1055 Figure 4: Bar graph displaying the percent of experimentally validated targets out of the top 50 1056 genes with each framework. 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 0 20 40 60 80 100 120 GEO2R STRING GETgene-AI Percentage of experimentally validated targets (out of top 50) .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 35 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 Table 4: 1081 Gene Weighted score CHAT GPT score GT list score Mutation Frequency (cBioportal TCGA PanCancer Atlas) RP-LIT score GPT-LIT score GET list score Expression list score PIK3CA 34.8 310 58.7 2.8 0.199 1.771 96 97 MYC 30.1 330 9.5 0.0 0.032 0.349 214 210 SRC 20.0 320 0.0 1.1 0.044 0.711 143 144 EGFR 18.2 320 2.4 0.6 0.010 0.171 134 133 CDK1 15.9 305 15.3 65.4 0.134 2.563 30 7 PRKCA 15.3 305 3.0 0.0 1.702 25.556 101 102 TNF 12.1 270 2.4 0.0 0.013 0.292 83 86 LCK 11.5 220 1.7 0.0 1.274 24.444 62 60 JAK2 10.6 285 1.0 0.6 0.082 2.192 67 67 MAPK1 10.3 305 11.6 3.4 0.139 4.122 7 7 AURKB 9.1 295 0.0 0.6 0.008 0.246 70 70 KRAS 8.7 220 1.7 1.7 0.335 8.462 48 47 MAPK8 7.8 295 0.0 0.0 0.002 0.068 121 117 MTOR 7.1 220 1.7 0.0 0.588 18.333 52 52 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint 36 ITGA4 6.9 220 4.3 0.6 2.298 73.333 40 37 TOP2A 6.9 310 10.2 1.1 0.215 9.688 0 0 CHEK1 6.7 220 1.7 0.0 0.128 4.231 46 45 BCL2 6.2 220 1.7 0.6 0.012 0.418 41 41 PRKCB 6.0 250 1.4 0.6 1.004 41.667 60 58 ERBB4 5.5 220 3.4 0.6 0.184 7.333 81 83 Table 4: Highest 20 genes ranked on GETGENE-AI. Weighted score is RP score, CHAT GPT 1082 score is GPT4 score. 1083 1084 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-ND-4.0