Introduction
30
Drug target discovery is a multifaceted process central to developing effective therapies 31
for a range of diseases (S.-F. Zhou and Zhong 2017; Lindsay 2003). Traditionally, it involves the 32
manual investigation of scientific literature and biomedical databases to identify biological 33
targets linked to disease mechanisms, followed by evaluations based on relevance, preclinical 34
and clinical success rates, and research popularity (Paananen and Fortino 2020; Y. Zhou et al. 35
2022; Trajanoska et al. 2023; Santos et al. 2017). While foundational, this approach often suffers 36
from inefficiencies, high failure rates, and uncertainties in translating early discoveries into 37
viable treatments (D. Sun et al. 2022; Singh et al. 2023). These challenges are particularly 38
pronounced in cancer, where the heterogeneity of the disease demands more precise and 39
individualized strategies (L. Zhu et al. 2021; Somarelli et al. 2019). Unlike other conditions, 40
cancers arise from diverse genetic and molecular abnormalities, requiring advanced techniques to 41
identify and prioritize therapeutic targets (Mroz and Rocco 2017). Cancer medications often 42
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
2
target rapidly dividing cells by disrupting DNA replication or cell division, as seen with 43
chemotherapeutic agents (Gu, Hickey, and Malkas 2023; Y. Sun et al. 2021). Additionally, some 44
therapies induce apoptosis in cancer cells by targeting pathways that regulate cell survival 45
(Sellers and Fisher 1999; Lim et al. 2019). 46
Significant innovations have been developed to streamline, reform, and reduce the cost of 47
drug target discovery, primarily through in silico assessments (Sadybekov and Katritch 2023). 48
These assessments utilize neural networks, genomic datasets, and machine learning algorithms to 49
predict key genes of interest (Serrano Nájera, Narganes Carlón, and Crowther 2021; Yue et al. 50
2017; Muhammad et al. 2014). These in silico tools offer new methods of analysis, uncovering 51
novel insights and advancing the identification of potential therapeutic targets (Sliwoski et al. 52
2014; Huan, Wu, and Chen 2010). The standard in silico framework for drug target discovery 53
typically begins with the construction of protein-protein interaction (PPI) networks, where 54
potential targets are ranked based on their network connectivity and centrality (Odongo, 55
Demiroglu-Zergeroglu, and Çakır 2024; Y. Chen and Xu 2016). Network centrality approaches 56
to drug target discovery are commonly referred to as network-based prioritization. By leveraging 57
network-based prioritization, researchers can efficiently analyze and interpret vast genomic 58
datasets to identify critical regulatory genes implicated in cancer development (Chang et al. 59
2021; Sonehara and Okada 2021). The ability to efficiently process genomic information and 60
derive meaningful insights is pivotal for identifying relevant drug targets, underscoring the 61
importance of network-based prioritization (J. Y. Chen, Piquette-Miller, and Smith 2013; Huang, 62
Li, and Chen 2009). Network-based approaches are particularly favored for this purpose, as they 63
utilize established biological networks to identify genes associated with specific diseases (Shim, 64
Hwang, and Lee 2015; Huang et al. 2012). These approaches prioritize disease-related genes by 65
integrating data from PPI networks and known gene-drug associations (Mohsen et al. 2021; 66
Zhang et al. 2021). These approaches are also able to be easily visualized (Huan, Wu, and Chen 67
2010). However, traditional network-based approaches often fail to incorporate crucial genomic 68
information, such as protein expression across tissues, gene mutation frequencies, and 69
differential gene expression profiles, which limits their utility (Petti et al. 2020). To address these 70
limitations, supplemental in silico methods have been developed to achieve a more 71
comprehensive and nuanced analysis of drug targets (Nitsch et al. 2010). 72
Differential gene expression is a critical method for identifying genes significantly altered 73
between conditions, such as cancerous versus normal tissues (Bai et al. 2013; Van de Sande et al. 74
2023). A common approach to calculate differential expression is by measuring "fold change," 75
which represents the ratio of gene expression levels between these conditions (Love, Huber, and 76
Anders 2014; Mutch et al. 2002). However, the threshold value used to define fold change can 77
introduce arbitrariness into the prioritization process, potentially affecting the accuracy of target 78
identification (McCarthy and Smyth 2009). GEO2R, a widely used tool, employs fold change to 79
prioritize genes under experimental conditions, specifically comparing gene expression in 80
cancerous versus normal tissues (Barrett et al. 2013). In addition to fold change, frequency-based 81
prioritization methods focus on genes with higher mutational rates in disease contexts, as these 82
genes may serve as common therapeutic targets (Dinstag and Shamir 2020; López-Cortés et al. 83
2018). While valuable, statistical significance methods for gene prioritization can be prone to 84
bias, especially due to sample selection, which can skew results (Lazzeroni, Lu, and Belitskaya-85
Lévy 2014). To mitigate these biases, network centrality-based prioritization has been developed, 86
leveraging gene connectivity within biological networks. This method offers a more balanced 87
approach to gene selection, expands gene lists, and enhances disease association measures, 88
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
3
improving the identification of relevant therapeutic targets (Janyasupab, Suratanee, and Plaimas 89
2021; Magger et al. 2012). Large Language Models (LLMs), such as GPT-4, have emerged as 90
another transformative in silico approach to drug discovery (Liu et al. 2021; Oniani et al. 2024). 91
LLMs can predict essential information about gene targets, including structural domains of 92
proteins, protein structure, toxicity and adverse effects, functional significance, clinical and 93
preclinical relevance, and treatment efficacy (Sallam 2023; Tripathi et al. 2024). Furthermore, 94
GPT-4 has demonstrated the ability to rival human performance in conducting literature reviews, 95
thus streamlining a critical component of the drug target discovery process (Khraisha et al. 2024; 96
Li, Zhu, and Chen 2010). 97
98
To address challenges in cancer drug target discovery, we developed GETgene-AI, a 99
framework that combines network-based analysis with artificial intelligence to prioritize 100
actionable targets. Central to GETgene-AI is the G.E.T. strategy, which integrates three key data 101
streams: the G List (genes with genetic mutations, variations functionally implicated in genotype-102
to-phenotype association studies of the disease), the E List (disease target tissue-specific 103
expressions of the candidate gene), and the T List (established drug targets based on reports from 104
literature, patents, clinical trials, or existing approved drugs). Initial gene lists are derived from 105
multiple sources, including fold change, copy number alterations, and mutational rates across 106
various biological databases. To mitigate biases from incomplete data, GETgene-AI incorporates 107
diverse datasets, ensuring robust prioritization. The framework iteratively refines these lists 108
through the network-based tool BEERE, annotating genes with biological context to create a 109
high-quality, prioritized gene list. This iterative process expands and ranks gene candidates based 110
on biological annotations, enhancing the accuracy of target identification. By combining 111
traditional and in silico methods, GETgene-AI bridges gaps in drug discovery and facilitates the 112
development of personalized cancer therapies. Additionally, GPT-4o is integrated into the 113
process to improve literature review efficiency and further annotate the target list, enhancing the 114
overall workflow. 115
116
While pancreatic cancer serves as a case study in this framework, the underlying 117
methodology is adaptable to a wide range of cancers and diseases, provided that relevant 118
genomic and clinical data are available. This adaptability underscores the framework's potential 119
as a versatile and powerful tool for drug discovery across diverse disease contexts. By 120
incorporating comprehensive in silico analyses and leveraging high-throughput genomic data, the 121
framework can be applied to identify novel drug targets for various malignancies, thereby 122
accelerating the discovery of therapeutic options. Furthermore, the novel drug targets identified 123
through our case study in pancreatic cancer not only offer insights into the unique molecular 124
mechanisms driving this aggressive cancer but also present promising avenues for therapeutic 125
intervention. These targets have the potential to facilitate breakthroughs in pancreatic cancer 126
treatment, paving the way for the development of more effective, personalized therapies. Given 127
the significant unmet clinical need in pancreatic cancer, these findings are poised to contribute to 128
the development of targeted therapies that could improve patient outcomes and survival rates. 129
Method
produced quantitative data and novel insights into potential drug targets. 150
Compiling the Gene list from Genetic Mutations 151
For the "GENE" component of our "GET" framework, we compiled three subsets: 152
PAGER-NC, COSMIC-MUT, and CBP-CNA-MUT. The initial "GENE" list was derived from 153
the PAGER (Yue et al. 2018; H. Huang et al. 2012; Yue et al. 2022), cBioPortal (de Bruijn et al. 154
2023), and COSMIC(Tate et al. 2019) databases. To mitigate potential sample biases and data 155
incompleteness (e.g., studies failing to detect specific genes), multiple datasets from the same 156
databases were utilized where possible. Genes associated with the term "Pancreatic Cancer" were 157
manually retrieved from these databases, and cutoffs were established to ensure the selection of 158
genes most relevant to pancreatic cancer. 159
PAGER was employed to incorporate a biological pathway perspective into gene ranking, 160
offering functional significance (Chowbina et al. 2009). From PAGER, 844 candidate genes 161
were heuristically selected based on an nCoCo score ranging from 5 to 100. The nCoCo score 162
reflects the biological relatedness of gene sets within PAGER (Yue et al. 2022; 2018; H. Huang 163
et al. 2012). A lower cutoff of 5 was chosen because literature review indicated that functional 164
groups below this threshold were not relevant to any disease or function, while a score above 100 165
was associated with major roles in other significant biological processes. For the cBioPortal and 166
COSMIC cancer databases, cutoffs were determined based on the point at which mutational 167
frequency no longer demonstrated significance to cancer in the literature. From cBioPortal, 1,000 168
genes were retrieved using a cutoff of 8.2% for copy number alterations (CNA) and 2.8% for 169
mutational frequency. From COSMIC, 649 genes were compiled using a heuristic cutoff of 20% 170
mutational frequency. Candidate genes from cBioPortal, PAGER, and COSMIC were then 171
combined to form the "G list," which comprised a total of 2,493 genes. 172
The BEERE tool employs an initial ranking algorithm and two iterative ranking 173
algorithms—PageRank and an ant-colony algorithm—both of which have demonstrated success 174
across diverse knowledge domains (Yue et al. 2019). BEERE expands the gene list using the 175
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
5
nearest-neighbor network constructed from protein-protein interactions in the HAPPI 2.0 176
database (X. Wu et al. 2012; Jake Yue Chen, Mamidipalli, and Huan 2009; Jake Y. Chen, 177
Pandey, and Nguyen 2017). Following the GRIPPs method (Gong and Chen 2023) the "G list" 178
was iteratively prioritized and expanded through BEERE, resulting in the "processed G list." 179
180
181
182
Figure 1: General overview of the GET list compilation and ranking process. A:Initial 183
gene lists from each of the three subsets are compiled. 2493 genes are compiled in the initial G 184
list, 2000 genes are compiled in the initial E list, and 131 genes are compiled in the initial T list. 185
B: Each list is prioritized using the BEERE network ranking and expansion tool, before being 186
annotated with genomic information and GPT4o literature review. C: A weighted score is 187
calculated to rank the list and genes are manually validated through literature review. 188
Compiling candidate genes for the “Expression” 189
subset 190
5
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
6
Candidate genes were identified through analysis performed using GEO2R on the GEO dataset 191
GSE28735, titled "Pancreatic ductal adenocarcinoma tumor and adjacent non-tumor tissue" (G. 192
Zhang et al. 2012; 2013). Samples were divided into tumor and non-tumor groups using the 193
"Define groups" feature, with "human pancreatic tumor tissue patient sample" designated as the 194
tumor group and "human pancreatic nontumor patient sample" as the non-tumor group. The 195
dataset included a total of 90 patient samples, evenly divided into 45 tumor samples and 45 non-196
tumor samples. Differentially expressed genes were analyzed using the "analyze" function in 197
GEO2R. The top 2,000 genes with the highest differential expression were compiled into the 198
initial "E list." This list was subsequently iteratively processed using the BEERE software, 199
following the GRIPPs method. 200
201
Compiling candidate genes for the “Target” subset 202
Finally, 131 genes were identified from DrugBank, a comprehensive database containing 203
information on various drugs. Genes were extracted by searching for "Pancreatic Cancer," 204
"Pancreatic Ductal Adenocarcinoma," and "Neuroendocrine Pancreatic Cancer" within the drugs 205
section of DrugBank. Drugs marked as being used for the treatment of pancreatic cancer were 206
identified by reviewing their descriptions in the Summary, Background, Indication, Associated 207
Conditions, or Clinical Trial categories. Each drug was manually reviewed for its mechanism of 208
action, summary, and background to determine whether it was directly used for treating 209
pancreatic cancer or for supportive purposes, such as chemotherapy relief, pain management, or 210
sedation. Drugs meeting these criteria had all their associated gene targets listed under the 211
"Targets" section of DrugBank compiled. Through this process, 131 unique genes were 212
identified. 213
214
Iterative ranking and generation of GET lists 215
We refined the candidate gene lists through an iterative process employing the BEERE 216
tool for gene prioritization and expansion. A customized pipeline was developed, building upon 217
the framework established by the GRIPPs method (Gong and Chen 2023). Using the previously 218
compiled G, E, and T lists, we integrated the data into three distinct categories for gene 219
prioritization. The first, the GET list, provided a comprehensive ranking by combining all three 220
categories—mutational frequency, differential expression, and established drug targets. The 221
second, the GT list, focused on genes that are both frequently mutated and recognized as 222
established drug targets, offering insights into genes with high mutation frequencies and 223
functional relevance in pancreatic cancer. Lastly, the prioritized E list was independently ranked 224
using BEERE to specifically assess and prioritize genes based solely on their differential 225
expression. This systematic approach ensured a robust and multifaceted analysis of candidate 226
genes, enhancing the identification of potential therapeutic targets. 227
Each list underwent a standardized processing workflow. Initially, BEERE was employed 228
to expand each list by incorporating genes from the nearest-neighbor network of protein-protein 229
interactions derived from the HAPPI 2.0 database. This expansion enhanced the 230
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
7
comprehensiveness and quality of the lists. Following gene expansion, BEERE’s network and 231
significance ranking algorithms were applied to prioritize the lists, generating statistically 232
significant rankings of candidate genes. Subsequently, each list was heuristically filtered to retain 233
only the top 500 genes before undergoing additional rounds of refinement. This iterative process 234
was repeated three times, as further iterations led to convergence of the lists into a single pool, 235
diminishing their differentiation. At three iterations, the lists retained distinct gene rankings and 236
compositions. The refined lists were then integrated to create the “Initial GET List,” which 237
underwent further prioritization and expansion via BEERE to produce the “Final GET List.” 238
Additionally, two derivative lists were constructed: the “GT List,” which combined the Genes 239
and Targets lists followed by BEERE-based expansion and prioritization, and the “Expression 240
List,” which consisted of the refined E list used for further analyses. 241
Annotation of GET list: Assessment of candidate genes 242
based on clinical trials 243
Clinical trials play a crucial role in evaluating the efficacy of targeting specific genes. To assess 244
the clinical relevance of each gene, clinical trial frequency was used as a metric for clinical trial 245
popularity. Genes targeted by drugs cited in pancreatic cancer treatment were manually compiled. 246
This process involved querying the term “Pancreatic Cancer” on ClinicalTrials.gov and collecting 247
all drugs listed for clinical pancreatic cancer treatment. The corresponding target genes for these 248
drugs were identified by querying the "targets" section in DrugBank. In total, 357 drugs targeting 249
253 genes were compiled. These genes were then annotated with BEERE scores generated 250
through the previously described methodology. Additionally, raw, quantifiable genomic data 251
were incorporated into the analysis. This included mutation frequency data sourced from 252
cBioPortal (de Bruijn et al. 2023) and protein expression data from ProteinAtlas (Uhlén et al. 253
2015), covering key tissues such as the brain, gastrointestinal system, liver, and kidney to account 254
for potential adverse effects of gene inhibition or activation. 255
GPT-4o aided literature assessment 256
Recent research has demonstrated that GPT-4o performs "human-like" literature reviews, 257
particularly in screening and analyzing scientific literature (Khraisha et al. 2024). For this study, 258
abstracts related to pancreatic cancer genes and treatments were downloaded using PubMed's 259
"save" feature. A total of 5,091 abstracts were collected and uploaded for analysis by GPT-4o 260
through a custom GPTo interface. Due to the data processing limitations of GPT-4o, abstracts 261
were filtered to include only meta-analyses, clinical trials, and systematic reviews on PubMed to 262
ensure high-quality input data. 263
The custom GPTo model was configured with specific instructions to rank genes based 264
on a scoring system with a maximum score of 400 points, distributed across four categories: 265
functional significance in pancreatic cancer, research popularity, treatment effectiveness when 266
targeting or inhibiting the gene, and protein structure. Each category was allocated 100 points, 267
and the resulting metric was termed the GPT-4 score. To mitigate GPT-4o's known issue of 268
"hallucination" or the generation of inaccurate or nonexistent information, the model was 269
explicitly instructed to base its rankings solely on the uploaded research database. Additionally, 270
the model was required to cite articles referenced during the ranking process and provide 271
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
8
explanations for the scores assigned to each gene in every category. GPT-4 outputs were 272
manually verified against curated datasets to ensure biological relevance and mitigate 273
hallucinations. Citations provided by GPT-4 were cross-referenced with PubMed to confirm 274
validity. All cited articles were manually verified, and any errors or hallucinations were 275
addressed by instructing the model to re-search the uploaded literature database for accurate 276
mentions of the gene. 277
Comprehensive gene ranking 278
Following the ranking of the GET list and the compilation of genes targeted in clinical 279
trials, genes were further annotated with relevant biological information. Mutational frequency, a 280
key category in gene ontology ranking(Timar and Kashofer 2020) ,was assessed alongside Copy 281
Number Alterations (CNA), another critical category in ontology(Beroukhim et al. 2010) . Data 282
for these categories were obtained from CBioPortal [30] using the “Pancreatic Cancer (UTSW, 283
Nat Commun 2015)” and “Pancreatic Adenocarcinoma (TCGA, PanCancer Atlas)” studies, both 284
of which utilized whole-exome sequencing for all samples.Tissue-specific expression was also 285
considered a vital factor in gene prioritization (Beroukhim et al. 2010). Genes with high 286
expression in essential tissues—such as the heart, liver, gastrointestinal system, brain, and 287
kidneys—pose a higher risk of adverse effects when targeted, necessitating their de-288
prioritization. Annotation of tissue expression was performed using the “RNA expression score” 289
provided by ProteinAtlas (Uhlén et al. 2015), a comprehensive database mapping protein 290
expression in various organs. This RNA expression score, manually calculated, measures the 291
RNA expression levels of genes across different tissues. 292
A weighted score, termed the RP score, was developed to integrate multiple factors into the 293
prioritization process. Spearman correlations were calculated between CNA, mutational 294
frequencies, GET list scores, tissue expression levels, E list scores, GT list scores, and clinical 295
trial popularity. Clinical trial popularity was defined as the number of trials testing drugs 296
targeting specific genes for cancer treatment. This RP score provided a comprehensive and 297
robust metric for ranking genes as potential therapeutic targets. 298
In Table 1, we observe the weights of each modality of ranking drug targets ordered 299
from greatest to least. 300
Modality of ranking Weighted
Score
GT LIST Score 0.329
CNA(CBIOPORTAL
UTSW NAT
COMMUN 2015)
0.201
Expression LIST
Score
0.088
GET LIST Score 0.085
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
9
301
Table 1: Weights each modality was assigned for calculation of the RP score in GETgene-AI. 302
Mitigation of Bias and False Positives 303
To address potential sample biases and data incompleteness—such as studies failing to detect 304
specific genes—multiple datasets from the same databases were utilized wherever possible. This 305
redundancy ensured a more comprehensive analysis and minimized the impact of dataset-306
specific variability. For example, multiple studies within CBioPortal, such as “Pancreatic Cancer 307
(UTSW, Nat Commun 2015)” and “Pancreatic Adenocarcinoma (TCGA, PanCancer Atlas),” 308
were analyzed concurrently to increase the reliability of mutational frequency and CNA data. 309
To further enhance the accuracy of the prioritization process, each gene within the top 250 310
ranked by RP score was manually verified through a literature review to confirm its role in 311
cancer biology. This step was critical in identifying and eliminating false positives. Notably, no 312
genes within the top 250 were found to be false positives, validating the robustness of the RP 313
scoring methodology. 314
Additionally, hallucination errors from GPT-4o were mitigated through a structured training 315
approach. The model was instructed to explicitly cite sources used in the calculation of each 316
gene’s ranking score. These citations were manually evaluated for accuracy and relevance, 317
ensuring that the ranking process was grounded in verifiable scientific evidence. This dual-318
layered validation—automated scoring combined with manual review—was integral to 319
maintaining the integrity and reliability of the gene prioritization framework. 320
Statistical Methods 321
Mutation
frequency(cBioporta
lTCGA
PanCancerAtlas)
0.079
CNA(CBIOPORTAL
TCGA
PANCANCERATLAS)
0.048
Mutation
frequency(Cbioportal
UTSW Nat Commun
2015)
-0.023
Brain Expression
Score
-0.054
Kidney Expression
Score
-0.081
Gastrointestinal
Expression Score
-0.095
Liver Expression
Score
-0.101
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
10
Spearman correlation coefficients were computed to assess the alignment of GPT-4o 322
rankings with network-derived rankings. The Spearman correlation between the GPT-4 score 323
and the Weighted Score was 0.291, indicating some significance. Interestingly, GPT-4 score is 324
more strongly correlated with all BEERE list ranking scores, with 0.478 between GPT-4 score 325
and Expression list score, 0.457 between GPT-4 score and Combined weighted score of all 326
BEERE lists, 0.454 correlations between GPT-4 score and GET list score, and 0.444 between 327
GPT-4 score and GT list score. These results indicate that the GPT-4 score is more similar to that 328
of standard network prioritization techniques, which may be a result of the training data utilized. 329
Comparing research relevance to rank on GETgene-AI 330
To compare the popularity to the rankings of each gene in both the GPT-4 Score and the 331
RP scores, the amount of results contained on PubMed when searching “Gene name Pancreatic 332
Cancer” were compiled and used for the GPT-LIT score, and the RP-LIT score. The GPT-LIT 333
score is the GPT4-score divided by the amount of publications on PubMed, while the RP-lit 334
score is the RP-score divided by the amount of publications on PubMed. Genes with no 335
functional relationship to cancer in any way were excluded from the rankings to remove false 336
positives. 337
Results
338
Enhancement provided by AI 339
GPT4o was employed to conduct a literature assessment for our gene list, though its 340
output was not included in the final weighted score. Nonetheless, the GPT score showed strong 341
correlations with both the weighted score and all three GET list scores. GPT-4o prioritized genes 342
such as MYC and SRC, reflecting their prominence in the literature, which complemented 343
GETgene-AI’s reliance on network mutational analysis. To minimize false positives in the GPT-344
4o scoring process, we instructed GPT-4o to cite articles directly from its database. Although 345
GPT-4o did not demonstrate a higher rate of experimental validation, it reduced the time required 346
for the literature review by 80%. The cited articles were subsequently manually verified. RP-LIT 347
score and GPT-score are highly correlated, with extremely similar rankings for each gene. Per 348
Spearman correlation calculation, GPT4o’s score out of 400 has a +0.457 correlation with the 349
weighted score, indicating significant correlation. 350
In Table 2, we observe the ranking differences between the GPT-4 score and the GET ranking 351
score. 352
Gene GPT4-Score
ranking
GET
ranking
Experimental
Validation?
Citation
MYC 1 2 Yes (Huan Zhang et al.
2024)
SRC 2 3 Yes (Su et al. 2023)
EGFR 3 4 Yes (F. Wu et al. 2023)
TERT 4 27 Yes (Campa et al. 2015)
RRM2 5 21 Yes (Li et al. 2022)
PIK3CA 6 1 Yes (Payne et al. 2015)
TOP2A 7 16 Yes (Pei, Yin, and Liu 2018)
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
11
NTRK1 8 22 Yes (Cheng et al. 2013)
PTGS2 9 25 Yes (Hingorani et al. 2003)
EGF 10 30 Yes (Sheng et al. 2020)
CDK1 11 5 Yes (J. Huang et al. 2021)
MAPK1 12 10 Yes (Si et al. 2023)
KRAS 13 13 Yes (Timar and Kashofer
2020)
MTOR 14 11 Yes (Stanciu et al. 2022)
MSLN 15 37 Yes (J. Hu et al. 2024)
RET 16 28 Yes (Bhamidipati et al.
2023)
AKT1 17 31 Yes (Arasanz et al. 2019)
JAK2 18 9 Yes (B. Huang, Lang, and Li
2022)
MET 19 34 Yes (Pothula et al. 2020)
PDCD1 20 38 Yes (Marabelle et al. 2020)
353
Table 2: Top 20 highest ranked genes based off of GPT4 score compared to their ranks in 354
GET and their status as experimentally validated drug targets. 355
356
Comparing GETgene-AI to other frameworks 357
We benchmarked GETgene-AI against two other frameworks: one focused on differential 358
expression and the other on network-based prioritization. GEO2R was chosen for the differential 359
expression comparison, using the GSE28735 dataset incorporated into the 'Expression list' aspect 360
of our GET lists. Genes were sorted by log-fold change (log-fc), representing the difference in 361
gene expression between tumor and non-tumor groups. In the GEO2R list, the top-ranked genes 362
were PNLIPRP1 and PNLIPRP2, both pancreatic lipase-related proteins critical for digestion and 363
fat absorption (G. Zhu et al. 2021), but not considered viable targets for pancreatic cancer. The 364
third-highest gene, IAPP (Islet Amyloid Polypeptide), has been shown not to function as a tumor 365
suppressor, and loss of IAPP signaling is not linked to pancreatic cancer (Taylor et al. 2023). Of 366
the top 50 genes in GEO2R, 30 were experimentally validated for relevance to pancreatic cancer. 367
In comparison, GETgene-AI identified 49 experimentally validated targets in its top 50, 368
demonstrating a 38% improvement over GEO2R. GEO2R, lacking analysis of mutational 369
frequency, functional impact, network-based analysis, and adverse effects, falls short in drug 370
target discovery. In contrast, GETgene-AI benefits from statistical filtering and incorporation of 371
genomic information, improving both the efficiency and quality of genes in each list. 372
373
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
12
374
Figure 2: Volcano plot GSE28735: Microarray gene-expression profiles of 45 matching pairs of 375
tumor vs. nontumor, Padj<0.05. Blue indicates down regulated while red indicates upregulated. 376
377
For the network-based framework, we used STRING, a database that integrates protein-378
protein interactions (Szklarczyk et al. 2023), and focused on the KEGG pathway hsa0512 (M. 379
Kanehisa and Goto 2000; Minoru Kanehisa et al. 2025; Minoru Kanehisa 2019). To rank the 380
genes, we exported the list and sorted them based on node degree, which measures the number of 381
interactions a protein has within the network (Bozhilova et al. 2019). The highest-ranked gene in 382
the STRING list was AKT1, a protein kinase that stimulates cell growth and proliferation 383
(Grassilli et al. 2020). However, AKT1 has been found to resist inhibition by switching its 384
metabolism from glycolysis to mitochondrial cellular respiration (Arasanz et al. 2019), and it has 385
a low mutational frequency of just 1% in a sample of 19,784 patients with various tumors (Millis 386
et al. 2016). Due to its low mutational frequency and the challenges associated with inhibiting 387
AKT1, it was ranked 33rd in GETgene-AI. A literature review of the top 50 genes in the 388
STRING list revealed that 46 were experimentally validated for pancreatic cancer, whereas 389
GETgene-AI identified 49 experimentally validated genes in the top 50, demonstrating a 6% 390
improvement over STRING. STRING’s limitations, such as its failure to account for mutational 391
frequency and other important factors in drug target identification, result in a narrower focus, 392
with only 81 targets identified compared to GETgene-AI’s more comprehensive analysis. 393
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
13
394
Figure 3: Network constructed by STRING utilizing the KEGG pathway HG0512. Content 395
inside each node is known or predicted 3d structure of protein. Turquoise edges mean Protein-396
protein interactions from curated databases, purple means experimentally determined. Green, 397
red, and dark blue edges indicate predicted Protein-protein interactions. Light green edges 398
represent text mining, black represents co-expression, and light purple represents protein 399
homology. 400
401
402
13
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
14
In table 3 we observe the ranking overlap for the top 15 genes for all three frameworks. The top 403
15 highest ranked targets in both GETgene-AI and STRING have all been experimentally 404
validated within pancreatic cancer, but 8 of the highest ranking targets in the GEO2R approach 405
have not. 406
407
GETgene-AI
top genes
Experimentally
validated?
STRING top
genes
Experimentally
validated?
GEO2R top
genes
Experimentally
validated?
PIK3CA Yes AKT1
Yes PNLIPRP1 No
MYC
Yes TP53 Yes PNLIPRP2 No
SRC
Yes KRAS Yes IAPP No
EGFR Yes PTEN Yes CTRC No
CDK1 Yes SRC Yes GP2 Yes
PRKCA Yes STAT3 Yes CEL No
TNF Yes EGFR Yes CPA2 Yes
LCK Yes MTOR Yes ALB Yes
JAK2 Yes BCL2 Yes CUZD1 Yes
MAPK1 Yes PIK3CA Yes ERP27 No
MTOR Yes CDKN2A Yes CLPS Yes
AURKB Yes HRAS Yes SERPINI2 Yes
KRAS Yes CCND1 Yes PLA2G1B Yes
MAPK8 Yes NFKB1 Yes CELA2A No
TOP2A Yes CDKN1A Yes CELA2B No
408
Table 3: Top 15 genes from GETGENE-AI, STRING, and GEO2R and their status as 409
experimentally validated drug targets. 410
411
Figure 4: Bar graph displaying the percent of experimentally validated targets out of the top 50 412
genes with each framework. 413
0
20
40
60
80
100
120
GEO2R STRING GETgene-AI
Percentage of Experimentally
Validated targets (out of top 50)
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
15
414
GETgene-AI Rankings 415
416
417
Gene
Weighted
score
CHAT
GPT
score
GT list
score
Mutation
Frequency
(cBioportal
TCGA
PanCancer
Atlas)
RP-LIT
score
GPT-LIT
score
GET list
score
Expression list
score
PIK3CA 34.8 310 58.7 2.8 0.199 1.771 96 97
MYC 30.1 330 9.5 0.0 0.032 0.349 214 210
SRC 20.0 320 0.0 1.1 0.044 0.711 143 144
EGFR 18.2 320 2.4 0.6 0.010 0.171 134 133
CDK1 15.9 305 15.3 65.4 0.134 2.563 30 7
PRKCA 15.3 305 3.0 0.0 1.702 25.556 101 102
TNF 12.1 270 2.4 0.0 0.013 0.292 83 86
LCK 11.5 220 1.7 0.0 1.274 24.444 62 60
JAK2 10.6 285 1.0 0.6 0.082 2.192 67 67
MAPK1 10.3 305 11.6 3.4 0.139 4.122 7 7
AURKB 9.1 295 0.0 0.6 0.008 0.246 70 70
KRAS 8.7 220 1.7 1.7 0.335 8.462 48 47
MAPK8 7.8 295 0.0 0.0 0.002 0.068 121 117
MTOR 7.1 220 1.7 0.0 0.588 18.333 52 52
ITGA4 6.9 220 4.3 0.6 2.298 73.333 40 37
TOP2A 6.9 310 10.2 1.1 0.215 9.688 0 0
CHEK1 6.7 220 1.7 0.0 0.128 4.231 46 45
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
16
BCL2 6.2 220 1.7 0.6 0.012 0.418 41 41
PRKCB 6.0 250 1.4 0.6 1.004 41.667 60 58
ERBB4 5.5 220 3.4 0.6 0.184 7.333 81 83
Table 4: Highest 20 genes ranked on GETgene-AI. Weighted score is RP score, CHAT GPT 418
score is GPT4 score. 419
420
Validation of gene rankings 421
422
During the iterative ranking process, genes with no functional relevance to cancer were 423
systematically excluded. For example, genes ranked high due to algorithmic artifacts but without 424
experimental validation or literature support were deprioritized. 425
PIK3CA ranks as the highest gene on the list. It encodes for the enzyme PI3K, which 426
plays a crucial role in cell growth, metabolism, proliferation, and apoptosis(Conway et al. 2019). 427
In addition, PIK3CA regulates various downstream effectors such as AKT and mTOR(Ala 428
2022). Mutations in PIK3CA have been shown to make cancers highly sensitive to dual therapy 429
with PI3K/mTOR inhibitors in preclinical studies (Huayu Zhang et al. 2021), further 430
emphasizing its potential as a key drug target. Moreover, PIK3CA-null tumors have 431
demonstrated increased susceptibility to T-cell surveillance in vitro (Sivaram et al. 2019). In 432
pancreatic cancer, inhibition of PIK3CA has been shown to initiate tumorigenesis in 433
experimental studies (Payne et al. 2015), underscoring its importance in both cancer progression 434
and therapeutic targeting. 435
The next highest-ranked gene on the list, MYC, is an established target in pancreatic 436
cancer. MYC is highly ranked due to its top GET list score, which indicates its significant 437
network centrality among the top 500 most expressed, clinically relevant, and highest mutational 438
frequency genes. Overexpression of c-MYC is a key marker of aggressive pancreatic cancer, 439
where it binds to the promoters of various genes (Hayashi, Hong, and Iacobuzio-Donahue 2021). 440
Although MYC plays a crucial regulatory role in pancreatic cancer, its protein structure presents 441
challenges for targeting. However, recent small molecule drugs have shown high efficacy in 442
preclinical studies (Ala 2022). Despite these promising developments, MYC's inherent targeting 443
difficulties result in a relatively low GT list score. 444
SRC ranks as the third-highest gene on our list, owing to its high scores in both the GET 445
list and Expression list score modalities. Inhibition of SRC in pancreatic cancer has been shown 446
in both in vitro and in vivo studies to reverse chemoresistance to pyroptosis (Su et al. 2023). 447
Additionally, aberrant SRC activity promotes tumorigenesis and is frequently associated with 448
poor prognosis in pancreatic ductal adenocarcinoma (PDAC) (Poh and Ernst 2023). Several 449
SRC-targeting cancer drugs have been developed and are currently under clinical investigation 450
(Hilbig 2008). 451
EGFR is the fourth highest ranking gene. EGFR is ranked highly due to its high GET list 452
and Expression list scores. EGFR also plays a role in tumorigenesis, mostly in lung and breast 453
cancer (Sigismund, Avanzato, and Lanzetti 2018). Anti-EGFR agents showed significant clinical 454
promise, despite adverse effects (Verma et al. 2020). 455
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
17
KRAS ranks twelfth on our list, despite its prominence in research, with over 4,545 456
articles on KRAS mutations in pancreatic cancer available on PubMed. Its lower ranking is 457
attributed to a low expression score. The KRAS oncogene plays a critical role in the initiation 458
and maintenance of pancreatic tumors (Luo 2021). KRAS mutations are present in over 90% of 459
pancreatic ductal adenocarcinoma (PDAC) cases, but therapeutic inhibition remains highly 460
challenging, with inhibitors only recently being discovered (Bannoura et al. 2021). 461
CDK1 ranks fifth on our list, largely due to its high scores in both the GET and 462
Expression lists. CDK1 is strongly correlated with prognosis and is highly expressed in 463
pancreatic cancer tissue, as well as in response to gemcitabine, an approved pancreatic cancer 464
drug (Xu et al. 2023). Additionally, inhibition of CDK1, along with CDK2 and CDK5, has been 465
shown to overcome IFNG-triggered acquired resistance in pancreatic tumor immunity (J. Huang 466
et al. 2021). 467
PRKCA ranks seventh on our list. It encodes protein kinase C and is mutated in various 468
cancers. PRKCA’s high ranking is attributed to its strong GET and Expression list scores, as well 469
as its extremely low organ expression score. It is strongly associated with the activation of the 470
protein translation initiation pathway (Rosenberg et al. 2018) and is a hallmark mutation in 471
chordoid gliomas (Jiang et al. 2019). PRKCA also contributes to susceptibility to pancreatic 472
cancer through the peroxisome proliferator-activated receptor (PPAR) signaling pathway, which 473
plays a key role in the development and progression of pancreatic cancer (Liu et al. 2020). 474
Inhibition of PRKCA has shown antitumor activity in patients with advanced non-small cell lung 475
cancer (NCSLC) (Villalona-Calero et al. 2004). 476
TNF was the eighth highest ranked gene on our list. TNF or Tumor Necrosis Factor 477
upregulation is associated with invasion and immunomodulation of pancreatic cancer (Wiedmann 478
et al. 2023). TNF mutated macrophages have also been shown to influence cancer cells into 479
adopting more aggressive behaviors through lineage reprogramming (Tu et al. 2021). 480
LCK is the ninth highest ranked gene on our list. The gene LCK has been expressed in 481
tumor cells, and is a key gene in the development of T cells (Bommhardt, Schraven, and Simeoni 482
2019). High LCK protein expression has been associated with improved patient survival in 483
cancer (Cancer Genome Atlas Network 2015). The LCK gene in relation to pancreatic cancer 484
only has four publications on PubMed as of May 2024. The identification of LCK as a high 485
priority target demonstrates GETgene-AI’s capability to identify genes with strong biological 486
relevance but lower literature prominence. 487
ITGA4 is ranked 15th on our list. ITGA4 has an extremely low organ expression score. 488
ITGA4 has only had 4 articles on PubMed discussing its role in Pancreatic Cancer. ITGA4 has 489
potential to be an independent prognostic indicator for patient survival, and has been linked to 490
the PI3K/AKT pathway (Faleiro et al. 2021). The identification of ITGA4 as a high priority 491
target demonstrates GETgene-AI’s capability to identify genes with strong biological relevance 492
but lower literature prominence. 493
KCNA is the 34th highest ranked gene on our list. KCNA is significant, as it contains 494
zero pubmed publications describing its relation to Pancreatic cancer. Furthermore, only three 495
publications on PubMed have ever mentioned its relation to cancer in general. The identification 496
of KCNA as a high priority target demonstrates GETgene-AI’s capability to identify genes with 497
strong biological relevance but lower literature prominence. Differentially high KCNA 498
expression is observed in stomach and lung cancers, and has a positive correlation to infiltrated 499
immune cells and survival rate (Angi et al. 2023). 500
501
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
18
502
False Positives and Limitations 503
False positives are an inherent risk in large scale computational analyses. By 504
incorporating interative refinement and excluding genes without functional or experimental 505
support, the GETgene-AI framework minimizes such risks. Future validation efforts will focus on 506
further refining these rankings through experimental studies. The literature assessment provided 507
by generative AI will also be improved as AI progresses and our model is trained on more 508
experimental data to minimize hallucinations. 509
To mitigate false positives, genes with no functional relevance were systemically 510
excluded. For example, genes ranked high due to algorithmic artifacts but without experimental 511
validation or literature were deprioritized. Genes such as ITGA4 and PRKCB, which both 512
contain less than 10 research articles on PubMed on its role in Pancreatic cancer were ranked 513
lower than many popular targets due to their low scores in the GET, GT, and Expression lists as a 514
References
592
Ala, Moein. 2022. “Target C-Myc to Treat Pancreatic Cancer.” Cancer Biology & Therapy 23 593
(1): 34–50. https://doi.org/10.1080/15384047.2021.2017223. 594
Angi, Beatrice, Silvia Muccioli, Ildikò Szabò, and Luigi Leanza. 2023. “A Meta-Analysis Study 595
to Infer Voltage-Gated K+ Channels Prognostic Value in Different Cancer Types.” 596
Antioxidants (Basel, Switzerland) 12 (3): 573. https://doi.org/10.3390/antiox12030573. 597
Arasanz, H., M. Zuazo, E. Santamaría, A. I. Bocanegra, M. Gato-Cañas, G. Fernández-Hinojal, 598
C. Hernández-Saez, et al. 2019. “Adaption of Pancreatic Cancer Cells to AKT1 Inhibition 599
Induces the Acquisition of Cancer Stem-Cell like Phenotype through Upregulation of 600
Mitochondrial Functions.” Annals of Oncology 30 (October):v11. 601
https://doi.org/10.1093/annonc/mdz238.036. 602
Bannoura, Sahar F., Md. Hafiz Uddin, Misako Nagasaka, Farzeen Fazili, Mohammed Najeeb Al-603
Hallak, Philip A. Philip, Bassel El-Rayes, and Asfar S. Azmi. 2021. “Targeting KRAS in 604
Pancreatic Cancer: New Drugs on the Horizon.” Cancer and Metastasis Reviews 40 (3): 605
819–35. https://doi.org/10.1007/s10555-021-09990-2. 606
Beroukhim, Rameen, Craig H. Mermel, Dale Porter, Guo Wei, Soumya Raychaudhuri, Jerry 607
Donovan, Jordi Barretina, et al. 2010. “The Landscape of Somatic Copy-Number 608
Alteration across Human Cancers.” Nature 463 (7283): 899–905. 609
https://doi.org/10.1038/nature08822. 610
Bhamidipati, Deepak, Sireesha Yedururi, Jason Huse, Sri Veda Chinapuvvula, Jie Wu, and 611
Vivek Subbiah. 2023. “Exceptional Responses to Selpercatinib in RET Fusion-Driven 612
Metastatic Pancreatic Cancer.” JCO Precision Oncology 7 (September):e2300252. 613
https://doi.org/10.1200/PO.23.00252. 614
Bozhilova, Lyuba V., Alan V. Whitmore, Jonny Wray, Gesine Reinert, and Charlotte M. Deane. 615
2019. “Measuring Rank Robustness in Scored Protein Interaction Networks.” BMC 616
Bioinformatics 20 (1): 446. https://doi.org/10.1186/s12859-019-3036-6. 617
Bruijn, Ino de, Ritika Kundra, Brooke Mastrogiacomo, Thinh Ngoc Tran, Luke Sikina, Tali 618
Mazor, Xiang Li, et al. 2023. “Analysis and Visualization of Longitudinal Genomic and 619
Clinical Data from the AACR Project GENIE Biopharma Collaborative in cBioPortal.” 620
Cancer Research 83 (23): 3861–67. https://doi.org/10.1158/0008-5472.CAN-23-0816. 621
Campa, Daniele, Cosmeri Rizzato, Rachael Stolzenberg-Solomon, Paola Pacetti, Pavel Vodicka, 622
Sean P. Cleary, Gabriele Capurso, et al. 2015. “TERT Gene Harbors Multiple Variants 623
Associated with Pancreatic Cancer Susceptibility.” International Journal of Cancer 137 624
(9): 2175–83. https://doi.org/10.1002/ijc.29590. 625
Cancer Genome Atlas Network. 2015. “Genomic Classification of Cutaneous Melanoma.” Cell 626
161 (7): 1681–96. https://doi.org/10.1016/j.cell.2015.05.044. 627
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
21
Chen, Jake Y., Ragini Pandey, and Thanh M. Nguyen. 2017. “HAPPI-2: A Comprehensive and 628
High-Quality Map of Human Annotated and Predicted Protein Interactions.” BMC 629
Genomics 18 (1): 182. https://doi.org/10.1186/s12864-017-3512-1. 630
Chen, Jake Yue, SudhaRani Mamidipalli, and Tianxiao Huan. 2009. “HAPPI: An Online 631
Database of Comprehensive Human Annotated and Predicted Protein Interactions.” BMC 632
Genomics 10 (1): S16. https://doi.org/10.1186/1471-2164-10-S1-S16. 633
Cheng, Yao, Dong-mei Diao, Hao Zhang, Yong-Chun Song, and Cheng-Xue Dang. 2013. 634
“Proliferation Enhanced by NGF-NTRK1 Signaling Makes Pancreatic Cancer Cells More 635
Sensitive to 2DG-Induced Apoptosis.” International Journal of Medical Sciences 10 (5): 636
634–40. https://doi.org/10.7150/ijms.5547. 637
Chowbina, Sudhir R., Xiaogang Wu, Fan Zhang, Peter M. Li, Ragini Pandey, Harini N. 638
Kasamsetty, and Jake Y. Chen. 2009. “HPD: An Online Integrated Human Pathway 639
Database Enabling Systems Biology Studies.” BMC Bioinformatics 10 (11): S5. 640
https://doi.org/10.1186/1471-2105-10-S11-S5. 641
Conway, James Rw, David Herrmann, Tr Jeffry Evans, Jennifer P. Morton, and Paul Timpson. 642
2019. “Combating Pancreatic Cancer with PI3K Pathway Inhibitors in the Era of 643
Personalised Medicine.” Gut 68 (4): 742–58. https://doi.org/10.1136/gutjnl-2018-316822. 644
Faleiro, Inês, Vânia Palma Roberto, Secil Demirkol Canli, Nicolas A. Fraunhoffer, Juan Iovanna, 645
Ali Osmay Gure, Wolfgang Link, and Pedro Castelo-Branco. 2021. “DNA Methylation 646
of PI3K/AKT Pathway-Related Genes Predicts Outcome in Patients with Pancreatic 647
Cancer: A Comprehensive Bioinformatics-Based Study.” Cancers 13 (24): 6354. 648
https://doi.org/10.3390/cancers13246354. 649
Gong, Eric, and Jake Y. Chen. 2023. “Prioritizing Complex Disease Genes from Heterogeneous 650
Public Databases.” Bioinformatics. https://doi.org/10.1101/2023.02.09.527562. 651
Grassilli, Silvia, Federica Brugnoli, Rossano Lattanzio, Simonetta Buglioni, and Valeria 652
Bertagnolo. 2020. “Vav1 Down-Modulates Akt2 Expression in Cells from Pancreatic 653
Ductal Adenocarcinoma: Nuclear Vav1 as a Potential Regulator of Akt Related 654
Malignancy in Pancreatic Cancer.” Biomedicines 8 (10): 379. 655
https://doi.org/10.3390/biomedicines8100379. 656
Gu, Long, Robert J. Hickey, and Linda H. Malkas. 2023. “Therapeutic Targeting of DNA 657
Replication Stress in Cancer.” Genes 14 (7): 1346. 658
https://doi.org/10.3390/genes14071346. 659
Hayashi, Akimasa, Jungeui Hong, and Christine A. Iacobuzio-Donahue. 2021. “The Pancreatic 660
Cancer Genome Revisited.” Nature Reviews Gastroenterology & Hepatology 18 (7): 661
469–81. https://doi.org/10.1038/s41575-021-00463-z. 662
Hilbig, Andreas. 2008. “Src Kinase and Pancreatic Cancer.” In Pancreatic Cancer, 177:179–85. 663
Recent Results in Cancer Research. Berlin, Heidelberg: Springer Berlin Heidelberg. 664
https://doi.org/10.1007/978-3-540-71279-4_19. 665
Hingorani, Sunil R., Emanuel F. Petricoin, Anirban Maitra, Vinodh Rajapakse, Catrina King, 666
Michael A. Jacobetz, Sally Ross, et al. 2003. “Preinvasive and Invasive Ductal Pancreatic 667
Cancer and Its Early Detection in the Mouse.” Cancer Cell 4 (6): 437–50. 668
https://doi.org/10.1016/s1535-6108(03)00309-x. 669
Hu, Jian-Xiong, Cheng-Fei Zhao, Wen-Biao Chen, Qi-Cai Liu, Qu-Wen Li, Yan-Ya Lin, and 670
Feng Gao. 2021. “Pancreatic Cancer: A Review of Epidemiology, Trend, and Risk 671
Factors.” World Journal of Gastroenterology 27 (27): 4298–4321. 672
https://doi.org/10.3748/wjg.v27.i27.4298. 673
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
22
Hu, Jili, Jia Wang, Xu Guo, Qing Fan, Xinming Li, Kai Li, Zhuoyin Wang, et al. 2024. “MSLN 674
Induced EMT, Cancer Stem Cell Traits and Chemotherapy Resistance of Pancreatic 675
Cancer Cells.” Heliyon 10 (8): e29210. https://doi.org/10.1016/j.heliyon.2024.e29210. 676
Huan, Tianxiao, Xiaogang Wu, and Jake Y. Chen. 2010. “Systems Biology Visualization Tools 677
for Drug Target Discovery.” Expert Opinion on Drug Discovery 5 (5): 425–39. 678
https://doi.org/10.1517/17460441003725102. 679
Huang, Bei, Xiaoling Lang, and Xihong Li. 2022. “The Role of IL-6/JAK2/STAT3 Signaling 680
Pathway in Cancers.” Frontiers in Oncology 12:1023177. 681
https://doi.org/10.3389/fonc.2022.1023177. 682
Huang, Hui, Xiaogang Wu, Madhankumar Sonachalam, Sammed N. Mandape, Ragini Pandey, 683
Karl F. MacDorman, Ping Wan, and Jake Y. Chen. 2012. “PAGED: A Pathway and 684
Gene-Set Enrichment Database to Enable Molecular Phenotype Discoveries.” BMC 685
Bioinformatics 13 (15): S2. https://doi.org/10.1186/1471-2105-13-S15-S2. 686
Huang, Jin, Pan Chen, Ke Liu, Jiao Liu, Borong Zhou, Runliu Wu, Qiu Peng, et al. 2021. 687
“CDK1/2/5 Inhibition Overcomes IFNG-Mediated Adaptive Immune Resistance in 688
Pancreatic Cancer.” Gut 70 (5): 890–99. https://doi.org/10.1136/gutjnl-2019-320441. 689
Jiang, Honghong, Qiaofen Fu, Xin Song, Chunlei Ge, Ruilei Li, Zhen Li, Baozhen Zeng, et al. 690
2019. “HDGF and PRKCA Upregulation Is Associated with a Poor Prognosis in Patients 691
with Lung Adenocarcinoma.” Oncology Letters 18 (5): 4936–46. 692
https://doi.org/10.3892/ol.2019.10812. 693
Kanehisa, M., and S. Goto. 2000. “KEGG: Kyoto Encyclopedia of Genes and Genomes.” 694
Nucleic Acids Research 28 (1): 27–30. https://doi.org/10.1093/nar/28.1.27. 695
Kanehisa, Minoru. 2019. “Toward Understanding the Origin and Evolution of Cellular 696
Organisms.” Protein Science: A Publication of the Protein Society 28 (11): 1947–51. 697
https://doi.org/10.1002/pro.3715. 698
Kanehisa, Minoru, Miho Furumichi, Yoko Sato, Yuriko Matsuura, and Mari Ishiguro-Watanabe. 699
2025. “KEGG: Biological Systems Database as a Model of the Real World.” Nucleic 700
Acids Research 53 (D1): D672–77. https://doi.org/10.1093/nar/gkae909. 701
Khraisha, Qusai, Sophie Put, Johanna Kappenberg, Azza Warraitch, and Kristin Hadfield. 2024. 702
“Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-703
4’s Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature 704
in Multiple Languages.” Research Synthesis Methods 15 (4): 616–26. 705
https://doi.org/10.1002/jrsm.1715. 706
Li, Wenyi, Qiwei Chen, Weiwei Gao, and Hui Zeng. 2022. “ARID1A Promotes 707
Chemosensitivity to Gemcitabine in Pancreatic Cancer through Epigenetic Silencing of 708
RRM2.” Die Pharmazie 77 (7): 224–29. https://doi.org/10.1691/ph.2022.1881. 709
Lim, Bora, Yoshimi Greer, Stanley Lipkowitz, and Naoko Takebe. 2019. “Novel Apoptosis-710
Inducing Agents for the Treatment of Cancer, a New Arsenal in the Toolbox.” Cancers 711
11 (8): 1087. https://doi.org/10.3390/cancers11081087. 712
Lindsay, Mark A. 2003. “Target Discovery.” Nature Reviews Drug Discovery 2 (10): 831–38. 713
https://doi.org/10.1038/nrd1202. 714
Liu, Xiaowen, Danwen Qian, Hongliang Liu, James L. Abbruzzese, Sheng Luo, Kyle M. Walsh, 715
and Qingyi Wei. 2020. “Genetic Variants of the Peroxisome Proliferator-Activated 716
Receptor (PPAR) Signaling Pathway Genes and Risk of Pancreatic Cancer.” Molecular 717
Carcinogenesis 59 (8): 930–39. https://doi.org/10.1002/mc.23208. 718
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
23
Luo, Ji. 2021. “KRAS Mutation in Pancreatic Cancer.” Seminars in Oncology 48 (1): 10–18. 719
https://doi.org/10.1053/j.seminoncol.2021.02.003. 720
Marabelle, Aurelien, Dung T. Le, Paolo A. Ascierto, Anna Maria Di Giacomo, Ana De Jesus-721
Acosta, Jean-Pierre Delord, Ravit Geva, et al. 2020. “Efficacy of Pembrolizumab in 722
Patients With Noncolorectal High Microsatellite Instability/Mismatch Repair-Deficient 723
Cancer: Results From the Phase II KEYNOTE-158 Study.” Journal of Clinical 724
Oncology: Official Journal of the American Society of Clinical Oncology 38 (1): 1–10. 725
https://doi.org/10.1200/JCO.19.02105. 726
Millis, Sherri Z., Sadakatsu Ikeda, Sandeep Reddy, Zoran Gatalica, and Razelle Kurzrock. 2016. 727
“Landscape of Phosphatidylinositol-3-Kinase Pathway Alterations Across 19/i1 784 728
Diverse Solid Tumors.” JAMA Oncology 2 (12): 1565–73. 729
https://doi.org/10.1001/jamaoncol.2016.0891. 730
Mroz, Edmund A., and James W. Rocco. 2017. “The Challenges of Tumor Genetic Diversity.” 731
Cancer 123 (6): 917–27. https://doi.org/10.1002/cncr.30430. 732
Paananen, Jussi, and Vittorio Fortino. 2020. “An Omics Perspective on Drug Target Discovery 733
Platforms.” Briefings in Bioinformatics 21 (6): 1937–53. 734
https://doi.org/10.1093/bib/bbz122. 735
Payne, S. N., M. E. Maher, N. H. Tran, D. R. Van De Hey, T. M. Foley, A. E. Yueh, A. A. 736
Leystra, et al. 2015. “PIK3CA Mutations Can Initiate Pancreatic Tumorigenesis and Are 737
Targetable with PI3K Inhibitors.” Oncogenesis 4 (10): e169–e169. 738
https://doi.org/10.1038/oncsis.2015.28. 739
Pei, Yao-Fei, Xi-Min Yin, and Xi-Qiang Liu. 2018. “TOP2A Induces Malignant Character of 740
Pancreatic Cancer through Activating β -Catenin Signaling Pathway.” Biochimica Et 741
Biophysica Acta. Molecular Basis of Disease 1864 (1): 197–207. 742
https://doi.org/10.1016/j.bbadis.2017.10.019. 743
Poh, Ashleigh R., and Matthias Ernst. 2023. “Functional Roles of SRC Signaling in Pancreatic 744
Cancer: Recent Insights Provide Novel Therapeutic Opportunities.” Oncogene 42 (22): 745
1786–1801. https://doi.org/10.1038/s41388-023-02701-x. 746
Pothula, Srinivasa P., Zhihong Xu, David Goldstein, Romano C. Pirola, Jeremy S. Wilson, and 747
Minoti V. Apte. 2020. “Targeting HGF/c-MET Axis in Pancreatic Cancer.” International 748
Journal of Molecular Sciences 21 (23): 9170. https://doi.org/10.3390/ijms21239170. 749
Rosenberg, Shai, Iva Simeonova, Franck Bielle, Maite Verreault, Bertille Bance, Isabelle Le 750
Roux, Mailys Daniau, et al. 2018. “A Recurrent Point Mutation in PRKCA Is a Hallmark 751
of Chordoid Gliomas.” Nature Communications 9 (1): 2371. 752
https://doi.org/10.1038/s41467-018-04622-w. 753
Santos, Rita, Oleg Ursu, Anna Gaulton, A. Patrícia Bento, Ramesh S. Donadi, Cristian G. 754
Bologa, Anneli Karlsson, et al. 2017. “A Comprehensive Map of Molecular Drug 755
Targets.” Nature Reviews. Drug Discovery 16 (1): 19–34. 756
https://doi.org/10.1038/nrd.2016.230. 757
Sellers, William R., and David E. Fisher. 1999. “Apoptosis and Cancer Drug Targeting.” Journal 758
of Clinical Investigation 104 (12): 1655–61. 759
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC409892/. 760
Sheng, Weiwei, Xiaoyang Shi, Yiheng Lin, Jingtong Tang, Chao Jia, Rongxian Cao, Jian Sun, 761
Guosen Wang, Lei Zhou, and Ming Dong. 2020. “Musashi2 Promotes EGF-Induced 762
EMT in Pancreatic Cancer via ZEB1-ERK/MAPK Signaling.” Journal of Experimental 763
& Clinical Cancer Research: CR 39 (1): 16. https://doi.org/10.1186/s13046-020-1521-4. 764
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
24
Si, Hongtao, Ning Zhang, Chang Shi, Zhanjiang Luo, and Senlin Hou. 2023. “Tumor-765
Suppressive miR-29c Binds to MAPK1 Inhibiting the ERK/MAPK Pathway in 766
Pancreatic Cancer.” Clinical & Translational Oncology: Official Publication of the 767
Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico 768
25 (3): 803–16. https://doi.org/10.1007/s12094-022-02991-9. 769
Sigismund, Sara, Daniele Avanzato, and Letizia Lanzetti. 2018. “Emerging Functions of the 770
EGFR in Cancer.” Molecular Oncology 12 (1): 3–20. https://doi.org/10.1002/1878-771
0261.12155. 772
Singh, Natesh, Philippe Vayer, Shivalika Tanwar, Jean-Luc Poyet, Katya Tsaioun, and Bruno O. 773
Villoutreix. 2023. “Drug Discovery and Development: Introduction to the General Public 774
and Patient Groups.” Frontiers in Drug Discovery 3 (May). 775
https://doi.org/10.3389/fddsv.2023.1201419. 776
Sivaram, Nithya, Patrick A. McLaughlin, Han V. Han, Oleksi Petrenko, Ya-Ping Jiang, Lisa M. 777
Ballou, Kien Pham, Chen Liu, Adrianus W.M. Van Der Velden, and Richard Z. Lin. 778
2019. “Tumor-Intrinsic PIK3CA Represses Tumor Immunogenicity in a Model of 779
Pancreatic Cancer.” Journal of Clinical Investigation 129 (8): 3264–76. 780
https://doi.org/10.1172/JCI123540. 781
Somarelli, Jason A, Amy M Boddy, Heather L Gardner, Suzanne Bartholf DeWitt, Joanne 782
Tuohy, Kate Megquier, Maya U Sheth, et al. 2019. “Improving Cancer Drug Discovery 783
by Studying Cancer across the Tree of Life.” Molecular Biology and Evolution 37 (1): 784
11–17. https://doi.org/10.1093/molbev/msz254. 785
Stanciu, Silviu, Florentina Ionita-Radu, Constantin Stefani, Daniela Miricescu, Iulia-Ioana 786
Stanescu-Spinu, Maria Greabu, Alexandra Ripszky Totan, and Mariana Jinga. 2022. 787
“Targeting PI3K/AKT/mTOR Signaling Pathway in Pancreatic Cancer: From Molecular 788
to Clinical Aspects.” International Journal of Molecular Sciences 23 (17): 10132. 789
https://doi.org/10.3390/ijms231710132. 790
Su, Liangping, Yitian Chen, Cheng Huang, Sangqing Wu, XiaoJuan Wang, Xinbao Zhao, 791
Qiuping Xu, et al. 2023. “Targeting Src Reactivates Pyroptosis to Reverse 792
Chemoresistance in Lung and Pancreatic Cancer Models.” Science Translational 793
Medicine 15 (678): eabl7895. https://doi.org/10.1126/scitranslmed.abl7895. 794
Sun, Duxin, Wei Gao, Hongxiang Hu, and Simon Zhou. 2022. “Why 90% of Clinical Drug 795
Development Fails and How to Improve It?” Acta Pharmaceutica Sinica. B 12 (7): 3049–796
62. https://doi.org/10.1016/j.apsb.2022.02.002. 797
Sun, Ying, Yang Liu, Xiaoli Ma, and Hao Hu. 2021. “The Influence of Cell Cycle Regulation on 798
Chemotherapy.” International Journal of Molecular Sciences 22 (13): 6923. 799
https://doi.org/10.3390/ijms22136923. 800
Szklarczyk, Damian, Rebecca Kirsch, Mikaela Koutrouli, Katerina Nastou, Farrokh Mehryary, 801
Radja Hachilif, Annika L. Gable, et al. 2023. “The STRING Database in 2023: Protein-802
Protein Association Networks and Functional Enrichment Analyses for Any Sequenced 803
Genome of Interest.” Nucleic Acids Research 51 (D1): D638–46. 804
https://doi.org/10.1093/nar/gkac1000. 805
Tate, John G., Sally Bamford, Harry C. Jubb, Zbyslaw Sondka, David M. Beare, Nidhi Bindal, 806
Harry Boutselakis, et al. 2019. “COSMIC: The Catalogue Of Somatic Mutations In 807
Cancer.” Nucleic Acids Research 47 (D1): D941–47. 808
https://doi.org/10.1093/nar/gky1015. 809
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
25
Taylor, Austin J., Evgeniy Panzhinskiy, Paul C. Orban, Francis C. Lynn, David F. Schaeffer, 810
James D. Johnson, Janel L. Kopp, and C. Bruce Verchere. 2023. “Islet Amyloid 811
Polypeptide Does Not Suppress Pancreatic Cancer.” Molecular Metabolism 68 812
(January):101667. https://doi.org/10.1016/j.molmet.2023.101667. 813
Timar, Jozsef, and Karl Kashofer. 2020. “Molecular Epidemiology and Diagnostics of KRAS 814
Mutations in Human Cancer.” Cancer Metastasis Reviews 39 (4): 1029–38. 815
https://doi.org/10.1007/s10555-020-09915-5. 816
Trajanoska, Katerina, Claude Bhérer, Daniel Taliun, Sirui Zhou, J. Brent Richards, and Vincent 817
Mooser. 2023. “From Target Discovery to Clinical Drug Development with Human 818
Genetics.” Nature 620 (7975): 737–45. https://doi.org/10.1038/s41586-023-06388-8. 819
Tu, Mengyu, Lukas Klein, Elisa Espinet, Theodoros Georgomanolis, Florian Wegwitz, Xiaojuan 820
Li, Laura Urbach, et al. 2021. “TNF-α -Producing Macrophages Determine Subtype 821
Identity and Prognosis via AP1 Enhancer Reprogramming in Pancreatic Cancer.” Nature 822
Cancer 2 (11): 1185–1203. https://doi.org/10.1038/s43018-021-00258-w. 823
Uhlén, Mathias, Linn Fagerberg, Björn M. Hallström, Cecilia Lindskog, Per Oksvold, Adil 824
Mardinoglu, Åsa Sivertsson, et al. 2015. “Proteomics. Tissue-Based Map of the Human 825
Proteome.” Science (New York, N.Y.) 347 (6220): 1260419. 826
https://doi.org/10.1126/science.1260419. 827
Verma, Henu K., Praveen K. Kampalli, Saikrishna Lakkakula, Gayathri Chalikonda, Lakkakula 828
V.K.S. Bhaskar, and Smaranika Pattnaik. 2020. “A Retrospective Look at Anti-EGFR 829
Agents in Pancreatic Cancer Therapy.” Current Drug Metabolism 20 (12): 958–66. 830
https://doi.org/10.2174/1389200220666191122104955. 831
Villalona-Calero, Miguel A., Paul Ritch, Jose A. Figueroa, Gregory A. Otterson, Robert Belt, 832
Edward Dow, Sebastian George, et al. 2004. “A Phase I/II Study of LY900003, an 833
Antisense Inhibitor of Protein Kinase C-Alpha, in Combination with Cisplatin and 834
Gemcitabine in Patients with Advanced Non-Small Cell Lung Cancer.” Clinical Cancer 835
Research: An Official Journal of the American Association for Cancer Research 10 (18 836
Pt 1): 6086–93. https://doi.org/10.1158/1078-0432.CCR-04-0779. 837
Wiedmann, Lena, Francesca De Angelis Rigotti, Nuria Vaquero-Siguero, Elisa Donato, Elisa 838
Espinet, Iris Moll, Elisenda Alsina-Sanchis, et al. 2023. “HAPLN1 Potentiates Peritoneal 839
Metastasis in Pancreatic Cancer.” Nature Communications 14 (1): 2353. 840
https://doi.org/10.1038/s41467-023-38064-w. 841
Wu, Fan, Jin He, Qianxi Deng, Jun Chen, Mingyu Peng, Jiayi Xiao, Yiwei Zeng, et al. 2023. 842
“Neuroglobin Inhibits Pancreatic Cancer Proliferation and Metastasis by Targeting the 843
GNAI1/EGFR/AKT/ERK Signaling Axis.” Biochemical and Biophysical Research 844
Communications 664 (July):108–16. https://doi.org/10.1016/j.bbrc.2023.04.080. 845
Wu, Xiaogang, Hui Huang, Tao Wei, Ragini Pandey, Christoph Reinhard, Shuyu D. Li, and Jake 846
Y. Chen. 2012. “Network Expansion and Pathway Enrichment Analysis towards 847
Biologically Significant Findings from Microarrays.” Journal of Integrative 848
Bioinformatics 9 (2): 213. https://doi.org/10.2390/biecoll-jib-2012-213. 849
Xu, Xiaodong, Yimin Ding, Junbin Jin, Chengjie Xu, Wenyi Hu, Songtao Wu, Guoping Ding, 850
Rui Cheng, Liping Cao, and Shengnan Jia. 2023. “Post-Translational Modification of 851
CDK1–STAT3 Signaling by Fisetin Suppresses Pancreatic Cancer Stem Cell Properties.” 852
Cell & Bioscience 13 (1): 176. https://doi.org/10.1186/s13578-023-01118-z. 853
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
26
Yue, Zongliang, Radomir Slominski, Samuel Bharti, and Jake Y. Chen. 2022. “PAGER Web 854
APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional 855
Genomics.” Frontiers in Genetics 13 (April). https://doi.org/10.3389/fgene.2022.820361. 856
Yue, Zongliang, Christopher D Willey, Anita B Hjelmeland, and Jake Y Chen. 2019. “BEERE: 857
A Web Server for Biomedical Entity Expansion, Ranking and Explorations.” Nucleic 858
Acids Research 47 (W1): W578–86. https://doi.org/10.1093/nar/gkz428. 859
Yue, Zongliang, Qi Zheng, Michael T Neylon, Minjae Yoo, Jimin Shin, Zhiying Zhao, Aik 860
Choon Tan, and Jake Y Chen. 2018. “PAGER 2.0: An Update to the Pathway, 861
Annotated-List and Gene-Signature Electronic Repository for Human Network Biology.” 862
Nucleic Acids Research 46 (D1): D668–76. https://doi.org/10.1093/nar/gkx1040. 863
Zhang, Geng, Peijun He, Hanson Tan, Anuradha Budhu, Jochen Gaedcke, B. Michael Ghadimi, 864
Thomas Ried, et al. 2013. “Integration of Metabolomics and Transcriptomics Revealed a 865
Fatty Acid Network Exerting Growth Inhibitory Effects in Human Pancreatic Cancer.” 866
Clinical Cancer Research: An Official Journal of the American Association for Cancer 867
Research 19 (18): 4983–93. https://doi.org/10.1158/1078-0432.CCR-13-0209. 868
Zhang, Geng, Aaron Schetter, Peijun He, Naotake Funamizu, Jochen Gaedcke, B. Michael 869
Ghadimi, Thomas Ried, et al. 2012. “DPEP1 Inhibits Tumor Cell Invasiveness, Enhances 870
Chemosensitivity and Predicts Clinical Outcome in Pancreatic Ductal Adenocarcinoma.” 871
PloS One 7 (2): e31507. https://doi.org/10.1371/journal.pone.0031507. 872
Zhang, Huan, Yan Sun, Zhaokai Wang, Xiaoju Huang, Lu Tang, Ke Jiang, and Xin Jin. 2024. 873
“ZDHHC20-Mediated S-Palmitoylation of YTHDF3 Stabilizes MYC mRNA to Promote 874
Pancreatic Cancer Progression.” Nature Communications 15 (1): 4642. 875
https://doi.org/10.1038/s41467-024-49105-3. 876
Zhang, Huayu, Amy Ferguson, Grant Robertson, Muchen Jiang, Teng Zhang, Cathie Sudlow, 877
Keith Smith, Kristiina Rannikmae, and Honghan Wu. 2021. “Benchmarking Network-878
Based Gene Prioritization Methods for Cerebral Small Vessel Disease.” Briefings in 879
Bioinformatics 22 (5): bbab006. https://doi.org/10.1093/bib/bbab006. 880
Zhou, Shu-Feng, and Wei-Zhu Zhong. 2017. “Drug Design and Discovery: Principles and 881
Applications.” Molecules 22 (2): 279. https://doi.org/10.3390/molecules22020279. 882
Zhou, Ying, Yintao Zhang, Xichen Lian, Fengcheng Li, Chaoxin Wang, Feng Zhu, Yunqing 883
Qiu, and Yuzong Chen. 2022. “Therapeutic Target Database Update 2022: Facilitating 884
Drug Discovery with Enriched Comparative Data of Targeted Agents.” Nucleic Acids 885
Research 50 (D1): D1398–1407. https://doi.org/10.1093/nar/gkab953. 886
Zhu, Guoying, Qing Fang, Fengshang Zhu, Dongping Huang, and Changqing Yang. 2021. 887
“Structure and Function of Pancreatic Lipase-Related Protein 2 and Its Relationship With 888
Pathological States.” Frontiers in Genetics 12 (July). 889
https://doi.org/10.3389/fgene.2021.693538. 890
Zhu, Liang, Minlin Jiang, Hao Wang, Hui Sun, Jun Zhu, Wencheng Zhao, Qiyu Fang, et al. 891
2021. “A Narrative Review of Tumor Heterogeneity and Challenges to Tumor Drug 892
Therapy.” Annals of Translational Medicine 9 (16): 1351. https://doi.org/10.21037/atm-893
21-1948. 894
Zhang, Huan et al. “ZDHHC20-mediated S-palmitoylation of YTHDF3 stabilizes 895
MYC mRNA to promote pancreatic cancer progression.” 896
Nature communications vol. 15,1 4642. 31 May. 2024, doi:10.1038/s41467-024-897
49105-3 898
Wu, Fan et al. “Neuroglobin inhibits pancreatic cancer proliferation and metastasis by targeting 899
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
27
the GNAI1/EGFR/AKT/ERK signaling axis.” Biochemical and biophysical research 900
communications vol. 664 (2023): 108-116. doi:10.1016/j.bbrc.2023.04.080 901
Campa, Daniele et al. “TERT gene harbors multiple variants associated with pancreatic cancer 902
susceptibility.” International journal of cancer vol. 137,9 (2015): 2175-83. 903
doi:10.1002/ijc.29590 904
Li, Wenyi et al. “ARID1A promotes chemosensitivity to gemcitabine in pancreatic cancer through 905
epigenetic silencing of RRM2.” Die Pharmazie vol. 77,7 (2022): 224-229. 906
doi:10.1691/ph.2022.1881 907
Pei, Yao-Fei et al. “TOP2A induces malignant character of pancreatic cancer through activating 908
β -catenin signaling pathway.” Biochimica et biophysica acta. Molecular basis of 909
disease vol. 1864,1 (2018): 197-207. doi:10.1016/j.bbadis.2017.10.019 910
Cheng, Yao et al. “Proliferation enhanced by NGF-NTRK1 signaling makes pancreatic cancer 911
cells more sensitive to 2DG-induced apoptosis.” International journal of medical 912
sciences vol. 10,5 (2013): 634-40. doi:10.7150/ijms.5547 913
Hingorani, Sunil R et al. “Preinvasive and invasive ductal pancreatic cancer and its early 914
detection in the mouse.” Cancer cell vol. 4,6 (2003): 437-50. doi:10.1016/s1535-915
6108(03)00309-x 916
Sheng, Weiwei et al. “Musashi2 promotes EGF-induced EMT in pancreatic cancer via ZEB1- 917
ERK/MAPK signaling.” Journal of experimental & clinical cancer research : CR vol. 39,1 918
16. 17 Jan. 2020, doi:10.1186/s13046-020-1521-4 919
920
Si, Hongtao et al. “Tumor-suppressive miR-29c binds to MAPK1 inhibiting the ERK/MAPK 921
pathway in pancreatic cancer.” Clinical & translational oncology : official publication of 922
the Federation of Spanish Oncology Societies and of the National Cancer Institute of 923
Mexico vol. 25,3 (2023): 803-816. doi:10.1007/s12094-022-02991-9 924
Stanciu, Silviu et al. “Targeting PI3K/AKT/mTOR Signaling Pathway in Pancreatic Cancer: From 925
Molecular to Clinical Aspects.” International journal of molecular sciences vol. 23,17 926
10132. 4 Sep. 2022, doi:10.3390/ijms231710132 927
Hu, Jili et al. “MSLN induced EMT, cancer stem cell traits and chemotherapy resistance of 928
pancreatic cancer cells.” Heliyon vol. 10,8 e29210. 7 Apr. 2024, 929
doi:10.1016/j.heliyon.2024.e29210 930
Bhamidipati, Deepak et al. “Exceptional Responses to Selpercatinib in RET Fusion-Driven 931
Metastatic Pancreatic Cancer.” JCO precision oncology vol. 7 (2023): e2300252. 932
doi:10.1200/PO.23.00252 933
Huang, Bei et al. “The role of IL-6/JAK2/STAT3 signaling pathway in cancers.” Frontiers in 934
oncology vol. 12 1023177. 16 Dec. 2022, doi:10.3389/fonc.2022.1023177 935
Pothula, Srinivasa P et al. “Targeting HGF/c-MET Axis in Pancreatic Cancer.” International 936
journal of molecular sciences vol. 21,23 9170. 1 Dec. 2020, doi:10.3390/ijms21239170 937
Marabelle, Aurelien et al. “Efficacy of Pembrolizumab in Patients With Noncolorectal High 938
Microsatellite Instability/Mismatch Repair-Deficient Cancer: Results From the Phase II 939
KEYNOTE-158 Study.” Journal of clinical oncology : official journal of the American 940
Society of Clinical Oncology vol. 38,1 (2020): 1-10. doi:10.1200/JCO.19.02105 941
942
943
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
28
Figures 944
Figure 1945
946
947
Figure 1. General overview of the GET list compilation and ranking process. A. Initial 948
gene lists from each of the three subsets are compiled. 2493 genes are compiled in the initial G 949
list, 2000 genes are compiled in the initial E list, and 131 genes are compiled in the initial T list. 950
B. Each list is prioritized using the BEERE network ranking and expansion tool, before being 951
annotated with genomic information and GPT4 literature review. C. A weighted score is 952
calculated to rank the list and genes are manually validated through literature review. 953
954
955
956
957
958
959
960
961
28
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
29
962
963
Table 1: 964
In Table 1, we observe the weights of each modality of ranking drug targets ordered from 965
greatest to least. 966
967
Table 1: Weights each modality was assigned for calculation of the RP score in GETGENE-AI. 968
969
970
971
972
973
974
975
Modality of ranking Weighted
Score
GT LIST Score 0.329
CNA(CBIOPORTAL
UTSW NAT
COMMUN 2015)
0.201
Expression LIST
Score
0.088
GET LIST Score 0.085
Mutation
frequency(cBioporta
lTCGA
PanCancerAtlas)
0.079
CNA(CBIOPORTAL
TCGA
PANCANCERATLAS)
0.048
Mutation
frequency(Cbioportal
UTSW Nat Commun
2015)
-0.023
Brain Expression
Score
-0.054
Kidney Expression
Score
-0.081
Gastrointestinal
Expression Score
-0.095
Liver Expression
Score
-0.101
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
30
976
977
Table 2: 978
Gene GPT4-Score
ranking
GET
ranking
Experimental
Validation?
Citation
MYC 1 2 Yes (Huan Zhang et al.
2024)
SRC 2 3 Yes (Su et al. 2023)
EGFR 3 4 Yes (F. Wu et al. 2023)
TERT 4 27 Yes (Campa et al. 2015)
RRM2 5 21 Yes (Li et al. 2022)
PIK3CA 6 1 Yes (Payne et al. 2015)
TOP2A 7 16 Yes (Pei, Yin, and Liu 2018)
NTRK1 8 22 Yes (Cheng et al. 2013)
PTGS2 9 25 Yes (Hingorani et al. 2003)
EGF 10 30 Yes (Sheng et al. 2020)
CDK1 11 5 Yes (J. Huang et al. 2021)
MAPK1 12 10 Yes (Si et al. 2023)
KRAS 13 13 Yes (Timar and Kashofer
2020)
MTOR 14 11 Yes (Stanciu et al. 2022)
MSLN 15 37 Yes (J. Hu et al. 2024)
RET 16 28 Yes (Bhamidipati et al.
2023)
AKT1 17 31 Yes (Arasanz et al. 2019)
JAK2 18 9 Yes (B. Huang, Lang, and Li
2022)
MET 19 34 Yes (Pothula et al. 2020)
PDCD1 20 38 Yes (Marabelle et al. 2020)
Table 2: Top 20 highest ranked genes based off of GPT4 score compared to their ranks in 979
GET and their status as experimentally validated drug targets. 980
981
982
983
984
985
986
987
988
989
990
991
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
31
992
993
994
995
996
997
Figure 2: 998
999
1000
Figure 2: Figure 2: Volcano plot GSE28735: Microarray gene-expression profiles of 45 matching 1001
pairs of tumor vs. nontumor, Padj<0.05. Blue indicates down regulated while red indicates 1002
upregulated. 1003
1004
1005
1006
1007
1008
1009
1010
1011
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
32
1012
1013
1014
Figure 3: 1015
1016
Figure 3: Network constructed by STRING utilizing the KEGG pathway HG0512. Content 1017
inside each node is known or predicted 3d structure of protein. Turquoise edges mean Protein-1018
protein interactions from curated databases, purple means experimentally determined. Green, 1019
red, and dark blue edges indicate predicted Protein-protein interactions. Light green edges 1020
32
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
33
represent text mining, black represents co-expression, and light purple represents protein 1021
homology. 1022
1023
1024
1025
1026
1027
1028
1029
Table 3: 1030
GETGENE-
AI top genes
Experimentally
validated?
STRING top
genes
Experimentally
validated?
GEO2R top
genes
Experimentally
validated?
PIK3CA Yes AKT1
Yes PNLIPRP1 No
MYC
Yes TP53 Yes PNLIPRP2 No
SRC
Yes KRAS Yes IAPP No
EGFR Yes PTEN Yes CTRC No
CDK1 Yes SRC Yes GP2 Yes
PRKCA Yes STAT3 Yes CEL No
TNF Yes EGFR Yes CPA2 Yes
LCK Yes MTOR Yes ALB Yes
JAK2 Yes BCL2 Yes CUZD1 Yes
MAPK1 Yes PIK3CA Yes ERP27 No
MTOR Yes CDKN2A Yes CLPS Yes
AURKB Yes HRAS Yes SERPINI2 Yes
KRAS Yes CCND1 Yes PLA2G1B Yes
MAPK8 Yes NFKB1 Yes CELA2A No
TOP2A Yes CDKN1A Yes CELA2B No
1031
Table 3: Top 15 genes from GETGENE-AI, STRING, and GEO2R and their status as 1032
experimentally validated drug targets. 1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
34
1043
1044
1045
1046
1047
1048
1049
1050
Figure 4: 1051
1052
1053
1054
1055
Figure 4: Bar graph displaying the percent of experimentally validated targets out of the top 50 1056
genes with each framework. 1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
0
20
40
60
80
100
120
GEO2R STRING GETgene-AI
Percentage of experimentally
validated targets (out of top 50)
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
35
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
Table 4: 1081
Gene
Weighted
score
CHAT
GPT
score
GT list
score
Mutation
Frequency
(cBioportal
TCGA
PanCancer
Atlas)
RP-LIT
score
GPT-LIT
score
GET list
score
Expression list
score
PIK3CA 34.8 310 58.7 2.8 0.199 1.771 96 97
MYC 30.1 330 9.5 0.0 0.032 0.349 214 210
SRC 20.0 320 0.0 1.1 0.044 0.711 143 144
EGFR 18.2 320 2.4 0.6 0.010 0.171 134 133
CDK1 15.9 305 15.3 65.4 0.134 2.563 30 7
PRKCA 15.3 305 3.0 0.0 1.702 25.556 101 102
TNF 12.1 270 2.4 0.0 0.013 0.292 83 86
LCK 11.5 220 1.7 0.0 1.274 24.444 62 60
JAK2 10.6 285 1.0 0.6 0.082 2.192 67 67
MAPK1 10.3 305 11.6 3.4 0.139 4.122 7 7
AURKB 9.1 295 0.0 0.6 0.008 0.246 70 70
KRAS 8.7 220 1.7 1.7 0.335 8.462 48 47
MAPK8 7.8 295 0.0 0.0 0.002 0.068 121 117
MTOR 7.1 220 1.7 0.0 0.588 18.333 52 52
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
36
ITGA4 6.9 220 4.3 0.6 2.298 73.333 40 37
TOP2A 6.9 310 10.2 1.1 0.215 9.688 0 0
CHEK1 6.7 220 1.7 0.0 0.128 4.231 46 45
BCL2 6.2 220 1.7 0.6 0.012 0.418 41 41
PRKCB 6.0 250 1.4 0.6 1.004 41.667 60 58
ERBB4 5.5 220 3.4 0.6 0.184 7.333 81 83
Table 4: Highest 20 genes ranked on GETGENE-AI. Weighted score is RP score, CHAT GPT 1082
score is GPT4 score. 1083
1084
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 22, 2025. ; https://doi.org/10.1101/2025.01.21.634201doi: bioRxiv preprint