Artificial Intelligence in Drug Discovery: A New Paradigm from Target Identification to Clinical Translation

doi:10.22541/au.176184278.86885419/v1

Artificial Intelligence in Drug Discovery: A New Paradigm from Target Identification to Clinical Translation

2025 · doi:10.22541/au.176184278.86885419/v1

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 32,662 characters · extracted from preprint-html · click to expand

Artificial Intelligence in Drug Discovery: A New Paradigm from Target Identification to Clinical Translation | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 30 October 2025 V1 Latest version Share on Artificial Intelligence in Drug Discovery: A New Paradigm from Target Identification to Clinical Translation Authors : Pankaj Kumar 0009-0006-3422-8881 [email protected] and Srishti Singh Authors Info & Affiliations https://doi.org/10.22541/au.176184278.86885419/v1 832 views 340 downloads Contents Abstract Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract The drug discovery process is historically characterized by high costs, protracted timelines, and substantial attrition rates. Artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), is now fundamentally reshaping this landscape by introducing unprecedented speed, efficiency, and predictive power. This comprehensive review delineates the transformative impact of AI across the entire drug discovery pipeline. We begin by examining foundational breakthroughs in structural biology, such as AlphaFold2 and RoseTTAFold, which have democratized accurate protein structure prediction, and their subsequent evolution into complex-aware systems like AlphaFold3. We then critically analyze the revolution in virtual screening and molecular docking, where geometric deep learning models (e.g., EquiBind, DiffDock) and GPU-accelerated platforms (e.g., Uni-Dock) enable the ultra-large-scale exploration of chemical space. A significant focus is placed on generative AI for de novo molecular design, where transformer and diffusion models facilitate the creation of novel, optimized drug candidates. We further explore AI's role in predicting Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties, phenotypic drug discovery via high-content imaging, and the emergence of privacy-preserving collaborative frameworks like federated learning. The review underscores translational milestones, including AI-prioritized drug repurposing (e.g., baricitinib for COVID-19) and the entry of AI-designed molecules into clinical trials. Finally, we provide a critical assessment of persistent challenges—data quality, model generalizability, interpretability, and regulatory harmonization—and outline a pragmatic roadmap for the field's future, emphasizing multimodal integration, robust validation, and human-AI collaboration to translate algorithmic advances into tangible patient benefits. Title Artificial Intelligence in Drug Discovery: A New Paradigm from Target Identification to Clinical Translation A uthors & Affiliation Pankaj Kumar¹*, Srishti Singh¹ ¹ICAR – Indian Agricultural Research Institute, New Delhi, India( https://www.iari.res.in ) *Corresponding Author: [email protected] Co-author Email: [email protected] Declarations: The authors declare no conflicts of interest.Ethical Compliance: This work adheres to the ethical guidelines of ICAR–IARI, New Delhi. Institutional approval was obtained for the study. Keywords: Artificial Intelligence, Machine Learning, Drug Discovery, AlphaFold, Molecular Docking, Generative Models, De Novo Design, ADMET Prediction, Clinical Translation. 1. Introduction The pursuit of new therapeutic agents is a cornerstone of modern medicine, yet the traditional drug discovery pipeline remains a daunting endeavor, often spanning over a decade and costing billions of dollars with a failure rate exceeding 90% in clinical stages [1, 2]. This inefficiency stems from the immense complexity of biological systems and the limitations of empirical, low-throughput experimental methods. The advent of artificial intelligence (AI) offers a paradigm shift, leveraging computational power to learn from vast, multi-modal datasets—including genomic sequences, protein structures, chemical libraries, high-content cellular images, and clinical records—to guide decision-making with enhanced precision and speed [3, 4]. Landmark achievements have catalyzed this transformation. The accurate prediction of protein structures from amino acid sequences by AlphaFold2 [5] and RoseTTAFold [6] has resolved a five-decade-old grand challenge in biology, fundamentally altering the initial stages of target identification and validation. Subsequent models like AlphaFold3 have extended this capability to protein-ligand complexes, opening new avenues for structure-based drug design [7]. In parallel, deep learning has revolutionized virtual screening [8, 9], enabled the generative design of novel molecular entities [10, 11], and improved the prediction of pharmacokinetic and toxicological profiles [12, 13]. Furthermore, consortia such as MELLODDY have demonstrated the feasibility of privacy-preserving, cross-institutional collaboration through federated learning, unlocking the value of proprietary data without sharing it [14, 15]. This review provides a systematic and critical examination of the methods, milestones, and future trajectory of AI in drug discovery. We deconstruct the AI-empowered pipeline, evaluate the core algorithmic approaches, highlight seminal case studies demonstrating clinical impact, and confront the significant challenges that remain. Our objective is to offer a comprehensive resource for researchers and clinicians, illustrating how AI is not merely an auxiliary tool but is rapidly becoming the central engine driving the next generation of therapeutic innovation. 2. The AI-Empowered Drug Discovery Pipeline Figure 1. Schematic of the integrated AI-driven drug discovery pipeline, illustrating the key stages from target identification to clinical candidate selection and the feedback loops enabled by AI. 2.1. Target Identification and Structural Biology The initial step of identifying and validating a druggable target has been profoundly accelerated by AI. Protein Structure Prediction: The release of AlphaFold2, which demonstrated near-experimental accuracy in the Critical Assessment of Protein Structure Prediction (CASP14), provided a vast and expanding database of reliable protein structures [5]. RoseTTAFold offered a complementary, three-track neural network architecture achieving similar performance [6]. Critical Analysis: While these models excel at monomeric structures, their initial limitations in modeling complexes, dynamics, and allosteric sites were significant. The recent AlphaFold3 model addresses many of these by predicting structures of proteins, nucleic acids, small molecules, and post-translational modifications in a single integrated system, providing a more holistic view for drug design [7]. Target Prioritization: AI models now integrate multi-omics data (genomics, transcriptomics, proteomics) to identify novel disease-associated targets and infer their druggability and potential safety profiles [16, 17]. Network-based algorithms and knowledge graphs can uncover non-obvious relationships between genes, diseases, and existing drugs, suggesting new repurposing opportunities [18]. 2.2. Virtual Screening and Molecular Docking AI is overcoming the speed-accuracy trade-off that has long plagued conventional virtual screening. • Classical Docking Augmented by DL: Tools like GNINA integrate 3D convolutional neural networks into the docking scoring function, significantly improving pose prediction and virtual screening enrichment over classical scoring functions [8]. • Geometric Deep Learning for Docking: A paradigm shift has occurred with models that treat docking as a direct prediction problem. EquiBind, an SE(3)-equivariant geometric deep learning model, predicts ligand binding poses and locations orders of magnitude faster than search-based methods, though with variable accuracy [19]. DiffDock leverages diffusion models to generate ligand poses with a confidence score, achieving state-of-the-art performance and reliability [9]. • Ultra-Large Virtual Screening: The computational bottleneck of screening massive chemical libraries is being removed by GPU-accelerated docking engines like Uni-Dock, which can screen billions of molecules in feasible timeframes, dramatically expanding the explorable chemical space [20, 21]. 2.3. De Novo Molecular Design Generative AI has moved beyond virtual screening to the creation of novel molecular structures. • Model Architectures: Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and especially Transformers and Diffusion Models have become the workhorses for molecular generation [10, 11, 22]. These models learn the underlying distribution of chemical space and can sample from it to propose new molecules. • Optimization and Conditioning: A key advancement is the ability to steer generation towards molecules with desired properties, a process known as goal-directed or conditional generation. Models can be conditioned on target properties (e.g., high affinity, solubility) or even 3D structural constraints from a protein pocket, leading to more relevant and optimizable hits [23, 24]. • Closing the DMTA Cycle: When integrated with automated synthesis planning and high-throughput experimentation, generative models can create fully automated Design-Make-Test-Analyze (DMTA) cycles, rapidly iterating on molecular designs based on experimental feedback [25]. 2.4. ADMET and Toxicity Prediction Early and accurate prediction of ADMET properties is crucial for reducing late-stage attrition. • Benchmark Studies: The DeepTox benchmark demonstrated that deep learning methods could outperform traditional methods in predicting complex toxicity endpoints [12]. • Integrated Platforms: Web servers like ADMETlab 2.0 and SwissADME provide comprehensive in silico profiling of numerous ADMET endpoints, becoming an integral part of the medicinal chemist’s toolkit for triaging compounds [13, 26]. • Analysis: While highly useful, these models are only as good as the data they are trained on. Sparse and noisy experimental data for certain endpoints (e.g., human pharmacokinetics) remains a major limitation, and predictions often lack reliable uncertainty estimates [27]. 2.5. Phenotypic Drug Discovery (PDD) AI is extracting rich, mechanistic information from cellular images. High-Content Imaging: Assays like Cell Painting generate high-dimensional morphological profiles. Deep learning models, particularly Convolutional Neural Networks (CNNs), can mine these images to identify bioactive compounds, predict their mechanisms of action (MoA), and detect subtle phenotypic signatures [28, 29]. Increased Scale and Efficiency: Innovations such as pooled/encoded phenotypic screening with optical barcoding allow for the multiplexed testing of thousands of conditions, with AI models deconvoluting the complex image data to assign bioactivity [30, 31]. 2.6. Privacy-Preserving Collaboration and Clinical Translation • Federated Learning (FL): The MELLODDY consortium proved that multiple pharmaceutical companies can collaboratively improve AI models without centralizing their sensitive chemical data. Each party trains a model on its local data, and only model parameter updates are shared and aggregated [14, 15]. This framework is critical for building robust, generalizable models across proprietary domains. • Clinical Milestones: • Repurposing: AI-driven analysis identified the JAK inhibitor baricitinib as a potential treatment for COVID-19, due to its predicted ability to inhibit viral endocytosis. This was subsequently validated in randomized controlled trials (ACTT-2, COV-BARRIER), leading to emergency use authorization [32, 33]. • AI-Designed Molecules: INS018_055, a small-molecule inhibitor for idiopathic pulmonary fibrosis designed by Insilico Medicine’s generative AI platform, has progressed into Phase II clinical trials, representing a landmark for end-to-end AI-driven drug discovery [34, 35]. 3. Core Methodological Foundations Figure 2. The iterative AI-driven Design-Make-Test-Analyze (DMTA) cycle, highlighting the role of AI at each stage. 3.1. Data Curation and Management The performance of AI models is intrinsically linked to the quality, quantity, and diversity of the training data. Data Sources: Key resources include public databases like the Protein Data Bank (PDB), ChEMBL, PubChem, and imaging repositories like the Cell Painting Gallery and JUMP-CP [36, 37]. Data Leakage: A critical challenge is ensuring that model evaluation is performed on truly novel data. Inappropriate data splitting that allows analogues from the test set to be present in the training set leads to grossly over-optimistic performance estimates. Rigorous time-based or scaffold-based splitting is essential [38, 39]. 3.2. Key AI/ML Model Classes • Geometric Deep Learning: Models like Graph Neural Networks (GNNs) and SE(3)-equivariant networks operate directly on graph and 3D structural data, making them ideal for molecules and proteins [19, 40]. • Diffusion Models: These generative models have taken the field by storm, achieving state-of-the-art results in image generation, molecular generation, and molecular docking by iteratively denoising data from a random distribution [9, 22]. • Transformers: Originally developed for natural language processing, Transformers are exceptionally powerful at modeling sequences (e.g., protein sequences, SMILES strings) and have been adapted for molecular design, reaction prediction, and retrosynthesis [41, 42]. • Multimodal Models: The next frontier involves models that can fuse and reason over diverse data types simultaneously, such as protein structure, ligand chemistry, and cellular imaging data, to make more robust and context-aware predictions [43, 44]. 3.3. Evaluation and Prospective Validation Moving from retrospective benchmarks to prospective real-world validation is the true test of an AI model. Best Practices: These include using held-out test sets, external validation, and rigorous statistical analysis. For generative models, metrics like validity, uniqueness, novelty, and diversity are used [45]. The Gold Standard: The most compelling evidence comes from prospective studies where AI-generated hypotheses (e.g., a set of predicted active molecules) are tested in blinded experiments. The success of companies like Exscientia and Insilico Medicine in moving compounds to the clinic provides the ultimate validation of this approach [34, 46]. 4. Critical Analysis of Limitations and Risks Despite the remarkable progress, significant hurdles remain before AI can realize its full potential. • Data Biases and Quality: Models trained on public data can inherit historical biases, such as over-representation of certain protein families or chemical scaffolds, limiting their applicability to novel target classes [47]. • Generalization and Domain Shift: A model performing excellently on a benchmark set may fail dramatically when applied to a different cellular context, assay technology, or chemical series. Developing models that are robust to such domain shifts is an active area of research [48]. • The ”Black Box” Problem: The complexity of deep learning models often makes it difficult to understand the rationale behind their predictions. This lack of interpretability can hinder trust and slow adoption by medicinal chemists. Methods like SHAP and attention mechanisms are being developed to provide insights [49]. • High Cost of Validation: The computational cost of training large models and, more significantly, the experimental cost of synthesizing and testing AI-generated molecules remain substantial bottlenecks. • Regulatory Uncertainty: Regulatory agencies like the FDA and EMA are still developing frameworks for the evaluation and approval of AI/ML-derived drugs and software. Clear guidelines on model transparency, validation, and lifecycle management are needed [50]. 5. The Regulatory and Ethical Landscape The integration of AI into a highly regulated industry necessitates careful consideration. FDA Perspectives: The U.S. FDA has released discussion papers outlining a potential framework for AI/ML in drug development, emphasizing principles of data quality, model robustness, interpretability, and representative datasets to avoid bias [50, 51]. Ethical Considerations: Key issues include patient data privacy (addressed by techniques like federated learning), algorithmic fairness to ensure therapies are effective across diverse populations, and the need for human oversight throughout the process [52, 53]. 6. Future Outlook and Pragmatic Roadmap The trajectory of AI in drug discovery points towards deeper integration and increasing automation. 1. Generative Chemistry becomes Ubiquitous: De novo design will become the default starting point for new projects, heavily conditioned on structural information from AF3 and other sources. 2. Multimodal AI for Systems Pharmacology: Models will integrate diverse data—from atomic structures to clinical trial outcomes—to predict drug effects in the context of entire biological systems, improving efficacy and safety predictions. 3. Automated and Autonomous Discovery: The integration of AI with lab automation and robotics will lead to highly efficient, self-optimizing ”closed-loop” discovery systems. 4. AI for Biomarker and Companion Diagnostic Development: AI will be increasingly used to identify patient stratification biomarkers from omics and clinical data, enabling precision medicine approaches. 5. Focus on Trust and Collaboration: Developing explainable AI and intuitive interfaces will be crucial for fostering effective collaboration between AI systems and human experts. 7. Conclusion AI is no longer a futuristic promise but a present-day reality that is actively reshaping the science and business of drug discovery. From solving fundamental problems in structural biology to generating novel drug candidates and guiding clinical repurposing, its impact is being felt across the entire pipeline. The entry of AI-designed molecules into clinical trials marks a historic inflection point. While formidable challenges in data quality, model generalizability, and regulatory alignment persist, the ongoing research and collaboration between computational scientists, biologists, and clinicians are steadily addressing these issues. The future of drug discovery is inextricably linked to the intelligent application of AI, heralding an era of accelerated, more efficient, and more targeted therapeutic development for patients in need. References 1. DiMasi, J. A., Grabowski, H. G., & Hansen, R. W. (2016). Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics , *47*, 20-33. 2. Scannell, J. W., Blanckley, A., Boldon, H., & Warrington, B. (2012). Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews Drug Discovery , *11*(3), 191-200. 3. Schneider, P., Walters, W. P., Plowright, A. T., et al. (2020). Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery , *19*(5), 353-364. 4. Vamathevan, J., Clark, D., Czodrowski, P., et al. (2019). Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery , *18*(6), 463-477. 5. Jumper, J., Evans, R., Pritzel, A., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature , *596*(7873), 583-589. 6. Baek, M., DiMaio, F., Anishchenko, I., et al. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science , *373*(6557), 871-876. 7. Abramson, J., Adler, J., Dunger, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature , *630*, 1-9. 8. McNutt, A. T., Francoeur, P., Aggarwal, R., et al. (2021). GNINA 1.0: molecular docking with deep learning. Journal of Cheminformatics , *13*(1), 43. 9. Corso, G., Stärk, H., Jing, B., Barzilay, R., & Jaakkola, T. (2023). DiffDock: Diffusion steps, twists, and turns for molecular docking. Nature Methods , *20*, 1363-1373. 10. Sanchez-Lengeling, B., & Aspuru-Guzik, A. (2018). Inverse molecular design using machine learning: Generative models for matter engineering. Science , *361*(6400), 360-365. 11. Elton, D. C., Boukouvalas, Z., Fuge, M. D., & Chung, P. W. (2019). Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering , *4*(4), 828-849. 12. Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2016). DeepTox: toxicity prediction using deep learning. Frontiers in Environmental Science , *3*, 80. 13. Xiong, G., Wu, Z., Yi, J., et al. (2021). ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Research , *49*(W1), W5-W14. 14. Heyndrickx, W., Vandael, D., Vanderschueren, J., et al. (2023). MELLODDY: Cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. Journal of Chemical Information and Modeling , *63*(8), 2331-2344. 15. MELLODDY Consortium. (2024). Official project results on CORDIS EU. 16. Nelson, M. R., Tipney, H., Painter, J. L., et al. (2015). The support of human genetic evidence for approved drug indications. Nature Genetics , *47*(8), 856-860. 17. Zarin, D. A., Tse, T., Williams, R. J., Califf, R. M., & Ide, N. C. (2011). The ClinicalTrials.gov results database—update and key issues. New England Journal of Medicine , *364*(9), 852-860. 18. Himmelstein, D. S., Lizee, A., Hessler, C., et al. (2017). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife , *6*, e26726. 19. Stärk, H., Ganea, O., Pattanaik, L., Barzilay, R., & Jaakkola, T. (2022). EquiBind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning (pp. 20503-20521). PMLR. 20. Yu, Y., Chen, J., Li, Y., et al. (2023). Uni-Dock: GPU-accelerated docking enables ultralarge virtual screening. Journal of Chemical Theory and Computation , *19*(6), 3336-3345. 21. Gorgulla, C., Boeszoermenyi, A., Wang, Z. F., et al. (2020). An open-source drug discovery platform enables ultra-large virtual screens. Nature , *580*(7805), 663-668. 22. Hoogeboom, E., Satorras, V. G., Vignac, C., & Welling, M. (2022). Equivariant diffusion for molecule generation in 3D. In International Conference on Machine Learning (pp. 8867-8887). PMLR. 23. Ragoza, M., Masuda, T., & Koes, D. R. (2022). Generating 3D molecules for target protein binding. Journal of Chemical Information and Modeling , *62*(9), 2281-2295. 24. Imrie, F., Bradley, A. R., van der Schaar, M., & Deane, C. M. (2020). Deep generative models for 3D linker design. Journal of Chemical Information and Modeling , *60*(4), 1983-1995. 25. Segler, M. H., Preuss, M., & Waller, M. P. (2018). Planning chemical syntheses with deep neural networks and symbolic AI. Nature , *555*(7698), 604-610. 26. Daina, A., Michielin, O., & Zoete, V. (2017). SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Scientific Reports , *7*(1), 42717. 27. Norinder, U., & Bergström, C. A. (2020). Machine learning and AI in ADME and PK-PD. In Computational Toxicology (pp. 133-155). Humana, New York, NY. 28. Bray, M. A., Singh, S., Han, H., et al. (2016). Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nature Protocols , *11*(9), 1757-1774. 29. Caie, P. D., Walls, R. E., Ingleston-Orme, A., et al. (2010). High-content phenotypic profiling of drug response signatures for distinct clinical classes of breast cancer. Molecular Cancer Therapeutics , *9*(6), 1913-1926. 30. Liu, N., Xu, X., Wang, T., et al. (2024). Scalable, compressed phenotypic screening using pooled optical barcoding. Nature Biotechnology , *42*(11), 1701-1712. 31. Feldman, D., Singh, A., Schmid-Burgk, J. L., et al. (2019). Optical pooled screens in human cells. Cell , *179*(3), 787-799. 32. Stebbing, J., Krishnan, V., de Bono, S., et al. (2020). Mechanism of baricitinib supports artificial intelligence-predicted testing in COVID-19 patients. EMBO Molecular Medicine , *12*(8), e12697. 33. Kalil, A. C., Patterson, T. F., Mehta, A. K., et al. (2021). Baricitinib plus remdesivir for hospitalized adults with Covid-19. New England Journal of Medicine , *384*(9), 795-807. 34. Insilico Medicine. (2023). INS018_055 enters Phase 2 clinical trials for idiopathic pulmonary fibrosis. Company press release. 35. Tong, X., Liu, X., Tan, X., et al. (2024). Generative AI for designing novel therapeutic antibodies. Nature Biotechnology [Preprint]. 36. Gaudelet, T., Day, B., Jamasb, A. R., et al. (2021). Utilizing graph machine learning within drug discovery and development. Briefings in Bioinformatics , *22*(6), bbab159. 37. Chandrasekaran, S. N., Ceulemans, H., Boyd, J. D., & Carpenter, A. E. (2021). Image-based profiling for drug discovery: due for a machine-learning upgrade?. Nature Reviews Drug Discovery , *20*(2), 145-159. 38. Sieg, J., Flachsenberg, F., & Rarey, M. (2019). In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. Journal of Chemical Information and Modeling , *59*(3), 947-961. 39. Chen, L., Tan, X., Wang, D., et al. (2020). Transformer-based molecular optimization beyond matched molecular pairs. Journal of Cheminformatics , *12*, 1-15. 40. Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine , *34*(4), 18-42. 41. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems , *30*. 42. Schwaller, P., Laino, T., Gaudin, T., et al. (2019). Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Central Science , *5*(9), 1572-1583. 43. Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., & Goldenberg, A. (2019). Multimodal deep learning for predicting drug–target interactions. Bioinformatics , *35*(14), i501-i509. 44. Sturm, N., Mayr, A., Le Van, T., et al. (2021). Application of bioactivity profile-based fingerprints for building machine learning models. Journal of Chemical Information and Modeling , *61*(1), 415-426. 45. Brown, N., Fiscato, M., Segler, M. H., & Vaucher, A. C. (2019). GuacaMol: benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling , *59*(3), 1096-1108. 46. Exscientia. (2021). Exscientia announces first AI-designed immuno-oncology drug to enter the clinic. Company press release. 47. Adamson, C. S., Chibale, K., Goss, R. J., et al. (2021). Antiviral drug discovery: preparing for the next pandemic. Chemical Society Reviews , *50*(6), 3647-3655. 48. Koh, P. W., Sagawa, S., Marklund, H., et al. (2021). WILDS: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning (pp. 5637-5664). PMLR. 49. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems , *30*. 50. U.S. Food and Drug Administration. (2023). Artificial Intelligence and Machine Learning in Drug Development: Discussion Paper and Request for Feedback. 51. European Medicines Agency. (2021). Data-driven modelling in medicinal product development. 52. Price, W. N., & Cohen, I. G. (2019). Privacy in the age of medical big data. Nature Medicine , *25*(1), 37-43. 53. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science , *366*(6464), 447-453. Information & Authors Information Version history V1 Version 1 30 October 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords alphafold artificial intelligence drug discovery generative models machine learning molecular docking Authors Affiliations Pankaj Kumar 0009-0006-3422-8881 [email protected] Indian Agriculture Research Institute, New Delhi View all articles by this author Srishti Singh Indian Agriculture Research Institute, New Delhi View all articles by this author Metrics & Citations Metrics Article Usage 832 views 340 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Pankaj Kumar, Srishti Singh. Artificial Intelligence in Drug Discovery: A New Paradigm from Target Identification to Clinical Translation. Authorea . 30 October 2025. DOI: https://doi.org/10.22541/au.176184278.86885419/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); Cited by Mengqi Cai, Tiancai Liu, From Algorithms to Assets: A Comprehensive Review of AI’s Role in Preclinical Drug Discovery and the Hurdles to Clinical Translation, Pharmaceuticals, 19 , 5, (696), (2026). https://doi.org/10.3390/ph19050696 Crossref Sumeet Dwivedi, Prerna Chaturvedi, From Tradition to Biochemical Innovation: Phytobioactive Compounds in Modern Neuropharmacology, International Journal of Pharmacology, 22 , 1, (2026). https://doi.org/10.31083/IJP48014 Crossref Loading... View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.176184278.86885419/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'a00aa60329d506f7',t:'MTc3OTYwODM4Nw=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-06-16T06:25:30.133384+00:00