PCPpred: Prediction of Chemically Modified Peptide Permeability Across Multiple Assays for Oral Delivery

preprint OA: closed
Full text JSON View at publisher
Full text 2,720 characters · extracted from oa-doi-fallback · click to expand
Abstract Chemically modified peptides, including cyclic peptides, have emerged as promising candidates for oral delivery yet they face the challenge of low membrane permeability. In this study, the datasets were sourced from CycPeptMPDB, a database for membrane permeability of peptides obtained from different assays. Our quantitative analysis showed a clear discordance between permeability measured using PAMPA and cell-based assays (Caco-2, MDCK, and, RRCK), thereby explaining its limits as surrogate for cell-based assays. Therefore, we developed assay-specific predictive models to more accurately capture permeability determinants in each system. We systematically compute diverse features of modified peptides using open-source software and used fine-tuned peptide embeddings generated using pretrained chemical language models. Baseline models were developed using the generated multi-hierarchical molecular features. We also developed a stacked ensemble architecture, which utilizes multi-hierarchical features in models as base learners. The ensemble model achieved the best PAMPA test set performance with an MSE of 0.200, R2 of 0.685, and PCC of 0.830; and a R2 of 0.783 on Caco-2 test set. Model trained on 2D Mordred descriptors attained the highest performance on the Caco-2 test-set with MSE of 0.129, R2 of 0.793, and PCC of 0.892, surpassing state-of-the-art approaches such as CPMP. To support widespread adoption, we developed an open-access web-server (https://webs.iiitd.edu.in/raghava/pcppred/) for users to design modified peptides using human comprehensible MAP (Modifications and Annotations of Proteins) format, converting MAP to SMILES format, and predict permeability across assays with result visualization. To ensure widespread adoption, and reproducibility, we also provided a standalone on GitHub (https://github.com/raghavagps/pcppred). Competing Interest Statement The authors have declared no competing interest. Footnotes Mailing Address of Authors Akshay Shendre: akshays{at}iiitd.ac.in Pushpendra Singh Gahlot: pushpendrag{at}iiitd.ac.in Gajendra P. S. Raghava (GPSR): raghava{at}iiitd.ac.in 13. Abbreviations - PAMPA - Parallel Artificial Membrane Permeability Assay - RRCK - Ralph Russ Canine Kidney - MDCK - Madin-Darby Canine Kidney - KDE - Kernel density estimation - LGBM - Light Gradient Boosting Machine - XGBoost - eXtreme Gradient Boosting - AdaBoost - Adaptive Boosting - SVR - Support Vector Regressor - KNN - K-Neighbors Regressor - MLP - Multi-Layer Perceptron - MSE - Mean Squared Error - RMSE - Root Mean Squared Error - MAE - Mean Absolute Error - R2 - Coefficient of Determination - PCC - Pearson Correlation Coefficient - SCC - Spearman Correlation Coefficient

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00