Full text
2,720 characters
· extracted from
oa-doi-fallback
· click to expand
Abstract
Chemically modified peptides, including cyclic peptides, have emerged as promising candidates for oral delivery yet they face the challenge of low membrane permeability. In this study, the datasets were sourced from CycPeptMPDB, a database for membrane permeability of peptides obtained from different assays. Our quantitative analysis showed a clear discordance between permeability measured using PAMPA and cell-based assays (Caco-2, MDCK, and, RRCK), thereby explaining its limits as surrogate for cell-based assays. Therefore, we developed assay-specific predictive models to more accurately capture permeability determinants in each system. We systematically compute diverse features of modified peptides using open-source software and used fine-tuned peptide embeddings generated using pretrained chemical language models. Baseline models were developed using the generated multi-hierarchical molecular features. We also developed a stacked ensemble architecture, which utilizes multi-hierarchical features in models as base learners. The ensemble model achieved the best PAMPA test set performance with an MSE of 0.200, R2 of 0.685, and PCC of 0.830; and a R2 of 0.783 on Caco-2 test set. Model trained on 2D Mordred descriptors attained the highest performance on the Caco-2 test-set with MSE of 0.129, R2 of 0.793, and PCC of 0.892, surpassing state-of-the-art approaches such as CPMP. To support widespread adoption, we developed an open-access web-server (https://webs.iiitd.edu.in/raghava/pcppred/) for users to design modified peptides using human comprehensible MAP (Modifications and Annotations of Proteins) format, converting MAP to SMILES format, and predict permeability across assays with result visualization. To ensure widespread adoption, and reproducibility, we also provided a standalone on GitHub (https://github.com/raghavagps/pcppred).
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Mailing Address of Authors
Akshay Shendre: akshays{at}iiitd.ac.in
Pushpendra Singh Gahlot: pushpendrag{at}iiitd.ac.in
Gajendra P. S. Raghava (GPSR): raghava{at}iiitd.ac.in
13. Abbreviations
- PAMPA
- Parallel Artificial Membrane Permeability Assay
- RRCK
- Ralph Russ Canine Kidney
- MDCK
- Madin-Darby Canine Kidney
- KDE
- Kernel density estimation
- LGBM
- Light Gradient Boosting Machine
- XGBoost
- eXtreme Gradient Boosting
- AdaBoost
- Adaptive Boosting
- SVR
- Support Vector Regressor
- KNN
- K-Neighbors Regressor
- MLP
- Multi-Layer Perceptron
- MSE
- Mean Squared Error
- RMSE
- Root Mean Squared Error
- MAE
- Mean Absolute Error
- R2
- Coefficient of Determination
- PCC
- Pearson Correlation Coefficient
- SCC
- Spearman Correlation Coefficient
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.