DeepEmbCas9: Cas9 coevolution and sgRNA structural information for CRISPR-Cas9 cleavage activity prediction

doi:10.1101/2025.10.08.681228

DeepEmbCas9: Cas9 coevolution and sgRNA structural information for CRISPR-Cas9 cleavage activity prediction

2025 · doi:10.1101/2025.10.08.681228

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 1,866 characters · extracted from oa-doi-fallback · click to expand

Abstract The development of CRISPR-Cas9 cleavage activity prediction tools hinges on data produced from high-throughput guide-target lentiviral library screens for different Cas9 variants. However, the majority of such tools remain limited to predictions for one or few Cas9 variants, making it difficult to quantify the effects of Cas9 residues on cleavage activity. To bridge the gap, we introduce 4 interpretable DeepEmbCas9 models for the cleavage activity prediction of 40 type II-A and II-C Cas9 variants — DeepEmbCas9, DeepEmbCas9-MVE, DeepEnsEmbCas9 naive, and DeepEnsEmbCas9 — leveraging protein and RNA language model embeddings to encode Cas9 and sgRNA, respectively. Among the 4 neural network models, DeepEnsEmbCas9 naive performed the best in both in-distribution and out-of-distribution settings, where DeepEnsEmbCas9 naive outperformed individual Cas9 cleavage activity prediction tools on 18 out of 51 and 17 out of 48 benchmark test sets, respectively, and performed comparably otherwise. Concerning uncertainty quantification, DeepEnsEmbCas9 yields quantile-calibrated uncertainty estimates while keeping a minimal performance drop compared to DeepEnsEmbCas9 naive. SHAP importance analysis on DeepEmbCas9 reaffirms the importance of Cas9-target PAM binding as a first step for Cas9 cleavage, and reveals the L2 linker and PLL-WED-PI as important Cas9 domains modulating DeepEmbCas9’s predicted activity change when introducing increased-fidelity and PAM-altering Cas9 mutations, respectively. Our findings demonstrate the usefulness of protein language model embeddings in uncertainty-aware Cas9 cleavage activity prediction. More generally, DeepEmbCas9 models serves as an initial step towards cleavage activity prediction modelling for the whole Cas9 protein family. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00