MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors

doi:10.1101/2025.08.06.668857

MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors

2025 · doi:10.1101/2025.08.06.668857

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 3,206 characters · extracted from oa-doi-fallback · click to expand

Abstract Identifying effector proteins of Gram-negative bacterial secretion systems is crucial for understanding their pathogenic mechanisms and guiding antimicrobial strategies. However, existing studies often directly rely on the outputs of protein language models for learning, which may lead to difficulties in accurately recognizing complex sequence features and long-range dependencies, thereby affecting prediction performance. In this study, we propose a deep learning model named MoCETSE to predict Gram-negative bacterial effector proteins. Specifically, MoCETSE first uses the pre-trained protein language model ESM-1b to transform raw amino acid sequences into context-aware vector representations. Then, by employing a target preprocessing network based on a mixture of convolutional experts, multiple sets of convolutional kernel “experts” process the data in parallel to separately learn local motifs and short-range dependencies as well as broader contextual information, generating more expressive sequence representations. In the transformer module, MoCETSE incorporates relative positional encoding to explicitly model the relative distances between residues, enabling the attention mechanism to precisely recognize the sequential relationships and long-range functional dependencies among amino acids, thereby achieving high-accuracy prediction of secreted effectors. MoCETSE has demonstrated outstanding predictive ability in 5-fold cross-validation and independent testing. Benchmark test shows that the performance of MoCETSE surpasses existing excellent binary and multi-class classifiers. Author Summary Gram-negative bacteria inject effector proteins into host cells via secretion systems, disrupting normal cellular functions and inducing diseases. Accurately identifying these virulent proteins is key to understanding bacterial pathogenic mechanisms and developing therapies. However, existing methods face issues like feature redundancy, inadequate capture of long-range dependent signals, and low computational efficiency. We developed MoCETSE, a novel computational method enabling end-to-end intelligent prediction of effector proteins from raw sequences. Due to the high computational cost of position-specific scoring matrix encoding, we use pre-trained protein language models to extract structural, evolutionary, and functional features from sequences, providing biologically meaningful inputs for subsequent deep learning models. Our hybrid convolutional expert network reduces dimensionality of high-dimensional embeddings and extracts multi-scale features, effectively overcoming feature redundancy and information loss, and improving model performance and efficiency. In learning secretion signal features, relative positional encoding models amino acid order, capturing critical long-range dependent signals, and enhancing the biological interpretability of predictions. MoCETSE outperforms existing tools like DeepSecE in cross-category predictions, offering a high-throughput method for effector protein prediction and clues for studying bacterial infections and developing therapies. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00