SRSA-VAE: Self-Attention-Based Feature Learning for Single-Cell Multimodal Clustering

doi:10.64898/2026.05.06.723212

SRSA-VAE: Self-Attention-Based Feature Learning for Single-Cell Multimodal Clustering

2026 · doi:10.64898/2026.05.06.723212

preprint OA: closed

Full text JSON View at publisher

⚙ AI-generated deep summary by claude@2026-07, 2026-07-04 · read from full text ⓘ

The paper studies multimodal single-cell clustering, focusing on feature/representation learning challenges posed by high dimensionality, sparsity, and the combined scRNA-seq and CITE-seq (gene and protein) modalities. It proposes SRSA-VAE, a scalable variational autoencoder with a residual self-attention encoder to dynamically contextualize gene and protein representations and capture inter-cell relationships while using residual connections to stabilize training and preserve information. Evaluated on five large public single-cell datasets, SRSA-VAE is reported to outperform existing deep generative clustering models in adjusted Rand index (ARI), with especially strong improvements for complex immune cell populations, and ablation results attribute gains to the self-attention and residual components. The paper does not discuss limitations in the provided text, and it is primarily an evaluation-focused benchmarking study. The paper does not explicitly discuss endometriosis or adenomyosis; it was included in the corpus via a keyword match in the upstream search index.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Full text 2,109 characters · extracted from oa-doi-fallback · click to expand

Abstract Clustering plays a critical role in the analysis of single-cell omics data for identifying cellular heterogeneity and uncovering biological mechanisms. However, the high dimensionality, sparsity, and multimodal nature of single-cell datasets such as single-cell RNA sequencing (scRNA-seq) and Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) pose significant challenges for effective feature learning and representation learning. Traditional dimensionality reduction methods often rely on linear transformations and fail to capture complex nonlinear relationships between gene and protein expression profiles. In this work, we propose SRSA-VAE, a scalable variational autoencoder framework that integrates a residual self-attention encoder for context-aware feature learning and multimodal representation learning. The proposed model dynamically contextualizes gene and protein representations through a self-attention mechanism, enabling the encoder to capture inter-cell relationships and emphasize biologically informative signals. A scalable residual connection further stabilizes training and preserves essential input information during latent representation learning. We evaluate SRSA-VAE on five large-scale publicly available single-cell datasets, including both scRNA-seq and CITE-seq data, and compare its performance with established deep generative models. Experimental results demonstrate that SRSA-VAE consistently outperforms existing methods in Adjusted Rand Index (ARI) across benchmark datasets, with particularly strong gains on complex immune cell populations. Ablation studies further confirm the importance of the self-attention mechanism and residual connection in enhancing model stability and clustering accuracy. The proposed model offers a generalizable, robust, and scalable solution for single-cell clustering tasks. Code Repository https://github.com/rangan2510/srsa-vae Competing Interest Statement The authors have declared no competing interest. Footnotes (e-mail: ujjwal.maulik{at}jadavpuruniversity.in). (e-mail: sanghami{at}isical.ac.in).

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00