Abstract
Motivation Accurately inferring transcription factor (TF) activity from single-cell RNA sequencing (scRNA-seq) data remains a fundamental challenge in computational biology. While existing methods rely on statistical models, motif enrichment, or prior-based inference, they often depend on deterministic assumptions about regulatory relationships and rely on static regulatory databases. Few approaches effectively integrate prior biological knowledge with data-driven inference to capture novel, dynamic, and context-specific regulatory interactions.
Results
To address these limitations, we develop scRegulate, a generative deep learning framework leveraging variational inference to estimate TF activities guided by experimental TF-target gene relationships and progressively adapted based on the input scRNA-seq data. By integrating structured biological constraints with a probabilistic latent space model, scRegulate offers a scalable and biologically grounded estimation of TF activity and gene regulatory network (GRN). Comprehensively bench-marking on public experimental and synthetic datasets demonstrates scRegulate’s superior ability. Further, scRegulate accurately recapitulates experimentally validated TF knockdown effects on a Perturb-seq dataset for key TFs. Applied to experimental human PBMC scRNA-seq data, scRegulate infers cell-type-specific GRNs and identifies differentially active TFs aligned with known regulatory pathways. scRegulate’s TF activity representations capture transcriptional heterogeneity, enabling accurate clustering of cell types. scRegulate is highly efficient, frequently an order of magnitude faster than common baselines. Collectively, our results establish scRegulate as a powerful, interpretable, and scalable framework for inferring TF activities and GRNs from single-cell transcriptomics.
Availability Results and scripts available at github.com/YDaiLab/scRegulate.
Supplementary information Supplementary data are available at Bioinformatics online.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
This revised version incorporates all updates made during peer review and matches the manuscript accepted (in press) at Bioinformatics. We added new benchmarking of scRegulate versus pySCENIC on mouse embryonic stem cell scRNA-seq data, with results reflected in the updated Figure S5 and Table S2, and clarified performance comparisons using experimental human PBMC data. We also expanded the related-work section and updated Table S4 to include Dictys and scMTNI as recent dynamic GRN methods. The distinction between synthetic PBMC benchmarking (GRouNdGAN) and experimental human PBMC data has been clarified, and terminology throughout the manuscript now consistently states that scRegulate infers, rather than reconstructs, GRNs.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.