Full text
1,960 characters
· extracted from
oa-doi-fallback
· click to expand
Abstract
Understanding how cells respond to perturbations like viral infections requires models capturing coordinated gene dynamics. However, current gene expression foundation models are predominantly reliant on single-cell data and static gene expression, limiting their applicability in real clinical scenarios. We present CellPulse, a direction-aware foundation model trained on the Virus Stimulated Atlas (VISTA), a newly curated atlas of over 23 million bulk RNA-sequencing differential expression profiles from viral infections. CellPulse models the direction and magnitude of gene expression changes via a structured representation of differential expression and a direction-aware attention mechanism, enabling the learning of coherent regulatory programs. It shows powerful diagnosing capability by accurately classifying 31 distinct virus types across diverse clinical and laboratory samples, solely from host transcriptional signatures. Crucially, without prior knowledge injection, CellPulse’s interpretability reveals virus-associated host factors that mediate infection. Using a selection of host factors for in silico drug screening yielded numerous compounds with confirmed efficacies in wet-lab assays, while cell-based and animal experiments further verified the causal relationship between host targets and viral infections. Overall, CellPulse represents a generalizable foundation model for deciphering coordinated gene dynamics from bulk transcriptomics, bridging host response modeling with clinical relevance and therapeutic discovery for infectious diseases and beyond.
Competing Interest Statement
Wuhan Institute of Virology on behalf of the authors X.Z., Y.R., and X-X.Z., Institute of Software on behalf of the authors Y.W., L.Z., and D.L. have filed a patent application for the method for disease diagnosis and drug discovery based on modeling of large‑scale gene expression data. All other authors declare no competing interests.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.