Fluctuation structure predicts genome-wide perturbation outcomes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Fluctuation structure predicts genome-wide perturbation outcomes Yogesh Goyal, Benjamin Kuznets-Speck, Leon Schwartz, Hanxiao Sun, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7304871/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Pooled single-cell perturbation screens represent powerful experimental platforms for functional genomics, yet interpreting these rich datasets for meaningful biological conclusions remains challenging. Most current methods fall at one of two extremes: either opaque deep learning models that obscure biological meaning, or simplified frameworks that treat genes as isolated units. As such, these approaches overlook a crucial insight: gene co-fluctuations in unperturbed cellular states can be harnessed to model perturbation responses. Here we present CIPHER (Covariance Inference for Perturbation and High-dimensional Expression Response), a conceptual framework leveraging linear response theory from statistical physics to predict transcriptome-wide perturbation outcomes using gene co-fluctuations in unperturbed cells. We validated CIPHER on synthetic regulatory networks before applying it to 11 large-scale single-cell perturbation datasets covering 4,234 perturbations and over 1.36M cells. CIPHER robustly recapitulated genome-wide responses to single and double perturbations by exploiting baseline gene covariance structure. Importantly, eliminating gene-gene covariances, while retaining gene-intrinsic variances, reduced model performance by 11-fold, demonstrating the rich information stored within baseline fluctuation structures. Moreover, gene-gene correlations transferred successfully across independent studies of the same cell type, revealing stereotypic fluctuation structures. Furthermore, CIPHER outperformed conventional differential expression metrics in identifying true perturbations while providing uncertainty-aware effect size estimates through Bayesian inference. Finally, most genome-wide responses propagated through the covariance matrix along approximately three independent and global gene modules. CIPHER underscores the importance of theoretically-grounded models in capturing complex biological responses, highlighting fundamental design principles encoded in cellular fluctuation patterns. Biological sciences/Computational biology and bioinformatics Biological sciences/Systems biology/Information theory single-cell perturbations genome-wide responses linear response theory fluctuations Bayesian statistics Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Full Text Additional Declarations There is NO Competing Interest. Table 1 is available in the Supplementary Files section. Supplementary Files Table1KuznetsSpeckCIPHER27June2025.pdf Table 1 FiguresSIKuznetsSpeckCIPHER5Aug2025.pdf Supplementary Figures Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7304871","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":498776234,"identity":"d91a9481-549d-4d48-b38a-ec024e524127","order_by":0,"name":"Yogesh Goyal","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABK0lEQVRIie3NsUrDQBjA8QuBZLkax5QI+gifBJRgTF/lSuCyWCL4AgEhXSKuEXwJF61byoFdbnELuCR7hxZBcpMmrUMxN3R0uP9w38F9Pw4hleq/piXbUay6Ye+82JLtXYLmeXvgzRrZkzC8D4HF9L0WMxTc22HFLmd+PHKy01o0fnyO9PkHlhDOb9wBR+FDToFNOPWyI+66mFDvNTHCCxkpr6ijpSiEkrQkZYDtybODCAMo8JkjJ5EQGxKtmLclL6Ih3y2xvuQkekODFAWtBab9/oIwKbpfDBkZcq47g9Qmw2x5Pb9LKeAy/nQwDQGY4XqPfXKwmNZrkfojy4yeqvYCZk7H68YPABa3dbnsk5MCQzvscXJI/r7p/fWu48SsujlCViHfUKlUKtUPuzVs0cRoIKcAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-3502-6465","institution":"Northwestern University and CZ Biohub","correspondingAuthor":true,"prefix":"","firstName":"Yogesh","middleName":"","lastName":"Goyal","suffix":""},{"id":498776235,"identity":"cce7f0aa-dfb3-4f72-84e0-30b2ea2bf5fc","order_by":1,"name":"Benjamin Kuznets-Speck","email":"","orcid":"","institution":"Northwestern University","correspondingAuthor":false,"prefix":"","firstName":"Benjamin","middleName":"","lastName":"Kuznets-Speck","suffix":""},{"id":498776236,"identity":"5a366024-3cc1-4c6a-88cc-962c47e76ab7","order_by":2,"name":"Leon Schwartz","email":"","orcid":"","institution":"Northwestern University","correspondingAuthor":false,"prefix":"","firstName":"Leon","middleName":"","lastName":"Schwartz","suffix":""},{"id":498776237,"identity":"407e24c5-1bec-47be-9c8e-f4f771ff5930","order_by":3,"name":"Hanxiao Sun","email":"","orcid":"https://orcid.org/0009-0006-2987-642X","institution":"Northwestern University","correspondingAuthor":false,"prefix":"","firstName":"Hanxiao","middleName":"","lastName":"Sun","suffix":""},{"id":498776238,"identity":"9cd7013d-2ea4-4f87-81ad-9cd96623d09c","order_by":4,"name":"Madeline Melzer","email":"","orcid":"","institution":"Northwestern University","correspondingAuthor":false,"prefix":"","firstName":"Madeline","middleName":"","lastName":"Melzer","suffix":""},{"id":498776239,"identity":"acd676ed-4130-4d16-beeb-d6c037d2f3ff","order_by":5,"name":"Nitu Kumari","email":"","orcid":"","institution":"Northwestern University","correspondingAuthor":false,"prefix":"","firstName":"Nitu","middleName":"","lastName":"Kumari","suffix":""},{"id":498776240,"identity":"a65b4f4c-a78c-4619-94ba-e26c05dc4fdd","order_by":6,"name":"Benjamin Haley","email":"","orcid":"","institution":"Université de Montréal","correspondingAuthor":false,"prefix":"","firstName":"Benjamin","middleName":"","lastName":"Haley","suffix":""},{"id":498776241,"identity":"6e51e849-07e5-4b44-90bc-5f038c5ee359","order_by":7,"name":"Ekta Prashnani","email":"","orcid":"","institution":"NVIDIA","correspondingAuthor":false,"prefix":"","firstName":"Ekta","middleName":"","lastName":"Prashnani","suffix":""},{"id":498776242,"identity":"a0444da7-dae7-40ce-8345-40854c7cb51d","order_by":8,"name":"Suriyanarayanan Vaikuntanathan","email":"","orcid":"","institution":"University of Chicago","correspondingAuthor":false,"prefix":"","firstName":"Suriyanarayanan","middleName":"","lastName":"Vaikuntanathan","suffix":""}],"badges":[],"createdAt":"2025-08-06 02:30:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7304871/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7304871/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":88879244,"identity":"fad3ebaf-ec21-45d5-9468-12598a9ac750","added_by":"auto","created_at":"2025-08-12 10:48:43","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":84778,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSchematic of the CIPHER workflow and applications to synthetic and real experimental datasets. \u003c/strong\u003e(column 1) The gene-gene covariance matrix informs on changes to gene expression upon perturbation. (column 2) Perturb-seq interrogates thousands of perturbations by combining RNA-seq with CRISPR screens. (column 3) Linear response can apply to progressively complex synthetic regulatory networks. (column 4) Given covariance and expression changes, CIPHER has three complementary modalities. The forward problem predicts changes in expression, the reverse problem predicts the driving perturbation(s) and the framework is fully interpretable telling us 1) how each genes change is made up of contributions from both itself and every other gene and 2) how response is propagated along dominant or subdominant modes of the covariance.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/880675f752b29ca1e2e391df.jpg"},{"id":88879249,"identity":"c7f2fe9a-360f-475d-ac7c-b4a3d61635ca","added_by":"auto","created_at":"2025-08-12 10:48:43","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":198006,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePredicting genome-wide response to perturbations in synthetic regulatory networks.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA) Random linear regulatory networks of N genes– nodes represent genes and edge thickness represent the (absolute) strength of interactions between genes. Subcritical, critical and supercritical networks are shown.\u003c/p\u003e\n\u003cp\u003eB) Steady-state gene-gene correlations for the networks in A).\u003c/p\u003e\n\u003cp\u003eC) Percent variance explained by the first principal component as a function of the gene-gene interaction parameter.\u003c/p\u003e\n\u003cp\u003eD) Participation ratio (effective dimension) as a function of the scaled interaction parameter for different sized networks.\u003c/p\u003e\n\u003cp\u003eE) Points are the typical error in linear response, , across ||Δ𝑋 − Σ𝑢|| all genes in 2/||Δ𝑋||2 the three-regimes, averaged over 500 trajectories. Standard error bars shown. (One-sided Wilcoxon signed-rank tests: subcritical error \u0026lt; critical error, p-value = 7.6×10-7; critical error \u0026lt; supercritical error, p-value = 3.0×10-51).\u003c/p\u003e\n\u003cp\u003eF) Non-linear networks with activating Hill function interactions, n = 10.\u003c/p\u003e\n\u003cp\u003eG) Error in linear response as a function of effective interaction strength, Geff. Points represent individual simulation runs and lines guide the eye.\u003c/p\u003e\n\u003cp\u003eH) Error in linear response as a function of perturbation magnitude for several different Hill coefficients (n).\u003c/p\u003e\n\u003cp\u003eI) A prototypical ‘teams’ network wherein groups of genes mutually activate within teams and inhibit across them, each gene has its own promoter whose bursting activity is modulated by nonlinear Hill-type reactions with all other genes.\u003c/p\u003e\n\u003cp\u003eJ) Time-series of the average expression for the two teams.\u003c/p\u003e\n\u003cp\u003eK) Steady state correlations over the teams time-series.\u003c/p\u003e\n\u003cp\u003eL) Empirical expression change and corresponding linear response predictions. Each point represents an individual gene in the network.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/370e084d90db122c43a7b6d8.jpg"},{"id":88879248,"identity":"fadf487c-7c77-4634-a126-ee6972909bbd","added_by":"auto","created_at":"2025-08-12 10:48:43","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":219723,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\n\u003cbr\u003e\n\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe forward problem: to what extent can covariance structure explain response to known gene perturbations from Perturb-seq.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA) We consider 10 Perturb-seq datasets, comprising CRISPRi (8 datasets) and CRISPRa (2 datasets) gene perturbations (outermost ring) for 8 different cell types (middle ring). The innermost ring shows the log-number of control (grey) and perturbed (colored) samples for each perturbation in each dataset.\u003c/p\u003e\n\u003cp\u003eB) Show control gene expression matrices before and after shuffling rows and columns, as well as corresponding covariance matrices\u003c/p\u003e\n\u003cp\u003eC)\u0026nbsp; Distributions of \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values over all CRISPRi perturbations, optimizing over the known perturbation with either the real or shuffled covariance matrix (KS test stat = 0.881, p-value = 5.444×10\u003csup\u003e-79\u003c/sup\u003e).\u003c/p\u003e\n\u003cp\u003eD) Distributions of \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values over all CRISPRa perturbations, with real or shuffled covariance (KS test stat = 0.876, p-value = 0).\u003c/p\u003e\n\u003cp\u003eE)\u0026nbsp; Average \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values over all CRISPRi perturbations (black points, black line) and per dataset (colored points, grey lines)\u003c/p\u003e\n\u003cp\u003eF) and analogously for CRISPRa (p-value calculated using combined one-sided Wilcoxon signed-rank tests including all 10 datasets: p-value = 0.00098).\u003c/p\u003e\n\u003cp\u003eG) The top scoring perturbation for Tian_19b is PPP2R1A. We show the empirical \u003cem\u003e∆ \u003c/em\u003e(lines) as well as that predicted by linear response (dots).\u003c/p\u003e\n\u003cp\u003eH)\u0026nbsp;\u0026nbsp;\u0026nbsp; The top scoring perturbation for Tian_21b is TUSC1.\u003c/p\u003e\n\u003cp\u003eI)\u0026nbsp; Predicted \u003cem\u003e∆ = Σ \u003c/em\u003evs experimental data for PPP2R1A, each point is a gene,\u003c/p\u003e\n\u003cp\u003eJ)\u0026nbsp;and similarly for TUSC1.\u003c/p\u003e\n\u003cp\u003eK) Real and ‘mean-field’ (shuffled over cells, not genes) gene expression count matrices, and corresponding covariance matrices.\u003c/p\u003e\n\u003cp\u003eL) Distribution of \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values over all datasets for mean-field and real covariance matrices (KS test stat = 0.654, p-value = 0).\u003c/p\u003e\n\u003cp\u003eM) The average \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values for CRISPRi/a datasets with the real \u003cem\u003eΣ\u003c/em\u003ecompared to mean field. Dark points are averages over all CRISPRi (green) and CRISPRa (yellow) perturbations, and light points are per dataset averages. (Combined one-sided Wilcoxon signed-rank tests including all 10 datasets: p-value = 0.00098)\u003c/p\u003e\n\u003cp\u003eN) \u0026nbsp;Histograms of \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values over all double perturbations to different genes A and B. We compare the full double perturbation response to that achieved for shuffled X0, perturbations A or B alone, and the additive solution. (Shuffled vs Single A: KS stat = 0.5250, p-value = 1.823×10\u003csup\u003e-15\u003c/sup\u003e; Shuffled vs Single B: KS stat = 0.5167, p-value = 5.753×10\u003csup\u003e-15\u003c/sup\u003e; Shuffled vs Additive: KS stat = 0.6333, p = 6.627×10\u003csup\u003e-23\u003c/sup\u003e; Single A vs Single B: KS stat = 0.0833, p-value = 8.012×10\u003csup\u003e-1\u003c/sup\u003e; Additive vs True Σ: KS stat = 0.3750, p-value = 6.651×10\u003csup\u003e-08\u003c/sup\u003e).\u003c/p\u003e\n\u003cp\u003eO) Mean \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e across the conditions in N) (Shuffled vs Single A: p-value = 7.9×10\u003csup\u003e-6\u003c/sup\u003e; Shuffled vs Single B: p = 0.00044; Shuffled vs Additive: p-value = 2.6×10\u003csup\u003e-6\u003c/sup\u003e; Additive vs True Σ: p-value = 9.9×10\u003csup\u003e-22\u003c/sup\u003e). Orange points are over Norman19 perturbations, blue points over Tian_19a perturbations and the black points are the averages over both the datasets.\u003c/p\u003e\n\u003cp\u003eP) Responses in a host study from inter-study covariances either of the same or different cell type.\u003c/p\u003e\n\u003cp\u003eQ) \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values across neuron datasets.\u0026nbsp; values using the true covariance matrix from the \u003csub\u003e\u003cem\u003eR\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e2 \u003c/em\u003ehost study compared to those from a different dataset but same cell type. Each point is a perturbed gene.\u003c/p\u003e\n\u003cp\u003eR) Average \u003cem\u003eR\u003c/em\u003e\u003csup\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sup\u003e values across all pairs of datasets and conditions, using a covariance matrix from either a shuffled X0 (across datasets), the mean-field approximation (across datasets), the unshuffled X0 (across datasets) and the unshuffled X0 from the same\u003c/p\u003e\n\u003cp\u003edataset. Thin lines correspond to individual dataset pairs (averages over all perturbations in the host dataset with a Σ of the same cell type in red or different cell types in orange) and points are averages over the typical \u003csup\u003e2\u003c/sup\u003e values for these pairs of datasets. Note that there are only three lines connecting the across and within dataset real covariance for the same cell type. This is because we are comparing the three datasets shown in Q, and that there is no true covariance from the same dataset and different cell type. (Same cell type Shuffled vs. mean-field: p = 0.0020; Same cell type mean-field vs cross-dataset covariance: p-value = 0.0020; Same cell type cross-dataset covariance vs same dataset covariance: p-value = 0.016. Different cell type Shuffled vs. mean-field: p-value = 4.8×10\u003csup\u003e-7\u003c/sup\u003e; Different cell type mean-field vs true: p-value = 4.8×10\u003csup\u003e-7\u003c/sup\u003e).\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/967277707affa6694a6dc51a.jpg"},{"id":88880249,"identity":"2e7bcada-09ec-48e4-af31-13501d01cd9c","added_by":"auto","created_at":"2025-08-12 10:56:43","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":121590,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe inverse problem: predicting causal drivers of perturbation response in transcriptome-wide screens.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA) CIPHER ranks genes responsible for a measured perturbation using knowledge of the control-cell gene expression fluctuations. Each point corresponds to a transcriptome from a single cell.\u003c/p\u003e\n\u003cp\u003eB) Left: Posterior probability of a nonzero perturbation effect across genes (points). True perturbed genes A and B have high posterior probabilities. Right: effect size distributions for different perturbations. Distributions of effect size for genes A and B as well as a composite distribution over other unperturbed genes.\u003c/p\u003e\n\u003cp\u003eC) Receiver-operator characteristic curves for each CRISPRi dataset. Each curve corresponds to a dataset.\u003c/p\u003e\n\u003cp\u003eD) Receiver-operator characteristic curves for each CRISPRa dataset.\u003c/p\u003e\n\u003cp\u003eE) AUROC comparison across metrics: p-value vs. PIP. (One-sided Wilcoxon signed-rank test p-value = 0.042)\u003c/p\u003e\n\u003cp\u003eF) AUROC comparison across metrics: log fold change vs maximum posterior effect size distribution spread vs. PIP. Each line connects AUROC scores for the same dataset over different conditions. (One-sided Wilcoxon signed-rank test p-value = 0.042)\u003c/p\u003e\n\u003cp\u003eG) AUROC comparison across covariance conditions: shuffled, ZINB, mean-field and the true covariance. Note that ZINB only has 4 datasets associated with it, as reparameterization for the other datasets led to pathological results (negative mean, see \u003cstrong\u003eMethods 4\u003c/strong\u003e). (Shuffled Σ vs ZINB (n=4): one-sided p-value = 0.31; ZINB vs Meanfield (shuffled X0) (n=4): one-sided p-value = 0.063; Meanfield (shuffled X0) vs Real Σ (n=10): one-sided p-value = 0.042)\u003c/p\u003e\n\u003cp\u003eH) Receiver-operator characteristic curves across datasets containing double (2-gene) perturbations.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/5a99ec9e6c6e30b0f8b0eca7.jpg"},{"id":88879247,"identity":"1336c93b-ab3c-4d4b-91dd-6247a21d6ed0","added_by":"auto","created_at":"2025-08-12 10:48:43","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":136013,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe effective dimensions of transcriptome-wide response.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA) Distributions of participation ratio (effective dimension) across all perturbations for three Perturb-seq datasets.\u003c/p\u003e\n\u003cp\u003eB) Clustered heatmaps of the fraction of the response that falls along each principal component for the datasets in A).\u003c/p\u003e\n\u003cp\u003eC) Gene-ontology significance heatmaps (clustered) from gene sets with high-loadings in each principal component.\u003c/p\u003e\n\u003cp\u003eD) Mean participation ratios vs. mean \u003csup\u003e2\u003c/sup\u003e values over the CRISPRi/a Perturb-seq datasets considered. Each point corresponds to a dataset and we show the best fit line describing the linear trend.\u003c/p\u003e\n\u003cp\u003eE) The effective number of genes driving global change,calculated from the entropy of the fractional contributions of each gene’s change from every other gene.\u003c/p\u003e\n\u003cp\u003eF) The effective number of genes impacting the expression change of the true perturbed gene across datasets, calculated in the same way as E).\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/80a4c78ba1b8774a3f8569f4.jpg"},{"id":91158351,"identity":"001ca42e-3c59-49df-8f6e-0ed6a48c206b","added_by":"auto","created_at":"2025-09-12 08:30:32","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1568993,"visible":true,"origin":"","legend":"Article File","description":"","filename":"KuznetsSpeckCIPHERMaintextMethods5Aug2025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1_covered_12db9bd4-9898-4dc8-87bd-baf1b4741dcd.pdf"},{"id":88879246,"identity":"732070d2-91b4-487a-a694-7d94a0977121","added_by":"auto","created_at":"2025-08-12 10:48:43","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":28346,"visible":true,"origin":"","legend":"Table 1","description":"","filename":"Table1KuznetsSpeckCIPHER27June2025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/0084c00d1d4801f67030b81b.pdf"},{"id":88880250,"identity":"09086b77-dd99-4955-b57f-94569570d0fb","added_by":"auto","created_at":"2025-08-12 10:56:43","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":6867050,"visible":true,"origin":"","legend":"Supplementary Figures","description":"","filename":"FiguresSIKuznetsSpeckCIPHER5Aug2025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7304871/v1/1538f495d39c70e3cbba776e.pdf"}],"financialInterests":"\u003cp\u003eThere is \u003cstrong\u003eNO\u003c/strong\u003e Competing Interest.\u003c/p\u003e\n\u003cp\u003eTable 1 is available in the Supplementary Files section.\u003c/p\u003e","formattedTitle":"Fluctuation structure predicts genome-wide perturbation outcomes","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"single-cell perturbations, genome-wide responses, linear response theory, fluctuations, Bayesian statistics","lastPublishedDoi":"10.21203/rs.3.rs-7304871/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7304871/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Pooled single-cell perturbation screens represent powerful experimental platforms for functional genomics, yet interpreting these rich datasets for meaningful biological conclusions remains challenging. Most current methods fall at one of two extremes: either opaque deep learning models that obscure biological meaning, or simplified frameworks that treat genes as isolated units. As such, these approaches overlook a crucial insight: gene co-fluctuations in unperturbed cellular states can be harnessed to model perturbation responses. Here we present CIPHER (Covariance Inference for Perturbation and High-dimensional Expression Response), a conceptual framework leveraging linear response theory from statistical physics to predict transcriptome-wide perturbation outcomes using gene co-fluctuations in unperturbed cells. We validated CIPHER on synthetic regulatory networks before applying it to 11 large-scale single-cell perturbation datasets covering 4,234 perturbations and over 1.36M cells. CIPHER robustly recapitulated genome-wide responses to single and double perturbations by exploiting baseline gene covariance structure. Importantly, eliminating gene-gene covariances, while retaining gene-intrinsic variances, reduced model performance by 11-fold, demonstrating the rich information stored within baseline fluctuation structures. Moreover, gene-gene correlations transferred successfully across independent studies of the same cell type, revealing stereotypic fluctuation structures. Furthermore, CIPHER outperformed conventional differential expression metrics in identifying true perturbations while providing uncertainty-aware effect size estimates through Bayesian inference. Finally, most genome-wide responses propagated through the covariance matrix along approximately three independent and global gene modules. CIPHER underscores the importance of theoretically-grounded models in capturing complex biological responses, highlighting fundamental design principles encoded in cellular fluctuation patterns.","manuscriptTitle":"Fluctuation structure predicts genome-wide perturbation outcomes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-12 10:48:38","doi":"10.21203/rs.3.rs-7304871/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d8f692ca-bc32-4251-8544-5a3f00d17bd8","owner":[],"postedDate":"August 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":52974963,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":52974964,"name":"Biological sciences/Systems biology/Information theory"}],"tags":[],"updatedAt":"2025-09-12T08:22:19+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-12 10:48:38","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7304871","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7304871","identity":"rs-7304871","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.