FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 18,892 characters · extracted from preprint-html · click to expand
FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models Gary Bader, Changjiang Xu, Delaram Pouyabahar, Veronique Voisin, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6372285/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 05 Feb, 2026 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract Single-cell RNA sequencing (scRNA-seq) enables detailed comparisons of gene expression across cells and conditions. Single-cell differential expression analysis faces challenges like sample correlation, individual variation, and scalability. We developed a fast and scalable linear mixed-effects model (LMM) estimation algorithm, FLASH-MM, to address these issues. We reformulate aspects of the model estimation procedure to make it faster, by reducing computational complexity and memory use in the case of working with a gene by cell matrix. Simulation studies with scRNA-seq data show that FLASH-MM is accurate, computationally efficient, effectively controls false positive rates, and maintains high statistical power in differential expression analysis. Tests on tuberculosis immune and kidney single cell data demonstrate FLASH-MM’s utility in accelerating single-cell differential expression analysis across diverse biological contexts. Biological sciences/Computational biology and bioinformatics/Statistical methods Biological sciences/Computational biology and bioinformatics/Software Figures Figure 1 Figure 2 Figure 3 Figure 4 Full Text Additional Declarations Yes there is potential Competing Interest. Gary Bader is on the Scientific Advisory Board of Adela Bio. No other competing interests are declared. Supplementary Files FLASHMMExtendedData.pdf Extended data FLASHMMSupplementaryInformationfinal.pdf Supplementary information nrsoftwarepolicy2.pdf Software Policy Checklist nrreportingsummary2.pdf Reporting Summary Cite Share Download PDF Status: Published Journal Publication published 05 Feb, 2026 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6372285","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":441196458,"identity":"438528ca-9801-4c9b-9031-8a8c3e4544e5","order_by":0,"name":"Gary Bader","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6klEQVRIiWNgGAWjYLACxgYQYj5Asha2BAawJjbitfAYEKdF3v3sMwnGHXay/dJnvkl/qLgnZy7ffIDhRw1uLYZn0s0kGM8kG8/sy90mceBMsbFlG1sCY88xPFoa0tgkGNuYEzec4d0mcbAtIXHDMR4DZnyuM+x/BtJSn7j/DM8zJC3/8PhFAmzL4cQNPDxsCC2Mbbi1GEg8Y7ZIPHPceMYZNmOLM2cSjA2OpSUc7O3DY0t/GuONjzuqZft7mB/eqKhIkDM4fPjggx/f8NhygIFFIgFd9ABuDUBbgAnlAz4Fo2AUjIJRMAoYADWuUjlu9IAMAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0003-0185-8861","institution":"University of Toronto","correspondingAuthor":true,"prefix":"","firstName":"Gary","middleName":"","lastName":"Bader","suffix":""},{"id":441196459,"identity":"1dd8019e-722d-4292-b0c9-21a081fb6bd2","order_by":1,"name":"Changjiang Xu","email":"","orcid":"","institution":"Terrence Donnelly Centre for Cellular and Biomedical Research, University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Changjiang","middleName":"","lastName":"Xu","suffix":""},{"id":441196460,"identity":"b1143c74-2087-4eaf-9d27-538e9cd2b92f","order_by":2,"name":"Delaram Pouyabahar","email":"","orcid":"https://orcid.org/0000-0002-8686-8067","institution":"University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Delaram","middleName":"","lastName":"Pouyabahar","suffix":""},{"id":441196461,"identity":"618bc327-e9b4-48a0-8608-188b54562655","order_by":3,"name":"Veronique Voisin","email":"","orcid":"https://orcid.org/0000-0002-3405-9532","institution":"University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Veronique","middleName":"","lastName":"Voisin","suffix":""},{"id":441196462,"identity":"7036ad67-3e45-4bf9-9e9a-5504f7aced87","order_by":4,"name":"Hamed Heydari","email":"","orcid":"https://orcid.org/0000-0001-6708-9116","institution":"University of Toronto","correspondingAuthor":false,"prefix":"","firstName":"Hamed","middleName":"","lastName":"Heydari","suffix":""}],"badges":[],"createdAt":"2025-04-03 22:45:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6372285/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6372285/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-026-69063-2","type":"published","date":"2026-02-05T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80584830,"identity":"dafb9948-fb95-4676-b4c1-e2721286c591","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":83583,"visible":true,"origin":"","legend":"\u003cp\u003eFLASH-MM workflow for single-cell differential expression analysis. A. Data: gene \u0026nbsp;expression matrix Y = log(1 + counts), with each row corresponding to a gene’s expression profile and \u0026nbsp;each column corresponding to a cell (gene x cell matrix). Metadata includes various variables such as \u0026nbsp;log-library size, batch effects, biological conditions of interest, and interactions between conditions and \u0026nbsp;cell types, which could be modeled as fixed effects, and individual subjects, which could be modeled as \u0026nbsp;random effects. B. Model: the linear mixed-effects model (LMM) for each gene by design matrices X \u0026nbsp;and Z, which are constructed based on prior knowledge about the covariates and the biological question. \u0026nbsp;C. Model fitting: comprises LMM estimation and tests. LMM estimation is implemented by a gradient \u0026nbsp;descent algorithm over summary statistics. The summary statistics are computed as XT X, XT YT , ZT X, ZT YT , and ZT Z. LMM tests perform hypothesis tests on the fixed effects and their contrasts using t-statistics\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/04c4a77848b2ad0449dd302b.png"},{"id":80585659,"identity":"4fa14d7c-66e4-4a0f-bc07-319cbd6e25da","added_by":"auto","created_at":"2025-04-15 00:42:40","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":77402,"visible":true,"origin":"","legend":"\u003cp\u003eComputational and statistical performance of FLASH-MM in differential expression \u0026nbsp;analysis of simulated scRNA-seq data. A) Boxplots of differences: comparison of variance \u0026nbsp;components, coefficients, and p-values between FLASH-MM and lmer fitting across various sample \u0026nbsp;sizes. B) Computation time: comparison of computation time (in minutes) for FLASH-MM, lmer, and \u0026nbsp;NEBULA across different sample sizes. C) QQ-plots of non-DE genes (negative controls) p-values for \u0026nbsp;FLASH-MM and NEBULA. The grey area represents the 95% confidence interval, indicating the \u0026nbsp;expected range under the null hypothesis. D) ROC curves for FLASH-MM and NEBULA\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/7337ee5508e43bdaa7b96324.png"},{"id":80584831,"identity":"d3bef6b3-e87c-4069-b045-f8f8f61ad5b0","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":84319,"visible":true,"origin":"","legend":"\u003cp\u003eFLASH-MM identifies sex-specific variations in a healthy human kidney map. A) UMAP \u0026nbsp;projection of the human healthy kidney transcriptomic map shows connecting tubule (CNT) cells in \u0026nbsp;purple, while other populations are shown in lighter tones for contrast. B) Bar plot indicates the fraction \u0026nbsp;of cells from each sex within different cell types. C) The top male-specific differentially expressed genes \u0026nbsp;within the CNT population are identified. Score is calculated as -log(adjusted p-value) x coefficient. \u0026nbsp;Genes are sorted by their score, and dot colors indicate -log(adjusted p-value). D) The top female specific differentially expressed genes within the CNT population are shown. Pathway enrichment \u0026nbsp;results based on male-specific and female-specific DE genes in the CNT population are shown in panels \u0026nbsp;E and F, respectively\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/7c71bc0cadf1c8640299a2a1.png"},{"id":80584832,"identity":"225e3535-35f7-4816-94ee-e27f2059dd76","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":280041,"visible":true,"origin":"","legend":"\u003cp\u003eFLASH-MM identifies TB-enriched signatures within T cell populations while \u0026nbsp;accounting for confounding variables. A) Bar plots indicate the number of differentially expressed \u0026nbsp;genes for each cell type. B) UMAP projection of single-cell RNA-seq data from 500K memory T cells \u0026nbsp;across 259 individuals in a TB progression cohort. CD4+ activated and CD8+ activated T cell \u0026nbsp;populations are highlighted in blue and dark orange, respectively, while other populations are shown in \u0026nbsp;lighter tones for contrast. C) The top TB-associated differentially expressed genes within the CD4+ activated and CD8+ activated T cell populations are identified. Score is calculated as -log(adjusted p value) x coefficient. Genes are sorted by their score and dot colors indicate -log(adjusted p-value) D) \u0026nbsp;Pathway enrichment results are presented for the TB-enriched differentially expressed genes within the \u0026nbsp;activated CD8+ and CD4+ T cell populations.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/6179a085484e511b82f4f0de.png"},{"id":104544682,"identity":"952fbb9b-361f-4fb3-8a38-4a3a56f2a050","added_by":"auto","created_at":"2026-03-13 07:06:35","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":755345,"visible":true,"origin":"","legend":"Article File","description":"","filename":"FLASHMMmanuscriptdraftfull.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1_covered_663bd7eb-6576-4c40-9d6d-34467400d10b.pdf"},{"id":80584838,"identity":"55cfac6d-512e-4315-bea5-38a59c8adb7a","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1626970,"visible":true,"origin":"","legend":"Extended data","description":"","filename":"FLASHMMExtendedData.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/e7dace247f54a1ec0323a25a.pdf"},{"id":80584842,"identity":"84ff4e2c-76f0-44ec-b6ec-ed5c56eaa8e9","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":3537522,"visible":true,"origin":"","legend":"Supplementary information","description":"","filename":"FLASHMMSupplementaryInformationfinal.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/cddc7e48105aef24a707ca88.pdf"},{"id":80584837,"identity":"3daf4d7a-884e-4f45-91d8-a4f00b61b271","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1316988,"visible":true,"origin":"","legend":"Software Policy Checklist","description":"","filename":"nrsoftwarepolicy2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/4d03108e1f5d6ca635323e5d.pdf"},{"id":80584840,"identity":"d90eb928-c988-4ffa-b7bd-404572ded8d7","added_by":"auto","created_at":"2025-04-15 00:26:40","extension":"pdf","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1666198,"visible":true,"origin":"","legend":"Reporting Summary","description":"","filename":"nrreportingsummary2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6372285/v1/819e5ff3601c1246fb8fecee.pdf"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nGary Bader is on the Scientific Advisory Board of Adela Bio. No other competing interests are declared.","formattedTitle":"FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6372285/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6372285/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Single-cell RNA sequencing (scRNA-seq) enables detailed comparisons of gene expression across cells and conditions. Single-cell differential expression analysis faces challenges like sample correlation, individual variation, and scalability. We developed a fast and scalable linear mixed-effects model (LMM) estimation algorithm, FLASH-MM, to address these issues. We reformulate aspects of the model estimation procedure to make it faster, by reducing computational complexity and memory use in the case of working with a gene by cell matrix. Simulation studies with scRNA-seq data show that FLASH-MM is accurate, computationally efficient, effectively controls false positive rates, and maintains high statistical power in differential expression analysis. Tests on tuberculosis immune and kidney single cell data demonstrate FLASH-MM’s utility in accelerating single-cell differential expression analysis across diverse biological contexts.","manuscriptTitle":"FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-15 00:26:35","doi":"10.21203/rs.3.rs-6372285/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"f50acce5-37ba-4673-9273-dfa2a87e69a1","owner":[],"postedDate":"April 15th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":46977137,"name":"Biological sciences/Computational biology and bioinformatics/Statistical methods"},{"id":46977138,"name":"Biological sciences/Computational biology and bioinformatics/Software"}],"tags":[],"updatedAt":"2026-03-13T07:06:29+00:00","versionOfRecord":{"articleIdentity":"rs-6372285","link":"https://doi.org/10.1038/s41467-026-69063-2","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2026-02-05 05:00:00","publishedOnDateReadable":"February 5th, 2026"},"versionCreatedAt":"2025-04-15 00:26:35","video":"","vorDoi":"10.1038/s41467-026-69063-2","vorDoiUrl":"https://doi.org/10.1038/s41467-026-69063-2","workflowStages":[]},"version":"v1","identity":"rs-6372285","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6372285","identity":"rs-6372285","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0