Abstract
Cell states are modulated by intrinsic driving forces such as gene expression noise and extrinsic
signals from the tissue microenvironment. The distinction between intrinsic and extrinsic cell
state determinants is essential for understanding the regulation of cell fate in tissues during
development, homeostasis and disease. The rapidly growing availability of single-cell
resolution spatial transcriptomics makes it possible to meet this challenge. However, available
computational methods to infer topological tissue domains, spatially variable gene expression,
or ligand-receptor interactions are limited in capturing cell state changes driven by crosstalk
between individual cell types within the same niche. We present NiCo , a computational
framework for integrating single-cell resolution spatial transcriptomics with matched single -
cell RNA-sequencing reference data to infer the influence of the spatial niche on the cell state.
By applying NiCo to mouse embryo genesis, adult small intestine and liver data, we
demonstrate the capacity to predict novel niche interactions that govern cell state variation
underlying tissue development and homeostasis . In particular, NiCo predicts a feedback
mechanism between Kupffer cells and neighboring stellate cells limiting stellate cell activation
in the normal liver. NiCo pr ovides a powerful tool to elucidate tissue architecture and to
identify drivers of cellular states in local niches.
Introduction
Cellular states are marked by distinct molecular configuration s which determine fate and
function 1,2, and are intricately regulated by a complex interplay of myriad molecules,
interacting stochastically, only biased by their affinities. Despite this complexity, cell states
exhibit remarkable robustness ensuring reproducible cell fate decision and differentiation.
Within tissues, cell state variability can be dissected into intrinsic and extrinsic determinants.
For instance, stochastic binding of transcription factors to regulatory sites may lead to strong
fluctuations of mRNA molecules across cells within the same state . This transcriptional
bursting represents a cell-intrinsic source of variability and could even be actively modulated
and exploited for the regulation of developmental processes 3. Extrinsic molecular
determinants, on the other hand, arise from cell-cell communication (CCC) in tissues,
encompassing molecular signaling, physical interactions, and competition for metabolites.
CCC is pivotal for physiological processes such as development, homeostasis, disease
progression, and regeneration. Deciphering the gene programs orchestrating spatiotemporal
communication between cells within tissues remains an open challenge for single-cell biology4.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
2
The ability to dissect intrinsic and extrinsic determinants of a cell state would allow us to
acquire a deeper understanding of cell state regulation within tissues. Although single -cell
RNA-sequencing (scRNA-seq) permits the characterization of cell state heterogeneity and gene
expression noise 5–7, and the prediction of ligand -receptor interactions 8,9, spatial context is
lacking, and a distinction between intrinsic and extrinsic drivers of cell state changes is not
possible.
Over the last decade, spatially resolved high-resolution transcriptomics technologies have
rapidly advanced 10. Following the development of sequencing-based spatial transcriptomics
with a resolution of ~100 µm 11, several improved methods have been introduced increasing
the resolution to the sub-micrometer range (e.g., Stereo-seq 12, seqScope 13). However, owing
to the low sensitivity of transcript detection, pixels typically need to be aggregated into larger
bins to deconvolve local profiles into cell types by utilizing scRNA-seq reference data thereby
sacrificing single-cell resolution 10. As a complementary approach, highly-multiplexed single-
molecule FISH (smFISH), pioneered by MERFISH 14 and seqFISH 15, or in situ sequencing 16–
18 rely on imaging -based quantification of individual mRNA molecules in tissue sections,
typically for a few hundred genes . Crucially, the assignment of single mRNA molecules to
specific cells is a prerequisite for measuring covariation of gene expression in neighboring
cells to understand extrinsic divers of cell state changes, and can currently only be achieved by
imaging-based methods allowing cell segmentation. Availability of commercial solutions for
multiplexed smFISH and in situ sequencing is rapidly growing enabling the broader
community to explore fundamental aspects of cell state regulation in tissues.
Extrinsic determinants of the transcriptional state of a cell originating from the
microenvironment can be revealed by gene expression covariance in neighboring cells. We
developed NiCo with the goal to infer Niche Covariation of gene expression programs at cell
type resolution on a transcriptome-wide scale by integrating imaging -based spatial
transcriptomics with matched scRNA-seq reference data. Available computational approaches
for the analysis of spatial transcriptomics are focused on the inference of spatially variable gene
(e.g., SpatialDE 19), on spatial modeling of ligand -receptor interactions (e.g., SpaOTsc 20 and
COMMOT 21) or local gene-gene dependencies (e.g., MISTy 22 or GCNG 23), or on explaining
intra-cell-type variance by niche composition (NCEM) 24. These methods are typically
restricted to the set of genes directly measured in the spatial modality, and none of these
Methods
was designed towards inferring cell state dependencies of co-localized cell types. To
overcome these limitations NiCo integrates sensitive genome -wide quantification of gene
expression by scRNA-seq data with single -cell resolution spatial transcriptomics. NiCo does
not rely on mapping individual cells from the scRNA -seq into space or vice versa, which is
computationally expensive and inherently noisy. Instead , NiCo derives interpretable latent
factors reflecting cell state variability of each cell type captured within both scRNA -seq
Reference
and spatial data. These latent factors are then used to infer covariation of gene
programs in neighboring cell types from the spatial modality and to associate these factors with
transcriptome-wide expression patterns by utilizing the scRNA-seq data. We demonstrate that
this approach has the capacity to recover well -known functional niche interactions in the
developing mouse embryo, as well as in mouse intestine and liver. NiCo provides novel
insights into the cell states involved in these interactions and their spatial configuration. NiCo
complements existing methods specialized for spatial ligand -receptor interaction analysis to
infer cell state covariation in local niches and illuminates the consequences of local niche
interactions for cell state control.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
3
Results
Inferring covariation of gene programs in local tissue niches at single-cell resolution with
NiCo
NiCo integrates imaging-based spatial transcriptomics data, obtained by multiplexed smFISH
or in situ sequencing technologies such as MERFISH 14, seqFISH 15, or STARmap 18, and
commercial implementations of these approaches (Vizgen MERSCOPE, 10x Xenium,
Nanostring CosMx) which typically measure targeted libraries of up to 500 genes, with
scRNA-seq reference data that provide genome -wide coverage but lack spatial information.
NiCo requires a cell -by-gene count matrix and two -dimensional cell-center coordinates from
imaging-based spatial transcriptomics data that have undergone cell segmentation, as well as a
cell-by-gene count matrix of a scRNA -seq reference dataset along wit h cell type labels
comprising all cell types expected to occur in the spatial data.
NiCo was designed as a multi -step pipeline, facilitating (1) cell type annotation of the spatial
modality by label transfer from the scRNA -seq reference data, (2) recovery of niche
architecture for each cell type, (3) inference of covarying latent factors in co -localized cell
types, and association of these latent factors with functional pathways and molecular signaling
interactions (Fig. 1).
NiCo employs a n iterative approach to perform the cell type annotations (Fig. 1,
“Annotations”). First, mutual nearest neighbor s are identified in the spatial and scRNA -seq
modalities after normalization to eliminate technical variability. After pruning scattered
anchors utilizing Leiden clusters of the spatial modality, non-anchors are iteratively annotated
based on anchors among their k-nearest neighbors (Methods).
After annotating cell types in the spatial domain, NiCo interrogates niche architecture for each
“central” cell type (CC) in order to identify neighboring niche-cell types with a high predictive
capacity for the central cell -type identity. NiCo trains a regularized logistic regression
classifier to predict the identity of the central cell type from the normalized cell -type
frequencies in local-niche neighborhoods (Methods). The regression coefficients of all niche-
cell types permit prioritization of potential interaction partners within local neighborhoods
(Fig. 1, “Interactions”).
As a first step towards inferring covar iation of gene programs within locally interacting cell
types, NiCo has implemented different versions of non-negative matrix factorization ( NMF).
For each cell type, NiCo infers latent factors by integrative NMF 25,26 to capture common
variability in the spatial and the scRNA -seq data modality on the shared subset of genes.
Alternatively, if gene expression variability in the spatial modality is dominated by technical
artefacts, e.g., due to imperfect segmentation giving rise the “spill -over” between cell types,
NiCo offers conventional NMF on the shared set of genes only for the scRNA -seq modality
and infers the cell loadings of these factors in the spatial modality. The output of this step is a
set of latent factors capturing the cell state variability of each cell type based on the shared set
of genes between both modalities (Fig. 1, “Covariations”).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
4
Fig. 1| Exploring cell state covariation in local niches with NiCo. The schematic illustrates the three
modules of the NiCo pipeline. Annotations: query imaging -based spatial transcriptomics data are
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
5
annotated by label transfer from reference scRNA -seq data using a soft mutual nearest neighbor
approach to derive anchors followed by iterative annotation of non-anchors. Interactions: For each cell
type a logistic regression classifier is trained to predict the cell-type identity from the niche composition.
Predicted coefficients for each cell type are informative on predictive cell types within the niche and a
cell type neighborhood graph is derived from the regression coefficients. Covariations: First, l atent
variables are inferred for each cell type to capture cell state variability using non -negative matrix
factorization (NMF). The gene -by-factor matrix is learned simultaneously from reference and query
data; if the cell segmentation is imperfect as indicated, e.g., by “spill over” of cell type-specific markers,
it is learned only from reference data and transferred to the spatial modality. Factors can be associated
with full transcriptome information based on gene-factor correlations derived from the scRNA-seq data.
A ridge regression infers the dependence of each factor of the central cell type on all factors of the
niche-cell types. Significance and magnitude of the regression coefficients indicating factor covariation
in co-localized cell types, can be inspected in a dot plot. Covarying factors are interrogated for enriched
pathways and ligand-receptor pairs to functionally interpret niche interactions.
Finally, NiCo performs ridge regression of each latent factor from the central cell on the latent
factors of all predictive niche -cell types obtained by step 2 (“Interactions”). Significant
regression coefficients indicate either positive or negative covariation between the respective
factors (Fig. 1, “Covariations”). Functional gene modules can be associated with covarying
factors by performing pathway enrichment analysis on the top (anti-)correlating genes for each
factor obtained from the transcriptome-wide scRNA-seq data. In addition, these gene s can be
interrogated for covarying ligands and receptors as potential mediators of this crosstalk. This
multifaceted integration strategy overcomes limitations of available methods and permits
identification of covarying gene programs in local niches at single cell type resolution.
NiCo enables accurate and scalable cell type annotation of imaging -based spatial
transcriptomics data
Cell type annotation of imaging-based spatial transcriptomics data on limited subsets of genes
is a challenging problem. Available computational methods addressing these tasks were
primarily optimized for spot -deconvolution of sequencing -based spatial transcriptomics.
Nonetheless, state -of-the-art methods such as cell2location 27, Tangram 28, TACCO29, and
uniPort 30, can be applied to imaging -based data that have undergone cell segmentation. In a
rigorous benchmarking analysis, we conducted a comprehensive comparison of NiCo
annotations against annotations generated by Tangram, uniPort, TACCO and cell2location on
published datasets, comprising mouse small intestinal MERFISH data 31, primary motor cortex
MERFISH data 32, and mouse embryo seqFISH data 33 (Extended Data Fig. 2c). Considering
the authors’ annotation as ground truths, we evaluated the performance of these methods with
well-established scoring metrics , i.e., adjusted Rand index and Jaccard similarity ( Methods).
Overall, NiCo outperformed Tangram and uni Port on all three datasets and exhibited higher
ground truth consistency on Primary Motor cortex and the embryo data than TACCO (Fig. 2a)
On the intestinal data, NiCo yielded a higher Jaccard similarity but a lower adjusted Rand index
than TACCO. Cell2location exhibited higher overlap with the ground truths on the intestinal
dataset, and showed comparable performance on the Primary Motor Cortex data, while NiCo
was more consistent with the ground truths for the embryo data (Fig. 2a). However, on large
datasets such as the MER SCOPE liver dataset with ~400k cells ( Data Availability ),
cell2location took more than two days to run, while NiCo finished in less than three hours on
an AMD Ryzen 9, 5950X, 16-core processor with 128 GiB memory (Fig. 2b). Therefore, NiCo
enables scalable reference-based cell type annotation of imaging-based spatial transcriptomics
data on a standard laptop.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
6
Fig. 2| Benchmarking NiCo annotation and interaction inference . a, Benchmarking of NiCo
annotations with uniPort, Tangram, TACCO, and cell2location for published datasets (mouse intestine
and primary motor cortex MERFISH data, and mouse embryo seqFISH data; see text for details) with
available ground truths annotations (authors’ annotation). Consistency of annotatio ns between two
Methods
was evaluated by Jaccard similarity (JAC) and adjusted rand index (ARI). b, Memory
requirement (left) and run time (right) for cell type annotation and niche prediction task on intestinal
MERFISH data (7,417 cells, top) and liver MER SCOPE data (391,679 cells, bottom). SpaGCN was
aborted on the liver datasets due to exhausted memory. Runtime is shown on a log 2 scale. (c-d),
Simulation experiments for testing NiCo’s interactions module. The locations of cells were simulated
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
7
for six cell types in 2D using Lennard-Jones (LJ) pairwise interaction potentials (first panel). Simulated
cell type locations were used as input to NiCo’s interactions module (second panel). The confusion
matrix (third panel) highlights the predictive cap acity of the neighborhood composition. Classifier
coefficients (fourth panel) are ordered by magnitude, reflecting co -localization preference, or
interaction strength. A more uniform (c) and a more biased (d) scenario were simulated and assessed
based on direct neighbors. See Methods for explanations of the parameters. e, Same as (d) but using a
larger neighborhood radius (R=5). f, Niche cell type interactions inferred with MISTy for direct
neighborhoods (left: juxtacrine view, right: paraview radius 5) on the more complex simulation scenario
(d). The two -dimensional map indicates the importance of neighboring cell types (Predictor) for
predicting central cell types (Target). (g-h), Comparison of cell type interactions predicted by NiCo
with cell type enrichment in topological tissue domains predicted by tissue domain detection methods
(CellCharter, SpaGCN, Stagate, Banksy, SpatialPCA, Seurat) on (g) Allen brain MERFISH atlas data
and (h) STARmap visual cortex data. The Pearson correlation between predicted c ell type enrichment
Z-score and cell type enrichment in best matching ground truths tissue domains is compared (Methods).
For NiCo, the central cell type with the highest correlation was selected and the interaction coefficients
were correlated to the Z-score.
NiCo identifies predictive niche interactions
To test NiCo’s ability to predict niche interactions, we simulated spatial architectures of tissues
with six cell types. To simulate preferential niche interactions between these cell types, we
modelled mutual affinities by specifying Lennar d-Jones potentials of varying attractive
strengths ε between specific pairs of cell types (Methods). The simulated particle
configurations thus reflect preferential niche interactions, which can be ranked by the strengths
of their relative attractive interactions. We assessed NiCo’s regularized logistic regression
classifier on these simulated data. Inspection of the confusion matrix, comparing predicted and
ground truths cell type s for two scenarios, demonstrates NiCo’s capacity to predict cell type
identities based on niche composition as a function of the simulated interaction strength (Fig.
2c-e). Magnitude and signs of the regression coefficients were consistent with the simulated
interactions, confirming NiCo’s capability for niche prediction. In the first scenario (Fig . 2c),
NiCo predicted the strongest interaction for the central cell type T3 with T5 (ε = 5), and for
central cell type T0, NiCo predicted the strongest interaction with T2 ( ε = 3). Testing a more
complex configuration (Fig. 2d), NiCo predicted the strongest interaction for the central cell
type T3 with T2 ( ε = 10), the second -strongest interaction with T1 ( ε = 8), and the third-
strongest with T5 ( ε = 5). Similar conclusions hold true for the other cell types . For a larger
neighborhood radius (R=5, Fig. 2e), the relative rankings are not always mirrored by the
inferred predictors due to stronger second -order effects. In general, we recommend running
NiCo with radius R=0 to focus on juxtacrine interactions before exploring larger radii. MISTy22
is an available machine learning framework for predicting structural and functional interactions
from spatial omics data. To benchmark with NiCo’s interaction module, we applied MISTy to
the second simulated scenario for juxtacrine interactions (Fig. 2d, Methods). MISTy recovered
the relative ranking by interaction strength with some exceptions (Fig. 2f). For instance, the
best predictor of T5 were T0 and T4 instead of T3. However, since MISTy interaction
coefficients reflect predictor importance derived from random forests regression, a high value
could indicate positive or negative interactions, which makes a direct interpretation difficult.
Similarly, MISTy only picks up T3 as predictor of T2, but fails to detect T0. Moreover, T5 was
not inferred as predictor of T3. The predictive contribution of higher order interactions becomes
apparent when inspecting MIST y’s paraview and comparing to NiCo with radius R=5 (Fig.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
8
2e), e.g., for predicting T5: both methods now detect T4 and T2 as best predictors. However,
NiCo reveals that T4 is depleted while T2 is enriched in the paraview.
Overall, NiCo’s predictions were more consistent with the ranking by simulated interaction
strengths (Fig. 2d).
Predicted niche interactions are consistent with tissue domain architecture
A key objective of spatial transcriptomics analysis is the identification of topological tissue
domains with distinct cell type architectures. Although NiCo was not designed specifically for
this task, its predicted cell type interactions may be reflective of tissue domain architecture. To
benchmark NiCo with tissue domain detection methods, including Stagate 34, Seurat
BuildNicheAssay35, CellCharter 36, SpaGCN 37, SpatialPCA 38, and Banksy 39 we ran each
Method
on spatial MERFISH whole mouse brain data with available ground truths annotation
of six regional tissue domains comprising 17 cell types in total (Methods) 40. For each domain,
we selected the NiCo cell type with the highest correlation of the regression coefficients to the
domain-specific cell type frequencies, and compared to the cell type enrichments computed for
the domains predicted by each method (Extended Data Fig. 1 and 2a,b ). Overall, the
interactions predicted by NiCo were on average more consistent with ground truths domain
annotations than the domain predictions obtained by available methods (Fig. 2g). On another
dataset of mouse visual cortex STARmap data with ground truths annotation of neocortical
layers 18, NiCo’s performance was comparable to the domain detection methods (Fig. 2h and
Extended Data Fig. 3). We note that NiCo predicts global interaction coefficients, while
domain detection methods predict local tissue domains. The observation that NiCo interaction
coefficients of specific cell types frequently reflect domain architecture indicates that these cell
types dominate particular domains. Benchmarking of memory usage and computation time for
niche domain detection on an AMD Ryzen 9, 5950X, 16-core processor (Fig. 2b) suggests that
NiCo is both fast and memory -efficient compared to available niche detection methods and
does not require any pre-specified number of clusters or cluster resolution parameters.
NiCo deciphers cellular crosstalk during mouse organogenesis
In the following, w e demonstrate the predictive capacity of NiCo on three different tissue
environments with varying complexities and distinct architectural features. As a first scenario,
we interrogate niche architecture and covariation of gene programs between local neighbors in
the context of mouse embryonic development. We applied NiCo to seqFISH data of E8.5
embryos 33, where a diversity of organ tissues has already developed, and utilized an scRNA-
seq reference data of the same developmental stage 41. To facilitate data integration from two
modalities, we annotated cell types within the seqFISH dataset using the NiCo annotation
Method
with default parameters to transfer the cell type labels from the scRNA-seq clusters to
the seqFISH dataset (Fig. 3a-b). With the emergence of organ structures and the establishment
of 3D spatial tissue domains, embryonic tissues pose a unique challenge when deciphering
niche interactions 42,43. We applied the interaction module of NiCo to interrogate the dynamic
interplay within and across these tissue domains. The level of confusion of NiCo’s cell type
classifier reflects the predictive capacity of the niche composition for each cell type. The
majority of cell types in the E8.5 embryo are predicted well from their niche composition (Fig.
3c) with a weighted average precision of 0.71. This observation is not unexpected given the
topological ordering of many cell types into tissue domains (Fig . 3a), e.g., gut,
forebrain/midbrain/hindbrain, and heart tissue ( cardiomyocytes). In these cases, the relative
magnitude of regression coefficients suggests that the highest predictive capacity is contributed
by neighbors of the same cell -type identity (Fig. 3d). Conversely, elevated values in the off -
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
9
diagonal entries of the confusion matrix reflect similarities in the niche composition of distinct
cell types, e.g., for endothelium and haematoendothelial progenitors (Fig. 3c). Hence,
inspection of the confusion matrix enables delineation of cell types with highly specific niche
composition, and pairs or even larger groups of cell types with similar niche composition.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
10
Fig. 3| NiCo identifies niche architecture and covariation during mouse embryogenesis. a, Spatial
representation of cell type annotations obtained with NiCo. See legend in (b). b, Illustrates the UMAP
representation based on gene expression of the spatial data. Cell type annotations obtained with NiCo
are highlighted. The legend indicates the number of cells for each cell type. c, Confusion matrix
highlighting predictive capacity of the niche for all cell types. d, Regression coefficients reflecting local
niche interactions for cardiomyocytes, gut, and haematoendothelial progenitors. Regression coefficients
(y-axis) were ordered by decreasing magnitude (x -axis). Error bars, see Methods. e, Cell-type
interaction map derived from the regression coefficients of NiCo’s interaction module (Methods). A
cutoff of c=0.01 was applied. Arrows point from predictive niche -cell types towards the central cell
type. f, Covariation between cardiomyocyte factors (y -axis) and co -localized neighborhood cell type
factors (x -axis). Circle si ze scales linearly with -log10(p-value), and circle color indicates ridge
regression coefficients. S denotes the normalized niche coefficient score. g, Spearman correlations (top)
and average expression in the corresponding cell type (bottom) for the top 20 positively and negatively
correlated genes from scRNA-seq reference data for pharyngeal mesoderm Fa1 and cardiomyocyte Fa2.
h, Ligand-receptor pairs correlated with pharyngeal mesoderm Fa1 and cardiomyocyte Fa2 (cc, central
cell; nc, niche cell). The rectangle’s north and south faces represent ligand and receptor correlation to
the factors, while west and east faces represent the proportion of ligand and receptor expressing cells.
Ligands, y-axis; receptors, x-axis. See Methods.
The regression coefficients of each cell type can be utilized to construct a spatial cell type
neighborhood graph highlighting dominant niche interactions. The sparsity of this graph can
be controlled by a cutoff on the regression coefficients. After applying a cutoff of 0.01, the
neighborhood graph (Fig. 3e) unveils well known tissue co -localization patterns, such as
extraembryonic (ExE) endoderm – definitive endoderm – gut reflecting the endodermal origin
of gut epithelium , or endothelium – haematoendothelium – blood progenitors (Fig. 3d)
consistent with the emergence of primitive blood cells from yolk sac endothelium at this stage.
The high connectivity of blood progenitors and erythroid cells is likely due to presence of these
cells within the vasculature.
NiCo’s covariation module identifies covarying molecular programs in co-localized cell types
encoded by cell type -specific latent variables, or factors. To explore covariation between co -
localized cell types in common niches, we inferred three latent factors for each cell type,
capturing major intra -cell-type variability. Using NiCo’s ridge regression step, we detected
covariation between these factors across pairs of co-localized cells.
Focusing on cardiac tissue, this analysis revealed strong positive covariation of each
cardiomyocyte factor with the same factor in neighboring cardiomyocytes, suggesting cell state
synchronization in neighboring cardiomyocytes (Fig. 3f). Another significa nt covariation
(log10P = -3.92) was detected between cardiomyocyte factor (Fa) 2 in and pharyngeal
mesoderm Fa1 (Fig. 3f). During mouse embryonic development, the vital role of the
pharyngeal mesoderm is well documented, contributing significantly to exten sive regions of
both head and heart musc le 44. Heart formation involves two distinct cardiac progenitor cell
populations: the first heart field (FHF) and the second heart field (SHF) 45. The T -box
transcription factor Tbx5 marks the FHF, which is responsible for the left ventricle and atria
development 46. Conversely, around E8.0, the SHF, originating from the pharyngeal mesoderm
and characterized by Islet1 (Isl1) expression, populates the cardiac outflow tract and right
ventricle 47,48. Studies suggest that canonical and noncanonical WNT signaling are essential for
progenitor cell proliferation in pharyngeal mesoderm and the development of SHF heart
differentiation 49. Anomalies in the signaling communication between the pharyngeal
mesoderm and cardiomyocytes are associated with congenital heart diseases 50.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
11
To interpret this covariation pattern, we inspected the top positively and negatively correlating
genes with cardiomyocyte Fa2 and pharyngeal mesoderm Fa1, computed from the scRNA-seq
Reference
data to leverage genome-wide information (Fig. 3g). While key markers for cardiac
progenitors such as Nkx2-5, Isl1, Mef2c, and Fgf8 were negatively correlated with pharyngeal
mesoderm Fa1, mature cardiac genes involved in muscle contraction , e.g., cardiac troponins
(Tnnt2, Tnni3, Tnnc1, Tnnt1, Tnni1), and other mature cardiomyocyte markers such as Myl2,
Myl3 and Acta anticorrelated with cardiomyocyte Fa2. These associations were also reflected
by enriched biological processes among the top positively and negatively correlating genes
which further highlights an association of pharyngeal mesoderm Fa1 with translational activity
(Extended Data Fig. 4a-b). To explore signaling interactions that may mediate this interaction,
we inspected ligands and receptors correlated to the covarying factors, and identified a potential
interaction between Wnt2 on cardiomyocytes and Fzd2 on pharyngeal mesodermal cells (Fig.
3h), consistent with the known requirement of WNT signaling for SHF progenitor proliferation
and differentiation from pharyngeal mesoderm 49. Moreover, the secreted signaling mediator
Hmgb1 is among the top three genes most highly correlated to Pharyngeal mesoderm Fa1.
Early knockout of this gene in the developing hearts leads to cardiac growth retardation 51.
Therefore, Hmgb1 signaling from the pharyngeal mesoderm could be an inducer of
cardiomyocyte maturation (Fig. 3g). These observations indicate the co-localization of
immature cardiomyocytes with translationally active putative cardiac progenitors in the
pharyngeal mesoderm , consistent with the SHF developmental origin , highlighting the
capability of NiCo to detect cell state covariation in local niches underlying embryonic tissue
development.
NiCo unveils covariation of intestinal stem cell and Paneth cell states
The intestinal epithelium undergoes permanent homeostatic turnover fueled by an active stem
cell compartment. This renewal process maintains the relative proportions of various cell types
along the crypt -villus axis. Lgr5+ stem cells, localized at the bottom of the intestinal crypts,
give rise to proliferating transient-amplifying (TA) progenitors and eventually differentiate into
mature epithelial cell types 52. This differentiation process is orchestrated with an upward
migration from the crypt bottom to the tip of the villus. Recent observations indicate that Paneth
cells, a known stem cell niche, could differentiate directly from stem cells 53.
To conduct a spatially resolved analysis of intercellular crosstalk within the mouse small
intestine, we applied NiCo to published MERFISH data 31. Integration with a comprehensive
scRNA-seq dataset 54 facilitated cell type annotation in the spatial domain reflecting known
localization of cell types along the crypt -villus axis (Fig. 4a-b). While stem and TA cells
(stem/TA) as well as Paneth cells were localized to the crypt bottom, enterocytes segregated
into bottom, medium, and top zone enterocytes. Goblet, tuft, and enteroendocrine cells were
distributed more randomly along the crypt villus axis. NiCo’s interaction analysis revealed a
predictive capacity of the niche composition for the majority of cel l types (Fig . 4c). Off -
diagonal elements in the confusion matrix, indicative of a similar niche composition for the
respective cell types, suggest that the goblet cells share a similar niche with the stem/TA cells
and bottom zone enterocytes, in line with their preferenti al localization to the lower crypt
region. Inspection of the regression coefficients indicated that stem/TA cells were predicted by
the presence of goblet and Paneth cells within their niche, and, vice versa, Paneth cells were
predicted by co-localized Paneth and stem/TA cells (Fig. 4d). The cell type neighborhood graph
derived from the regression coefficients segregated into a module of e pithelial cells, and a
module connecting different types of immune cells, lymphatic endothelial cells, as well as
vascular cells (Fig. 4e).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
12
Fig. 4| NiCo infers cell state covariation between mouse intestinal stem cells and Paneth cell
progenitors. a, Spatial representation of cell type annotations obtained with NiCo. See legend in (b).
b, UMAP representation based on gene expression of the spatial data. Cell type annotations obtained
with NiCo are highlighted. The legend indicates the number of cells for each cell type. cDC,
conventional dendritic cell; pDC, plasmacytoid dendritic cell. c, Confusion matrix highlighting
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
13
predictive capacity of the niche for all cell types. d, Regression coefficients reflecting local niche
interactions for stem/TA cells (top) and Paneth cells (bottom). Regression coefficients (y -axis) were
ordered by decreasing magnitude (x -axis). The red dotted line indicates the threshold for cell type
interaction (c=0.1) applied in (e). Error bars, see Methods. e, Cell type interaction map derived from
the regression coefficients of NiCo’s interaction module (Methods). A cutoff of c=0.1 was applied.
Arrows point from predictive niche -cell types towards the ce ntral cell type. f, Covariation between
stem/TA factors (y-axis) and co -localized neighborhood cell type factors (x -axis). Circle size scales
linearly with -log10(p-value), and circle color indicates ridge regression coefficients. S denotes the
normalized niche coefficient score. g, Spearman correlations (left) and average expression (right) for
the top 20 positively correlated genes from scRNA-seq reference data for stem/TA Fa1 and Paneth Fa1.
Dashed line demarcates the genes associated with each cell type . h, Ligand-receptor pairs correlated
with stem/TA Fa1 and Paneth Fa1 (cc, central cell; nc, niche cell). The rectangle’s north and south faces
represent ligand and receptor correlation to the factors, while west and east faces represent the
proportion of ligand and receptor expressing cells. Ligands, y -axis; receptors, x-axis. See Methods. i,
UMAP visualization of intestinal cell types (left) and normalized expression of Wnt3 (center) and Fzd7
(right). Data from Böttcher et al. 53. Cell counts are indicated in the legend.
Focusing on the stem/TA cell niche, NiCo’s covariation module inferred a significant
covariation of stem/TA cell Fa1 with goblet cell Fa2 (log10P = -4.00) and Paneth cell Fa1
(log10P = -1.43) (Fig. 4f). Inspection of the top -correlating genes with stem/TA Fa1 in the
scRNA-seq data revealed several known stem cell markers such as Olfm4, Lgr5, and Hopx.
Paneth cell Fa1, on the other hand, strongly correlated with markers of Paneth cell progenitors
(Fig. 4g and Extended Data Fig. 5a-b). This pattern suggests that the stemness signature
covaries with a progenitor signature in neighboring Paneth cells. We note that Paneth cells can
emerge directly from stem cells and the nascent Paneth cell progenitors may thus send the Wnt
signal to the sister stem cells in order to maintain their stemness 53 (Extended Data Fig. 5c-d).
Interrogation of ligand -receptor pairs correlated to the covarying factors recovered the Wnt3
ligand correlating to Paneth cell Fa1 and Fzd2 as well as Fzd7 receptors correlating to stem/TA
cell Fa1 (Fig. 4h). Canonical Wnt signaling through the Wnt3 -Fzd7 axis is essential for the
survival of Lgr5+ intestinal stem cells 55. Reassuringly, regulation of canonical Wnt signaling
as well as negative regulation of development were identified among the enriched gene
ontology (GO) biological processes enriched in top correlating genes of stem/TA Fa1
(Extended Data Fig. 5e). Our spatial covariation analysis extends previous findings by
identifying Paneth cell progenitors that likely directly emerged from the stem cell pool as the
source of the critical Wnt ligand required for maintenance of the stem cell 53. Consistently,
Wnt3 is downregulated during Paneth cell maturation following the trend of Fa1 (Fig. 4i). We
note, that existing methods for the inference of spatial cell-cell crosstalk, such as Niche-DE56,
COMMOT21, stLearn 57and CellNeighborEX 58did not recover this interaction due to their
Limitation
to directly measured gene sets (Methods and Extended Data Fig. 6). In summary,
NiCo recovers the critical Paneth cell niche of intestinal stem cell maintenance without any
prior information and predicts nascent Paneth cell progenitors as a critical Wnt ligand source.
NiCo predicts covariation of stellate cell and Kupffer cell states in the mouse liver
The liver is the major metabolic organ in the body and possesses a remarkable regenerative
capacity 59. Liver tissue is functionally organized into hexagonally shaped units termed lobules
60. Within each lobule, hepatocytes are organized into distinct zones along the axis connecting
the central vein in the center with the portal vessels at the boundary. Metabolic and signaling
pathways exhibit zonal gradients, a phenomenon also reflected by transcriptome zonation in
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
14
hepatocytes and sinusoids connecting the central vein with the portal vein and the hepatic artery
61,62. Achieving spatial single-cell resolution is imperative to unravel the intricate cellular
interactions within zonated liver niches. We applied NiCo to publicly available highly-
multiplexed smFISH data on 347 genes in the mouse liver generated with the MERSCOPE
platform (Data Availability) and integrated these data with a comprehensive single-cell/single-
nucleus RNA-seq mouse liver atlas 63.
Fig. 5| NiCo recovers niche architecture of the mouse liver. a, UMAP representation based on gene
expression of the spatial data. Cell type annotations obtained with NiCo are highlighted. The legend
indicates the number of cells for each cell type. EC, endothelial cell; cDC, conventional dendritic cell;
pDC, plasmacyt oid dendritic cell; HsPC, haematopoietic stem and progenitor cell; ILC1, innate
lymphoid cell type 1. b, Select cell types are highlighted for a representative tissue area. Top, central
vein and portal vein endothelial cells (EC). Bottom, hepatocytes of di fferent zones, i.e., central (Hep
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
15
C), mid-zonal (Hep M) and portal (Hep P1/P2). c, Cell types from (b) are highlighted in the expression
UMAP representation from (a). d, Confusion matrix highlighting predictive capacity of the niche for
all cell types. e, Regression coefficients reflecting local niche interactions for stellate cells (left), central
vein EC (center) and central hepatocytes (right). Regression coefficients (y -axis) were ordered by
decreasing magnitude (x -axis). The red dotted line indicates the threshold for cell type interac tion
(c=0.15) applied in (f). Error bars, see Methods. f, Cell type interaction map derived from the regression
coefficients of NiCo’s interaction module (Methods). A cutoff of c=0.15 was applied. Arrows point from
predictive niche-cell types towards the central cell type.
NiCo could annotate 375,161 out of 391,679 cells (95.8%) in the spatial modality (Fig . 5a).
Since no author annotation was available as ground truths, we inspected spatial cell type
localization within the lobule. According to NiCo’s annotation , portal vein endothelial cells
localized to the margin of tissue areas that morphologically resemble large blood vessels,
distant from smaller area s demarcated by the presence of central vein endothelial cells.
Reassuringly, portal hepatocytes were annotated in vicinity to portal vein endothelial cells,
while central hepatocytes co-localized with central vein endothelial cells, and the intermediate
zone was populated by mid -zonal hepatocytes (Fi g. 5b-c). This observation demonstrates
NiCo’s capacity to correctly annotate cell states with more subtle transcriptional differences
such as hepatocytes within different zones. A comparison with alternative state -of-the-art
annotation tools such as cell2location, Tangram, TACCO and uni Port reveals that these
Methods
struggle with the recovery of zonated hepatocyte states (Extended Data Fig. 7). Apart
from NiCo, TACCO and cell2location managed to annotate hepatocytes of different zones
correctly, although the separation of mid -zonal and portal hepatocytes was less clear.
Moreover, on this dataset it took more than two days of computation time on a standard
machine to run cell2location, while NiCo finished in few hours (Fig. 2b).
The logistic regression classifier of NiCo’s interaction module identified cell types with a
highly predictive niche s, such as portal and central vein endothelial cells, or cholangiocytes
(Fig. 5d), and the regression coefficients confirmed known associations such as co-localization
of central hepatocytes and central vein endothelial cells (Fig . 5e). Various immune cell
populations, on the other hand, such as dendritic cells, monocytes, or neutrophils, were not
predicted by the cell type composition of their niche suggesting a variable neighborhood
composition for these motile cells types . The cell type neighborhood graph derived from the
regression coefficients segregated into a portal cell type module, comprising portal vein and
lymphatic endothelial cells, as well as cholangiocytes, and portal hepatocytes, and a mid -
central module of central and mid central hepatocytes, central vein and sinusoidal endothelial
cells (Fig. 5f). Both modules were connected to fibroblasts, which are known to reside around
the central and portal blood vessels.
To understand whether the niche interactions identifie d from the regression coefficients were
meaningful even in situations where the predictive capacity was reduced, we focused on the
stellate cell niche, which exhibited a classification accuracy of 0.06 . Nonetheless, the logistic
regression classifier suggested specific positive associations with sinusoidal endothelial cells,
Kupffer cells, and central/mid-zonal hepatocytes (Fig . 5e). This niche composition aligns
perfectly with a recent characterization of the stellate cell niche 64. We demonstrate robustness
of this niche prediction a cross techinal replicate s. Running NiCo on a second liver slice
analyzed with MERSCOPE recovers liver cell type interactions in general, and the stellate cell-
Kupffer cell niche covariation in particular (Extended Data Fig. 8). Moreover, to test robustness
of NiCo to sample size, we repeated the analysis on a smaller sub -region containing only
88,772 (2 2.7%) cells. Again, NiCo recovered liver cell type interactions and the stellate -
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
16
Kupffer cell covariation (Extended Data Fig. 9). To explore the consequence of these niche
interactions for the state of stellate cells, we applied NiCo’s covariation module.
Fig. 6| NiCo predicts crosstalk between Stellate cells and Kupffer cells in the mouse liver. a,
Covariation between stellate cells factors (y -axis) and co-localized neighborhood cell type factors (x -
axis). Circle size scales linearly with -log10(p-value), and circle color indicates ridge regression
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
17
coefficients. S denotes the normalized niche coefficient score. b, Spearman correlations (left) and
average expression (right) for the top 20 positively correlated genes from scRNA-seq reference data for
Kupffer cell (KC) Fa1 and stellate cells Fa2. Dashed line demarcates the genes associated with each
cell types. c, Ligand-receptor pairs correlated with KC Fa1 and stellate cells Fa2. The rectangle’s north
and south faces represent ligand and receptor correlation to the factors, while west and east faces
represent the proportion of ligand and receptor expressing cells. KC ligands, y -axis; stellate cells
receptors, x-axis. See Methods. (d-e), UMAP visualization of mouse liver cell types (d) and normalized
expression of Tgfb1, Tgfbr3, Dcn, Clec4f (e). scRNA-seq/single-nucleus (sn) RNA-seq/CITE-seq data
from Guilliams et al. 63. Cell counts are indicated in the legend. f, Single-molecule fluorescence in situ
hybridization (smFISH) analysis of stellate cells and KC marker genes in mouse liver tissue sections.
Representative field of view with nuclei staining (DAPI, blue), and single mRNA molecule signals for
Tgfb1 (yellow), Dcn (red), and Clec4f (green). Asterisk, co-localized pairs of a stellate cell and a KC.
g, Scatterplot highlighting correlation of left ( Dcn and Tgfb1), right (Dcn and Clec4f) in co-localized
stellate cells an d KC, respectively. A regression line (broken dashed line) and Pearson’s correlation
coefficient (top left) are indicated. h, qPCR quantification ( DDCq) of five genes for HSC+Dcn,
HSC+Tgfb, and HSC+Tgfb+Dcn conditions against control (HSC). Biological replicates are
represented by different symbols. Paired t-test p-values comparing HSC+Tgfb against HSC+Tgfb+Dcn
for each gene are indicated on top of each bar.
The most significant covariation (log10P = -3.08) was inferred between stellate cell Fa2 and
Kupffer cell Fa1 (Fig. 6a). Stellate cells are quiescent in the normal liver, but undergo activation
and transdifferentiate into myofibroblasts producing alpha-smooth muscle actin (SMA) and
collagen after chronic liver damage 65,66. Kupffer cells function as resident macrophages and
shield the liver from bacterial infections 67. The top correlating genes with stellate cell s Fa2
(Fig. 6b) were enriched in viral defense and antigen presentation pathways (Extended Data Fig.
10a-b), indicative of a proposed function as innate immune cell 68. On the other hand ,
proteoglycans such as Dcn and Bgn were among the top correlating genes. Antigen presentation
and complement activation pathways were highly enriched among the top correlating genes
with Kupffer cell Fa1 reflecting an activated Kupffer cell state (Fig. 6b and Extended Data Fig.
10a-e). Focusing on ligand -receptor pairs correlated with the covarying factors, we detected
the ligand Tgfb1 on Kupffer cells and the receptor Tgfbr3 on stellate cells (Fig 6c). Tgf-β acts
as a potent inducer of fibrosis in the liver 69,70. Proteoglycans such as Dcn and Bgn are produced
by stellate cells and exert known anti -fibrogenic function s; they inhibit Tgf-β signaling by
sequestering Tgf-β ligands at the extracellular matrix or cell surface 71. The predicted
correlation of the Tgfb1-Tgfbr3 interaction with antigen presentation and immune activation in
Kupffer cells and similar functions as well as proteoglycan production in co -localized stellate
cells (Fig. 6d-e and Extended Data Fig. 10c-e), may indicate a functional dependence of these
processes in the healthy liver. Sensing of low levels of viruses and other pathogens may lead
to activation of Kupffer cells and induction of Tgf-β signaling, which is received by interacting
stellate cells. In stellate cells, this signal could induce proteoglycans which sequester Tgf -β
ligands to dampen pro -fibrogenic signaling in order to avoid stellate cell s activation at low
pathogen levels in the healthy liver . Upon chronic stimulation, this mechanism may be
insufficient to inhibit Tgf -β signaling surpassing a critical threshold , and thus leading to full
stellate cells activation and collagen production.
Since Tgfb1 and Dcn were not directly measured in the original MERSCOPE experiment, we
independently validated the inferred covariation between Tgfb1 in Kupffer cells marked by
Clec4f and Dcn in stellate cells by single-molecule hybridization chain reaction (smHCR) 72
(Methods). Indeed, this highly sensitive assay confirmed a strong correlation ( Pearson’s
correlation coefficient r =0.59) between Tgfb1 in Dcn in co -localized pairs of stellate and
Kupffer cells within healthy liver tissue (Fig. 6f-g and Methods). In comparison, correlation of
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
18
Dcn in stellate cells with the known marker Clec4f in co-localized Kupffer cells was markedly
reduced (Pearson’s correlation coefficient r=0.24) indicating the specificity of the covariation
between Dcn and Tgfb1 in pairs of stellate and Kupffer cells (Fig . 6g). We finally validated
dampening of hepatic stellate cell (HSC) activation by Dcn in vitro (Methods). qPCR
quantification after in vitro culture of HSC s stimulated with Tgf -b in the presence
(HSC+Tgfb+Dcn) or absence (HSC+Tgfb) of Dcn suggested dampened activation when Dcn
was administered to the culture. Specifically, HSC stimulation by Tgf-b in the presence of Dcn
led to significantly lower induction of activation markers Col1a1 (P<0.01) and Pdgfrb
(P<0.04), while the quiescent HSC marker gene Reln remained unaffected. (Methods, Fig. 6h
and Extended Data Fig. 10f). Induction of activation markers Lox and Acta2 was also reduced
although the fold -reduction did not reach significance . These observations support the
hypothesis that secreted Dcn may in deed dampen HSC activation by sequestering Tgf -b,
showcasing the capacity of NiCo to generate insightful hypotheses about the functional
implications of cell type-specific niche interactions in the liver.
NiCo detects covariation patterns in sequencing -based mouse cerebellum spatial
transcriptomics data
Although NiCo was designed for single-cell resolution spatial transcriptome data, we evaluated
its capabilities on a mouse cerebellum dataset generated with Slide-seqV2, a sequencing-based
spatial transcriptomics method with 10 μm resolution 74. Since pixels typically consist of 1 -2
cells, interaction and covariation inference may be confounded to a certain extent. Utilizing
cell type annotations for each pixel generated by the authors, we first ran NiCo’s interaction
module, which recovered wel l known interactions, e.g., between Purkinje neurons and a
specialized form of astrocytes called Bergmann glial cells 75 (Extended Data Fig. 11).
Covariation analysis between these co -localized cell types recovered covarying secretory
programs, e.g., calcium signaling from Bergmann cells to Purkinje neurons 76 (Extended Data
Fig. 11e). This vignette showcases the general applicability of NiCo to sequencing -based
spatial transcriptomics.
Discussion
With the availability of single -cell sequencing methods, our understanding of the molecular
definition of cell types has substantially advanced. As a key insight, it was recognized that a
cell type cannot be defined by a unique transcriptome state, but rather populates a continuous
region within the high -dimensional transcriptome space, reflecting systematic cell state
variability. Such variability could in theory be accounted for by cell -intrinsic factors such as
the architecture of the gene regulatory network governing a cell type, which could give rise to
multi-stability due to stochasticity of molecular interactions 77. However, cells within tissue are
exposed to extrinsic inputs from the microenvironment, which could be another mechanism to
induce cell state modulations. The ability to deconvolve intrinsic and extrinsic cell state
determinants represents a critical step forward towards understanding robustness and plasticity
of cell types in multicellular tissue environments.
Application of single-cell resolution spatial transcriptomics permits addressing this challenge.
NiCo explains cell state variability by gene expression covariates within cell types populating
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
19
the same niche. To avoid strong assumptions on the signaling mediators, we did not incorporate
ligand-receptor interactions as a constraint. Hence, predicted covariations may result from
direct signaling interactions but also from indirect sources such as cellular adaptation towards
biomechanical cues, or competition for metabolites 78. Nonetheless, NiCo proposes candidate
signaling pathways mediating predicted covariations by identifying ligands and receptors
correlating to the covarying factors within the interacting cell types. However, we believe that
alternative drivers of cell state covariation in local niches are important to consider, and
therefore provide a tool to detect the downstream effects on the transcriptional state of a cell
unconstrained by assumptions on the mediators of this interaction.
Since imaging-based technologies typically only measure few hundred genes, integration with
scRNA-seq data is critical for the functional interpretation of predicted covariations between
co-localized cells. NiCo solves this challenge by inferring a small set of latent factors capturing
dominant cell state variability for each cell type , and utilizes these factors to infer cell -cell
covariation and to associate these covariates with full transcriptome variability. It is important
to stress that cell state covariation inference requires mapping of individual mRNA molecules
to the cell of origin. This is currently not possible with sequencing -based spatial
transcriptomics, which require local pixel aggregation and subsequent deconvolution into cell
types due to limited resolution and sensitivity. This process destroys local covariance structure
between cell types in aggregated spots. With technological advancements, sequencing -based
Methods
may be able to achieve single -cell resolution in the future, and NiCo will be directly
applicable to this data type.
By applying NiCo to seqFISH data of E8.5 mouse embryos undergoing hierarchical dynamic
organization into tissue domains , we showcase that NiCo identifies cell -cell interactions
underlying the emergence of cardiac progenitors from the sub -pharyngeal mesoderm and
recovers Wnt signaling as a critical pathway regulating this process. For mouse intestinal
MERFISH data, NiCo provides deeper insights into the well-known Paneth cell niche of Lgr5+
intestinal stem cells, and associates Wnt activity with the Paneth c ell progenitor state. In the
mouse liver, NiCo captures cell state covariation between stellate cells and Kupffer cells, which
could be essential for limiting stellate cell activation in the healthy liver , supported by Dcn -
mediated dampening of HSC activation in vitro. These examples demonstrate that NiCo can
illuminate the effect of cell-cell interactions in local tissue environments.
NiCo relies on regularized regression analysis to identify predictive niche interaction s and to
infer factor covariation. Both steps are limited in their capacity to infer higher -order
interactions and the covariation analysis is insensitive to strongly non -linear covariation
patterns. While NiCo’s regression framework in principle could accommodate for higher-order
interaction terms this would massively increase model complexity, which would be detrimental
for robustness and interpretability. Future extensions should explore the capacity of non-linear
Methods
such as graph attention network s 79 for the exploration of higher-order interactions
and non-linear niche covariation.
In summary, NiCo fills a critical gap for identifying spatial dependencies between cell states
in common tissue niches and represents a significant step forward towards learning and
interpreting extrinsic driving forces of cellular states. With the rapidly increasing availability
of commercial platforms for high -resolution spatial transcriptomics we anticipate that NiCo
will be a key component to gain biological insights from such data.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
20
Methods
Dataset Description
We analyzed spatial data and reference scRNA-seq of four different mouse tissues across seven
studies.
Organogenesis: We acquired spatial transcriptomics data at single-cell resolution from mouse
organogenesis generated by seqFISH 33. The spatial data was captured during the embryonic
time point E(8.5–8.75) and comprised 19,451 individual cells and 351 genes. For this dataset,
24 distinct cell types were annotated by the authors, which were used as ground truths for
benchmarking cell type annotation. As reference dataset, a published scRNA-seq resource
covering different stages of mouse organogenesis was used 41, which was subset to the E8.5
timepoint corresponding to the spatial dataset . Cell designated as NaN cluster type were
removed, resulting in 16,909 cells and 19,371 genes across 29 clusters. After merging the two
blood progenitor clusters and three erythroid subtypes, 26 distinct cell types were identified by
the authors, which were utilized for NiCo’s annotation module.
Small intestine: We obtained single-cell resolution spatial transcriptomics data of the mouse
small intestine generated with the MERFISH technology 31. Corresponding scRNA -seq
Reference
data of the same tissue were obtained from an independent study 54. The published
MERFISH annotation comprises 19 unique cell types, profiling 241 genes across 7 ,416
individual cells. The scRNA -seq data encompass 32 ,287 genes, and 2 ,239 cells that are
stratified into two clustering partitions, i.e., high -resolution cell states and coarse cell types.
Within the cell state category, 26 distinct cell states were identified, while the coarse category
encompasses 17 unique cell types. To apply the NiCo annotation module, we merged all
populations of stem cell -related states, g oblet-related cells, and stroma -related populations,
respectively, resulting in 19 distinct cell type clusters. The intersection of the scRNAseq and
MERFISH data comprises 240 common genes.
Liver: We utilized single-cell resolution spatial transcriptomics data of the mouse adult liver
obtained with the Vizgen MERSCOPE platform (available from https://vizgen.com/data-
release-program/). Corresponding scRNA -seq reference data from the Liver Cell Spatial
Proteogenomics atlas were utilized 63. The MER SCOPE dataset includes 347 genes and
391,679 individual cells, with cell filtration based on a minimum count of 5. The scRNA-seq
atlas comprises 185,894 cells and 24,857 genes. We utilized the ‘mouseStStAll’ cell type
annotation provided by the authors. This annotation was modified by merging specific immune
cell types (CD8 Effector Memory T cells, CTLs, NKT cells, Naïve CD4+ T cells, Naïve CD8+
T cells, Th1s, TRegs, and Th17 cells into single T cell cluster; Patrolling Monocytes, Trans.
Monocytes and Monocytes-derived cells into single Monocyte cluster; and MoMac1, MoMac2,
and Peritoneal Macrophages into single Macrophage cluster) and by annotating subclusters of
hepatocytes. The hepatocyte population in ‘mouseStStAll’ was divided into eight sub-clusters,
which we categorized into four subpopulations of zonal hepatocytes based on their expression
of zonal markers 61.
Primary motor cortex: We accessed MERFISH data of the mouse primary motor cortex 32 and
corresponding scRNA -seq data 80. The MERFISH dataset comprises 280 ,186 cells,
encompassing 254 genes and 24 cell types as defined by the authors. These cells were sourced
from 64 tissue slides derived from two mice litter. From this dataset, we selected a specific
slide (mouse1, slice153) due to its continuous structure a nd the highest cell count, totaling
7,626 cells. The scRNA-seq data, termed ‘BICCN MOp scRNA 10X v3 Analysis AIBS’ were
obtained from the NeMo archive ; we removed ‘Low Quality’ and ‘doublet’ subclass labels.
This dataset consisted of 71,183 cells and 30,982 genes, and we retained 20 subclass labels as
defined by the authors. Nineteen of these cell types mirrored those identified in the MERFISH
data.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
21
Cerebellum Slide-seqV2: The Slide-seqV2 mouse cerebellum dataset 74 comprises 44,091 cells
and 5 ,160 genes, with cell types annotated using the RCTD 81 method by the authors. We
retained the singlet and doublet_uncertain categories, resulting in 30,575 cells and 18 cell
types. The NiCo interaction module was applied to this dataset with default parameters and the
covariation module was applied with the unimodal version of ordinary NMF.
STARmap mouse visual cortex dataset: STARmap mouse visual cortex data 18 comprise 1,207
cells, 1,020 genes, 16 cell types, annotated into 7 neocortical layers L1, L2/3, L4, L5, L6, HPC
(hippocampus) and cc (corpus callosum).
MERFISH Mouse brain atlas: The MERFISH atlas of the whole mouse brain 40 comprises 59
brain sections with approximately 4 million cells. We analyzed z-slice 11.2, containing 44,699
cells and 500 non-blank genes, falling into 17 cell type classes and 6 neighborhood classes with
regional specificity. The six neighborhood class are denoted as D1 (HY-EA-Glut-GABA), D2
(NN-IMN-GC), D3 (Pallium -Glut), D4 (Subpallium -GABA), D5 ( Subpallium-GABA; HY-
EA-Glut-GABA), D6 (Subpallium-GABA; NN-IMN-GC).
Niche Covariation (NiCo) algorithm
The NiCo cell type annotation module
We adopted a multistep approach to integrate single-cell resolution spatial transcriptomics with
scRNA-seq reference data, assuming a given cell type annotation of the reference data available
as clustering partition . We first give a brief overview of the pipeline, followed by a detailed
explanation of each step: (1) Determining anchors : Following preprocessing, NiCo first
identifies anchor cells between the spatial query data and the scRNA-seq reference data by the
mutual nearest neighbor (MNN) method 82. (2) Anchor filtering : W e made use of gene
expression-based clustering of the spatial data using the Leiden community detection algorithm
to filter out dispersed anchors mapping to reference clusters at low frequencies . (3) Iterative
annotation of non-anchors: Non-anchor cells in the spatial query data receive a cell type label
based on the cell-type frequency of anchors among their K-nearest neighbors (KNN) and join
the anchor set. This process is iterated.
(1) Determining anchors : NiCo’s annotation module takes cell-by-gene count matrices of
query and reference data, denoted as Mq and Mr, as input data. After sub-setting to the shared
set of genes measured in both modalities, normalization using the scTransform method 83 yields
Pearson’s residuals for downstream analysis . Next, we extracted the top 50 principal
components (PCs) from each normalized matrix, transforming them into latent dimension
matrices, dq and dr. These 50 PCs retained the majority of the variance in the data . We
standardized dq and dr and employed a KD -tree algorithm to find mutual nearest neighbor
(MNN) pairs applying a soft criterion. Specifically, we identify for each query cell q the set of
KNN in the reference data, termed sq. Similarly, we identify for each reference cell r the set of
KNN in the query data, termed sr. By default, we use K=50. Query cell q and reference cell r
are considered MNN, if qÎsr and rÎsq. This neighborhood-based definition of MNNs permits
for the existence of multiple mutual nearest neighbor cells in the scRNA-seq reference dataset
for a given query cell. In this case, the query cell is considered a “confused anchor”. This is
resolved by drawing information from the non-confused anchor cells among the KNNs (K=50
by default) of the query cell. After pruning neighbouring anchors that do not belong to the same
spatial guiding cluster as the query cell, a unique cell type is assigned based on the largest
proportion of pruned neighboring cells by majority vote. If this results in a tie or if no filtered
neighboring cells belong to the same spatial guiding cluster as the query cell, it is designated
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
22
‘NM' (Not mapped). If this procedure results in too many cells that could not be mapped, we
implemented a weighted score instead of a majority vote for cell type assignment based on
neighboring anchor information. The weighted score is computed based on the inverse distance
between the confused query cell and its neighboring anchors after pruning. Scores are
aggregated across neighbors with the same assigned cell types, and the cell type with the largest
score is assigned to the query cell.
(2) Anchor filtering: For each pair of MNN the cell type label is transferred from the scRNA -
seq reference annotation to the spatial query data. We next perform low resolution Leiden
clustering of the spatial query data dq to infer clusters for anchor pruning (spatial guide
clustering). These guiding clusters are utilized to filter out noisy anchors. Anchors of a given
cell type label are considered noisy, or dispersed, if they map to a spatial cluster with a
frequency lower than the dispersion parameter r (r=0.15 by default). This step implements
information sharing between anchors of the same cell type label by keeping anchors falling
into the same spatial guiding cluster at high frequency and removing anchors falling into spatial
guiding clusters at low frequency. The resolution of the spatial guide clustering can be adjusted;
lower values retain more anchors but may decrease specificity of the annotation.
(3) Iterative annotation of non -anchors: Anchor cells serve as a seed for annotating the
remaining non-anchor cells in the spatial query data. Each non -anchor cell is annotated by a
majority vote based on cell type proportion s of anchor cells among its KNN. If the cell type
with the highest proportion by majority vote and the queried non -anchor cell end up in same
guiding cluster, the queried non-anchor cell is assigned to this cell type. Moreover, if a query
cell’s majority vote for cell type assignment result s in a tie, NiCo mark s it as ‘NM’ (Not
Mapped). This process is iterated, and after each iteration anchors are updated with the
annotated non -anchors. This process is only iterated three times to avoid low confidence
annotations. NiCo successfully mapped ~96% of query cells in embryo and liver data, ~99%
of query cells in intestine data, and ~90% of quer y cells in brain data. To further reduce the
number of non-mapped cells, cell type annotations of neighbors can optionally be weighted by
their relative distances as explained in the step 1 to the query cell, which avoids ties b ut may
reduce specificity and is therefore not used as default setting.
Benchmarking cell type annotations
In our comparative analysis, we benchmarked NiCo against four state-of-the-art methods:
cell2location 27, Tangram 28, TACCO29 and uniPort30. The evaluation was conducted based on
two primary criteria: the adjusted rand score (ARI) and Jaccard similarity (JAC). The author’s
annotation of each dataset was considered as ground truth. To compute these metrics, the cell
type names for the reference annotation had to match exactly the author-defined annotation.
We identified 14, 20, and 17 matched cell types for intestine, embryo, and cortex data,
respectively, out of 19, 26, and 20 total scRNA-seq reference clusters. Initially, all datasets
were preprocessed with a filtering step, setting the minimum total transcript count to 5 in order
to eliminate low quality cells . For NiCo, the resolution parameter for spatial guide Leiden
clustering was set to 0.4 for intestine, cortex, and embryo datasets, and 0.5 for the liver data.
Tangram was executed with default parameters, mapping cells to space using the ‘clusters
mode’ and reference cluster labels from scRNA -seq data. uniPort was run with default
parameters, employing label transfer from reference data to query data for spatial cell type
annotations. TACCO was run with default parameters. For cell2location, parameters “N cells
per location=1” and “detection alpha=20” were applied, with all other parameters set to their
default values. To avoid memory overflow, cell abundance was estimated from the posterior
distribution summary in the large liver dataset by directly using quantile computation. For other
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
23
datasets, the quantile computation uses 1 ,000 “num_samples” to generate a posterior
distribution and use the 0.05 quantiles to obtain spatial cell type annotations, following the
developer’s recommendation.
Calculation of computation time and memory usage for cell type annotation and niche
prediction tasks
The computation time and me mory requirement for different tasks were measured using
Python’s time and memory_profiler modules. Specifically, the se module were called at the
beginning and end of the script to calculate the total execution time and memory usage.
The NiCo interaction module
To infer significant cell type interaction s based on spatial co -localization, NiCo employs a
logistic regression classifier to predict the cell -type identity of a “central” cell from the cell
type composition of its neighborhood. In short, the method proceeds in three steps. (1)
Inference of niche composition : We first determine the local -niche neighbors for each cell .
Subsequently, we infer global expected cell-type frequencies and determine fold enrichments
for every cell type in each neighborhood. (2) Logistic regression classifier: We then train a
logistic regression classifier to predict the central cell-type identity from the cell type
enrichment ratios in the local niche. (3) Cell type interaction graph: The regression coefficients
are then utilized to construct a cell type interaction graph.
(1) Inference of niche composition: Local neighborhood information is deduced based on cell
centroids. Two distinct schemes are employed for defining neighborhood composition. The
first scheme is based on juxt acrine signalling and includes only direct neighbors. This
approach, also referred to as ‘radius=0’ is executed through Delaunay triangulation. Since
Delaunay triangulation can include distant neighbors due to wide intercellular space in the
tissue we define a cutoff of 100 µm and discard neighbor with a centroid-to-centroid distance
beyond this limit . The second scheme accommodates paracrine signaling by including all
neighbors whose centroid is localized within a given radius R around the central cell’s centroid.
Once the neighbors are identified, we create a cell type count matrix Λij and a vector yi with
i=1,…,m and j=1,…,n. n corresponds to the number of cell types, and m is equal to the total
number of cells in the dataset. Λij denotes the count of cell type j in the neighborhood of the
central cell i. The cell-type identity of cell i is stored in yi Î (1, ..., K). The number of classes
K corresponds to the number of cell types and is therefore identical to the number of features
n. The counts Λij are normalized by expected global cell -type frequencies to obtain the
neighborhood enrichment matrix
𝑋!" = Λ!" / ∑ 𝑓" ∙ Λ!## (1)
where fj corresponds to the global frequency of cell type j and ∑ 𝑓" = 1" .
(2) Logistic regression classifier: NiCo employs a multi-class logistic regression classifier to
learn a set of coefficients reflecting the relevance of each niche cell type for predicting a given
central cell type. This classifier computes the probability 𝑃(𝑦! = 𝑘|𝑥!; 𝑊) of central cell i to
be of cell type k given the enrichment ratios 𝑥! of all cell types in the neighborhood of cell i:
𝑃(𝑦! = 𝑘|𝑥!; 𝑤) =
$%& () !*"+) #,!)
∑ $%& () %*"+) #,%)&
%'(
(2)
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
24
𝑥! corresponds to the i-th row of the neighborhood enrichment matrix 𝑋 and 𝑤# corresponds
to the k-th column of 𝑊.
The algorithm incorporates an L2 penalty for regularization, with its hyperparameter, denoted
as C, determined using the GridSearchCV function of the scikit -learn python library by
minimizing the negative log-likelihood
𝑚𝑖𝑛) 7−𝐶 ∑ ∑ [𝑦! = 𝑘] log?𝑃(𝑦! = 𝑘|𝑥!; 𝑊)@ +
.
/
0
#1.
2
!1. ∑ ∑ 𝑤",!
/0
"1.
2
!1. B (3)
Stratified 5-fold cross-validation is applied. The input feature matrix 𝑋 is normalized using the
StandardScaler function. Furthermore, class weights are adjusted using the balanced mode to
account for class imbalance. The ‘class_weight’ function of the scikit -learn python library
assigns higher weights to minority class es, allowing the model to pay more attention to its
patterns and reducing bias towards majority class es. In this mode, values of y adjust weights
inversely proportional to class frequencies in the input data.
The L-BFGS method implemented in the ‘lbfgs’ solver function of the scikit -learn python
library is employed during the training step, alongside the ‘multinomial’ logistic loss function
encompassing the entire probability distribution.
The vector 𝑤!
4 = ?𝑤!.
4 , … , 𝑤!2
4 @ corresponds to the i-th column of the coefficient matrix 𝑊4for
cross fold 𝑓, i.e., contains the cell type coefficients for predicting each central cell type i from
its niche composition 𝑥!. It is averaged across all data cross folds 𝑓 = 1, … , 𝐹:
𝛽! =
.
5 ∑ 𝑤!
45
41. (4)
The standard deviation of 𝑤!
4 reflects the uncertainty of the coefficients and serves to estimate
error bars. For follow-up analysis we recommend disregarding coefficients with error bars of
similar magnitude to the coefficient value itself.
To assess the predictive capacity of the niche composition for each cell type, we inspect the
confusion matrix, which compares the predicted cell type label to the ground truths label
corresponding to the NiCo annotation of the central cell . We report the average confusion
scores across data folds and normalize each row of the confusion matrix (ground truths cell
type labels) to one.
To facilitate the visualization of the coefficients for each central cell type i, we plot the mean
𝛽! in descending order of their absolute value along the x-axis and indicate uncertainty by the
standard deviation across data folds. Positive coefficients 𝛽!" > 0 indicate that the neighboring
cell type j preferentially co-localizes with central cell type i, whereas negative coefficients
𝛽!" < 0 signify preferential absence of cell type j from the niche of cell type i.
(3) Cell type interaction graph: To create a cell type interaction map highlighting cell type co-
localization patterns, we normalize 𝛽! by dividing it by the maximum absolute value of its
components 𝛽!". After applying a cutoff on 𝛽!", we draw edges connecting the central cell type
i and niche cell type j with directionality pointing from j to i, resulting in a graph structure
comprising all cell types and their interaction partners. A global cutoff can be applied to 𝛽!" to
control the sparsity of the graph. The edge thickness scales with 𝛽!". We employed a force -
directed layout within the framework of the Graphviz layout.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
25
Testing the NiCo interaction module on simulated ground truths data
To test the capacity of NiCo’s logistic regression classifier to predict the central cell type’s class
from the cell type neighborhood, we conducted a simulation where we create six cell types,
each containing 200 cells, in a 2D space using the LAMMPS molecular dynamics package 84.
This simulation was carried out with a varied Lennard -Jones (LJ) pairwise potential for
modeling interactions between cell types. LJ potentials describe interactions that are repulsive
on short range ( particle size σ), to simulate physical cell boundaries, and attractive at longer
distances with a given range (interaction cutoff range parameter rc) and strength (potential well
ε), to simulate preferential co-localization of defined pairs of cell types.
Figure 2C: We first initialized all (6 + 6 choose 2)=21 pairwise interactions, including self -
interactions, with the following parameters: potential well ε = 1, particle size σ = 2, and
interaction cutoff range parameter rc = 5. Subsequently, we introduced stronger interaction
potentials between T0-T2 (ε = 3) and T3-T5 (ε = 5). This allowed us to create a scenario where
T0 was expected to be in the neighborhood of T2, and vice versa, and T3 should be in proximity
to T5, and vice versa. The simulation box ha d dimensions of -50 to +50 in both X and Y , and
periodic boundary conditions were assumed. All masses were set to 1, initial velocities were
assigned to achieve a temperature of T = 1 at the normalized kB unit, and equilibration was
attained using NVE and Langevin’s thermostat. This ensured that each cell maintained the
desired temperature of T = 1, and the enforce2d command was employed to constrain cell
movement along Z direction to mimic the simulated locations in 2D plane. The final
configuration of the simulation served as the input for NiCo to predict neighborhood niche
interactions. The confusion matrix and coefficients for predicting each cell type were computed
for the default parameter of radius=0, considering only immediate neighbors.
Figure 2D: To test a more challenging situation, we simulated a more complex scenario . All
six pairwise interactions, including self-interactions, were initialized with parameters ε = 1, σ
= 2, and rc = 5. We then altered the ε parameters for T0 -T2, T3 -T5, T2 -T3, and T1 -T3
interactions to 3, 5, 10, and 8, respectively, to establish a distinct order among the cell types.
The confusion matrix and coefficients for predicting each cell type were computed for the
default parameter of radius=0, considering only immediate neighbors.
Benchmarking the NiCo interaction module with MistyR
In our comparative study between NiCo and M ISTy 22 for niche prediction, we transformed
cell types for the more complex, second simulated scenario into a one-hot encoding scheme.
Subsequently, we input these data into MISTy, generating a juxtaview with default parameters
and paraview with radius five. Inferred importance, i.e., z-transformed total variance reduction
scaled by (1 - p-value), was interpreted as indicating cell type interactions between a central
cell type (target) and niche-cell types (predictors).
Detecting covariation of gene expression in co-localized cells with NiCo
To detect covariation of co -localized cell states in local niches, NiCo’s covariation module
implements the following steps: (1) Latent variable inference: Identification of a set of latent
variables for each cell type that capture intra-cell type variability. (2) Latent variable
annotation: Inference of genes and pathways correlated to the latent factors, including ligand
and receptor pairs to predict signaling mediators. In this step, we focus on the scRNA-seq data
to obtain full transcriptome coverage. (3) Identification of niche covariation : NiCo applies
regularized linear regression to identify latent factors in co -localized cell types exhibiting
covariation.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
26
(1) Latent variable inference: NiCo employs an integrated Non-Negative Matrix Factorization
(iNMF) approach alongside ordinary NMF (oNMF) to identify latent factors from the shared
set of genes in the spatial transcriptomics and the scRNA-seq data. Both datasets are scaled but
not mean -centered. These NMF methods learn a lower -dimensional embedding space ,
encompassing spatial cell factors 𝐻67 ∈ ℝ+
8)* x 95
, scRNA-seq cell factors 𝐻68 ∈ ℝ+
8)+ x 95, and
shared gene embedding factors 𝑊 ∈ ℝ+95 x : , where 𝐿𝐹 corresponds to the user-defined number
of factors, 𝐺 denotes the number of shared genes across modalities, and 𝑐67 and 𝑐68 denote the
number of cells for a particular cell type in the spatial and the scRNA -seq data, respectively.
The iNMF methodology 25,26 draws inspiration from the LIGER package 25,26, and within
NiCo, it simultaneously learns common gene factors 𝑊, 𝐻68 factors and 𝐻67 factors from cell
type-specific scRNA-seq reference cluster 𝐸68 and query spatial cluster 𝐸67 after sub-setting to
the shared set of genes. The objective function of iNMF is designed to learn a shared factor 𝑊
across all datasets while capturing data heterogeneity through factors 𝐻* and 𝑉* with 𝑥 denoting
the modality:
𝑎𝑟𝑔𝑚𝑖𝑛( ∥ 𝐸68 − 𝐻68 (𝑊 + 𝑉68) ∥5
/ + 𝜆 ∥ 𝐻68 𝑉68 ∥5
/ + ∥ 𝐸67 − 𝐻67?𝑊 + 𝑉67@ ∥5
/
+ 𝜆 ∥ 𝐻67𝑉67 ∥5
/ ) (5)
Optimization incorporates penalizing the Frobenius norm (denoted by F) after accounting for
heterogeneous effects across modalities, 𝐻68 𝑉68 and 𝐻67𝑉67, and regularization of these
components with parameter 𝜆. The matrix product 𝐻*𝑉* encodes data-specific constraints for
modality 𝑥 tailored for heterogenous multi-modal data, following the principles outlined in ref.
26. Importantly, all learned matrices, 𝐻*, 𝑉*, and 𝑊, are constrained to be non-negative. The
selection of 𝜆, a tuning parameter, is critical and preferably small for a mixture of homogenous
datasets and larger for a mixture of heterogeneous datasets. To determine the optimal 𝜆 from
𝐸68 and 𝐸67, we execute iNMF for increasing values of 𝜆 set to even integers (𝜆 = 0,2,4,6, … ),
and selecting the 𝜆 that stabilize s the alignment score (equation ( 6)), following the
methodology outlined in ref. 85. In brief, we first randomly downsampled the larger dataset
(𝐻68 or 𝐻67) to match the size of the smaller dataset and merged them into a unified dataset.
We determine 𝐾 as 1% of the total number of cells in the smaller dataset, denoted as 𝑛, and
identified the 𝐾-nearest neighbors for each cell in the unified dataset. For every cell, we
calculated how many of its KNN belonged to the same dataset and average this across all cells
to obtain 𝑥̅. In an ideal scenario where 𝐻68 and 𝐻67 are learned perfectly, each cell’s K-nearest
neighbors would be evenly distributed across the two datasets. We then normalized this value
by the expected number of cells from the same dataset and scaled it to range from 0 to 1.
Ultimately, the alignment score between two successive 𝜆 iterations was considered stable
when the difference between them fell below the threshold of 0.001, determining our final 𝜆.
𝑠𝑐𝑜𝑟𝑒 = 1 −
*̅ < &
,
0 < &
,
(6)
For the MERSCOPE liver dataset, we noticed intermingling of all cell types in the dimensional
reduction representation of the spatial data ( Figure 5a). Potential explanations for this
observations are imperfect cell segmentation or high background. This leads to a challenge
during the factorization process of iNMF, since factors could become strongly associated with
spillover and/or background genes, which would not be indicative of their actual prevalence
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
27
within certain cell types. To address this issue, we implemented an alternative approach known
as ordinary NMF (oNMF). In oNMF, the matrices 𝐻68 and 𝑊 are only learned from reference
data 𝐸68. We use the iterative element -wise multiplicative update solver 86. The objective
function in oNMF aims to minimize the ‘Kullback-Leibler’ divergence, denoted as 𝑑09, which
measures the dissimilarity between matrices 𝑋 and 𝑌: 𝑑09 (𝑋, 𝑌) = ∑ (𝑋!," log a
=",%
>",%
b − 𝑋!," +!,"
𝑌!,"). We initialize the matrices 𝑊 and 𝐻68 using the nndsvda method of the scikit-learn python
library (Non-negative Double Singular Value Decomposition with zeros filled with the average
of Esc). Our optimization process runs for a maximum of 1,000 iterations, and no regularization
parameters are applied to matrices 𝐻68 and 𝑊. Following the derivation of 𝑊 from the scRNA-
seq data, we fix these factors for the spatial data while optimizing 𝐻67. This is achieved through
an analogous element-wise iterative multiplicative update rule keeping 𝑊 constant:
𝐻67[𝑖, 𝑗] =
?)*[!,"]∗(C-D)*- )[!,"]
EC-C?)*F[!,"] (7)
Importantly, we do not update the matrix W during this process. This strategy enables us to
infer spatial cell factors reflecting the gene expression variability information contained in the
scRNA-seq data.
The number of factors 𝐿𝐹 per cell type is a user-defined parameter. Since cell type populations
are already homogenous and intra -cell-type variability is typically limited, we execute NiCo
with a small number of factors per cell types ( 𝐿𝐹 = 3 by default). By performing consensus-
NMF analysis 87 on the intestinal datasets, we found that factor stability frequently drops
significantly for larger numbers of factors, and tuning the number of factors individually for
each cell type would require substantial user intervention. We recommend running NiCo with
three to five factors per cell type.
The factors obtained from NMF lack an inherent order. To address this, we introduce a
quantification of disorder within each factor 𝑖 of cell type 𝑘, referred to as entropy:
𝐸#[𝑖] =
< ∑ (?)+! [:,!]∗HIJ.(?)+! [:,!]))"
HIJ.(HK2JLME?)+ ! [:,!]F) (8)
This normalized entropy scales between 0 and 1. We calculate the entropy of scRNA-seq cell
factors (𝐻68) and reorder the factors by increasing entropy. Subsequently, the same ordering is
applied to the spatial cell factors (𝐻67). We found that the normalized entropy of iNMF-derived
factors is typically higher than for oNMF-derived factors. This sorting process provides a more
structured and informative representation of the factors.
(2) Latent variable annotation: Finally, both NMF approaches yield a factors-by-genes matrix
𝑊; each factor thus corresponds to a linear combination of genes, represent ing specific gene
modules or meta -genes for each cell type . Since factors are derived from the shared set of
genes, they do not provide transcriptome -wide information. To conduct comprehensive gene
covariation analysis, pathway enrichment, and ligand-receptor analysis, we require information
associating the whole transcriptome with each of these factors. To achieve this, we utilize two
distinct metrics, the Spearman correlation and cosine similarity . Specifically, the Spearman
correlation measures the association of each gene with every factor by calculating correlations
between each column of 𝐸68 (representing cell type-specific scRNA-seq whole gene expression
profiles) to each column of 𝐻68. Accordingly, the cosine similarity is obtained by taking the
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
28
dot product between a column of 𝐸68 and the column of 𝐻68. Top positively or negatively
correlating genes for each factor could then be inspected and functionally annotated, e.g., by
pathway enrichment analysis. Ligands and receptors among these genes can be nominated for
inferring intercellular signaling underlying observed covariations of factors between cells.
(3) Identification of niche covariation : We applied regularized linear regression analysis to
elucidate covariation between latent factors of co-localized cell types residing in the same
tissue niche. NiCo explores these relationships between factors corresponding to a central cell
type and those pertaining to co-localized neighboring cell types as identified by the interaction
module. By design , NiCo is sensitive to approximately linear relationships and returns
regression coefficients between the dependent variable (representing factors of central cell
types) and the independent variables (comprising weighted averages of factors from
neighboring cell types).
Given a central cell type 𝑘, we include all neighboring cell types 𝑗 occurring across instances
of cell type 𝑘 with regression coefficients 𝛽#" > 𝑐 for a given cutoff 𝑐. By increasing 𝑐,
neighboring cell types of limited predictive capacity as inferred by NiCo’s interaction module
can be discarded.
For this analysis, let 𝐻67
# [𝑗, 𝑖] denote the i-th factor of the j-th instance of the central cell type ,
and consider cell factors 𝐻67
N of neighboring cell types 𝑚 ∈ {1, … , 𝑀} with 𝛽#N > 𝑐. We infer
covariation between 𝐻67# [, 𝑖] and all factors 𝐻67N across all instances 𝑗 by ridge regression:
𝑚𝑖𝑛O h∑ ∑ 𝛼H,N
#,! .
P%
!,0
95
H1.N ∑ 𝐻67
N [ℎ, 𝑙]M∈R%
!,0 − 𝐻67
# [𝑗, 𝑖]h
/
/
+ 𝜂m𝛼H,N
#,! m/
/
(9)
Here, 𝛼H,N
#,! denotes the regression coefficient s between factor 𝑖 of the central cell type 𝑘 and
the latent factor 𝑙 of neighboring cell type 𝑚. 𝑆"
#,N denotes the set of all cells of cell type 𝑚 in
the neighborhood of the j-th instance of central cell type 𝑘 and 𝑁"
#,N denotes the number of
these cells. Hence, NiCo averages a latent factor across all cells of a given neighboring cell
type.
The ridge regression regularization parameter h is optimized through the following procedure:
The initial h parameter range spans powers of 2, ranging from [2 −10, 2 −9, ..., 2 10].
GridSearchCV calculates the negative mean square d error using the l eave-one-out cross-
validation strategy to determine the optimal h parameter.
Ligand-receptor interaction analysis
To identify ligand -receptor signaling interactions as potential mediators of the inferred
covariation of factors in co-localized cells, we first compiled a ligand -receptor database from
available resources. We created NiCoLRdb by merging ligand-receptor pairs from Omnipath88,
NATMI89, and CellPhoneDB 90,91. Multimeric complexes from CellPhoneDB were not
included. From this database we extract ligand-receptor (LR) cognate pairs within the cell type
of interest and interacting neighboring cell types included in the covariation analysis . Since
only limited numbers of ligand and receptor genes are typically directly measured in the spatial
data, we leverage information from the genome-wide scRNA-seq data. For a pair of interacting
cell types with covarying factors inferred by NiCo, we only retain ligand-receptor pairs where
the ligand and receptor genes are expressed in a minimum fraction 𝑓9S of cells for the
interacting cell types. Furthermore, we require a minimum absolute Spearman correlation 𝑐9S
of the ligand and receptor genes with the respective factor in the covarying cell type. To
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
29
increase sensitivity, 𝑓9S and 𝑐9S can be lowered, and higher values of these thresholds select
candidates with stronger associations with the covarying factors. However, it is not necessarily
expected that both ligand and receptor correlate with the covarying factors. For example, the
receptor could be constitutively expressed in the receiving cell type, while the ligand is induced
together with a gene program associated with a factor in the sending cell type. Therefore, it is
recommended to apply low cutoff values for 𝑓9S and 𝑐9S and visually inspect the metrics for
each pair. For visualization, we devised a quadrilateral (rectangle) plot uniting the fraction of
expressing cells and factor correlation values within a dot -plot-like visualization to highlight
LR crosstalk. Four isosceles triangles form this unit, each oriented towards one of the
rectangle’s edges and joined in the center. The western and eastern faces represent the fraction
of cells expressing the ligand and receptor, respectively, while the northern and southern faces
highlight ligand and receptor factor correlation values. Ligands are consistently mapped on the
y-axis, while receptors are denoted on the x-axis.
Pathway enrichment analysis in the high gene associated factors
We conducted pathway enrichment analysis using the Enrichr module within the gseapy
package 92. The objective was to assess whether a predefined set of genes displayed statistical
significance or relevance with respect to specific gene sets or pathways from the Enrichr
library. We supplied the top positively or negatively correlated genes for a given factor as our
input ‘gene list ’. This gene list was then queried against various Enrichr libraries, such as
‘BioPlanet 2019’, ‘Reactome 2016’, or ‘GO Biological Process 2021’ to assess the enrichment
within diverse cellular processes or pathways. Enrich r visually presents significant cellular
processes as a dot plot, where the Y-axis represents these processes as a function of a combined
score. The size of each dot indicates the percentage of genes shared with in specific cellular
processes, while the color code reflects the associated p -value. The combined score, denoted
as ‘c’ is defined as the product of the logarithm of the p -value and the z-score. The p-value is
calculated using the Enrichr Fisher ’s exact test, while the z -score is determined through the
Enrichr correction applied to the Fisher’s exact test 93.
smFISH experiment
Sample preparation and Image Acquisition: Single molecule FISH by smHCR was performed
on 5 μm cut, snap-frozen, methanol fixed mouse liver tissue sections from C57B/6J males with
an age of 12 -16 weeks (n=3). Coding -sequences of Clec4f (NM 016751.3), Tgfb1 (NM
011577.2), Dcn (NM 001190451.2) were used to design smFISH probes of 20 nt length. Probes
were designed by Molecular Instruments (Los Angeles, United States) and the hybridization
protocol was performed according to the manufacturer’s protocol and as previously described
94. Samples were imaged using an Olympus IX83 CSUW1 -TS2 spinning disc confocal
microscope equipped with 405, 488, 561 and 640 nm LED lasers using a 60x oil immersion
objective. Image acquisition was performed with a resolution of (149.76 x 149.76) microns
equivalent to 2304 x2304 px (binning 1 x 1) and images opened with the Bio-Formats Importer
and further processed using FIJI 95.
Detection of RNA Spots and Cell Segmentation : We employed the STARFISH package 96 for
finding the precise localization of RNA spots and harnessed QuPath software 97 to perform the
cell segmentation based on extended DAPI regions with default parameters. Our smFISH data
acquisition leveraged four lasers, each capturing distinct molecular signals across different
areas of tissue into four channels: Tgfb1(Red 561nm laser), Dcn (Far-red 639nm laser), Clec4f
(Green 488 nm laser), and DAPI (Blue 405nm laser). We initiated our analysis with a white top
hat filter to remove the background noise with a masking radius of 3. Subsequently, we
normalized and enhanced each channel’ s intensit y using the STARFISH
‘ClipPercentileToZero’ function within the percentile intensities parameter range of p min=80
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
30
to p max=100. To ensure the robustness of detected spots, we wrote a custom script to convert
detected spot coordinates into Tif images and then systematically overlaid the Tif images of
raw signals. This way we verified detected spots for a range of parameters in FIJI 95 for each
channel across different fields of view (FOV). Once we achieved a satisfactory level of spot
detection, we proceeded to select optimal parameter values. These included the minimum and
maximum sigma parameters (1.5, 3), numerical sigma (10), and the intensity threshold
parameter (95th percentile), facilitating spot detection across all FOVs. We saved all the
detected spots centroid parameter (Xc, Yc) and th e radius parameter r of the spots for further
analysis.
Analysis of expression correlation in co -localized pairs of s tellate cells and KC : The cell
segmentation data, derived from QuPath, was processed using the regionprops command from
the scikit -image python library. This command stores the pixel coordinates of the true cell
morphological region as a labeled mask. We generated discrete region masks associated with
individual spots bounded in x dimension within the interval [ceil(Xc − r), floor(Xc + r)] and
bounded in y dimension within [ceil(Yc − r), floor(Yc + r)]. By assessing the containment of a
spot’s mask within a cell’s mask, we effectively determined the complete overlap of the spots
region with the cell region. Spots failing to satisfy this criterion were excluded from further
analysis. This process was iteratively applied, resulting in a tally of associated spots for each
cell across the Clec4f, Dcn, and Tgfb1 gene transcripts.
To ascertain whether the cell’s identity belongs to the KC or Stellate populations, we first
selected all the cells exhibiting over 20 transcripts for either Dcn or Clec4f, along with
satisfactory segmentation accuracy. The maximum number of expressed transcripts in a single
cell could be over 150 in such a setting. We labeled cells as stellate cells if Dcn transcript counts
exceed Clec4f transcript counts, and as KC if Clec4f transcript counts are greater than Dcn
transcript counts. This soft criterion accounts for imperfect segmentation due to the difficult -
to-segment shape of stellate cells with long pr otrusions. Cells not meeting these criteria
remained unclassified. To explore spatial relationships, we constructed cell neighborhoods
based on Delaunay Triangulation using the centroid positions. Complex units comprising
adjacent KC and stellate cells were identified from the colocalized neighborhood. In total, 93
such complex units were discerned for three FOVs. From each complex unit, we generated
scatter plots, with Dcn transcript levels from stellate cells represented on the y-axis and Tgfb1
transcripts from KC on the x-axis. Pearson correlation coefficients were computed, and linear
regression was applied for quantification.
Hepatic Stellate Cell culture experiment
Animal work : Male C57BL/6 mice were used and all the animal experiments that were
performed, were in accordance with the local and institutional regulations for the Protection of
Animal Welfare (Regierung Unterfranken, Bavaria, Germany).
Isolation and cultivation of mouse hepatic stellate cells (HSCs) and real-time PCR: HSCs were
isolated from 25 to 35 weeks old, male C57BL/6 mice as previously described 73 . In brief, in
situ perfusion with EGTA buffer and Collagenase solution was followed by a successive
Collagenase and Pronase in vitro digest. HSCs were washed and enriched by non-ionic gradient
centrifugation and seeded in 24 wells (6 x 10 4 cells per well; Thermo Scientific, Waltham,
United States ). Prior to cell stimulation, HSCs were cultured for 72 h in DMEM (Thermo
Scientific) supplemented with 10% FBS (Corning, New York, United States) and 1%
penicillin–streptomycin (Thermo Scientific). To activate TGF-β signaling, cells were washed
with PBS (Thermo Scientific) and treated with recombinant mouse TGF -β1 (10 ng/ml;
Biolegend, San Diego, United States), under FBS free conditions and compared with pretreated
HSCs, which received recombinant mouse Decorin (10 μg/ml; R&D Systems, Minneapolis,
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
31
United States) for 1 h before TGF-β1 was added. After 72 h of treatment, cells were harvested
for RNA extraction, cDNA synthesis and qPCR analysis, as previously described 98. Total RNA
was extracted using Arcturus PicoPure RNA Isolation Kit and cDNA was synthesized using
RevertAid First Strand cDNA Synthesis kit (both Thermo Scientific) according to the
manufacturer's instructions. For qPCR data analysis, the relative gene expression values for the
target mRNA were normalized to Gapdh, which was selected from a set of 3 housekeeping
genes (Gapdh, Hprt, Ppia), based on the normqPCR algorithm 99. Used primers are listed in
Table 1.
Table 1: Primer list
Primers qPCR(m) Genbank ID Synthesis direction (5'-3'): Sequences
Acta2 (m) NM_007392 Forward: CTATTCAGGCTGTGCTGTCCCTC
Reverse: CACGTTGTGAGTCACACCATCTC
Col1a1 (m) NM_007742 Forward: GTCTGGTTTGGAGAGAGCATGAC
Reverse: CGCAGGAAGGTCAGCTGGATAG
Gapdh (m NM_008084 Forward: TGTCCGTCGTGGATCTGAC
Reverse: CCTGCTTCACCACCTTCTTG
Hprt (m) NM_013556 Forward: TCCTCCTCAGACCGCTTTT
Reverse: CCTGGTTCATCATCGCTAATC
Lox (m) NM_010728 Forward: CTATGGGTACCACAGGCGCTTTG
Reverse: CCGCATAGGTGTCATAACATCCAG
Pdgfrb (m) NM_001146268 Forward: GAGCTCAGTGAGAGGAAGCGTATC
Reverse: GAACAGGTCCTCGGAGTCCATAG
Ppia (m) NM_008907 Forward: GCATACAGGTCCTGGCATCT
Reverse: AGCTGTCCACAGTCGGAAAT
Reln (m) NM_011261 Forward: CAGCCGAAGGACTTCACACAAG
Reverse: GCAGTAGAACTCCTTTCATCCGAGC
Analysis of qPCR data: We performed three technical replicates and five biological replicates
for the experiment, measuring the Cq (Quantification Cycle) values of Acta2, Col1a1, Lox,
Pdgfrb, and Reln genes. For each biological replicate, we first averaged the technical replicates
and quantified the relative expression of each gene i of interest using: ∆𝑐T!
" = 2
UV123*45
% <V1"
%W
,
where j represents the condition from the following groups: control (HSC), HSC+Tgfb,
HSC+Dcn, and HSC+Tgfb+Dcn. We then quantif ied the relative gene expression over the
control group: ∆∆𝑐T!
" =
∆81"
%
∆81"
+6,7869. Finally, we performed a two sample paired t-test between
∆∆𝑐T!
?RV+YJ4Z and ∆∆𝑐T!
?RV+YJ4Z+[VPacross the data point s of biological replicates and
reported the p-value in the figure.
Benchmarking NiCo cell type niche interactions with tissue niche detection methods
To compare NiCo predictions for cell -cell interactions derived from logistic regression
coefficients with topological tissue niche domains predicted by alternative methods, we define
two scoring schemes:
Niche composition Z-score: For a given tissue domain 𝑑, we compute the frequency 𝑓\" of cell
type 𝑗, and mean 𝜇" and standard deviation 𝜎" of cell type 𝑗 across all domains. From these
numbers, we compute a z-score for the enrichment of cell type 𝑗 in domain 𝑑:
𝑍\" =
44%<]%
^%
(10)
Cellcharter neighborhood enrichment score (CC enrichment): As alternative scoring metric we
compute the function cc.gr.enrichment from the CellCharter method. cc.gr.enrichment takes an
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
32
anndata object along with ‘group_key’ and ‘label_key’ as input and computes the enrichment
of label_key in group_key. The function returns the ratio of observed to expected values and
indicates whether a cell type is over- or under-represented in a given tissue domain.
We compare cell type composition between the ground truth (GT) domains and method-
detected domains using the Z-score and the CC enrichment. In both scenarios, we compute the
Pearson correlation of each of these quantities between a GT domain and all method-detected
domains and consider the method-detected domains with the highest correlation as best match.
We then compare the distribution of maximum correlations for all GT domains across methods.
Niche-detection b enchmarking on Allen Brain spatial MERFISH atlas data and S TARmap
mouse visual cortex data: We used the default parameters for CellCharter, Banksy, SpaGCN,
SpatialPCA, SeuratV5, and Stagate, except for the clustering resolution parameter or the
required number of cluster s. We adjusted this parameter to retrieve a number of predicted
domains close to the number of GT domains (6). For the Allen Brain MERFISH data,
CellCharter identif ied 6 clusters that maximize d the stability parameter. Banksy , with the
default Leiden clustering resolution of 0.9 and lambda of 0.8, identified 15 clusters, which
exceeds the ground truth (GT) of 6 domains. We also tested another clustering resolution of 0.2
with a lambda of 0.8, which resulted in 9 clusters. SpaGCN , with the default Louvain
parameter, identified 23 clusters, while resolutions of 0.2, 0.1, and 0.05 identified 13, 9, and 3
clusters, respectively. For SpatialPCA, we specif ied the clusternum parameter as 6 at the
louvain_clustering step. Similarly , in SeuratV5 , we s et the niches.k parameter to 6 in the
BuildNicheAssay step. Stagate , with the default resolution of 1, identified 22 cluster s;
resolution of 0.5, 0.2, and 0.1 identified 15, 9, and 4 clusters, respectively.
After adjustment of clustering resolution or cluster number, a ll methods identified 7 clusters
for STARmap mouse visual cortex data , equal to the number of GT domains, except for
CellCharter, which identified only five clusters. All methods were used with default parameters
except for the clustering resolution parameter or the required number of cluster s. We set the
niches.k parameter to 7 for Seurat and clusternum parameter to 7 for SpatialPCA. We used
Leiden clustering with a resolution of 0.5 in Stagate, a resolution of 0.9 in Banksy, and a
resolution of 1 in SpaGCN. CellCharter identified 5 clusters from stability analysis that w ere
used to define the tissue domains.
Cell-cell interaction benchmarking on Intestinal data.
We benchmarked Niche-DE56, COMMOT21, cellNeighborEX58, and stLearn57 on the intestinal
data to predict ligand-receptor pairs enriched in pairs of interacting cell types. Niche-DE56
identified 6 genes (Mptx2, Slc12a2, Maoa, Clca3b, Apob, Stmn1 ) between Paneth niche cells
and the stem/TA index cell type, and 4 genes ( Stmn1, H2 -Eb1, Maoa, Slc12a2 ) between
stem/TA niche cells and the Paneth index cell. CellNeighborEX58 identified Mptx2, Lgr5, and
Kit as cell-contact dependent differentially expressed genes between stem/TA and Paneth cells.
COMMOT21 using default parameters that filter LR pairs where both ligand and receptor are
expressed in at least 5% of the spots, identified 0 pairs with the CellChat database and 5 pairs
(Apob-Sdc1, Cadm1 -Cadm1, Cd14 -Itgam, Vcan -Cd44, Vim-Cd44) with the NiCoLRdb
database. stLearn57 detected five pairs of LR interactions ( Vim-Cd44, Vcan -Cd44, Cadm1 -
Cadm1, Gzmb-Chrm3, Cd34-Sell) using the connectomeDB2020 database.
Data Availability
All scRNA-seq, spatial transcriptomics and respective cell type annotations datasets analyzed
in this study are publicly available in the referenced studies. The mouse liver MERFISH data
is downloaded from Vizgen MERSCOPE website (https://vizgen.com/data-release-program/).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
33
Code Availability
The open-source NiCo package is available at a PyPI repository: https://pypi.org/project/nico-
sc-sp/. We uploaded all codes and scripts used for this study's analyses and figure plotting
functions to the NiCo package. The documentation, usage instructions and other relevant
information related to NiCo can be accessed through the tutorial link contained in the
repository.
Acknowledgements
We thank the members of the Grün lab for their advice and encouragement throughout this
project. We also thank Julio Saez -Rodriguez, Jovan Tanevski, Moritz Lampert and Ingo
Scholtes for interesting discussion s and suggestion s. We thank Carlos Talavera -Lopéz for
critical reading and feedback on the manuscript. This work was supported by the German
Research Foundation (DFG) ( SPP1937 GA 2129/2-2 and SFB1425-Project #422681845), by
the CZI Seed Networks for the Human Cell Atlas, by the ERC (818846 — ImmuNiche —
ERC-2018-COG), and by the Bundesministerium für Bildung und Forschung (BMBF)
(TissueNet - 031L0311A and CureFib - 01EJ2201C).
Author Contributions
D.G. conceived the project and proposed the algorithm. D.G. and A.A. designed the algorithm.
A.A. implemented the algorithm, collected and analyzed all data. S.B. performed in vitro
culture experiments. S.T. carried out smFISH experiments. D.G. and A.A. wrote the
manuscript. D. G. supervised the project. All authors reviewed the manuscript.
Competing Interests
D.G. serves on the scientific advisory board of Gordian Biotechnology.
References
1. Stein-O’Brien, G. L., Ainslie, M. C. & Fertig, E. J. Forecasting cellular states: from
descriptive to predictive biology via single-cell multiomics. Current opinion in systems biology
26, 24–32 (2021).
2. Sagar & Grün, D. Deciphering cell fate decision by integrated single -cell sequencing
analysis. Annual review of biomedical data science 3, 1–22 (2020).
3. Leyes Porello, E. A., Trudeau, R. T. & Lim, B. Transcriptional bursting: stochasticity
in deterministic development. Development 150, (2023).
4. Velten, B. & Stegle, O. Principles and challenges of modeling temporal and spatial
omics data. Nature Methods 1–13 (2023).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
34
5. Grün, D. Revealing dynamics of gene expression variability in cell state space. Nature
Methods
17, 45–49 (2020).
6. Rosales-Alvarez, R. E. et al. VarID2 quantifies gene expression noise dynamics and
unveils functional heterogeneity of ageing hematopoietic stem cells. Genome Biology 24, 1–
30 (2023).
7. Vallejos, C. A., Marioni, J. C. & Richardson, S. BASiCS: Bayesian analysis of single-
cell sequencing data. PLoS computational biology 11, e1004333 (2015).
8. Browaeys, R., Saelens, W. & Saeys, Y . NicheNet: modeling intercellular
communication by linking ligands to target genes. Nature methods 17, 159–162 (2020).
9. Hu, Y ., Peng, T., Gao, L. & Tan, K. CytoTalk: De novo construction of signal
transduction networks using single -cell transcriptomic data. Science Advances 7, eabf1356
(2021).
10. Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction
to spatial transcriptomics for biomedical research. Genome Medicine 14, 1–18 (2022).
11. Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by
spatial transcriptomics. Science 353, 78–82 (2016).
12. Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA
nanoball-patterned arrays. Cell 185, 1777–1792 (2022).
13. Cho, C.-S. et al. Microscopic examination of spatial transcriptome using Seq -Scope.
Cell 184, 3559–3572 (2021).
14. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved,
highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
15. Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells
reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).
16. Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nature
Methods
10, 857–860 (2013).
17. Lee, J. H. et al. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression
profiling in intact cells and tissues. Nature protocols 10, 442–458 (2015).
18. Wang, X. et al. Three-dimensional intact -tissue sequencing of single -cell
transcriptional states. Science 361, eaat5691 (2018).
19. Svensson, V ., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially
variable genes. Nature methods 15, 343–346 (2018).
20. Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from
single cell transcriptomic data. Nature communications 11, 2084 (2020).
21. Cang, Z. et al. Screening cell –cell communication in spatial transcriptomics via
collective optimal transport. Nature Methods 20, 218–228 (2023).
22. Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable
multiview framework for dissecting spatial relationships from highly multiplexed data.
Genome biology 23, 1–31 (2022).
23. Yuan, Y . & Bar-Joseph, Z. GCNG: graph convolutional networks for inferring gene
interaction from spatial transcriptomics data. Genome biology 21, 1–16 (2020).
24. Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in
tissues using spatial graphs of cells. Nature Biotechnology 41, 332–336 (2023).
25. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features
of brain cell identity. Cell 177, 1873–1887 (2019).
26. Yang, Z. & Michailidis, G. A non -negative matrix factorization method for detecting
modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).
27. Kleshchevnikov, V . et al. Cell2location maps fine -grained cell types in spatial
transcriptomics. Nature biotechnology 40, 661–671 (2022).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
35
28. Biancalani, T. et al. Deep learning and alignment of spatially resolved single -cell
transcriptomes with Tangram. Nature methods 18, 1352–1362 (2021).
29. Mages, S. et al. TACCO unifies annotation transfer and decomposition of cell identities
for single-cell and spatial omics. Nature biotechnology 41, 1465–1473 (2023).
30. Cao, K., Gong, Q., Hong, Y . & Wan, L. A unified computational framework for single-
cell data integration with optimal transport. Nature Communications 13, 7419 (2022).
31. Petukhov, V . et al. Cell segmentation in imaging-based spatial transcriptomics. Nature
biotechnology 40, 345–354 (2022).
32. Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by
MERFISH. Nature 598, 137–143 (2021).
33. Lohoff, T. et al. Integration of spatial and single -cell transcriptomic data elucidates
mouse organogenesis. Nature biotechnology 40, 74–85 (2022).
34. Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved
transcriptomics with an adaptive graph attention auto -encoder. Nature communications 13,
1739 (2022).
35. Hao, Y . et al. Dictionary learning for integrative, multimodal and scalable single -cell
analysis. Nature biotechnology 42, 293–304 (2024).
36. Varrone, M., Tavernari, D., Santamaria -Martínez, A., Walsh, L. A. & Ciriello, G.
CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity.
Nature Genetics 56, 74–84 (2024).
37. Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to
identify spatial domains and spatially variable genes by graph convolutional network. Nature
Methods
18, 1342–1351 (2021).
38. Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics.
Nature communications 13, 7203 (2022).
39. Singhal, V . et al. BANKSY unifies cell typing and tissue domain segmentation for
scalable spatial omics data analysis. Nature Genetics 1–11 (2024).
40. Yao, Z. et al. A high-resolution transcriptomic and spatial atlas of cell types in the whole
mouse brain. Nature 624, 317–332 (2023).
41. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early
organogenesis. Nature 566, 490–495 (2019).
42. Qiu, C. et al. Systematic reconstruction of cellular trajectories across mouse
embryogenesis. Nature genetics 54, 328–341 (2022).
43. Qu, F. et al. Three-dimensional molecular architecture of mouse organogenesis. Nature
Communications 14, 4599 (2023).
44. Kelly, R. G. & Evans, S. M. The second heart field. Heart Development and
Regeneration 143–169 (2010).
45. Kuhn, E. N. & Wu, S. M. Origin of cardiac progenitor cells in the developing and
postnatal heart. Journal of cellular physiology 225, 321–325 (2010).
46. Palpant, N. J. et al. Transmembrane protein 88: a Wnt regulatory protein that specifies
cardiomyocyte development. Development 140, 3799–3808 (2013).
47. Cai, C.-L. et al. Isl1 identifies a cardiac progenitor population that proliferates prior to
differentiation and contributes a majority of cells to the heart. Developmental cell 5, 877–889
(2003).
48. Zhuang, S. et al. Expression of Isl1 during mouse development. Gene Expression
Patterns 13, 407–412 (2013).
49. Kelly, R. G., Buckingham, M. E. & Moorman, A. F. Heart fields and cardiac
morphogenesis. Cold Spring Harbor perspectives in medicine 4, (2014).
50. Bruneau, B. G. The developmental genetics of congenital heart disease. Nature 451,
943–948 (2008).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
36
51. Yu, P. et al. Cardiomyocyte-restricted high-mobility group box 1 (HMGB1) deletion
leads to small heart and glycolipid metabolic disorder through GR/PGC -1α signalling. Cell
death discovery 6, 106 (2020).
52. Beumer, J. & Clevers, H. Regulation and plasticity of intestinal stem cells during
homeostasis and regeneration. Development 143, 3639–3649 (2016).
53. Böttcher, A. et al. Non-canonical Wnt/PCP signalling regulates intestinal stem cell
lineage priming towards enteroendocrine and Paneth cell fates. Nature cell biology 23, 23–31
(2021).
54. Niec, R. E. et al. Lymphatics act as a signaling hub to regulate intestinal stem cell
activity. Cell stem cell 29, 1067–1082 (2022).
55. Flanagan, D. J. et al. Frizzled7 functions as a Wnt receptor in intestinal epithelial Lgr5+
stem cells. Stem cell reports 4, 759–767 (2015).
56. Mason, K. et al. Niche-DE: niche -differential gene expression analysis in spatial
transcriptomics data identifies context-dependent cell-cell interactions. Genome Biology 25, 14
(2024).
57. Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions
in healthy and diseased tissues. Nature communications 14, 7739 (2023).
58. Kim, H. et al. CellNeighborEX: deciphering neighbor-dependent gene expression from
spatial transcriptomics data. Molecular Systems Biology 19, e11670 (2023).
59. Michalopoulos, G. K. & Bhushan, B. Liver regeneration: biological and pathological
mechanisms and implications. Nature reviews Gastroenterology & hepatology 18, 40 –55
(2021).
60. Arroyo, V . et al. Acute-on-chronic liver failure in cirrhosis. Nature reviews Disease
primers 2, 1–18 (2016).
61. Halpern, K. B. et al. Single-cell spatial reconstruction reveals global division of labour
in the mammalian liver. Nature 542, 352–356 (2017).
62. Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial
progenitors. Nature 572, 199–204 (2019).
63. Guilliams, M. et al. Spatial proteogenomics reveals distinct and evolutionarily
conserved hepatic macrophage niches. Cell 185, 379–396 (2022).
64. Bonnardel, J. et al. Stellate cells, hepatocytes, and endothelial cells imprint the Kupffer
cell identity on monocytes colonizing the liver macrophage niche. Immunity 51, 638 –654
(2019).
65. Friedman, S. L. Hepatic stellate cells: protean, multifunctional, and enigmatic cells of
the liver. Physiological reviews 88, 125–172 (2008).
66. Yang, W. et al. Single-cell transcriptomic analysis reveals a hepatic stellate cell –
activation roadmap and myofibroblast origin during liver fibrosis in mice. Hepatology 74,
2774–2790 (2021).
67. Nguyen-Lefebvre, A. T. & Horuzsko, A. Kupffer cell metabolism and function. Journal
of enzymology and metabolism 1, (2015).
68. Wang, Y ., Li, J., Wang, X., Sang, M. & Ho, W. Hepatic stellate cells, liver innate
immunity, and hepatitis C virus. Journal of gastroenterology and hepatology 28, 112–115
(2013).
69. Baghy, K., Iozzo, R. V . & Kovalszky, I. Decorin –TGFβ axis in hepatic fibrosis and
cirrhosis. Journal of Histochemistry & Cytochemistry 60, 262–268 (2012).
70. Dewidar, B., Meyer, C., Dooley, S. & Meindl-Beinker, N. TGF-β in hepatic stellate cell
activation and liver fibrogenesis—updated 2019. Cells 8, 1419 (2019).
71. Li, Y ., Fan, W., Link, F., Wang, S. & Dooley, S. Transforming growth factor β latency:
A mechanism of cytokine storage and signalling regulation in liver homeostasis and disease.
JHEP Reports 4, 100397 (2022).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
37
72. Shah, S. et al. Single-molecule RNA detection at depth by hybridization chain reaction
and tissue hydrogel embedding and clearing. Development 143, 2862–2867 (2016).
73. Mederacke, I., Dapito, D. H., Affò, S., Uchinami, H. & Schwabe, R. F. High-yield and
high-purity isolation of hepatic stellate cells from normal and fibrotic mouse livers. Nature
protocols 10, 305–315 (2015).
74. Cable, D. M. et al. Cell type -specific inference of differential expression in spatial
transcriptomics. Nature methods 19, 1076–1087 (2022).
75. Bellamy, T. C. Interactions between Purkinje neurones and Bergmann glia. The
cerebellum 5, 116–126 (2006).
76. Metea, M. R. & Newman, E. A. Calcium signaling in specialized glial cells. Glia 54,
650–655 (2006).
77. Raj, A. & Van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression
and its consequences. Cell 135, 216–226 (2008).
78. Esteban-Martínez, L. & Torres, M. Metabolic regulation of cell competition.
Developmental Biology 475, 30–36 (2021).
79. Veličković, P. et al. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
80. Yao, Z. et al. A transcriptomic and epigenomic cell atlas of the mouse primary motor
cortex. Nature 598, 103–110 (2021).
81. Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial
transcriptomics. Nature biotechnology 40, 517–526 (2022).
82. Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell
RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature
biotechnology 36, 421–427 (2018).
83. Hafemeister, C. & Satija, R. Normalization and variance stabilization of single -cell
RNA-seq data using regularized negative binomial regression. Genome biology 20, 296 (2019).
84. Plimpton, S. Fast parallel algorithms for short -range molecular dynamics. Journal of
computational physics 117, 1–19 (1995).
85. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single -cell
transcriptomic data across different conditions, technologies, and species. Nature
biotechnology 36, 411–420 (2018).
86. Lee, D. D. & Seung, H. S. Learning the parts of objects by non -negative matrix
factorization. Nature 401, 788–791 (1999).
87. Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular
activity with single-cell RNA-Seq. Elife 8, e43803 (2019).
88. Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for
literature-curated signaling pathway resources. Nature methods 13, 966–967 (2016).
89. Hou, R., Denisenko, E., Ong, H. T., Ramilowski, J. A. & Forrest, A. R. Predicting cell-
to-cell communication networks using NATMI. Nature communications 11, 5011 (2020).
90. Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB:
inferring cell–cell communication from combined expression of multi-subunit ligand–receptor
complexes. Nature protocols 15, 1484–1506 (2020).
91. Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature
607, 540–547 (2022).
92. Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene
set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
93. Chen, E. Y . et al. Enrichr: interactive and collaborative HTML5 gene list enrichment
analysis tool. BMC bioinformatics 14, 1–14 (2013).
94. Nina, P. et al. Tracking spatio-temporal dynamics of the endothelial niche during fetal
hematopoiesis. unpublished unpublished, 0.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
38
95. Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nature
Methods
9, 676–682 (2012).
96. Axelrod, S. et al. starfish: scalable pipelines for image-based transcriptomics. Journal
of Open Source Software 6, 2440 (2021).
97. Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis.
Scientific reports 7, 1–7 (2017).
98. Thomann, S. et al. YAP orchestrates heterotypic endothelial cell communication via
HGF/c-MET signaling in liver tumorigenesis. Cancer Research 80, 5502–5514 (2020).
99. Perkins, J. R. et al. ReadqPCR and NormqPCR: R packages for the reading, quality
checking and normalisation of RT-qPCR quantification cycle (Cq) data. BMC genomics 13, 1–
8 (2012).
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2024. ; https://doi.org/10.1101/2024.09.08.611848doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.