{"paper_id":"1c62f56d-7f33-4e63-b186-aa95f3e27f51","body_text":"SCALE: Unsupervised Multi-Scale Domain Identification \nin Spatial Omics Data \n \nBehnam Yousefi1,2,*, Darius P. Schaub1,3,*,  Robin Khatri1,3, Nico Kaiser1,3, Malte Kuehl1,6,7, \nCedric Ly1,4, Victor G. Puelles3,5,6,7, Tobias B. Huber3,5,8, Immo Prinz4,8, Christian F. Krebs3,5,8, Ulf \nPanzer3,5,8, Stefan Bonn1,2,5,8 \n1 Institute of Medical Systems Bioinformatics, Center for Biomedical AI (bAIome), Center for \nMolecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, \n20251, Germany. \n2 German Center for Child and Adolescent Health (DZKJ), partner site Hamburg, University \nMedical Center Hamburg-Eppendorf, Germany. \n3 III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, \nGermany. \n4 Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, \nGermany \n5 Hamburg Center for Kidney Health (HCKH), University Medical Center Hamburg-Eppendorf, \nHamburg, 20251, Germany. \n6 Department of Clinical Medicine, Aarhus University, Aarhus, Denmark. \n7 Department of Pathology, Aarhus University Hospital, Aarhus, Denmark. \n8 Hamburg Center for Translational Immunology (HCTI), University Medical Center \nHamburg-Eppendorf, Hamburg, 20251, Germany.  \n*equal contribution \nCorrespondence: sbonn@uke.de  \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nAbstract \nSingle-cell spatial transcriptomics enables precise mapping of cellular states and functional \ndomains within their native tissue environment. These functional domains often exist at multiple \nspatial scales, with larger domains encompassing smaller ones, reflecting the hierarchical \norganization of biological systems. However, the identification of these functional domain \nhierarchies has been hardly explored due to the lack of appropriate computational methods. In \nthis work, we present SCALE, an unsupervised algorithm for multi-scale domain identification \nin spatial transcriptomics data. SCALE combines neural graph representation learning with an \nentropy-based search algorithm to detect functional domains at different scales. It reaches \nstate-of-the-art performance in single- and multi-scale domain detection on simulated and murine \nbrain Xenium and MERFISH data, as well as patient-derived kidney tissue, highlighting its \nrobustness and scalability across diverse tissue types and platforms. SCALE’s ease of use makes \nit a powerful aid for advancing our understanding of tissue organization and function in health \nand disease. \nKeywords: single-cell spatial transcriptomics, domain identification, deep learning, multi-scale, \nhyperparameter optimization. \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nIntroduction \nThe rapidly growing field of single-cell spatial transcriptomics (ST) enables the identification of \ncellular states and signaling in their native microenvironments. Within a tissue, cells are often \norganized into distinct spatially related regions that correspond to specific anatomical structures \nor functional areas. These anatomical and functional domains exist across multiple scales, often \nin a nested structure, ranging from local cellular domains to broader anatomical domains (Fig. \n1a). Identifying spatial domains across multiple scales can improve our understanding of organ \nfunction, development, and disease pathology. However, despite significant advancements in \nspatial molecular imaging technologies, such as MERFISH1 and Xenium2, multi-scale domain \nidentification remains an underexplored area, and current methodologies still face substantial \nchallenges in accurately identifying spatial domains, particularly in an unsupervised manner. \nTraditional approaches in ST have often adapted clustering algorithms initially designed for \nsingle-cell RNA sequencing3–5, which do not take spatial information into account. Recent works \nhave introduced spatially informed algorithms ranging from graph neural network-based \nmethods6–12 to Bayesian methods13. Despite these advances, these methods are not designed for \ncapturing multi-scale spatial structures. To identify domains across multiple scales, existing \nmethods usually require manual tuning of multiple hyperparameters, while the results may \nhighly depend on the user’s choice. Currently, the only method for the identification of \nhierarchical domains is NeST14, which searches regions of highly co-expressed genes. NeST is \nbased on an assumption that domains in a tissue are characterized by specific gene coexpression \npatterns; however, this assumption may not be generalizable across different tissues, specifically \nwhere distinct domains include overlapping cell types, potentially leading to suboptimal \nperformance. \nThis manuscript introduces SCALE (Spatial Clustering At multiple LEvels), a method that \nemploys a graph neural network-based (GNN-based) encoder-decoder architecture with a \nbi-objective function integrating both cell transcriptomic data and spatial relationships among \ncells. GNN representations are subsequently subjected to a novel entropy-based search algorithm \nthat enables the identification of optimal domains across multiple scales. SCALE is easily \napplicable and identifies spatial domains in simulated and real-world spatial single-cell data \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nobtained from a variety of tissue types and technologies with best-of-breed accuracy and \nreliability. \nResults \nOverview of SCALE \nOur algorithm is derived from three key assumptions about spatial domain organization. The \ncoherence assumption states that cells within a spatial domain have similar gene expression in \ntheir neighborhood. The spatial continuity and scale relevance assumption asserts that biological \ndomains form contiguous regions at specific biologically meaningful scales. The hierarchical \norganization premise posits that functional regions are organized in a spatially nested manner.   \nDeep learning model architecture. SCALE employs a GNN-based encoder-decoder architecture \nto map cells into an embedding space and subsequently utilizes a clustering algorithm to identify \nspatial domains. The overall framework of SCALE is illustrated in Fig. 1b-d. Having a spatially \nresolved sample biopsy, we first construct a spatial graph,  with nodes representing cells and \ntheir gene expression profiles as node attributes. The edges represent spatial adjacency between \ncells defined by a spatial distance threshold . In the encoder, a graph attention network (GAT)15 𝑟\nis trained to perform message passing between adjacent cells and embed them into an embedding \nspace. The decoder is subsequently trained to reconstruct both the gene expression profiles of the \ncells and the cell-cell adjacency relationships (Fig. 1b). This approach ensures that, in the \nembedding space, cells that are spatially close and share similar gene expression neighborhoods \nwill be positioned close to each other. Therefore, to train the model, we proposed to use a \nbi-objective cost function optimized along a Pareto front: \n 𝐿 = 𝐿1 + λ𝐿2\nwhere  is a binary cross-entropy loss for predicting pairs of cells within a neighborhood and  𝐿1 𝐿2\nis a mean squared error for gene expression prediction. The hyperparameter  enables navigation λ\nacross the Pareto front, balancing the trade-offs between the objectives. For a given , we train 𝑟\nour model with different values of  and choose the one that maximizes the correlation of the λ\nGNN embedding space and spatial adjacencies, as quantified by Moran’s I (MI, Supplementary \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nAlgorithm 1). After the model is trained and cells are represented in the embedding space, we \napply Leiden clustering16 to identify spatial domains (Fig. 1c). \nMulti-scale domain identification. The identified domains largely depend on the choice of  and 𝑟\nthe Leiden resolution . Selecting these parameters is challenging for two reasons: (i) domain γ\nidentification is an unsupervised task with no ground truth in real-life scenarios, and (ii) there is \noften no single optimal scale. In reality, however, biological tissues often contain multiple \nbiologically meaningful spatial domains that are organized hierarchically across different scales. \nTo address this challenge, we developed an entropy-based search algorithm that identifies a set of \noptimal  and  values, enabling multi-scale domain identification (Fig. 1d). We start by 𝑟 γ\ngenerating clusters multiple times across a grid of  and  values, then assess the stability of the 𝑟 γ\nresulting clusters. This approach operates on the assumption that more robust clusters are likely \nto be better aligned with underlying biological structures. To achieve this, we compute the \nadjusted rand index (ARI), i.e., a measure for evaluating the similarity between two data \nclusterings, for each pair of clusterings and then calculate the average ARI across all pairs, \nproducing a stability matrix. Pairs of  and  values associated with higher stability were 𝑟 γ\nretained, with each pair representing a domain clustering at a specific scale. Next, we use our \nentropy-based search algorithm to find spatial domains across multiple scales, optimizing for a \nnested structure, where lower-scale domains are contained within those at higher scales. \nSpecifically, given an arbitrary cell in a cluster from a lower-level domain clustering, the \nalgorithm maximizes the probability that the same cell is assigned to a single domain inside a \nhigh-level domain clustering (Supplementary Fig. 1). The mathematical details of SCALE are \ndescribed in the Methods section. \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \n \nFig. 1. Schematic representation of spatial domain hierarchies and SCALE’s architecture. (a) Spatial \ndomains in the brain are organized in a hierarchical, nested structure. (b) Given spatial transcriptomic data \nfrom a tissue biopsy, a spatial graph is constructed where cells are represented as nodes and edges \nrepresent cell-cell adjacencies within a threshold . For a grid of different thresholds , a graph neural 𝑟 𝑟\nnetwork-based encoder-decoder learns cell representations by reconstructing the cellular adjacencies and \ngene expression. (c) Leiden clustering with several resolutions  is used for clustering nodes in γ\nembedding space, and a cluster similarity metric measures the stability of clusters generated by each pair \nof  and . The base assumption is that functional domains should be more stable. (d) Our entropy-based 𝑟 γ\nsearch algorithm finds values of  and  for a desired number of levels. 𝑟 γ\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nSCALE achieves state-of-the-art single-scale domain identification performance \nWe first assessed SCALE’s single-scale domain identification performance by comparing it to \nsix state-of-the-art competitors: NichePCA17, MENDER18, Space-Flow10, SCAN-IT6, \nCellCharter19, Banksy20, and NEST14. Since all the existing methods, except for NeST, lack \nsupport for multi-scale domain identification, our comparison in this section is limited to \nsingle-scale domain identification. For the data, we assembled two independent spatial \nsingle-cell mouse brain datasets. Dataset 1 contains four samples measured with the MERFISH \ntechnology1 with 483 genes and ~80,000 cells per sample. The ground-truth domain annotations \nwere performed using the same automated annotation workflow as in Schaub et al. (2024)17, \nobtaining annotations for major brain regions (Level 1) and subregions (Level 2) based on the \nAllen mouse brain atlas21. For single-scale domain identification, we used Level 2 annotations as \nthey resemble the annotations used in Dataset 2 more closely. Dataset 2 contains three mouse \nbrain samples measured with Xenium technology2, featuring 248 unique genes and ~150,000 \ncells per sample. The ground-truth annotations for cells and spatial domains were obtained from \nthe original publication22. \nThe identification of spatial domains was assessed using the ground truth annotations based on \ndifferent cluster similarity metrics. First, we compared SCALE with its automatic \nhyperparameter selection to competing methods with default parameters. However, all \nLeiden-based competitors still required supervised tuning of their resolution parameter to \noptimize for the best match to the ground truth labels, while SCALE selected it automatically in \nan unsupervised manner. The only constraint we imposed on SCALE was a minimum of 30 \nclusters in the selectable clusterings. In this setting, SCALE outperformed all competing \nalgorithms with NichePCA being a close second (Fig. 2a and b, and Supplementary Fig. 2).  On \nDataset 1, SCALE achieved a median AMI of 0.67, surpassing NichePCA’s, MENDER’s, and \nNeST’s median AMI by 4.9, 7.6, and 211.7 percentage points, respectively. On Dataset 2, \nSCALE achieved a median AMI of 0.66, which is almost on par with NichePCA's AMI of 0.67, \nwhile surpassing MENDER’s and NeST’s median AMI by 6.2 and 180.8 percentage points, \nrespectively. \nIn our next analysis, we tuned all available hyperparameters of the competing methods (e.g., the \nnumber of nearest neighbors for constructing the spatial graph) in a supervised manner to \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nachieve the best possible single-scale domain detection performance. In this case, SCALE, \nalthough effectively unsupervised with the only constraint that candidate clusterings contain at \nleast 30 clusters, performs on par with NichePCA and outperforms all other methods \n(Supplementary Fig. 3).  The identified spatial domains by different methods, along with their \nground truth annotation for a selected sample of Dataset 1, are shown in Fig. 2c. \nIn summary, SCALE provides state-of-the-art supervised and unsupervised single-scale domain \ndetection performance, indicating that its GNN-based architecture and dual optimization function \nare well-suited for the task. \n \nFig. 2. Single-scale domain detection performance comparison on Xenium and MERFISH mouse brain \ndata. (a) Benchmarking performance across different methods for Xenium Dataset 1. Dots show \nindividual sample performance, while the boxplot displays the median, the 25th and 75th percentiles, and \nwhiskers extending 1.5 times the interquartile range. (b) Benchmarking performance across different \nmethods for MERFISH Dataset 2. (c) Example images of the ground truth domains for the left brain \nhemisphere of a selected sample of the MERFISH dataset and the identified domains for SCALE, \nNichePCA, MENDER, and SpaceFlow. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nSCALE can identify domains across multiple scales \nWhile SCALE achieves state-of-the-art single-scale domain detection performance, it is \nspecifically designed for multi-scale, unsupervised domain detection. In this section, we use one \nsimulated and two real-world datasets to compare SCALE to NEST, the only other algorithm \ncapable of multi-scale domain identification. \nMulti-scale analysis in simulated data. In our simulation study, we manually crafted an image of \nspatial domains with predefined structures at two levels. To generate spatial single-cell data, we \nrandomly sampled 1,000 points uniformly across a hand-crafted image representing the cell \ncoordinates (Fig. 3a). A gene expression vector was assigned to each cell using the mouse brain \ndata presented in Moffitt et al. (2018)23. At the finest scale, we considered each domain to be \ncomposed of a homogeneous mixture of randomly picked cell types. Details of the algorithm \nused to assign gene expression profiles to individual cells are provided in Supplementary \nAlgorithm 2. We then applied both NeST and SCALE (with L = 2) to the simulated data to \nidentify spatial domains (Fig. 3b, c). As can be seen, the domains identified by SCALE align \nalmost perfectly with the ground truth labels, while NeST fails to correctly recognize the \ndomains and leaves most cells unassigned. For a quantitative comparison of the predicted with \nthe ground truth domains, we used AMI, HOM, and COM scores. As the results in Fig. 3d show, \nSCALE reliably identifies the ground truth domains at different scales, while NeST performs \npoorly. For high-level domain identification, SCALE achieves a median AMI score of 0.85, \nsurpassing NeST by 158.4 percentage points. In the low-level case, SCALE reaches an AMI \nscore of 0.75, again outperforming NeST by 42.5 percentage points. \nMulti-scale analysis of mouse brain data. As an example of real-world data, we used our Dataset \n1 considering both the Level 1 (high-level) and Level 2 (low-level) domains for the ground truth \nannotations (Fig. 4a). We used NeST and SCALE (with L = 2) to identify domains across two \nscales and the results are shown in Fig. 4b and 4c. NeST only detected a single scale with many \ncells not assigned to any cluster. Moreover, it failed to correctly recognize certain domains, in \nparticular hippocampal subfields. SCALE, on the other hand, showed promising results in \nidentifying multiple scales in the mouse brain data. In particular, as the high-level domains, \nSCALE identified the brain cortex, hippocampus, and thalamus, which are functionally \nimportant areas the brain. Each of these domains was then divided into lower-level scales such as \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \ncortical layers and hippocampal subfields. Figure 4d shows the similarity of the identified \ndomains with the ground truth annotations at high- and low-level scales. Similar to the above \nanalysis, we used AMI, HOM, and COM scores as the clustering metrics. SCALE demonstrates \nnotably higher similarity to the ground-truth labels at both scales compared to NeST. \nSpecifically, it achieves a median AMI score of 0.53 for high-level domain detection, exceeding \nNeST by 104.0 percentage points. For low-level domains, SCALE reaches a median AMI score \nof 0.62, again outperforming NeST by 191.1 percentage points.   \n \nFig. 3. Multi-scale domain identification results on simulated data. (a) Ground truth high-level and \nlow-level domain annotations, (b) identified domains by NeST, and (c) SCALE for a representative \nsample from our 50 simulated samples. Cells not assigned to any domain by NeST are labeled as “NA”. \n(d) Performance comparison between SCALE and NeST across all 50 simulated samples at high- and \nlow-level scales in terms of AMI, HOM, and COM scores. On one sample, NeST and SCALE assigned \nall cells to a single domain, leading to near-zero scores. \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \n \nFig. 4. Multi-scale domain identification results on MERFISH mouse brain data. (a) Ground truth domain \nannotations for a selected sample of Dataset 1 were obtained at two levels based on the Allen mouse brain \natlas. (b) domains identified by NeST, and (c) domains identified by SCALE at each scale. The inlets \nshow the domain clusters inside the hippocampal domain; all other cells are grayed out. Cells not assigned \nto any domain by NeST are labeled as “NA”. Since NeST does not allow the detection of a predefined \nnumber of domain levels, it only outputs results for one level on this sample. (d) The similarity of \nidentified domains with ground truth annotations at high- and low-level scales as measured by AMI, \nHOM, and COM scores. NeST identified domains at two scales only for two of the four samples. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nMulti-scale analysis in human kidney data. We further demonstrate SCALE’s ability to delineate \ndistinct domains in patient-derived kidney biopsies.. For this, we used the data from Sultana et \nal.24, which consists of gene expression information for a custom gene panel of 480 genes at \nsubcellular resolution measured on tissue sections of the human kidney using the Xenium \ntechnology. We employed NeST and SCALE (L=2) to identify domains on a control sample \n(healthy regions of a tumor nephrectomy specimen) shown in Supplementary Fig. 5. Then, we \nused genetic biomarkers to identify different kidney compartments, including glomeruli, \nproximal convoluted tube, distal convoluted tube, loop of Henle, and vasculature25,26 \n(Supplementary Fig. 4). Fig. 5 shows domains annotated for the higher- and lower-level scales, \nrespectively. NeST only detected a single scale with 58 clusters and a large region of the tissue \nwas not assigned to any cluster, here labeled as “NA” (Supplementary Fig. 5a), which did not \nalign well with the marker genes related to the kidney compartments (Supplementary Fig. 6a). \nNevertheless, in the results obtained by SCALE (Fig. 5b), the higher-level scale distinguishes the \nglomeruli domain from the tubulointerstitial regions; the lower-level scale further identifies finer \ndomains, including proximal convoluted tube, distal convoluted tube, loop of Henle, and \nvasculature (Supplementary Fig. 6b,c). In addition, we assessed the performance of our method \nin the recognition of glomerular regions in comparison with expert manual annotations \n(Supplementary Fig. 7). Our method demonstrated a sensitivity of 100% and specificity of 88%, \nshowing the capability of SCALE to reliably identify relevant domains in human biopsy data. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \n \nFig. 5. Multi-scale domain identification by SCALE and NEST on a human kidney biopsy. All \nannotations are based on marker genes (e.g., Glomeruli in red). Domains not assigned to any kidney \ncompartments are denoted as ‘Unannotated’. (a) Domains identified by NeST, and (b) by SCALE at \nhigh-level and low-level scales. H&E staining images are provided to showcase kidney compartments. \nWhile SCALE reliably captures a high-level representation of glomerular and extra-glomerular regions, \nNEST fails to find a high-level representation.  \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nDiscussion \nUnderstanding tissue organization requires the identification of spatial domains and capturing of \ntheir hierarchical structure, as biological processes operate across a range of spatial scales — \nfrom small, localized niches to broader tissue-level patterns. This hierarchical organization of \nfunctional domains is essential for organ function and unsupervised charting of these intricate \nrelationships has to date hardly been investigated. Current domain detection algorithms mostly \nlack the ability to flexibly detect domains at multiple scales without manual intervention, \npotentially overlooking important layers of biological organization. In this manuscript, we \npresent SCALE, a deep learning framework designed for the identification of multi-scale spatial \ndomains in single-cell spatial transcriptomics data. Leveraging a bi-objective training scheme \nthat anchors both transcriptomic similarity and spatial proximity, SCALE learns an embedding \nspace to simultaneously reflect the gene expression neighborhoods of cells and their spatial \ncellular adjacencies. In addition, SCALE introduces an innovative entropy-based search \nalgorithm that automatically adapts its key parameters  and , enabling robust domain detection 𝑟 γ\nacross scales with minimal manual calibration.  Together, these advances position SCALE as a \nversatile and scalable tool for decoding tissue organization from spatial omics data. \nTo the best of our knowledge, NeST is the only method specifically designed to identify spatial \ndomains across multiple scales. However, in all of our analyses, NeST resulted in suboptimal \ndomain identification performance. In contrast, SCALE showed promising results when tested \nfor multi-scale domain identification based on simulated data, mouse brain tissue, and kidney \ntissue with both MERFISH and Xenium technologies. These results prove SCALE’s versatility \nacross different tissues, ST technologies, and organisms. In the mouse brain tissue, SCALE \nidentified the primary anatomical domains as high-level domains, while simultaneously detecting \nfiner-scale organization, such as cortical layers and hippocampal compartments at a lower level. \nIn the kidney tissue, SCALE annotated glomerular and tubulointerstitial regions as the two \nhigh-level domains. At the lower level, SCALE delineated proximal convoluted tube, distal \nconvoluted tube, loop of Henle, and vasculature as distinct compartments.   \nWe benchmarked SCALE against the state-of-the-art in a single-scale domain identification \nsetting and showed its accuracy and robustness in identifying spatial domains for two \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nindependent mouse brain datasets. In particular, we conducted two experiments, both with \nSCALE optimized in an unsupervised manner. In the first experiment, all methods were run with \ndefault parameters, tuning only the Leiden resolution, where SCALE outperformed the \ncompeting algorithms. In the second experiment, SCALE was compared against methods \noptimized using a supervised approach. Although it showed slightly lower performance than \nNichePCA, it still outperformed or matched all other algorithms. These results demonstrate \nSCALE's robustness and adaptability, as it maintains superior performance even when optimized \nin an unsupervised manner, highlighting its potential for real-world applications where \nsupervised optimization is not feasible. \nDespite the superior performance and applicability of SCALE, it does have certain limitations. \nWhile its automatic hyperparameter tuning enables effective multi-scale domain identification, \nsetting it apart from other methods, it can significantly increase computational time. Users, \nhowever, have the flexibility to bypass our hyperparameter tuning and input their own \nparameters. Future work could focus on enhancing the efficiency of this process by integrating \nfaster search algorithms, such as evolutionary search methods, to improve the scalability and \nspeed of SCALE further. We also acknowledge that some low-level domains, such as \nvasculature, may span multiple high-level domains, making it difficult for SCALE to capture \nthem as coherent units. This hierarchical overlap may lead to the exclusion of biologically \nmeaningful structures that are not confined to a single higher-level domain. Future work will be \nnecessary to better handle these corner cases. \nOverall, SCALE represents a significant advancement in unsupervised, multi-scale spatial \ndomain identification by leveraging graph neural networks and information theory to uncover \nnested functional structures within spatial transcriptomic data. Unlike existing methods, SCALE \nreliably detects hierarchical domains across diverse tissue types and technologies, demonstrating \nsuperior performance in both simulated and real-world datasets. Its ability to generalize across \ndifferent spatial omics platforms and tissue types underscores its adaptability and real-world \napplicability. SCALE provides a powerful framework for advancing our understanding of tissue \norganization in both health and disease, with potential implications for precision medicine and \nclinical decision-making. \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nOnline Methods \nSCALE Description \nGiven a set of  cells  in a tissue, each with  dimensional spatial coordinates 𝑛 𝐶 = {𝑐1, ..., 𝑐𝑛} 𝑠\n and  dimensional gene expression profiles , we developed a SCALE to identify 𝑦𝑖 ∈ 𝑅\n𝑠\n𝑔 𝑥𝑖 ∈ 𝑅\n𝑔\nspatial domains  at scale , where the number of domains  is unknown a 𝐷𝑝 = {𝑑1\n𝑝\n, ..., 𝑑𝑘\n𝑝\n} 𝑝 𝑘\npriori. Each domain  is a subset of  such that . 𝑑𝑖\n𝑝\n𝐶 𝑑𝑖 ∩ 𝑑𝑗 = ∅;     ∀ 𝑖≠𝑗\nAssumptions \nOur approach is derived from three minimal assumptions about the nature of spatial domains in \nbiological tissues: \n1. Local coherence: cells in a spatial domain have similar gene expression profiles in their \nneighborhood. \n2. Spatial continuity and scale relevance: biological domains form spatially contiguous \nregions, with their structure being most pronounced at specific biologically meaningful \nscales. \n3. Hierarchical organization: functional domains generally exhibit a spatially nested \nstructure. \nTo satisfy our assumptions, we first defined a spatial graph and then used GNN-based \nrepresentation learning to embed cells into a vector space. We introduced a bi-objective function \nto integrate both the gene expression similarity and spatial proximity representation while \ntraining the model. We then presented a scale-tuning algorithm to identify domains at multiple \nnested tissue scales. \nRepresenting spatial proximity as a spatial graph \nLet  be a graph with the sets of nodes (vertices) , edges , 𝐺𝑟 = 𝐺 𝑉, 𝐸𝑟, 𝑋( ) 𝑉 = {𝑣𝑖} 𝐸𝑟 = {𝑒\n(𝑟)\n𝑖𝑗 }\nand node features .  To define a spatial graph, we let   represent the cells as 𝑋 ∈ 𝑅\n𝑛×𝑔\n𝑉 = 𝐶\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nnodes and  denote the expression profiles of cells.  represents spatial proximity within the 𝑋 𝐸𝑟\ndistance threshold . In particular, the existence of an edge  between cells  and  is 𝑟 𝑒\n(𝑟)\n𝑖𝑗 𝑐𝑖 𝑐𝑗\ndetermined by: \n 𝑒\n(𝑟)\n𝑖𝑗 ∈ 𝐸 ⇔ ‖ 𝑥𝑖 − 𝑥𝑗 ‖≤ 𝑟\nGNN-based representation learning \nWhile several architectures can be used to incorporate our assumptions, a GNN-based model is \nnaturally suitable as it inherently takes into account spatial proximity as a graph and is flexible to \nensure local coherence. We defined the model architecture below: \nLet  be the dimension of the learned embedding space . The network architecture ℎ 𝑍 ∈ 𝑅\n𝑛×ℎ\nconsists of an encoder for graph representation learning, a link prediction decoder, and a gene \nexpression decoder. The encoder learns a mapping function  that maps  into the embedding 𝑓θ 𝐺𝑟\nspace , 𝑍𝑟\n 𝑓θ = 𝑓𝑙𝑖𝑛𝑒𝑎𝑟 ◦ 𝑓𝑅𝑒𝐿𝑈 ◦ 𝑓𝐺𝐴𝑇 : 𝑅\n𝑛×(𝑛+𝑔) \n→ 𝑅\n𝑛×ℎ\n,\nThe link prediction decoder maps the embeddings of a pair of cells onto a scalar value as a \nprobability representing the existence of an edge, \n 𝑔ϕ = 𝑔𝑆𝑖𝑔𝑚𝑜𝑖𝑑 ◦ 𝑔𝑙𝑖𝑛𝑒𝑎𝑟 ◦ 𝑔𝑠𝑖𝑚: 𝑅\n𝑛\n2\n×ℎ\n→ 𝑅\n𝑛\n2\n×1\nwhere .   Finally, the gene expression 𝑔𝑠𝑖𝑚(𝑖, 𝑗) =− ||𝑓θ(𝑖) − 𝑓θ(𝑗)||² ∀𝑖, 𝑗  𝑠. 𝑑.  𝑒𝑖𝑗 ∈ 𝐸\ndecoder maps the embedding of each cell onto a reconstructed gene expression vector, \n ℎψ = ℎ𝑙𝑖𝑛𝑒𝑎𝑟: 𝑅\n𝑛×ℎ\n→ 𝑅\n𝑛×𝑔\nObjective function \nWe defined a bi-objective function that incorporates both decoders of spatial and molecular \nobjectives as: \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \n 𝐿 θ, ϕ,  ψ( ) = 𝐿1 θ, ϕ,  ψ( ) + λ𝐿2 θ, ϕ,  ψ( )\nwhere: \n•  is the binary cross-entropy (BCE) loss for link prediction, 𝐿1 = 𝐵𝐶𝐸 𝑔ϕ 𝑓θ 𝐺𝑟 ( )( ), 𝐴( )\nencouraging the model to capture local spatial structure. \n•  is the mean squared error (MSE) for gene expression 𝐿2 = 𝑀𝑆𝐸 ℎψ 𝑓θ 𝐺𝑟 ( )( ), 𝑌( )\nprediction, ensuring the embeddings capture global gene expression patterns. \nPareto Optimization for  λ\nTo find optimal trade-offs between spatial and molecular objectives, we trained multiple models \nwith different  values: 0.000001, 0.000005, 0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, λ\n0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000. For each value of , we calculated the λ\ncorrelation between the GNN embedding space and spatial adjacencies based on Moran’s I (MI)  \n(using the morans_i function from the Scanpy v1.10.1 Python package). To determine the \noptimal , we first identified the saturation point on the -  curve. If no saturation point was λ 𝑀𝐼 λ\nobserved, we selected the  value corresponding to the maximum observed MI. The algorithm is λ\npresented in Supplementary Algorithm 1. \nDomain identification by clustering \nFor a given radius , we constructed  and computed the embeddings . Next, we 𝑟 𝐺𝑟 𝑍𝑟 = 𝑓θ(𝐺𝑟)\napplied a hard clustering method on  to identify spatial domains satisfying 𝑍𝑟\n. Here, we used the Leiden clustering algorithm as implemented in the 𝑑𝑖 ∩ 𝑑𝑗 = ∅;     ∀ 𝑖≠𝑗\nScanpy Python package. An important hyperparameter of Leiden is the resolution  that controls γ\nthe number of clusters. Hence, the final clustering  is a function of . 𝐷 (𝑟,  γ)\nMulti-scale domain identification by optimizing  and   𝑟 γ\nEach value of  can represent a potential solution at a different scale. To identify an (𝑟, γ)\noptimized set of  values, we implemented a two-step scale-tuning algorithm approach (𝑟, γ)\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \ndesigned to ensure: (i) cluster stability, and (ii) a nested structure of the clusters from different \nscales. These two steps are described in the following: \nThe final clustering , to some degree, can depend on algorithmic initializations. We 𝐷(𝑟, γ)\ndenote this variation using an index  as  (  between 15  and 55  and  between 𝑖 𝐷\n(𝑖)\n(𝑟, γ) 𝑟 µ𝑚 µ𝑚 γ\n0.01 and 1.2, see Supplementary Table 1). Among the clusterings obtained from different scales \n, we selected those with larger stability, i.e., being less sensitive to random initializations27. (𝑟, γ)\nTo define the stability of clusters generated based on  we used a measure of cluster (𝑟, γ)\nsimilarity, in particular ARI, as \n 𝑆 𝑟, γ( ) =\n𝑖≠𝑗\n∑ 𝐴𝑅𝐼 𝐷\n(𝑖)\n(𝑟, γ), 𝐷\n(𝑗)\n(𝑟, γ)( )\nHigh  indicates stable domain identification, while low  suggests the scale  𝑆 𝑟, γ( ) 𝑆 𝑟, γ( ) 𝑟, γ( )\nmay not be appropriate for domain identification. Calculating the cluster stability allows us to \ndetermine a set of the most stable clusterings for the data. We selected the top k (15%) most \nstable solutions for subsequent analysis, assuming all biologically meaningful domain scales are \ncontained in this set. Next, we considered that domains identified at lower-level scales will \ngenerally be nested within those identified at higher-level scales. Accordingly, we introduced a \nnovel entropy-search algorithm to find sets of  for a desired number of scales, optimizing 𝑟, γ( )\nfor a nested structure where lower-scale domains align with those at higher scales. The algorithm \nsearches for a pair of domain clusterings  and  at scales  and 𝐷𝑝 = {𝑑1\n𝑝\n,  ..., 𝑑𝑟\n𝑝\n} 𝐷𝑞 = {𝑑1\n𝑞\n,  ..., 𝑑𝑠\n𝑞\n} 𝑝\n that best adhere to the nestedness assumption. We achieve this by maximizing the following 𝑞\nentropy score: \n 𝐻(𝑝, 𝑞) =− 1\n𝑠 𝑗=1\n𝑠\n∑\n𝑖∈1\n𝑟\n∑ 𝑃𝑖𝑗𝑙𝑜𝑔2(𝑃𝑖𝑗)\nwhere  is the probability that any cell  inside the lower-level cluster  is also contained in 𝑃𝑖𝑗 𝑥 𝑑𝑗\n𝑞\nthe higher-level cluster  defined as 𝑑𝑖\n𝑝\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \n. 𝑃𝑖𝑗 = 𝑃(𝑥 ∈ 𝑑𝑖\n𝑝\n|𝑥 ∈ 𝑑𝑗\n𝑞\n) =\n|𝑑𝑖\n𝑝\n ∩ 𝑑𝑗\n𝑞\n|\n|𝑑𝑗\n𝑞\n|\n \nThis approach naturally extends to  scales by iterating over all possible -tuples of , 𝑙 𝑙 𝑟, γ( )\nordered by the number of clusters. For each -tuple, we compute the entropy score  across all 𝑙 𝐻\n cluster pairs it contains. The -tuple with the lowest average entropy score is then selected 𝑙 − 1 𝑙\nas the final multi-scale domain solution. The user can also specify the minimum number of \nclusters to be added at each scale (we set this to 20 for the mouse brain data, and 5 for the kidney \ndata). \nDomain annotation based on marker genes \nAfter domain identification, we annotated the kidney compartments using the \ncompartment-specific marker gene sets (Supplementary Fig. 4). For each gene set, we calculated \na score by computing the average expression of its constituent genes. These scores were \naveraged across all cells in a domain to obtain domain-specific kidney compartment scores. \nFinally, we assigned the kidney compartment with the highest score to a domain (Supplementary \nFig. 6). \nData \nDataset 1. This publicly available dataset was generated using MERFISH technology by the \nmanufacturer Vizgen. It contains measurements for 483 genes, with cell segmentation performed \nusing Vizgen's default pipeline. We applied the same domain annotation workflow as Schaub et \nal. (2024)17 to the four most symmetric samples. Each of these four samples contains \napproximately 80,000 cells. Links to the original data are provided in Supplementary Table 2. \nDataset 2. This publicly available dataset comprises three mouse brain samples measured using \nXenium technology by 10x Genomics. Each sample includes approximately 150,000 cells with \nexpression data for 248 unique genes. Cells were segmented using the proprietary pipeline from \n10x Genomics. The dataset was recently annotated by Bhuva et al. (2024)22, who mapped spatial \ndomains at the transcript level based on the Allen Brain Reference Atlas21. We followed the same \npreprocessing steps as in Schaub et al. (2024)17 and excluded dissociated cells by constructing a \ncell graph using a 60-micron distance threshold and retaining only cells within the largest \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nconnected component. The data was downloaded from the resource provided by Bhuva et al. \n(2024)22, and the corresponding links are provided in Supplementary Table 2. \nFor all datasets, we first removed cells containing fewer than 10 transcripts and genes expressed \nin fewer than 5 cells. Then, we normalized the raw spatial transcriptomics data to sum to their \nmedians and applied a log-transformation, respectively, using the normalize_total and log1p \nfunctions of the Scanpy (v1.10.1) Python package3. \nDataset 3. Sultana et al.24 generated spatially resolved gene expression data from 64 kidney \nbiopsies. Gene expression was profiled at subcellular resolution using the 10x Xenium platform, \nemploying a custom panel of 480 genes on tissue sections. \nEvaluation metrics \nTo evaluate the performance of SCALE in recognizing the “ground-truth” labels and compare it \nwith other methods, we used the following metrics. All of these metrics were implemented via \nthe scitkit-learn (v1.3.0) Python package28. \nAdjusted mutual information score (AMI): AMI quantifies the similarity between two \nclusterings  and  (i.e. in our case, the model-generated clusters and the ground truth) based on 𝑈 𝑉\ntheir mutual information  while accounting for the possibility that some agreement 𝑀𝐼(𝑈,  𝑉)\nbetween clusterings could happen randomly. AMI is defined as \n 𝐴𝑀𝐼(𝑈, 𝑉) = 𝑀𝐼(𝑈, 𝑉) − 𝐸(𝑀𝐼(𝑈, 𝑉))\n𝑎𝑣𝑔(𝐻(𝑈), 𝐻(𝑉)) − 𝐸(𝑀𝐼(𝑈, 𝑉))\nwhere ,  and  represent the mutual information, expectation, and average, 𝑀𝐼() 𝐸() 𝑎𝑣𝑔()\nrespectively. The score is normalized, with a value of 1 indicating perfect clustering \ncorrespondence, a value of 0 indicating that the similarity is no better than random, and negative \nvalues showing worse than random correspondence. \nHomogeneity score (HOM): HOM is a score evaluating the purity of clusters with respect to the \nground-truth labels. It measures whether each cluster primarily contains data points belonging to \na single class defined as \n 𝐻𝑂𝑀(𝑈, 𝑉) = 1 − 𝐻(𝑈|𝑉)\n𝐻(𝑈)\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nwhere  is the Shannon entropy. A HOM of 1 indicates perfect homogeneity, meaning each 𝐻()\ncluster is entirely composed of a single class, while a score of 0 indicates poor homogeneity. \nCompleteness score (COM): COM is a score evaluating the extent to which all data points \nbelonging to a given ground-truth label are assigned to the same cluster recognized by the \nmodels  defined as \n 𝐻𝑂𝑀(𝑈, 𝑉) = 1 − 𝐻(𝑉|𝑈)\n𝐻(𝑉)\nA COM of 1 indicates perfect completeness, meaning all members of each class are grouped \nwithin a single cluster, while a score of 0 suggests that members of the same class are distributed \nacross multiple clusters. \nAdjusted rand index (ARI): ARI is a measure used to evaluate the similarity between two data \nclusterings by considering both the agreements and disagreements between cluster pairs and \ncounting pairs that are assigned in the same or different clusters in the predicted and ground-truth \nclusterings. ARI is defined as \n 𝐴𝑅𝐼(𝑈, 𝑉) = 𝑅𝐼−𝐸(𝑅𝐼)\n𝑚𝑎𝑥(𝑅𝐼)−𝐸(𝑅𝐼)\nwhere  is the rand score defined as 𝑅𝐼\n 𝑅𝐼 = 𝑇𝑃+𝑇𝑁\n𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁\nwhere: \n● TP is the number of pairs that are in the same cluster in both U and V (true positives).   \n● TN is the number of pairs that are in different clusters in both U and V (true negatives). \n● FP is the number of pairs that are in the same cluster in U but in different clusters in V \n(false positives). \n● FN is the number of pairs that are in different clusters in U but in the same cluster in V \n(false negatives). \nAn ARI  of 1 indicates perfect clustering correspondence, and 0 indicates random labeling. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nBenchmarking workflow \nWe used the ground truth annotations for Datasets 1 and 2 to quantitatively compare the \nperformance of SCALE with the following existing spatial domain identification methods: \nMENDER18, Banksy20, CellCharter19, SCAN-IT6, SpaceFlow10. We additionally attempted to \ncompare our method against STAGATE8, BASS13, SpatialPCA29, and SpaGCN7, but we were not \nable to execute them on either Dataset within a memory budget of 128 GB. \nFor SCALE, we applied our automatic cluster-stability-based search procedure per sample to \ndetermine the distance cutoff for spatial graph construction and the resolution for Leiden \nclustering (with the only restriction that the number of clusters should be at least 30). No other \nmethod allows for this kind of unsupervised parameter selection. CellCharter offers a similar \nprocedure, called AutoK, but it only determines the number of clusters. For their algorithm, we \nlimited the number of clusters to be between 20 and 60. To make the results for the existing \nmethods comparable, we selected the default distance cutoff for graph construction (or similar \nparameters, depending on the method) while choosing the cluster resolution (or number of \nclusters) in a supervised fashion per sample, i.e., we selected the resolution resulting in the best \nAMI score. Note that this comparison favors the competing methods over SCALE. \nMoreover, we compared SCALE against competing methods with tuned hyperparameters, i.e., \nnot only the resolution (or number of clusters) but also another method-specific parameter was \nselected to maximize the AMI score. The parameter space for each method is provided in \nSupplementary Table 3. \nComputational resources \nComputations were performed on an AMD EPYC 7742 central processing unit (2.25 GHz, 512 \nMB L3 cache, 64 cores, 128 GB RAM) and an NVIDIA A100 Tensor Core GPU with 40 GB \nVRAM. \nData availability \nAll the raw data can be downloaded from the links provided in Supplementary Table 2. The \nworkflows for further processing and domain annotation are described in the Methods section. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nCode availability \nThe code for SCALE is available at https://github.com/imsb-uke/scale. All scripts to reproduce \nthe benchmarking and analysis are available at https://github.com/imsb-uke/scale-analysis. \nAcknowledgments \nWe would like to thank the members of the Institute of Medical Systems Bioinformatics for their \nfeedback and Sven Heins and Vadim Ustinov for IT support. This study was supported by grants \nfrom the Deutsche Forschungsgemeinschaft (DFG) to UP (SFB 1192 A1 and C3), CFK (SFB \n1192 A5 and C3; KR 3483/3-1), SB (SFB 1192 A2, B8, and C3), RK (FOR5068 P9), CL (SFB \n1286 Z2), DPS (SFB 1192 A1). BY was supported by the Federal Ministry of Education and \nResearch (BMBF) as part of the German Center for Child and Adolescent Health (DZKJ). NK \nwas supported by DFG SFB1192 B8 and CDL FLIGHT of the University of Hamburg. \nAuthor contributions \nConceptualization and methodology: BY, DPS, and SB. Formal analysis: BY, DPS, RK, CL, \nMK, and NK. Writing original draft: BY, DSP, RK, CL. Reviewing and editing of the \nmanuscript: BY, DSP, RK, CL, MK, NK, and SB. Visualization: BY, DSP, RK, CL, and NK. \nSupervision: VGP, TBH, CFK, UP, and SB. Funding acquisition: VGP, TBH, CFK, UP, and SB. \nConflict of interest \nThe authors declare that no conflict of interest exists. \n \n \n \n \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nReferences \n1. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. RNA imaging. Spatially \nresolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015). \n2. Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated \nsingle-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023). \n3. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data \nanalysis. Genome Biol. 19, 15 (2018). \n4. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic \ndata across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018). \n5. Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, \n1375–1384 (2021). \n6. Cang, Z., Ning, X., Nie, A., Xu, M. & Zhang, J. SCAN-IT: Domain segmentation of spatial \ntranscriptomics images by graph neural network. BMVC 32, (2021). \n7. Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial \ndomains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 \n(2021). \n8. Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an \nadaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022). \n9. Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with \ngraph neural networks. Nat. Comput. Sci. 2, 399–408 (2022). \n10. Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of \ncells with SpaceFlow. Nat. Commun. 13, 4076 (2022). \n11. Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial \ntranscriptomics with GraphST. Nat. Commun. 14, 1155 (2023). \n12. Liu, T. et al. A comprehensive overview of graph neural network-based approaches to clustering for \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nspatial transcriptomics. Comput. Struct. Biotechnol. J. 23, 106–128 (2024). \n13. Li, Z. & Zhou, X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering \nand spatial domain detection in spatial transcriptomic studies. Genome Biol. 23, 168 (2022). \n14. Walker, B. L. & Nie, Q. NeST: nested hierarchical structure identification in spatial transcriptomic \ndata. Nat. Commun. 14, 6554 (2023). \n15. Veličković, P. et al. Graph Attention Networks. arXiv [stat.ML] (2017) \ndoi:10.48550/ARXIV.1710.10903. \n16. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected \ncommunities. Sci. Rep. 9, 5233 (2019). \n17. Schaub, D. P. et al. PCA-based spatial domain identification with state-of-the-art performance. \nBioinformatics 41, (2024). \n18. Yuan, Z. MENDER: fast and scalable tissue structure identification in spatial omics data. Nat. \nCommun. 15, 207 (2024). \n19. Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals \nspatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024). \n20. Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial \nomics data analysis. Nat. Genet. 56, 431–441 (2024). \n21. Wang, Q. et al. The Allen Mouse Brain Common Coordinate Framework: A 3D reference atlas. Cell \n181, 936–953.e20 (2020). \n22. Bhuva, D. D. et al. Library size confounds biology in spatial transcriptomics data. Genome Biol. 25, \n99 (2024). \n23. Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic \npreoptic region. Science 362, eaau5324 (2018). \n24. Sultana, Z. et al. Spatio-temporal interaction of immune and renal cells determines glomerular \ncrescent formation in autoimmune kidney disease. Immunology (2024). \n25. Balzer, M. S., Rohacs, T. & Susztak, K. How many cell types are in the kidney and what do they do? \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint \n\n \nAnnu. Rev. Physiol. 84, 507–531 (2022). \n26. Engesser, J. et al. Immune profiling-based targeting of pathogenic T cells with ustekinumab in \nANCA-associated glomerulonephritis. Nat. Commun. 15, 8220 (2024). \n27. Yousefi, B. & Schwikowski, B. Consensus clustering for robust bioinformatics analysis. bioRxiv \n(2024) doi:10.1101/2024.03.21.586064. \n28. Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. arXiv [cs.LG] (2012). \n29. Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. \nCommun. 13, 7203 (2022). \n23. Sultana, Z. et al. Spatio-temporal interaction of immune and renal cells determines glomerular \ncrescent formation in autoimmune kidney disease. biorxiv (2024). \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 27, 2025. ; https://doi.org/10.1101/2025.05.21.653987doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}