Accurate, Scalable and Cross-platform Cell Identification for High-resolution Spatial Transcriptomics | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Technical Report Accurate, Scalable and Cross-platform Cell Identification for High-resolution Spatial Transcriptomics Chenfei Wang, Dongqing Sun, Lele Zhang, Tong Han, Qiu Wu, Peng Zhang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4428586/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Recent advances in spatial transcriptomics (ST) have brought unparalleled opportunities to unravel cellular diversity and cell-cell interactions within their spatial context. High-resolution ST techniques, including barcoding-based and imaging-based platforms, have achieved remarkable sub-cellular resolution. However, the precise segmentation of cells remains a significant challenge and hampers effective single-cell spatial analysis. Existing methods targeting specific techniques lack compatibility across different high-resolution ST platforms, and many of them cannot scale well to ST data with large fields of view. Here, we introduce Cellist, a novel multi-modal cell-segmentation method that synergistically combines image and expression information, enabling comprehensive investigations at the cell level. Employing Cellist on mouse brain Stereo-seq data, we showcased its superiority in yielding heightened within-cell homogeneity over existing methods. Furthermore, Cellist facilitated accurate spatial domain identification and cell-type annotation. Importantly, Cellist is suitable for various ST techniques including Seq-Scope, seqFISH+, STARmap, and 10x Xenium, exhibiting exceptional accuracy across diverse high-resolution ST platforms and biological systems, all while maintaining high computational efficiency. Finally, we applied Cellist to post-neoadjuvant immunotherapy non-small cell lung cancer (NSCLC) samples. Our analyses revealed the spatial heterogeneity of tumor clones and identified therapy response-related myeloid subtypes and structures. These findings highlight the immense potential of Cellist in enhancing the power of high-resolution ST techniques for characterizing the spatial organization of cells and unraveling intricate tissue architectures. Cellist is publicly available at https://github.com/wanglabtongji/Cellist . Biological sciences/Computational biology and bioinformatics/Genome informatics Biological sciences/Computational biology and bioinformatics/Software Biological sciences/Cancer/Cancer microenvironment Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Full Text Additional Declarations There is NO Competing Interest. Supplementary Files Supplementaryinfomationfinal.pdf Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4428586","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Technical Report","associatedPublications":[],"authors":[{"id":307139820,"identity":"13cb7ddd-5306-4be9-ac20-b039ac669955","order_by":0,"name":"Chenfei Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABLUlEQVRIie3QsUrDQBjA8QuFyxLNmuNo+gpfOUiRvsyFglmCo3QoeCDcpLhGEPERnIrjQaFZTrpmbJdODu2gtFDFS7SDkFBHwfsPF0K+H3c5hGy2Pxh2BTaPEOHqdVitXCHUaiS+p8ph9k00HCYk41+kypGw/9BMQE+W8/UThMfBYLl4vd+FPf9hodCwHwv3WdWSXPa6txoYDk57rD0GdpLNzcF0EgvvjNfvojE9krtYUh5RMob4sVBcOXISi8CDWlKkmL5LuJA0eaPkbk8+mgnJDDF/zTFNI7IWhsyEIaKZ+N40ItcSurLzck7RlDEozCXzacKkl9YS7F4ug62ETucqH5PNKAxhpger1ajfvnF1LflRyyvXIOWovCp8cN7kbKqz5uo3wzabzfZ/+gR5f2S7ph/r9wAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-7573-3768","institution":"Tongji University","correspondingAuthor":true,"prefix":"","firstName":"Chenfei","middleName":"","lastName":"Wang","suffix":""},{"id":307139821,"identity":"743dfac9-1df8-4965-8548-8fc0e991e748","order_by":1,"name":"Dongqing Sun","email":"","orcid":"","institution":"Tongji University","correspondingAuthor":false,"prefix":"","firstName":"Dongqing","middleName":"","lastName":"Sun","suffix":""},{"id":307139822,"identity":"d8cd080e-1f49-4467-bf58-c95afd04b747","order_by":2,"name":"Lele Zhang","email":"","orcid":"https://orcid.org/0000-0002-5595-3103","institution":"Shanghai Pulmonary Hospital, Tongji University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Lele","middleName":"","lastName":"Zhang","suffix":""},{"id":307139823,"identity":"f49aab8c-ab19-4d6b-850e-06b31d04ded9","order_by":3,"name":"Tong Han","email":"","orcid":"","institution":"Tongji University","correspondingAuthor":false,"prefix":"","firstName":"Tong","middleName":"","lastName":"Han","suffix":""},{"id":307139824,"identity":"87e65199-db29-4dba-bcb2-87b45efb3a5a","order_by":4,"name":"Qiu Wu","email":"","orcid":"","institution":"Tongji University","correspondingAuthor":false,"prefix":"","firstName":"Qiu","middleName":"","lastName":"Wu","suffix":""},{"id":307139825,"identity":"e34b7d15-cd40-4405-8685-efdd28d9d926","order_by":5,"name":"Peng Zhang","email":"","orcid":"https://orcid.org/0000-0003-1771-7545","institution":"Shanghai Pulmonary Hospital, School of Medicine, Tongji University","correspondingAuthor":false,"prefix":"","firstName":"Peng","middleName":"","lastName":"Zhang","suffix":""}],"badges":[],"createdAt":"2024-05-16 05:35:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4428586/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4428586/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":57380179,"identity":"4ee523a1-130a-49c3-8256-4a5d6e37191c","added_by":"auto","created_at":"2024-05-30 01:18:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1004086,"visible":true,"origin":"","legend":"\u003cp\u003eCharacteristics of the Stereo-seq data and workflow of Cellist (A) Illustration showing the issue of transcript diffusion in the Stereo-seq mouse brain data. Stained single-stranded DNA (ssDNA) and sequenced unique molecule identifiers (UMIs) are depicted in red and green, respectively. (B) Distribution of expression correlation between nucleus and cytoplasm in the Stereo-seq mouse brain data. Nucleus and cytoplasm regions for each cell are defined using the Expansion method. Transcripts within each region are aggregated to represent their expression, and the Pearson’s correlation of expression between nucleus and cytoplasm is calculated in each cell. (C) Changes in spot-nucleus correlation over their distances in a selected cell from the Stereo-seq mouse brain data. The x-axis represents the division of distances between each outside spot and the nucleus center, while the y-axis represents the Pearson’s correlation between each spot’s augmented expression and the nucleus’ expression. (D) The illustration of the Cellist workflow. With spatial expression and a paired ssDNA staining image as input, registration is performed to align information from the two modalities. Watershed segmentation is followed to identify all nuclei in the image, which serve as the potential cells. For each spot outside nuclei, it is assigned as either part of a cell or background by combining the physical distance and expression dissimilarity between the spot and nearby nuclei. Following cell segmentation, spatially-aware expression imputation is supported by leveraging information from cells which are both close in physical space and similar in expression. Lastly, cell-level analyses can be performed to understand the spatial organization of cells, including cell clustering, cell-type annotation, spatial domain identification and cell-cell interaction analysis.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-4428586/v1/2877a5c00ef19cadf4b3beb2.png"},{"id":57380181,"identity":"329a1454-0b38-4134-b0c0-8844ef2ee524","added_by":"auto","created_at":"2024-05-30 01:18:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":657542,"visible":true,"origin":"","legend":"\u003cp\u003eApplication of Cellist and downstream analyses in the Stereo-seq mouse brain dataset (A) Number of cells generated by each segmentation method. Only cells with more than 50 genes are counted. (B) Distribution of number of covered genes in cells generated by different segmentation methods. Only cells with more than 50 genes are considered. (C) Evaluation of cell segmentation using the metric of random correlation. Schematic of random division of all spots into two parts (top). In each cell, the expression correlation between the two parts is calculated to measure 1116 expression purity. A higher correlation indicates higher expression purity. Distribution of random correlation in different methods is displayed using a box plot (bottom). The correlation calculation is limited to cells with more than 100 spots. (D) Evaluation of the agreement between Cellist segmentation and other methods. Schematic of the cross-evaluation strategy (top). For any two segmentations from methods A and B, the intersection region and the difference regions are computed. The Pearson’s correlation between each difference region and the intersection region can be calculated. The higher correlation indicates more accurate segmentation (bottom). (E) Anatomic regions provided by the original study (left) and the spatial domain identification result based on the Cellist cell segmentation (right). (F) Consistency of cell-based spatial domains with manually annotated bin-based anatomic regions. The adjusted Rand index (ARI) is calculated between cell-based spatial domains and anatomic regions at the cell level to measure consistency. (G) Spatial distribution of different cell types identified by Cellist. Spatial distribution of all cell types and two representative cell types DGGRC2 and OPC (top). Moran’s I scores of different cell types (bottom). Excitatory and inhibitory neurons are labelled by red and blue, respectively. A higher score indicates stronger spatial enrichment. (H) Consistency between Stereo-seq and reference scRNA-seq. For each cell type identified in Stereo-seq based on each segmentation method, the Pearson’s correlation of averaged gene expression is calculated between Stereo-seq and scRNA-seq. Each dot represents a cell type, and the black line indicates the median value among all cell types.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-4428586/v1/ea5d7ebfbe2b59d40763eb03.png"},{"id":57380180,"identity":"7e13f552-acd7-4ba7-a931-21ae10008bdf","added_by":"auto","created_at":"2024-05-30 01:18:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1517340,"visible":true,"origin":"","legend":"\u003cp\u003eApplication of Cellist on different high-resolution ST platforms and tissues (A) Cellist cell segmentation results of Tile 2104 (left) and Tile 2105 (right) in the Seq- Scope mouse liver dataset. The background is the gray-scaled H\u0026amp;E image and the foreground dots depict measured ST spots, colored by cell labels. (B) Box plot showing the distribution of within-cell random correlation generated by different segmentation methods in four tiles from the Seq-Scope mouse liver dataset. (C) Manual cell segmentation results from the original 1150 study (left) and cell segmentation results generated by Cellist (right) in the FOV 0 of the seqFISH+ NIH/3T3 dataset. (D) Consistency of different segmentation results with manual labels in cells from all seven FOVs of the seqFISH+ NIH/3T3 dataset. For each segmented cell from each method, the intersection over union (IoU) is calculated between it and the manual segmentation labels. Higher IoU indicates more accurate segmentation. (E) Cellist cell segmentation in one cropped FOV of the STARmap mouse primary visual cortex dataset. (F) Evaluation of the agreement between Cellist segmentation and other methods using the metric of cross-correlation in all segmented cells from the STARmap mouse primary visual cortex dataset. (G) Cell type annotation based on Cellist segmentation in the 10x Xenium human melanoma dataset. (H) Lollipop plots showing the log-transformed fold change (logFC) of top marker genes in T cells (top) and endothelial cells (bottom) versus all other cells. Higher logFC indicates more differential expression and thus more accurate segmentation.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-4428586/v1/411ace0e56be1ad9f4f65abe.png"},{"id":57380183,"identity":"666431e4-fc08-49c1-841e-9b837949fcd0","added_by":"auto","created_at":"2024-05-30 01:18:50","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":969213,"visible":true,"origin":"","legend":"\u003cp\u003eApplication of Cellist in the characterization of post-therapy TME of NSCLC (A) Workflow of the research design in the study of TME in NSCLC. Six specimens are collected from six NSCLC patients (5 LUSC and 1 LUAD) after neoadjuvant immunotherapy, which are subjected to Stereo-seq and matched scRNA-seq. After cell segmentation on Stereo-seq data achieved by Cellist, cell-level analyses are performed to characterize the TME, including cell-type annotation, tumor clone detection, spatial domain identification, and cell-cell interaction analysis. (B) Copy number profiles in the sample of ST2 estimated from Stereo-seq (top) and scRNA-seq (bottom) data using copKAT. (C) UMAP plot of the ST2 Stereo-seq data colored by cell types annotated using a marker-based method combined with CNV inference. (D) Spatial distribution of cell types identified in Fig. 4C. (E) Clonal substructure of malignant cells delineated by hierarchical clustering. (F) Spatial distribution of tumor clones identified in Fig. 4E. (G) Hallmark enrichment analysis using up-regulated genes in each of the tumor clones. The color of the dot represents the enrichment significance, and the size represents the number of genes enriched in the term.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-4428586/v1/1f05446ecbae790360092934.png"},{"id":57380184,"identity":"e0ec1051-e5ec-455d-9935-04e0d1dcc608","added_by":"auto","created_at":"2024-05-30 01:18:50","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1081989,"visible":true,"origin":"","legend":"\u003cp\u003eSpatial distribution of myeloid subtypes 1187 in the TME of NSCLC (A) Spatial distribution of cell types (left) and spatial domains identified using STAGATE (right) in the sample of ST1. (B) Cross-sample ecotype definition based on cell-type compositions. The bar plot displays the cell-type compositions in each domain from all samples. Hierarchical clustering on the cell-type enrichment profile classifies all domains into 6 ecotypes: tumor, airway, stroma, and lymphocyte-, myeloid-, or plasma-enriched ecotypes. (C) UMAP plot of myeloid cells extracted from all samples after Harmony integration, colored by marker-defined subtypes. (D) Dot plot showing the marker genes for each myeloid subtype. The color of the dot represents the scaled expression, and the size represents the percentage of cells expressing the marker. (E) Heatmap showing the signature score of different functional pathways in different myeloid subtypes. (F) Heatmap showing the enrichment of different myeloid subtypes in different domains. Only domains with over 100 myeloid cells are included in this analysis. Raw cell numbers are centered and scaled within each column. (G) Spatial distribution of myeloid subtypes in the sample of ST1. Spatial distribution of myeloid subtypes in the entire slide (left) and at the tumor-stroma boundary (center). Illustration of outer and inner cell layers determined according to the distance to the tumor domain (right). Each layer is approximately 10 μm in width. (H) Line plot showing cell numbers of different myeloid subtypes in different cell layers of ST1 (left) and ST5 (right). (I) Box plot showing the cytotoxic scores of cells located in different cell layers of ST1 and ST5. (J) Chord diagram showing the collagen-related interactions between myeloid subtypes and other cell types.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-4428586/v1/c6a62f28b126e760d7f6f3fb.png"},{"id":57380182,"identity":"4e2ceee5-d41f-4c9c-88f5-8cb04e3de0c3","added_by":"auto","created_at":"2024-05-30 01:18:50","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":3431967,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"Supplementaryinfomationfinal.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4428586/v1/d0829d10ad0da018eda7d118.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Accurate, Scalable and Cross-platform Cell Identification for High-resolution Spatial Transcriptomics","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4428586/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4428586/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Recent advances in spatial transcriptomics (ST) have brought unparalleled opportunities to unravel cellular diversity and cell-cell interactions within their spatial context. High-resolution ST techniques, including barcoding-based and imaging-based platforms, have achieved remarkable sub-cellular resolution. However, the precise segmentation of cells remains a significant challenge and hampers effective single-cell spatial analysis. Existing methods targeting specific techniques lack compatibility across different high-resolution ST platforms, and many of them cannot scale well to ST data with large fields of view. Here, we introduce Cellist, a novel multi-modal cell-segmentation method that synergistically combines image and expression information, enabling comprehensive investigations at the cell level. Employing Cellist on mouse brain Stereo-seq data, we showcased its superiority in yielding heightened within-cell homogeneity over existing methods. Furthermore, Cellist facilitated accurate spatial domain identification and cell-type annotation. Importantly, Cellist is suitable for various ST techniques including Seq-Scope, seqFISH+, STARmap, and 10x Xenium, exhibiting exceptional accuracy across diverse high-resolution ST platforms and biological systems, all while maintaining high computational efficiency. Finally, we applied Cellist to post-neoadjuvant immunotherapy non-small cell lung cancer (NSCLC) samples. Our analyses revealed the spatial heterogeneity of tumor clones and identified therapy response-related myeloid subtypes and structures. These findings highlight the immense potential of Cellist in enhancing the power of high-resolution ST techniques for characterizing the spatial organization of cells and unraveling intricate tissue architectures. Cellist is publicly available at https://github.com/wanglabtongji/Cellist.","manuscriptTitle":"Accurate, Scalable and Cross-platform Cell Identification for High-resolution Spatial Transcriptomics","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-05-30 01:18:45","doi":"10.21203/rs.3.rs-4428586/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-genetics","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ng","sideBox":"Learn more about [Nature Genetics](http://www.nature.com/ng/)","snPcode":"","submissionUrl":"","title":"Nature Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"172a4f5d-97a1-4c34-8721-d0b41b28293f","owner":[],"postedDate":"May 30th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":32438966,"name":"Biological sciences/Computational biology and bioinformatics/Genome informatics"},{"id":32438967,"name":"Biological sciences/Computational biology and bioinformatics/Software"},{"id":32438968,"name":"Biological sciences/Cancer/Cancer microenvironment"}],"tags":[],"updatedAt":"2026-04-22T21:15:58+00:00","versionOfRecord":[],"versionCreatedAt":"2024-05-30 01:18:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4428586","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4428586","identity":"rs-4428586","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.