A visual–omics foundation model to bridge histopathology image with transcriptomics

preprint OA: closed
Full text JSON View at publisher
Full text 37,515 characters · extracted from preprint-html · click to expand
A visual–omics foundation model to bridge histopathology image with transcriptomics | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A visual–omics foundation model to bridge histopathology image with transcriptomics Guangyu Wang, Weiqing Chen, Pengzhi Zhang, Tu Tran, Yiwei Xiao, and 15 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5183775/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 29 May, 2025 Read the published version in Nature Methods → Version 1 posted You are reading this latest preprint version Abstract Artificial intelligence has revolutionized computational biology. Recent developments in omics technologies, including single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST), provide detailed genomic data alongside tissue histology. However, current computational models focus on either omics or image analysis, lacking their integration. To address this, we developed OmiCLIP, a visual-omics foundation model linking hematoxylin and eosin (H&E) images and transcriptomics using tissue patches from Visium data. We transformed transcriptomic data into “sentences” by concatenating top-expressed gene symbols from each patch. We curated a dataset of 2.2 million paired tissue images and transcriptomic data across 32 organs to train OmiCLIP integrating histology and transcriptomics. Building on OmiCLIP, our Loki platform offers five key functions: tissue alignment, annotation via bulk RNA-seq or marker genes, cell type decomposition, image–transcriptomics retrieval, and ST gene expression prediction from H&E images. Compared with 22 state-of-the-art models on 5 simulations, 19 public, and 4 in-house experimental datasets, Loki demonstrated consistent accuracy and robustness. Biological sciences/Computational biology and bioinformatics/Computational models Biological sciences/Computational biology and bioinformatics/Software Biological sciences/Computational biology and bioinformatics/Machine learning Biological sciences/Computational biology and bioinformatics/Computational platforms and environments Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Full Text Additional Declarations There is NO Competing Interest. Supplementary Files NMETHA58071AWangGuidanceauthors17430077791.docx Checklist SupplementaryTable.xlsx Supplementary table nreditorialpolicychecklist.pdf Editorial Policy checklist nrreportingsummary.pdf Reporting Summary SupplymentaryInformationFile.pdf Supplymentary information File InventoryofSupportingInformation.docx Inventroy of supporting information PublicationLicenseApr022025.pdf Publication License of BioArt ExtendedDataFigures.pptx All extended data figures 687292supp749614sb5hbp.docx Inventroy of supporting information thirdpartyrightstable.docx third party rights table NMETHA58071AWangGuidanceauthors17430077791.docx Checklist Cite Share Download PDF Status: Published Journal Publication published 29 May, 2025 Read the published version in Nature Methods → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5183775","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":441330099,"identity":"072aa0df-4b0e-4805-95be-fb788d50fe78","order_by":0,"name":"Guangyu Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABKElEQVRIiWNgGAWjYNCCCpsEBEcCiBJwqYSDM2kgNYwNIDYPRIsBfh2MbYfRtDDg0SIfkfzs4Re283n8EunPH3zcsU3OXrr54I0Hf/4w8Esfv4BNi+GNNHNjGZ7bxZIzcgwbZ565bcwjcyzZIrHNgEGyL6cAq5bZCWbSEhK3EzfcyGFs5m27ndgjkWMmkdhgwGBwhicBu5b0b9ISBueAWtIfgrTUg7Uk/MGtRV46x0zyQ8IBoJYEQ5CWBB6wFjaQFvYD2LQYyL8pk2Y4kJw4s+eN4cyZbbcNe26kgfxizCPZw4M9xHqOb5P8+c8usZ89/cGHj2235dlnJB+8+eOPnBw/D/sDrLYA7WbGahpQkAdr5Mg3AKPwB1YHMDBgt2UUjIJRMApGHAAADRlnHm2WYWAAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-4803-7200","institution":"Houston Methodist Research Institute","correspondingAuthor":true,"prefix":"","firstName":"Guangyu","middleName":"","lastName":"Wang","suffix":""},{"id":441330100,"identity":"85b8ba00-f99d-4306-ad39-44c786297aff","order_by":1,"name":"Weiqing Chen","email":"","orcid":"https://orcid.org/0000-0003-3539-9210","institution":"Cornell University","correspondingAuthor":false,"prefix":"","firstName":"Weiqing","middleName":"","lastName":"Chen","suffix":""},{"id":441330101,"identity":"add09f9d-3ce0-4ed9-b553-7e3b424b3c28","order_by":2,"name":"Pengzhi Zhang","email":"","orcid":"https://orcid.org/0000-0001-6920-1490","institution":"Houston Methodist Research Insitute","correspondingAuthor":false,"prefix":"","firstName":"Pengzhi","middleName":"","lastName":"Zhang","suffix":""},{"id":441330102,"identity":"600bc21b-d0d5-46fb-9aa0-705e110f2587","order_by":3,"name":"Tu Tran","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Tu","middleName":"","lastName":"Tran","suffix":""},{"id":441330103,"identity":"f656a102-b92e-470b-806c-92c13596dd55","order_by":4,"name":"Yiwei Xiao","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Yiwei","middleName":"","lastName":"Xiao","suffix":""},{"id":441330104,"identity":"2197fbc7-4cc8-483b-a238-01c0d7d89990","order_by":5,"name":"Shengyu Li","email":"","orcid":"","institution":"Houston Methodist Research Insitute","correspondingAuthor":false,"prefix":"","firstName":"Shengyu","middleName":"","lastName":"Li","suffix":""},{"id":441330105,"identity":"338d8709-d8aa-4cf5-8c9e-be71b4174b0d","order_by":6,"name":"Vrutant Shah","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Vrutant","middleName":"","lastName":"Shah","suffix":""},{"id":441330106,"identity":"514fb838-f4d2-44df-b1f8-17200f9773f2","order_by":7,"name":"Kristopher Brannan","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Kristopher","middleName":"","lastName":"Brannan","suffix":""},{"id":441330107,"identity":"39963c3b-48ab-4fc0-8d0d-d427b596b5be","order_by":8,"name":"Keith Youker","email":"","orcid":"https://orcid.org/0000-0003-2535-7973","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Keith","middleName":"","lastName":"Youker","suffix":""},{"id":441330108,"identity":"e6395d47-e724-4fcf-9fbd-8794d7375fe5","order_by":9,"name":"Li Lai","email":"","orcid":"https://orcid.org/0000-0002-5731-2705","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Li","middleName":"","lastName":"Lai","suffix":""},{"id":441330109,"identity":"c12e8282-d2a1-416c-9cae-ad6e364a0b54","order_by":10,"name":"Longhou Fang","email":"","orcid":"https://orcid.org/0000-0003-1653-5221","institution":"Houston Methodist Academic Institute","correspondingAuthor":false,"prefix":"","firstName":"Longhou","middleName":"","lastName":"Fang","suffix":""},{"id":441330110,"identity":"ae81257f-f90a-425d-ab84-8e839218594b","order_by":11,"name":"Yu Yang","email":"","orcid":"","institution":"University of Florida College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Yang","suffix":""},{"id":441330111,"identity":"34df1b84-fc5c-483d-8e3d-2a7e725b89a9","order_by":12,"name":"Nhat-Tu Le","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Nhat-Tu","middleName":"","lastName":"Le","suffix":""},{"id":441330112,"identity":"1129ab8e-b90e-417d-b9d6-89af4ce02c90","order_by":13,"name":"Jun-Ichi Abe","email":"","orcid":"https://orcid.org/0000-0001-7439-7774","institution":"The University of Texas M. D. Anderson Cancer Center","correspondingAuthor":false,"prefix":"","firstName":"Jun-Ichi","middleName":"","lastName":"Abe","suffix":""},{"id":441330113,"identity":"45282e99-9145-4b62-ae7d-49ed075e8adc","order_by":14,"name":"Shu-Hsia Chen","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"Shu-Hsia","middleName":"","lastName":"Chen","suffix":""},{"id":441330114,"identity":"c0dffda0-f298-4b25-ab41-a2c777560b47","order_by":15,"name":"Qin Ma","email":"","orcid":"https://orcid.org/0000-0002-3264-8392","institution":"The Ohio State University","correspondingAuthor":false,"prefix":"","firstName":"Qin","middleName":"","lastName":"Ma","suffix":""},{"id":441330115,"identity":"5121ffab-e3e9-48f0-8fac-c2081247cb56","order_by":16,"name":"Ken Chen","email":"","orcid":"https://orcid.org/0000-0003-4013-5279","institution":"The University of Texas MD Anderson Cancer Center","correspondingAuthor":false,"prefix":"","firstName":"Ken","middleName":"","lastName":"Chen","suffix":""},{"id":441330116,"identity":"d8f397f7-4ac9-4d57-9c38-812b9d815076","order_by":17,"name":"Qianqian Song","email":"","orcid":"https://orcid.org/0000-0002-4455-5302","institution":"University of Florida","correspondingAuthor":false,"prefix":"","firstName":"Qianqian","middleName":"","lastName":"Song","suffix":""},{"id":441330117,"identity":"83a17ff5-4a04-4518-b838-83e077a46348","order_by":18,"name":"John Cooke","email":"","orcid":"","institution":"Houston Methodist Research Institute","correspondingAuthor":false,"prefix":"","firstName":"John","middleName":"","lastName":"Cooke","suffix":""},{"id":441330118,"identity":"518825df-37f1-4502-936f-a5215e857f9d","order_by":19,"name":"Hao Cheng","email":"","orcid":"","institution":"The Ohio State University","correspondingAuthor":false,"prefix":"","firstName":"Hao","middleName":"","lastName":"Cheng","suffix":""}],"badges":[],"createdAt":"2024-10-01 00:35:32","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5183775/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5183775/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41592-025-02707-1","type":"published","date":"2025-05-29T04:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80714107,"identity":"85835051-ca07-44cc-a204-ab57ef1f51f8","added_by":"auto","created_at":"2025-04-16 09:30:06","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1420337,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOverview of the study.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eThe workflow of pre-training the OmiCLIP model with paired image–image-transcriptomics dataset via contrastive learning.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb, \u003c/strong\u003eThe workflow of Loki platform using OmiCLIP foundation model as an engine. Left diagram illustrates the size of the training data in different organs. Right diagram lists the existing modules of the Loki platform, including tissue alignment, cell type decomposition, tissue annotation, ST gene expression prediction, and histology image–transcriptomics retrieval.\u003cstrong\u003e \u003c/strong\u003eCreated in BioRender.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec, \u003c/strong\u003eThe heatmap represents image embeddings and transcriptomic embeddings similarity across various organs and disease conditions. The color of the heatmap reflects the OmiCLIP’s embedding similarities, with red indicating high similarity and blue with low similarity.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed, \u003c/strong\u003eSchematic illustration of Loki platform with transfer learning for 3D tissue analysis. Created in BioRender.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/93a8c720b3b5552a552369af.png"},{"id":80714109,"identity":"cc2863ab-f35a-44b7-adb7-ec68efbe70fa","added_by":"auto","created_at":"2025-04-16 09:30:06","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1275989,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTissue alignment.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eSchematic illustration of tissue alignment using ST and histology image with Loki Align.\u003cstrong\u003e \u003c/strong\u003eCreated in BioRender.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb,\u003c/strong\u003e Performance comparison of tissue alignment on 100 low-noise and 100 high-noise simulated datasets, represented by the distance between ground truth and aligned simulated sample using Loki (ST to ST and Image-to-ST) and baseline methods PASTE (ST-to-ST) and GPSA (ST-to-ST), respectively. P-values were calculated using a one-sided Wilcoxon test.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec, \u003c/strong\u003eAlignment results on 8 adjacent normal human small intestine samples using Loki (ST-to-ST and Image-to-ST) and baseline methods PASTE (ST-to-ST), GPSA (ST-to-ST) and CPD method (ST-to-ST), respectively. We colored the samples using the top three PCA components of OmiCLIP transcriptomic embeddings, mapped to red, green, and blue color channels, respectively.\u003cstrong\u003e \u003c/strong\u003eFor visualization, we stack the 8 samples together along the perpendicular axis before and after different alignment methods respectively, and visualize from the side view. The source2 that has no spatial variable gene selected by GPSA to run it, is marked as N/A. Boxplots show the comparison of tissue alignment performances on these 7 source samples respectively and combined, represented by the PCC (and Kendall’s tau coefficient in Extended Fig. 4a) of highly variable gene expression between target and source sample after alignment at the same location, using Loki and baseline methods (PASTE, GPSA and CPD method using PCA embeddings as input) respectively.\u003cstrong\u003e \u003c/strong\u003eIn the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed, \u003c/strong\u003eTissue\u003cstrong\u003e \u003c/strong\u003ealignment of 2 adjacent human ovarian carcinosarcoma samples\u003cstrong\u003e \u003c/strong\u003eusing Loki (ST-to-ST and Image-to-ST) and baseline methods PASTE (ST-to-ST), GPSA (ST-to-ST) and CAST (ST-to-ST), respectively.\u003cstrong\u003e \u003c/strong\u003eWe colored the samples as described in \u003cstrong\u003eb\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ee, \u003c/strong\u003eAlignment performance comparison using PCC and Kendall’s tau coefficient of the highly expressed gene expression between target sample and source sample at aligned locations, using Loki (ST-to-ST and Image-to-ST) and baseline methods PASTE (ST-to-ST), GPSA (ST-to-ST) and CAST (ST-to-ST), respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range, n=147.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/5e72233287cf1797fba4906e.png"},{"id":80713101,"identity":"b4927b6e-0663-4f46-8c10-727e5c0a4f72","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1309530,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTissue annotation using bulk RNA-seq data.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eSchematic illustration of tissue annotation using H\u0026amp;E image and reference bulk RNA-seq data from different sources, with OmiCLIP paired image and transcriptomic embeddings.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb, \u003c/strong\u003eHistology WSIs of breast cancer, heart failure, and normal breast samples. The major tumor regions, fibroblast cell enriched regions, and adipose regions are annotated by pathology experts in black lines. Heatmap shows the similarity of WSIs to the corresponding reference bulk RNA-seq of tumor, fibroblast, and adipose, respectively. The color of the heatmap reflects the similarities between WSIs and reference bulk RNA-seq data, with red indicating high similarity and blue with low similarity. CLAM attention heatmaps were generated using CLAM with default parameters.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/03045d1e0be5f98566b03956.png"},{"id":80714337,"identity":"142eb1f7-375d-4caa-9657-c22f5b3b6c07","added_by":"auto","created_at":"2025-04-16 09:38:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":381770,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTissue annotation using marker genes.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eSchematic illustration of tissue annotation using H\u0026amp;E image and reference marker genes. The annotation result is decided by choosing the candidate texts with the highest similarity score to the input image query. For Loki, we used the text content of marker gene symbols of each tissue type. For PLIP model, we used the text content of natural language description of each tissue type.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb,\u003c/strong\u003e Examples of similarity scores of images and texts calculated by Loki and OpenAI CLIP model, respectively.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec,\u003c/strong\u003eComparison of zero-shot performances, represented by weighted F1 scores, across four datasets using Loki and OpenAI CLIP model, respectively. Number of test samples for each dataset are CRC7K (n = 6,333); WSSS4LUAD (n = 10,091); LC25000 (n = 15,000); and PatchCamelyon (n = 32,768).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed, \u003c/strong\u003eComparison of zero-shot performances, represented by weighted F1 scores, across four datasets using Loki, PLIP model, and incorporating Loki and PLIP models by average similarity (shown in panel a, Methods), respectively.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ee, \u003c/strong\u003eComparison of zero-shot performances, represented by weighted F1 scores of each tissue type in the CRC7K dataset using OpenAI CLIP model, Loki, PLIP model, and incorporating Loki and PLIP models, respectively.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ef, \u003c/strong\u003eConfusion matrix of the CRC7K dataset using Loki (left), PLIP model (middle), and incorporating Loki and PLIP models (right), respectively. The ground truth labels are presented in rows and the predicted labels are presented in columns. Adipose tissue abbreviated as ADI, normal colon mucosa abbreviated as NOR, colorectal carcinoma epithelium abbreviated as TUM, lymphocytes abbreviated as LYM, mucus abbreviated as MUC, debris abbreviated as DEB, smooth muscle abbreviated as MUS, and cancer-associated stroma abbreviated as STR.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/dba6c45a3c96960e032419fc.png"},{"id":80713104,"identity":"7289e3cf-aafb-4431-a8d7-f6d4b5992f3f","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1679249,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCell type decomposition.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eSchematic illustration of tissue alignment using ST, reference scRNA-seq data, and histology image with OmiCLIP paired transcriptomic and image embeddings after finetuning.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb, \u003c/strong\u003eH\u0026amp;E image of our in-house triple-negative breast cancer (TNBC) patient sample, characterized by Xenium into three major cell types: cancer epithelial, immune, and stromal cells.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec, \u003c/strong\u003ePerformance comparison of 12 decomposition methods using JS divergence, SSIM, and impact scores. Z-scores of JS divergence (or SSIM) across methods was calculated based on the average JS divergence (or SSIM) among cell types. The impact score of each method is the average of the z-score of JS divergence and SSIM (Methods). The green color indicates decomposition tools. The blue color indicates the performance of replacing OmiCLIP embeddings with other transcriptomic foundation models’ embeddings.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed,\u003c/strong\u003e Cell type decomposition results on three major cell types of the TNBC sample using the image by Loki and using ST by Tangram, with Xenium data as ground truth. The color of the heatmap reflects the z-score, calculated by the probability distribution of each cell type.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ee, \u003c/strong\u003eH\u0026amp;E image of the human colorectal cancer sample and cell type distribution within the Visium HD capture area.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ef,\u003c/strong\u003e Bar plot shows the accuracy of decomposition on four major cell types by Loki using ST or image, and by Tangram using ST. Error bar is standard deviation with center measured by mean. For both JS divergence and SSIM, adjusted \u003cem\u003ep\u003c/em\u003e-value \u0026gt; 0.1 using a two-sided Wilcoxon test.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eg,\u003c/strong\u003e Whole-slide (20mm×13mm) human colorectal cancer cell type decomposition. Different tissue regions are annotated by the pathologist as ground truth. Heatmap shows the cell type distribution of fibroblast, tumor, intestinal epithelial, smooth muscle, and immune/inflammatory, with color reflecting the density of each cell type. CLAM attention heatmaps were generated using CLAM with default parameters.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eh,\u003c/strong\u003e Cell type decomposition results on the brain sample. Left: brain anatomical references with zoom-in H\u0026amp;E image patches of L1 (VLMCs, astrocytes), L2/3, L4/5, L6, and WM (oligodendrocytes), respectively. Created in BioRender. Right: heatmap shows the cell type distribution of VLMCs, astrocytes, L2/3, L4/5, L6, and oligodendrocytes, with color reflecting the distribution of each cell type.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/91037e3bad7db66cf2532d1f.png"},{"id":80714112,"identity":"2ca414db-12ac-425f-9deb-ddf78104a102","added_by":"auto","created_at":"2025-04-16 09:30:06","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":995144,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eImage-to-transcriptomics retrieval.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea, \u003c/strong\u003eSchematic illustration of Image-to-transcriptomics retrieval on the ST-bank dataset.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb, \u003c/strong\u003eExample image-to-transcriptomics retrieval results. For each example image from adipose, colorectal adenocarcinoma epithelium, lymphocytes, smooth muscle, and normal colon mucosa, the retrieved top 50 most similar transcriptomics are shown by the paired image from the ST-bank dataset.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec, \u003c/strong\u003eImage-to-transcriptomics retrieval similarity scores across the four validation datasets: CRC7K, WSSS4LUAD, LC25000, and PatchCamelyon using Loki, OpenAI CLIP, and PLIP respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed,\u003c/strong\u003eImage-to-transcriptomics retrieval similarity scores across the 8 in-house patient tissues: heart failure (HF), Alzheimer's disease (AD), metaplastic breast cancer (MPBC), and triple-negative breast cancer (TNBC), using Loki, OpenAI CLIP, and PLIP respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ee, \u003c/strong\u003eImage-to-transcriptomics retrieval evaluation across four validation datasets and one test dataset using Loki, OpenAI CLIP, and PLIP, respectively, with random baseline. The top-K quantile most similar transcriptomics were retrieved. We report Recall@K for K ∈ {5%, 10%} (Methods).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ef, \u003c/strong\u003eExample image-to-transcriptomics retrieval results. The retrieved transcriptomics are shown by the paired image.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/ed26105a70bf52e37d9b9de8.png"},{"id":83651238,"identity":"0171b1c2-15dd-40cd-9532-5a891e60468e","added_by":"auto","created_at":"2025-05-30 07:11:08","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2082804,"visible":true,"origin":"","legend":"","description":"","filename":"Avisualomicsfoundationmodeltobridgehistopathologyimagewithtranscriptomics.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1_covered_9ce1235e-1bc1-4449-aa9a-a5e44084e097.pdf"},{"id":80714335,"identity":"75ecd897-431e-4b53-8478-2818a17093c1","added_by":"auto","created_at":"2025-04-16 09:38:06","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":47411,"visible":true,"origin":"","legend":"Checklist","description":"","filename":"NMETHA58071AWangGuidanceauthors17430077791.docx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/7b44c82250ce8d4b224c8098.docx"},{"id":80713098,"identity":"99752be3-861f-4a10-8720-171e0493adcd","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":59659,"visible":true,"origin":"","legend":"Supplementary table","description":"","filename":"SupplementaryTable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/206548fe6c4992bd3effad8c.xlsx"},{"id":80714338,"identity":"f78d9a88-1163-4c35-b3ac-186038839593","added_by":"auto","created_at":"2025-04-16 09:38:06","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1666819,"visible":true,"origin":"","legend":"Editorial Policy checklist","description":"","filename":"nreditorialpolicychecklist.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/a76ef8ba48fd92dd7efa9d69.pdf"},{"id":80713103,"identity":"128b2498-705e-487f-a52b-779e53edb222","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"pdf","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1647563,"visible":true,"origin":"","legend":"\u003cp\u003eReporting Summary\u003c/p\u003e","description":"","filename":"nrreportingsummary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/a6130f0fd01817d76068556e.pdf"},{"id":80714131,"identity":"4b874753-b444-40fd-89b6-8018c514fb14","added_by":"auto","created_at":"2025-04-16 09:30:07","extension":"pdf","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":15845162,"visible":true,"origin":"","legend":"\u003cp\u003eSupplymentary information File\u003c/p\u003e","description":"","filename":"SupplymentaryInformationFile.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/36254209f5c4d0063aac3d4e.pdf"},{"id":80714114,"identity":"e61d68b2-4c30-4773-9d06-f05f0775b7b6","added_by":"auto","created_at":"2025-04-16 09:30:06","extension":"docx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":55977,"visible":true,"origin":"","legend":"\u003cp\u003eInventroy of supporting information\u003c/p\u003e","description":"","filename":"InventoryofSupportingInformation.docx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/e639c7e03aceaed413b8bf72.docx"},{"id":80713110,"identity":"442129db-7013-4759-a5cf-a9a04c4482f0","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"pdf","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":231726,"visible":true,"origin":"","legend":"\u003cp\u003ePublication License of BioArt\u003c/p\u003e","description":"","filename":"PublicationLicenseApr022025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/3b494e138ac38892d15c50e0.pdf"},{"id":80714133,"identity":"b4a16aa5-f575-4b70-a02f-77e353c19abe","added_by":"auto","created_at":"2025-04-16 09:30:09","extension":"pptx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":122380200,"visible":true,"origin":"","legend":"\u003cp\u003eAll extended data figures\u003c/p\u003e","description":"","filename":"ExtendedDataFigures.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/1dce46b0915a3be8ccacf85e.pptx"},{"id":80714110,"identity":"9d11cd68-6d93-48b0-bf0d-a5f79d6e6dfe","added_by":"auto","created_at":"2025-04-16 09:30:06","extension":"docx","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":49885,"visible":true,"origin":"","legend":"\u003cp\u003eInventroy of supporting information\u003c/p\u003e","description":"","filename":"687292supp749614sb5hbp.docx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/ec80d08ea38f5c0d99fc19b4.docx"},{"id":80713114,"identity":"13a4eda3-deb1-4258-aa6f-32360c7b3886","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"docx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":58216,"visible":true,"origin":"","legend":"\u003cp\u003ethird party rights table\u003c/p\u003e","description":"","filename":"thirdpartyrightstable.docx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/095324f1af9810f960e8061e.docx"},{"id":80713107,"identity":"1c55160e-2c06-4966-b835-03394cf479bb","added_by":"auto","created_at":"2025-04-16 09:22:06","extension":"docx","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":47301,"visible":true,"origin":"","legend":"\u003cp\u003eChecklist\u003c/p\u003e","description":"","filename":"NMETHA58071AWangGuidanceauthors17430077791.docx","url":"https://assets-eu.researchsquare.com/files/rs-5183775/v1/013775c28b819a0445a0996e.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"A visual–omics foundation model to bridge histopathology image with transcriptomics","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5183775/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5183775/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Artificial intelligence has revolutionized computational biology. Recent developments in omics technologies, including single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST), provide detailed genomic data alongside tissue histology. However, current computational models focus on either omics or image analysis, lacking their integration. To address this, we developed OmiCLIP, a visual-omics foundation model linking hematoxylin and eosin (H\u0026E) images and transcriptomics using tissue patches from Visium data. We transformed transcriptomic data into “sentences” by concatenating top-expressed gene symbols from each patch. We curated a dataset of 2.2 million paired tissue images and transcriptomic data across 32 organs to train OmiCLIP integrating histology and transcriptomics. Building on OmiCLIP, our Loki platform offers five key functions: tissue alignment, annotation via bulk RNA-seq or marker genes, cell type decomposition, image–transcriptomics retrieval, and ST gene expression prediction from H\u0026E images. Compared with 22 state-of-the-art models on 5 simulations, 19 public, and 4 in-house experimental datasets, Loki demonstrated consistent accuracy and robustness.","manuscriptTitle":"A visual–omics foundation model to bridge histopathology image with transcriptomics","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-16 09:22:01","doi":"10.21203/rs.3.rs-5183775/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-methods","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"nmeth","sideBox":"Learn more about [Nature Methods](http://www.nature.com/nmeth)","snPcode":"","submissionUrl":"","title":"Nature Methods","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"75f594a1-cb68-497a-b04a-78a5fd2d781a","owner":[],"postedDate":"April 16th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":46994453,"name":"Biological sciences/Computational biology and bioinformatics/Computational models"},{"id":46994454,"name":"Biological sciences/Computational biology and bioinformatics/Software"},{"id":46994455,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":46994456,"name":"Biological sciences/Computational biology and bioinformatics/Computational platforms and environments"}],"tags":[],"updatedAt":"2025-05-30T07:10:57+00:00","versionOfRecord":{"articleIdentity":"rs-5183775","link":"https://doi.org/10.1038/s41592-025-02707-1","journal":{"identity":"nature-methods","isVorOnly":false,"title":"Nature Methods"},"publishedOn":"2025-05-29 04:00:00","publishedOnDateReadable":"May 29th, 2025"},"versionCreatedAt":"2025-04-16 09:22:01","video":"","vorDoi":"10.1038/s41592-025-02707-1","vorDoiUrl":"https://doi.org/10.1038/s41592-025-02707-1","workflowStages":[]},"version":"v1","identity":"rs-5183775","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5183775","identity":"rs-5183775","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00