Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment

preprint OA: closed
Full text JSON View at publisher
Full text 15,462 characters · extracted from preprint-html · click to expand
Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment Hongyi Wang, Zhengjie Zhu, Jiabo Ma, Fang Wang, Yue Shi, Bo Luo, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8222041/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The rapid digitization of histopathology slides has opened up new possibilities for computational tools in clinical and research workflows. Among these, content-based slide retrieval stands out, enabling pathologists to identify morphologically and semantically similar cases, thereby supporting precise diagnoses, enhancing consistency across observers, and assisting example-based education. However, effective retrieval of whole slide images (WSIs) remains challenging due to their gigapixel scale and the difficulty of capturing subtle semantic differences amid abundant irrelevant content. To overcome these challenges, we present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning. Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both fine-grained morphological cues and high-level semantic patterns to enable accurate and flexible retrieval. The framework supports two key functionalities: (1) mosaic-based image-to-image retrieval, ensuring accurate and efficient slide research; and (2) multi-modal retrieval, where text queries can directly retrieve relevant slides. PathSearch was rigorously evaluated on four public pathology datasets and three in-house cohorts, covering tasks including anatomical site retrieval, tumor subtyping, tumor vs. non-tumor discrimination, and grading across diverse organs such as breast, lung, kidney, liver, and stomach. External results show that PathSearch outperforms traditional image-to-image retrieval frameworks by up to 10.9% Top-1 accuracy on subtyping tasks and 7.4% Top-1 accuracy on grading tasks, while surpassing multimodal foundation models by an average of 20% on multi-modal retrieval benchmarks. A multi-center reader study further demonstrates that PathSearch improves diagnostic accuracy, boosts confidence, and enhances inter-observer agreement among pathologists in real clinical scenarios. These results establish PathSearch as a scalable and generalizable retrieval solution for digital pathology. Beyond advancing retrieval accuracy, it strengthens digital pathology infrastructure by facilitating clinical decision support, enabling intuitive archive exploration, and enriching educational platforms for medical trainees. Biological sciences/Cancer Biological sciences/Computational biology and bioinformatics Health sciences/Health care Physical sciences/Mathematics and computing Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8222041","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":558031204,"identity":"d87872c3-a64f-446f-b46b-a5b334e87526","order_by":0,"name":"Hongyi Wang","email":"","orcid":"","institution":"Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Hongyi","middleName":"","lastName":"Wang","suffix":""},{"id":558031214,"identity":"6080bf69-8fdd-4116-9461-e837ba211e27","order_by":1,"name":"Zhengjie Zhu","email":"","orcid":"","institution":"Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Zhengjie","middleName":"","lastName":"Zhu","suffix":""},{"id":558031216,"identity":"cb75fc6e-5afd-46ca-9f48-1ebd7afd8277","order_by":2,"name":"Jiabo Ma","email":"","orcid":"","institution":"Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Jiabo","middleName":"","lastName":"Ma","suffix":""},{"id":558031218,"identity":"e3745cae-c161-4f82-93a7-f70bb0701760","order_by":3,"name":"Fang Wang","email":"","orcid":"","institution":"Union Hospital","correspondingAuthor":false,"prefix":"","firstName":"Fang","middleName":"","lastName":"Wang","suffix":""},{"id":558031220,"identity":"181f7303-ea8d-4214-a6dd-0d3248b6412e","order_by":4,"name":"Yue Shi","email":"","orcid":"","institution":"Sir Run Run Shaw Hospital","correspondingAuthor":false,"prefix":"","firstName":"Yue","middleName":"","lastName":"Shi","suffix":""},{"id":558031222,"identity":"42fae419-096a-46a9-828e-8b02a62e04da","order_by":5,"name":"Bo Luo","email":"","orcid":"","institution":"Central Hospital of Wuhan","correspondingAuthor":false,"prefix":"","firstName":"Bo","middleName":"","lastName":"Luo","suffix":""},{"id":558031223,"identity":"83617b5d-cfac-4f9a-8eea-197ab0616e72","order_by":6,"name":"Jili Wang","email":"","orcid":"","institution":"First Affiliated Hospital Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Jili","middleName":"","lastName":"Wang","suffix":""},{"id":558031225,"identity":"4a4e69d0-50cc-4e4a-bfcf-28df6aaf88f2","order_by":7,"name":"Qiuyu Cai","email":"","orcid":"","institution":"First Affiliated Hospital Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Qiuyu","middleName":"","lastName":"Cai","suffix":""},{"id":558031229,"identity":"a52a2104-b370-49bd-abaa-bb1fa8dc848e","order_by":8,"name":"Xiuming Zhang","email":"","orcid":"","institution":"First Affiliated Hospital Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Xiuming","middleName":"","lastName":"Zhang","suffix":""},{"id":558031230,"identity":"b32e6117-bee5-4eb2-bb91-014169cb02d2","order_by":9,"name":"Yen-Wei Chen","email":"","orcid":"","institution":"Ritsumeikan University","correspondingAuthor":false,"prefix":"","firstName":"Yen-Wei","middleName":"","lastName":"Chen","suffix":""},{"id":558031232,"identity":"219a9d27-fcf6-466b-8b6d-39bef79c7231","order_by":10,"name":"Lanfen Lin","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Lanfen","middleName":"","lastName":"Lin","suffix":""},{"id":558031234,"identity":"7e58f0a2-bc0f-4b09-8dc7-3ebcffa08b89","order_by":11,"name":"Hao Chen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsUlEQVRIiWNgGAWjYBACxhkMjA9gHGaGA8RpYTaAaSBOC4MEA5sEaVqYZzc/q/jxx4bB4Pz5A8wFZ4hx2JxjZjd729IYDG4kMzDPuEGMlhk5bLcZGw4DtQAdxvOBSC3FDH/+Ax12mAQtzAxsBxgMDgAdxkOcw9KMJXvbknkkbyQbHOYhxvuGM5Iffvjxx06O7/zBh495jhGjpQFC84CIA0RoYGCQJ0rVKBgFo2AUjGwAAITwNJZ5nBIOAAAAAElFTkSuQmCC","orcid":"","institution":"Hong Kong University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Hao","middleName":"","lastName":"Chen","suffix":""}],"badges":[],"createdAt":"2025-11-27 12:38:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8222041/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8222041/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97953334,"identity":"1a76668e-0311-454f-813e-f54ea1847485","added_by":"auto","created_at":"2025-12-11 07:30:38","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":14735,"visible":true,"origin":"","legend":"","description":"","filename":"fefe049e53b14220a39eb43aeecdb349.json","url":"https://assets-eu.researchsquare.com/files/rs-8222041/v1/4a4055f9725b4a1ae63667a7.json"},{"id":99683572,"identity":"9d355ebe-23a5-44c3-83b7-b6ead083dd09","added_by":"auto","created_at":"2026-01-07 09:09:29","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3653760,"visible":true,"origin":"","legend":"","description":"","filename":"WangPathSearchSlideRetrieval2025fornpjDM.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8222041/v1_covered_14dd7d82-f6ff-4b8e-9572-0321619db77b.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8222041/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8222041/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"The rapid digitization of histopathology slides has opened up new possibilities for computational tools in clinical and research workflows. Among these, content-based slide retrieval stands out, enabling pathologists to identify morphologically and semantically similar cases, thereby supporting precise diagnoses, enhancing consistency across observers, and assisting example-based education. However, effective retrieval of whole slide images (WSIs) remains challenging due to their gigapixel scale and the difficulty of capturing subtle semantic differences amid abundant irrelevant content. To overcome these challenges, we present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning. Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both fine-grained morphological cues and high-level semantic patterns to enable accurate and flexible retrieval. The framework supports two key functionalities: (1) mosaic-based image-to-image retrieval, ensuring accurate and efficient slide research; and (2) multi-modal retrieval, where text queries can directly retrieve relevant slides. PathSearch was rigorously evaluated on four public pathology datasets and three in-house cohorts, covering tasks including anatomical site retrieval, tumor subtyping, tumor vs. non-tumor discrimination, and grading across diverse organs such as breast, lung, kidney, liver, and stomach. External results show that PathSearch outperforms traditional image-to-image retrieval frameworks by up to 10.9% Top-1 accuracy on subtyping tasks and 7.4% Top-1 accuracy on grading tasks, while surpassing multimodal foundation models by an average of 20% on multi-modal retrieval benchmarks. A multi-center reader study further demonstrates that PathSearch improves diagnostic accuracy, boosts confidence, and enhances inter-observer agreement among pathologists in real clinical scenarios. These results establish PathSearch as a scalable and generalizable retrieval solution for digital pathology. Beyond advancing retrieval accuracy, it strengthens digital pathology infrastructure by facilitating clinical decision support, enabling intuitive archive exploration, and enriching educational platforms for medical trainees.","manuscriptTitle":"Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-11 07:30:29","doi":"10.21203/rs.3.rs-8222041/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0990d2af-72ef-4eec-92f5-2e9c408f0b90","owner":[],"postedDate":"December 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":59397118,"name":"Biological sciences/Cancer"},{"id":59397119,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":59397120,"name":"Health sciences/Health care"},{"id":59397121,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-01-26T12:36:40+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-11 07:30:29","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8222041","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8222041","identity":"rs-8222041","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00