A Versatile Foundation Model for AI-enabled Mammogram Interpretation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A Versatile Foundation Model for AI-enabled Mammogram Interpretation Hao Chen, Fuxiang Huang, Jiayi Zhu, Yunfang Yu, Yu Xie, Yuan Guo, and 17 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7765691/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-related mortality in women globally. Mammography is essential for the early detection and diagnosis of breast lesions. Despite recent progress in foundation models (FMs) for mammogram analysis, their clinical translation remains constrained by several fundamental limitations, including insufficient diversity in training data, limited model generalizability, and a lack of comprehensive evaluation across clinically relevant tasks. Here, we introduce VersaMammo, a versatile foundation model for mammograms, designed to overcome these limitations. We curated the largest multi-institutional mammogram dataset to date, comprising 706,239 images from 21 sources. To improve generalization, we propose a two-stage pre-training strategy to develop VersaMammo, a mammogram foundation model. First, a teacher model is trained via self-supervised learning to extract transferable features from unlabeled mammograms. Then, supervised learning combined with knowledge distillation transfers both features and clinical knowledge into VersaMammo. To ensure a comprehensive evaluation, we established a benchmark comprising 92 specific tasks, including 68 internal tasks and 24 external validation tasks, spanning 5 major clinical task categories: lesion detection, segmentation, classification, image retrieval, and visual question answering. VersaMammo achieves state-of-the-art performance, ranking first in 50 out of 68 specific internal tasks and 20 out of 24 external validation tasks, with average ranks of 1.5 and 1.2, respectively. These results demonstrate its superior generalization and clinical utility, offering a substantial advancement toward reliable and scalable breast cancer screening and diagnosis. Health sciences/Diseases/Cancer/Breast cancer Health sciences/Health care/Medical imaging/Radiography Artificial Intelligence Breast Cancer Mammogram Foundation Model Knowledge Distillation Generalizability Multi-institutional Dataset Full Text Additional Declarations There is NO Competing Interest. Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7765691","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":526193853,"identity":"e8c2fd71-ad9c-4ccc-a173-666e8a7beb30","order_by":0,"name":"Hao Chen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsUlEQVRIiWNgGAWjYBACxhkMjA9gHGaGA8RpYTaAaSBOC4MEA5sEaVqYZzc/q/jxx4bB4Pz5A8wFZ4hx2JxjZjd729IYDG4kMzDPuEGMlhk5bLcZGw4DtQAdxvOBSC3FDH/+Ax12mAQtzAxsBxgMDgAdxkOcw9KMJXvbknkkbyQbHOYhxvuGM5Iffvjxx06O7/zBh495jhGjpQFC84CIA0RoYGCQJ0rVKBgFo2AUjGwAAITwNJZ5nBIOAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-8400-3780","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Hao","middleName":"","lastName":"Chen","suffix":""},{"id":526193854,"identity":"4b2962e2-5740-4c45-a48d-b4963d809e62","order_by":1,"name":"Fuxiang Huang","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Fuxiang","middleName":"","lastName":"Huang","suffix":""},{"id":526193855,"identity":"bc06cecd-49f1-40f0-b4a7-0efd7650c612","order_by":2,"name":"Jiayi Zhu","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology (Guangzhou)","correspondingAuthor":false,"prefix":"","firstName":"Jiayi","middleName":"","lastName":"Zhu","suffix":""},{"id":526193856,"identity":"4dcf4326-0fbf-44d6-96c7-97f652ff6dbf","order_by":3,"name":"Yunfang Yu","email":"","orcid":"https://orcid.org/0000-0003-2579-6220","institution":"Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation","correspondingAuthor":false,"prefix":"","firstName":"Yunfang","middleName":"","lastName":"Yu","suffix":""},{"id":526193857,"identity":"642ae61d-38f1-4cfe-b6f3-ddb8c0c4d752","order_by":4,"name":"Yu Xie","email":"","orcid":"","institution":"The Third Affiliated Hospital of Kunming Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Xie","suffix":""},{"id":526193858,"identity":"8203a774-d2f4-4980-a21e-ce518287f016","order_by":5,"name":"Yuan Guo","email":"","orcid":"","institution":"Guangzhou First People’s Hospital, South China University of Technology","correspondingAuthor":false,"prefix":"","firstName":"Yuan","middleName":"","lastName":"Guo","suffix":""},{"id":526193859,"identity":"ff19864a-a7e4-4988-8e45-1d4f75aeba86","order_by":6,"name":"Qingcong Kong","email":"","orcid":"","institution":"The Third Affiliated Hospital, Sun Yat-Sen University","correspondingAuthor":false,"prefix":"","firstName":"Qingcong","middleName":"","lastName":"Kong","suffix":""},{"id":526193860,"identity":"9e2a6007-d1a8-494e-b7f2-94ddea01aa8e","order_by":7,"name":"MingXiang Wu","email":"","orcid":"","institution":"Shenzhen People’s Hospital","correspondingAuthor":false,"prefix":"","firstName":"MingXiang","middleName":"","lastName":"Wu","suffix":""},{"id":526193861,"identity":"a2e454da-4294-474e-8015-ae8c5750de1e","order_by":8,"name":"Xinrui Jiang","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Xinrui","middleName":"","lastName":"Jiang","suffix":""},{"id":526193862,"identity":"d8be03db-af1d-40fd-bc07-43011d5ae7e5","order_by":9,"name":"Shu Yang","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Shu","middleName":"","lastName":"Yang","suffix":""},{"id":526193863,"identity":"c5baf6c4-9a48-475c-86ac-c7c79ae1ed50","order_by":10,"name":"Jiabo MA","email":"","orcid":"https://orcid.org/0000-0001-8532-4466","institution":"Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Jiabo","middleName":"","lastName":"MA","suffix":""},{"id":526193864,"identity":"39b8d1cd-c3d1-4a15-b8aa-a938a0bdfa77","order_by":11,"name":"Ziyi LIU","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Ziyi","middleName":"","lastName":"LIU","suffix":""},{"id":526193865,"identity":"8be57232-ba64-44ac-994e-2a648fa47672","order_by":12,"name":"Zhe Xu","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Zhe","middleName":"","lastName":"Xu","suffix":""},{"id":526193866,"identity":"3a9f1094-8f26-46ae-b959-b7fa05fb888c","order_by":13,"name":"Zhixuan Chen","email":"","orcid":"https://orcid.org/0000-0001-8767-7177","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Zhixuan","middleName":"","lastName":"Chen","suffix":""},{"id":526193867,"identity":"a1f1739c-51c3-4c7c-904e-73a73c2af224","order_by":14,"name":"Yujie Tan","email":"","orcid":"","institution":"Sun Yat-sen Memorial Hospital of Sun Yat-sen University","correspondingAuthor":false,"prefix":"","firstName":"Yujie","middleName":"","lastName":"Tan","suffix":""},{"id":526193868,"identity":"8a7c48c3-2e9f-48c1-a0ec-f224f07b5308","order_by":15,"name":"Zifan He","email":"","orcid":"","institution":"Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation","correspondingAuthor":false,"prefix":"","firstName":"Zifan","middleName":"","lastName":"He","suffix":""},{"id":526193869,"identity":"222e4842-1f32-4a6c-9ea5-42bb54f095cf","order_by":16,"name":"Luhui Mao","email":"","orcid":"","institution":"Sun Yat-sen Memorial Hospital, Sun Yat-sen University","correspondingAuthor":false,"prefix":"","firstName":"Luhui","middleName":"","lastName":"Mao","suffix":""},{"id":526193870,"identity":"dd5a9b95-f1fb-4097-ad40-48d1bb213fd9","order_by":17,"name":"Xi Wang","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Xi","middleName":"","lastName":"Wang","suffix":""},{"id":526193871,"identity":"c4ca436e-c0eb-4bc7-a684-3f87063a4af8","order_by":18,"name":"Junlin Hou","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Junlin","middleName":"","lastName":"Hou","suffix":""},{"id":526193872,"identity":"6a6c1034-48ce-45ae-8a81-f8227c41ca24","order_by":19,"name":"Lei Zhang","email":"","orcid":"","institution":"Chongqing University","correspondingAuthor":false,"prefix":"","firstName":"Lei","middleName":"","lastName":"Zhang","suffix":""},{"id":526193873,"identity":"9ef72911-6819-499a-9330-86489d99152a","order_by":20,"name":"Qiong Luo","email":"","orcid":"","institution":"The Hong Kong University of Science and Technology (Guangzhou)","correspondingAuthor":false,"prefix":"","firstName":"Qiong","middleName":"","lastName":"Luo","suffix":""},{"id":526193874,"identity":"8264785e-1572-4d6c-b69f-58b3a86f2eb1","order_by":21,"name":"Zhenhui Li","email":"","orcid":"https://orcid.org/0009-0009-3613-9403","institution":"The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Peking University Cancer Hospital Yunnan","correspondingAuthor":false,"prefix":"","firstName":"Zhenhui","middleName":"","lastName":"Li","suffix":""},{"id":526193875,"identity":"213a3dab-755a-47bd-8fbf-97f9b04c6eec","order_by":22,"name":"Herui Yao","email":"","orcid":"https://orcid.org/0000-0001-5520-6469","institution":"Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Department of Medical Oncology, Breast Tumor Center and Phase I Clinical Trial Center","correspondingAuthor":false,"prefix":"","firstName":"Herui","middleName":"","lastName":"Yao","suffix":""}],"badges":[],"createdAt":"2025-10-02 10:25:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7765691/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7765691/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93107421,"identity":"e3e62a94-55e6-42b3-a1c5-373d8c0c16de","added_by":"auto","created_at":"2025-10-09 06:55:02","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4072595,"visible":true,"origin":"","legend":"Article File","description":"","filename":"VersaMammo3.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7765691/v1_covered_c372115f-b9c0-4374-a42a-87a709a6ddc7.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"A Versatile Foundation Model for AI-enabled Mammogram Interpretation","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Artificial Intelligence, Breast Cancer, Mammogram, Foundation Model, Knowledge Distillation, Generalizability, Multi-institutional Dataset","lastPublishedDoi":"10.21203/rs.3.rs-7765691/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7765691/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-related mortality in women globally. Mammography is essential for the early detection and diagnosis of breast lesions. Despite recent progress in foundation models (FMs) for mammogram analysis, their clinical translation remains constrained by several fundamental limitations, including insufficient diversity in training data, limited model generalizability, and a lack of comprehensive evaluation across clinically relevant tasks.\r\nHere, we introduce VersaMammo, a versatile foundation model for mammograms, designed to overcome these limitations. We curated the largest multi-institutional mammogram dataset to date, comprising 706,239 images from 21 sources. To improve generalization, we propose a two-stage pre-training strategy to develop VersaMammo, a mammogram foundation model. First, a teacher model is trained via self-supervised learning to extract transferable features from unlabeled mammograms. Then, supervised learning combined with knowledge distillation transfers both features and clinical knowledge into VersaMammo. \r\nTo ensure a comprehensive evaluation, we established a benchmark comprising 92 specific tasks, including 68 internal tasks and 24 external validation tasks, spanning 5 major clinical task categories: lesion detection, segmentation, classification, image retrieval, and visual question answering. VersaMammo achieves state-of-the-art performance, ranking first in 50 out of 68 specific internal tasks and 20 out of 24 external validation tasks, with average ranks of 1.5 and 1.2, respectively. These results demonstrate its superior generalization and clinical utility, offering a substantial advancement toward reliable and scalable breast cancer screening and diagnosis.","manuscriptTitle":"A Versatile Foundation Model for AI-enabled Mammogram Interpretation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-09 06:46:52","doi":"10.21203/rs.3.rs-7765691/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-biomedical-engineering","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"natbiomedeng","sideBox":"Learn more about [Nature Biomedical Engineering](http://www.nature.com/natbiomedeng/)","snPcode":"41551","submissionUrl":"https://mts-natbiomedeng.nature.com/cgi-bin/main.plex","title":"Nature Biomedical Engineering","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"0990d2af-72ef-4eec-92f5-2e9c408f0b90","owner":[],"postedDate":"October 9th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":55927491,"name":"Health sciences/Diseases/Cancer/Breast cancer"},{"id":55927492,"name":"Health sciences/Health care/Medical imaging/Radiography"}],"tags":[],"updatedAt":"2026-04-11T11:40:23+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-09 06:46:52","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7765691","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7765691","identity":"rs-7765691","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.