{"paper_id":"14c3d800-1c8c-4f5c-a71a-537e5a9367cb","body_text":"Overcoming Data Heterogeneity in Breast Ultrasound: A ResNet50V2-Based Solution with Enhanced Class Balancing | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Overcoming Data Heterogeneity in Breast Ultrasound: A ResNet50V2-Based Solution with Enhanced Class Balancing Marwen SAKLI, Bassem BEN SALAH, Chaker ESSID, Rayhane HAMMEMI, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7855980/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract In breast cancer diagnosis, the accurate and efficient analysis of ultrasound images is essential, particularly for distinguishing benign from malignant tumors. Ultrasound imaging is a widely used, non-invasive diagnostic tool that provides valuable insights into tumor characteristics. This study presents a deep learning approach for breast cancer classification using transfer learning (TL) models across three distinct datasets: BUSI, BUSI-UCLM, and GDPH&SYSUCC. A significant challenge addressed was the inherent class imbalance within these datasets. To mitigate this, a comprehensive evaluation of various data sampling techniques was conducted, with Random Over Sampling (ROS) emerging as the most effective method for balancing the data. Among multiple TL models assessed, the ResNet50V2 architecture consistently demonstrated superior performance across all metrics. When trained and validated on the individual 1 datasets, the ResNet50V2 model achieved an accuracy of 95.44%, an F1-score of 95.25%, and an AUC of 99.21% on the BUSI dataset; 97.22% accuracy, 97.32% F1-score, and 99.90% AUC on the BUSI-UCLM dataset; and 95.72% accuracy, 95.72% F1-score, and 98.37% AUC on the GDPH&SYSUCC dataset. Following this individual evaluation, a combined dataset was created, which consisted of 3971 images distributed across benign (1497), malignant (1819), and normal (247) classes. On this combined, more challenging dataset, the ROS-enhanced ResNet50V2 model maintained its strong performance, achieving a final accuracy of 93.96%, an F1-score of 93.99%, and an AUC of 98.70%. These results highlight the efficacy of using ROS to address class imbalance and the robustness of ResNet50V2 as a transfer learning backbone for breast cancer classification across heterogeneous ultrasound datasets. The findings underscore the potential of this approach to enhance diagnostic accuracy and support clinical decision-making. Breast Cancer Classification Ultrasound Imaging ResNet50V2 Random Oversampling (ROS) Transfer Learning Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {\"props\":{\"pageProps\":{\"initialData\":{\"identity\":\"rs-7855980\",\"acceptedTermsAndConditions\":true,\"allowDirectSubmit\":true,\"archivedVersions\":[],\"articleType\":\"Research Article\",\"associatedPublications\":[],\"authors\":[{\"id\":543526777,\"identity\":\"a4fe8e14-8983-4d66-9268-ff0367209758\",\"order_by\":0,\"name\":\"Marwen SAKLI\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"SERCOM Laboratory\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Marwen\",\"middleName\":\"\",\"lastName\":\"SAKLI\",\"suffix\":\"\"},{\"id\":543526779,\"identity\":\"ae8498ba-e178-425e-b3d6-d7e378d35e52\",\"order_by\":1,\"name\":\"Bassem BEN SALAH\",\"email\":\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABHklEQVRIie2RMUvDQBSA31FIFyXrhUrvLyQU0hb9MS/TLWkRunRwqBTiop0j9UfoktWDA12CrhkclANdOkQCooPiXe3WFOvmcN/w7uPg4x4cgMXyD/FXJy7nI4i287cEQXSWifi5JZNtkmjyW9Jt3qryNYah2zp5KvHogc+0VCVI5s5FUx2uJ/3TOPTOMxh5F3mH4s3zINnTIkAG6R2SaVqzmIidxm4G0WURA0VHDhKqRScIOZDpTk1y/6KqT51cF1y945fkDtViErYpKdBvEfMKxZBGiUTHiEn8DUk/XYTeWUZHtIjDXjSTgV4s7OU+D65ycjyvSbouV+VHdjB0U66K8k0yZmQ83mftvCGqmmQFXVvYjNpvsVgsFssWfAPm3W5BAiuuBAAAAABJRU5ErkJggg==\",\"orcid\":\"\",\"institution\":\"DGIM, National Institute of Applied Science and Technology, University of Carthage, Centre Urbain Nord\",\"correspondingAuthor\":true,\"prefix\":\"\",\"firstName\":\"Bassem\",\"middleName\":\"BEN\",\"lastName\":\"SALAH\",\"suffix\":\"\"},{\"id\":543526780,\"identity\":\"bafe6179-e4e6-4c0c-a920-2647637766dd\",\"order_by\":2,\"name\":\"Chaker ESSID\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"University of Tunis El Manar\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Chaker\",\"middleName\":\"\",\"lastName\":\"ESSID\",\"suffix\":\"\"},{\"id\":543526782,\"identity\":\"726353d6-395b-4cd5-98ef-a68572bfcbb3\",\"order_by\":3,\"name\":\"Rayhane HAMMEMI\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"University of Tunis El Manar\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Rayhane\",\"middleName\":\"\",\"lastName\":\"HAMMEMI\",\"suffix\":\"\"},{\"id\":543526783,\"identity\":\"cf16ad3e-a178-4659-8b91-e93cb61ca0ad\",\"order_by\":4,\"name\":\"Hedi SAKLI\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"EITA Consulting\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Hedi\",\"middleName\":\"\",\"lastName\":\"SAKLI\",\"suffix\":\"\"}],\"badges\":[],\"createdAt\":\"2025-10-14 08:23:24\",\"currentVersionCode\":1,\"declarations\":\"\",\"doi\":\"10.21203/rs.3.rs-7855980/v1\",\"doiUrl\":\"https://doi.org/10.21203/rs.3.rs-7855980/v1\",\"draftVersion\":[],\"editorialEvents\":[],\"editorialNote\":\"\",\"failedWorkflow\":false,\"files\":[{\"id\":95842970,\"identity\":\"f67ce354-ed95-496f-8b11-2f557c229ada\",\"added_by\":\"auto\",\"created_at\":\"2025-11-13 14:35:08\",\"extension\":\"pdf\",\"order_by\":0,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"acdc-reference\",\"size\":763334,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"PaperOct2025.pdf\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-7855980/v1/441bf13d9bc729bbb7971b39.pdf\"},{\"id\":95842969,\"identity\":\"24c16cbe-01c6-467f-baa6-61386d9add92\",\"added_by\":\"auto\",\"created_at\":\"2025-11-13 14:35:07\",\"extension\":\"json\",\"order_by\":1,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"acdc-reference\",\"size\":7375,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"045ec7d2ef544445ab7f700a53e52456.json\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-7855980/v1/6ad5dc491054778f33f9bdad.json\"},{\"id\":108644040,\"identity\":\"514e39a3-53dd-46f3-bc9e-1db6dec3a228\",\"added_by\":\"auto\",\"created_at\":\"2026-05-06 21:25:34\",\"extension\":\"pdf\",\"order_by\":1,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"manuscript-pdf\",\"size\":505939,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"PaperOct2025.pdf\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-7855980/v1_covered_e479c394-51ba-4c16-b6c5-9347b21ade17.pdf\"}],\"financialInterests\":\"No competing interests reported.\",\"formattedTitle\":\"Overcoming Data Heterogeneity in Breast Ultrasound: A ResNet50V2-Based Solution with Enhanced Class Balancing\",\"fulltext\":[],\"fulltextSource\":\"\",\"fullText\":\"\",\"funders\":[],\"hasAdminPriorityOnWorkflow\":false,\"hasManuscriptDocX\":false,\"hasOptedInToPreprint\":true,\"hasPassedJournalQc\":\"\",\"hasAnyPriority\":false,\"hideJournal\":true,\"highlight\":\"\",\"institution\":\"\",\"isAcceptedByJournal\":false,\"isAuthorSuppliedPdf\":true,\"isDeskRejected\":\"\",\"isHiddenFromSearch\":false,\"isInQc\":false,\"isInWorkflow\":false,\"isPdf\":true,\"isPdfUpToDate\":true,\"isWithdrawnOrRetracted\":false,\"journal\":{\"display\":true,\"email\":\"info@researchsquare.com\",\"identity\":\"researchsquare\",\"isNatureJournal\":false,\"hasQc\":true,\"allowDirectSubmit\":true,\"externalIdentity\":\"\",\"sideBox\":\"\",\"snPcode\":\"\",\"submissionUrl\":\"/submission\",\"title\":\"Research Square\",\"twitterHandle\":\"researchsquare\",\"acdcEnabled\":true,\"dfaEnabled\":false,\"editorialSystem\":\"\",\"reportingPortfolio\":\"\",\"inReviewEnabled\":false,\"inReviewRevisionsEnabled\":true},\"keywords\":\"Breast Cancer Classification, Ultrasound Imaging, ResNet50V2, Random Oversampling (ROS), Transfer Learning\",\"lastPublishedDoi\":\"10.21203/rs.3.rs-7855980/v1\",\"lastPublishedDoiUrl\":\"https://doi.org/10.21203/rs.3.rs-7855980/v1\",\"license\":{\"name\":\"CC BY 4.0\",\"url\":\"https://creativecommons.org/licenses/by/4.0/\"},\"manuscriptAbstract\":\"In breast cancer diagnosis, the accurate and efficient analysis of ultrasound images is essential, particularly for distinguishing benign from malignant tumors. Ultrasound imaging is a widely used, non-invasive diagnostic tool that provides valuable insights into tumor characteristics. This study presents a deep learning approach for breast cancer classification using transfer learning (TL) models across three distinct datasets: BUSI, BUSI-UCLM, and GDPH\\u0026SYSUCC. A significant challenge addressed was the inherent class imbalance within these datasets. To mitigate this, a comprehensive evaluation of various data sampling techniques was conducted, with Random Over Sampling (ROS) emerging as the most effective method for balancing the data. Among multiple TL models assessed, the ResNet50V2 architecture consistently demonstrated superior performance across all metrics. When trained and validated on the individual 1 datasets, the ResNet50V2 model achieved an accuracy of 95.44%, an F1-score of 95.25%, and an AUC of 99.21% on the BUSI dataset; 97.22% accuracy, 97.32% F1-score, and 99.90% AUC on the BUSI-UCLM dataset; and 95.72% accuracy, 95.72% F1-score, and 98.37% AUC on the GDPH\\u0026SYSUCC dataset. Following this individual evaluation, a combined dataset was created, which consisted of 3971 images distributed across benign (1497), malignant (1819), and normal (247) classes. On this combined, more challenging dataset, the ROS-enhanced ResNet50V2 model maintained its strong performance, achieving a final accuracy of 93.96%, an F1-score of 93.99%, and an AUC of 98.70%. These results highlight the efficacy of using ROS to address class imbalance and the robustness of ResNet50V2 as a transfer learning backbone for breast cancer classification across heterogeneous ultrasound datasets. The findings underscore the potential of this approach to enhance diagnostic accuracy and support clinical decision-making.\",\"manuscriptTitle\":\"Overcoming Data Heterogeneity in Breast Ultrasound: A ResNet50V2-Based Solution with Enhanced Class Balancing\",\"msid\":\"\",\"msnumber\":\"\",\"nonDraftVersions\":[{\"code\":1,\"date\":\"2025-11-13 14:35:03\",\"doi\":\"10.21203/rs.3.rs-7855980/v1\",\"editorialEvents\":[{\"type\":\"communityComments\",\"content\":0}],\"status\":\"published\",\"journal\":{\"display\":true,\"email\":\"info@researchsquare.com\",\"identity\":\"researchsquare\",\"isNatureJournal\":false,\"hasQc\":true,\"allowDirectSubmit\":true,\"externalIdentity\":\"\",\"sideBox\":\"\",\"snPcode\":\"\",\"submissionUrl\":\"/submission\",\"title\":\"Research Square\",\"twitterHandle\":\"researchsquare\",\"acdcEnabled\":true,\"dfaEnabled\":false,\"editorialSystem\":\"\",\"reportingPortfolio\":\"\",\"inReviewEnabled\":false,\"inReviewRevisionsEnabled\":true}}],\"origin\":\"\",\"ownerIdentity\":\"3eec92f3-7458-4b62-9d5f-188cf800b454\",\"owner\":[],\"postedDate\":\"November 13th, 2025\",\"published\":true,\"recentEditorialEvents\":[{\"type\":\"decision\",\"content\":\"Rejected\",\"date\":\"2026-05-06T21:13:30+00:00\",\"index\":\"\",\"fulltext\":\"\"},{\"type\":\"editorInvitedReview\",\"content\":\"\",\"date\":\"2026-05-06T08:23:55+00:00\",\"index\":43,\"fulltext\":\"\"}],\"rejectedJournal\":[],\"revision\":\"\",\"amendment\":\"\",\"status\":\"posted\",\"subjectAreas\":[],\"tags\":[],\"updatedAt\":\"2026-05-06T21:24:53+00:00\",\"versionOfRecord\":[],\"versionCreatedAt\":\"2025-11-13 14:35:03\",\"video\":\"\",\"vorDoi\":\"\",\"vorDoiUrl\":\"\",\"workflowStages\":[]},\"version\":\"v1\",\"identity\":\"rs-7855980\",\"journalConfig\":\"researchsquare\"},\"__N_SSP\":true},\"page\":\"/article/[identity]/[[...version]]\",\"query\":{\"redirect\":\"/article/rs-7855980\",\"identity\":\"rs-7855980\",\"version\":[\"v1\"]},\"buildId\":\"8U1c8b4HqxoKbykW_rLl7\",\"isFallback\":false,\"isExperimentalCompile\":false,\"dynamicIds\":[84888],\"gssp\":true,\"scriptLoader\":[]}","source_license":"CC-BY-4.0","license_restricted":false}