Breast Mammary Gland Dataset (BMGD): DAPI-Stained Fluorescent Images for Nuclei Segmentation

doi:10.21203/rs.3.rs-8263420/v1

Breast Mammary Gland Dataset (BMGD): DAPI-Stained Fluorescent Images for Nuclei Segmentation

2026 · doi:10.21203/rs.3.rs-8263420/v1

preprint OA: closed

Full text JSON View at publisher

Full text 67,352 characters · extracted from preprint-html · click to expand

Breast Mammary Gland Dataset (BMGD): DAPI-Stained Fluorescent Images for Nuclei Segmentation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF data-descriptor Breast Mammary Gland Dataset (BMGD): DAPI-Stained Fluorescent Images for Nuclei Segmentation Zabina Tasneem, Jinwei Fan, Aishwarya Shrestha, Joy Zhao, Qingsu Cheng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8263420/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Accurate segmentation of nuclei images is essential for analyzing cellular responses to perturbation in both in vitro and in vivo experiments. Although traditional methods, including watershed, thresholding, clustering, morphological operations, and active contour models, have long been used in segmenting nuclei in digital images, these methods are labor-intensive and time-consuming. Therefore, current research has shifted to deep learning techniques for improved nuclei segmentation. However, training deep learning models requires high-quality annotated ground truth datasets, which are often scarce and not available for public use. In this study, we introduce the B reast M ammary G land D ataset (BMGD), an annotated collection of DAPI-stained nuclei images of mammary organoids. The dataset contains 819 image patches with more than 9,500 manually segmented nuclei cultured in various stiffness conditions. Each original image in the BMGD is paired with one carefully annotated ground truth mask. This dataset will enable researchers to develop and evaluate automated nuclei segmentation algorithms, particularly for studying cellular responses in breast cancer research and treatment. Figures Figure 1 Figure 2 Figure 3 Background & Summary Confocal fluorescent microscopy, a staple in life science research for its high-quality images, often requires the labor-intensive and error-prone task of manually analyzing each nucleus. This challenge underscores the growing need for an automatic approach to analyze biological images 1 . In computational biology, nuclei segmentation plays a fundamental role in analyzing morphological changes and quantifying molecular expressions 2 . These detailed biological data can be used for cell and tissue identification, cancer diagnosis, and therapeutic assessment 3 , 4 . Researchers can further map molecular activities, organelle information, cellular phenotypes, and multicellular structures linked to cellular migration, division, and tissue development under environmental stimuli 5 . Although nuclei segmentation and quantification can be performed manually, these processes are labor-intensive, time-consuming, and prone to error 6 , 7 . Additionally, even if the semantic segmentation of cell nuclei is done properly, several challenges can still arise. For example, unconventional morphologies in diseased environments cannot be differentiated, a high noise‑to‑signal ratio and heterogeneous staining further degrade segmentation accuracy, limiting the reliability of the produced masks, and overlapping nuclei cannot be delineated, leading to fragmented or merged instance predictions 8 – 10 . To enhance the accuracy and efficacy of automated nuclei segmentation, we should apply deep learning algorithms to overcome current limitations 11 . In recent years, CNNs trained on large and supervised image datasets have achieved state-of-the-art results in medical image classification and segmentation. However, a shortage of well‑annotated histopathology datasets limits their advancement, and among publicly available breast‑tissue sets, annotation quality varies, with only a few providing exhaustive nucleus‑boundary markings 12 . Most of the labs are striving towards incorporating deep learning image segmentation for day-to-day use, but open-source datasets of nuclei images are not available for training and validating the model accuracy 10 , 13 . To overcome this issue, we will release our dataset to benefit the broader research community, inspiring new developments and advancements in the field of nuclei segmentation. We present the Breast Mammary Gland Dataset (BMGD), an annotated dataset for nuclei segmentation in DAPI-stained fluorescent images. The dataset includes high-quality 40X images acquired with a Zeiss LSM 710 confocal microscope, featuring mammary epithelial cell cultures exposed to different microenvironmental stiffnesses 14 . Additionally, the dataset contains 819 images with more than 9,500 cell nuclei boundaries, which are annotated with human labor to ensure precision and accuracy for a reliable learning process. The BMGD can be further utilized in evaluating, training, and testing machine learning algorithms for nuclei segmentation methods, and additionally estimating the transferability and adaptability of previously developed nuclei segmentation methods. Methods Data Collection The dataset proposed in this study originates from one of our prior research projects 14 . The three-dimensional volume of a single mammary colony was captured utilizing a Zeiss LSM 710 confocal microscope equipped with a Zeiss Apochromat 40X/1.1 (0.8mm working distance) water-immersion objective lens. Excitation filters were configured at 405 nm, while emission filters were set to detect signals between 420–480 nm. The laser intensity was maintained at 1%, and a twin-gate main beam splitter featuring two wheels, each containing 10 filter positions (resulting in 100 possible combinations), was employed to separate the excitation and emission beams. The images were taken at 12-bit resolution. The pinhole aperture was set at "1", and digital gain was adjusted to approximately ¾ of the maximum gain, ensuring a dynamic range of pixel values between 500–2000. The voxel size was set to 0.25µՠ × 0.25µՠ × 1µՠ, yielding high-resolution capture of the cellular structures. For each colony, Z-stack images were acquired to encompass the entire volume of the cellular structure. The resulting image files were saved in Laser Scanning Microscope (.lsm) format with their corresponding metadata. Data Processing and Labeling To facilitate the development and evaluation of comprehensive nuclei segmentation algorithms, we employed the Labkit extension within the FIJI platform for careful annotation of all images 15 . This tool provides intuitive manual and semi-automated image segmentation capabilities. Each image in our collection is paired with a hand-crafted ground-truth mask that precisely outlines cellular structures of interest. The dataset encompasses delineations of more than 9,500 nuclei perimeters, and the annotation phase alone consumed over 800 hours of work. The complete annotation workflow was designed to ensure maximum precision and reproducibility in the segmentation process, as shown in Fig. 1 . The first step is data preprocessing. Initially, a Python script was used to isolate 2D image slices from the 3D dataset 13 . Then, we applied an intensity threshold of 1500 or higher to preprocess the data for better visualization. This was followed by manual filtering to remove noise, using mean filters with a radius ranging from 0.5 to 2.0 and intensity subtraction to ensure optimal data quality. Gaussian blur filters were also applied to enhance edge detection. Second, foreground and background regions are separated to enable nuclei boundary detection, then the annotation process involved pixel-based delineation of nuclei boundaries, with the nuclei themselves being marked as foreground in red and the background colored in blue. Third, a random forest classifier integrated into Labkit was then utilized to generate preliminary masks, followed by manual verification against the original image 16 . Lastly, we binarized images to finalize the mask. The final binary mask is produced from these annotations, where white pixels represent segmented nuclei and black pixels represent background. Data Records The BMGD ( https://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD ) is now available for the public to access on GitHub. The dataset includes 819 DAPI-stained fluorescent microscopy images of mammary gland cells cultured under different stiffness conditions, ranging from 250Pa to 1800Pa. There is a total of > 9,500 manually annotated nuclei, distributed across four different stiffness conditions: 250Pa (453 images, 5,426 nuclei), 950Pa (54 images, 453 nuclei), 1200Pa (114 images, 1,538 nuclei), and 1800Pa (198 images, 2,144 nuclei). On average, each image contains between 8 and 14 nuclei, with the lowest average density observed in the 950Pa condition, 8.4 nuclei per image, and the highest in the 1200Pa condition,13.49 nuclei per image. Table 1 shows the quantification of nuclei across different substrate stiffness conditions. Each image is paired with corresponding binary masks and labeled segmentation data, making it suitable for both instance and semantic segmentation tasks. All images and their associated annotations were standardized to 256 × 256 pixels and maintained their original 12-bit dynamic range. Table 1 Quantification of nuclei across different substrate stiffness conditions. Stiffness Condition Images Nuclei Avg. Nuclei per image 250Pa 453 5426 11.98 950Pa 54 453 8.4 1200Pa 114 1538 13.49 1800Pa 198 2144 10.83 Technical Validation To validate the dataset's reliability for supervised segmentation tasks, we benchmarked several convolutional neural networks within the U-Net architecture, including ResNet50, MobileNetV2, Inception-ResNetV2, InceptionV3, DenseNet121, and VGG19 13 . We divided the dataset into training, validation, and testing subsets following an 80:10:10 ratio, corresponding to 655 images for training, 82 for validation, and 82 images reserved for testing. The model performance was evaluated using the F1-score, Intersection over Union (IoU), and validation loss, enabling a direct comparison of pixel-level segmentation accuracy and region-level agreement. Figure 2 presents the generated mask and overlay images produced with our in-house segmentation algorithm with Inception-ResNetV2 as encoder, compared with the ground truth mask. The segmented results of different stiffness conditions are shown to demonstrate the generalization of the code, which is published on GitHub ( https://github.com/zt089/BMGD-nuclei-segmentation ). The performance of each model, along with its evaluation matrices, is shown in Table 2 . The comparative results demonstrate consistently strong performance across all models, with F1 scores ranging around the same level, from 92.90% to 93.66% and IoU values between 86.73% and 88.08%. Among the tested backbones, Inception-ResNetV2 achieved the highest overall segmentation accuracy, yielding an F1 score of 93.66% and the best IoU of 88.08%, indicating the best pixel-wise segmentation with ground-truth annotations. MobileNetV2 and DenseNet121 also performed competitively, maintaining F1 scores above 93% with IoU values exceeding 87%, while VGG19, although slightly lower in accuracy, obtained the lowest validation loss (0.06006), suggesting more stable optimization on this dataset. These results indicate that the dataset supports robust training across multiple architectures, with Inception-ResNetV2 providing the most reliable and consistent segmentation performance. The reported metrics serve as a quantitative baseline for future methodological comparisons and for assessing improvements from advanced architecture or training strategies. Table 2 Dataset performance on different models. Model F1 score IoU Validation loss ResNet50 93.22% 87.31% 0.06650 MobileNetV2 93.31% 87.45% 0.07459 Inception-ResNetV2 93.66% 88.08% 0.08286 InceptionV3 93.61% 87.99% 0.07875 DenseNet121 93.55% 87.89% 0.07849 VGG19 92.90% 86.73% 0.06006 Initially, the dataset was evaluated previously, where EfficientNetB5, ResNet50, InceptionResNetV2, VGG19, DenseNet121, and MobileNet were used as U-Net backbone encoders, and EfficientNetB5 showed the most promising result with an F1-score of 87.11% and a mean IoU of 80.89% 13 . The new benchmarks from this study show clear improvement over earlier results. All tested backbones now perform much better than the previous baseline, with F1-scores above 92% and IoU values over 86%. These results show that the dataset is robust and the updated training strategy works well, leading to much stronger segmentation accuracy than previously reported. To compare each model’s performance, we selected one random image, and the results are presented in Fig. 3 . The original DAPI image with contrast enhancement (top) and the corresponding ground truth mask are shown as references. Predicted masks and image-mask overlays are presented for U-Net models with different encoders, ResNet50, MobileNetV2, InceptionResNetV2, InceptionV3, DenseNet121, and VGG19. There are differences in segmentation quality, including nucleus shape preservation, boundary sharpness, and detection completeness across different backbone architectures. Usage Notes The BMGD with the raw images, corresponding binary and segmented mask, is accessible through our published repository in GitHub ( https://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD ). To ensure effective use of our dataset, we will provide comprehensive documentation (Read Me files) and supporting materials to assist researchers in maximizing the value of these resources. When applying data augmentation, we suggest following our documented protocol, which includes horizontal flipping, random cropping, elastic transformations, and brightness contrast adjustments. These augmentation techniques have been validated to improve model generalization without introducing artifacts that could compromise segmentation accuracy. For initial data processing, we recommend utilizing the provided Python scripts, which standardize image dimensions and normalize intensity values. The dataset is organized for compatibility with widely used deep learning frameworks, with all images pre-processed to 256×256 pixels. Users should be aware that the images retain their original 12-bit dynamic range, which preserves detailed intensity information essential for accurate nuclei segmentation. The training and evaluation pipelines were implemented using TensorFlow and Keras, with encoders derived from the Segmentation Models library. Image augmentation was performed using Albumentations, and image input/output and preprocessing utilized OpenCV and NumPy. The encoder backbone can be readily exchanged among ResNet50, MobileNetV2, InceptionResNetV2, InceptionV3, DenseNet121, and VGG19 without requiring modifications to the core training loop. We recommend using our Inception-ResNetV2 implementation, as it provides an optimal balance between performance and computational efficiency, particularly for researchers with limited computational resources. It should be highlighted that the dataset can be used either independently for training, validation, and testing of segmentation algorithms, or as a complementary dataset to assess model generalization. Researchers can freely incorporate the dataset into their own machine-learning or deep-learning pipelines for nuclei segmentation tasks. The dataset format is lightweight, enabling seamless integration into custom analysis environments and different computational setups. Declarations Competing Interests There is no competing interest. Funding This research is supported in part by grant 80NSSC23K0989 from NASA and grant DE-SC0025403 from DOE. We greatly appreciate additional support from the UWM Research Foundation and the University of Wisconsin-Milwaukee. Author Contribution Q.C. conceived the project concept and acquired microscopy data. Z.T., J.F., and A.S. performed the image annotations and created the segmentation masks. A.S. initially developed the code, with Z.T. refining it significantly. J.F. drafted the initial manuscript, after which Z.T. and Q.C. revised and edited it. J.Z. contributed to mask generation and further manuscript editing. Acknowledgement We greatly appreciate Dr. Bahram Parvin from the University of Nevada, Reno, for his advice in the past decade. Data Availability The BMGD with original images, corresponding binary and segmented masks, is accessible through GitHub ([https://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD]. All images under different stiffness conditions, with their underlying mask images, are uploaded to separate folders. Code Availability The code implemented for this dataset can also be found on GitHub ( https://github.com/zt089/BMGD-nuclei-segmentation ). It includes a systematic script to evaluate the BGMD for training, validation, testing, and generating new masks. References Xu, X. et al. Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 8300–8308 (IEEE Computer Society, 2018). doi: 10.1109/CVPR.2018.00866 . Mergenthaler, P. et al. Rapid 3D phenotypic analysis of neurons and organoids using data-driven cell segmentation-free machine learning. PLoS Comput Biol 17, e1008630 (2021). Shi, F. et al. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19. IEEE Reviews in Biomedical Engineering vol. 14 4–15 Preprint at https://doi.org/10.1109/RBME.2020.2987975 (2021). Minaee, S. et al. Image Segmentation Using Deep Learning: A Survey. IEEE Trans Pattern Anal Mach Intell 44, 3523–3542 (2022). Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. http://arxiv.org/abs/1505.04597 (2015). Ng, H. P., Ong, S. H., Foong, K. W. C., Goh, P. S. & Nowinski, W. L. Medical image segmentation using k-means clustering and improved watershed algorithm. in Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation vol. 2006 61–65 (2006). Chang, S. et al. Deformable multi-level feature network applied to nucleus segmentation. Front Microbiol 15, (2024). Zhang, W. et al. Keep it accurate and robust: An enhanced nuclei analysis framework. Comput Struct Biotechnol J 24, 699–710 (2024). Gabdullin, M. T. et al. Automatic cancer nuclei segmentation on histological images: comparison study of deep learning methods. Biotechnology and Bioprocess Engineering 29, 1034–1047 (2024). Mahbod, A. et al. NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images. Sci Data 11, (2024). He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition vols 2016-December 770–778 (IEEE Computer Society, 2016). Lagree, A. et al. A review and comparison of breast tumor cell nuclei segmentation performances using deep convolutional neural networks. Sci Rep 11, (2021). Shrestha, A., Bao, X., Cheng, Q. & McRoy, S. CNN-Modified Encoders in U-Net for Nuclei Segmentation and Quantification of Fluorescent Images. IEEE Access 12, 107089–107097 (2024). Cheng, Q. et al. Stiffness of the microenvironment upregulates ERBB2 expression in 3D cultures of MCF10A within the range of mammographic density. Sci Rep 6, (2016). Schindelin, J. et al. Fiji: An open-source platform for biological-image analysis. Nature Methods vol. 9 676–682 Preprint at https://doi.org/10.1038/nmeth.2019 (2012). Hollandi, R., Diósdi, Á., Hollandi, G., Moshkov, N. & Horváth, P. Annotator J: An image J plugin to ease hand annotation of cellular compartments. Mol Biol Cell 31, 2179–2186 (2020). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8263420","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"data-descriptor","associatedPublications":[],"authors":[{"id":573279742,"identity":"9ffcb0fc-65dd-4f10-86f2-74e983cbb393","order_by":0,"name":"Zabina Tasneem","email":"","orcid":"","institution":"University of Wisconsin-Milwaukee","correspondingAuthor":false,"prefix":"","firstName":"Zabina","middleName":"","lastName":"Tasneem","suffix":""},{"id":573279745,"identity":"d6d6edc1-d6b5-4e03-b4d5-302768931020","order_by":1,"name":"Jinwei Fan","email":"","orcid":"","institution":"University of Wisconsin-Milwaukee","correspondingAuthor":false,"prefix":"","firstName":"Jinwei","middleName":"","lastName":"Fan","suffix":""},{"id":573279747,"identity":"b40a2200-84ad-478a-8132-c6383754b101","order_by":2,"name":"Aishwarya Shrestha","email":"","orcid":"","institution":"University of Wisconsin-Milwaukee","correspondingAuthor":false,"prefix":"","firstName":"Aishwarya","middleName":"","lastName":"Shrestha","suffix":""},{"id":573279749,"identity":"bf0e7d65-a623-4a06-9929-aebfbeea663d","order_by":3,"name":"Joy Zhao","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Joy","middleName":"","lastName":"Zhao","suffix":""},{"id":573279750,"identity":"8c95ca97-6a02-406a-bc41-d31eb2d769a9","order_by":4,"name":"Qingsu Cheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAArUlEQVRIiWNgGAWjYDACCTBpA+HwkKAljXQth0nQwj+7+djDL2XnEzdcO8D44G0bMZbcOZZuLHPuduKG2wnMhnOJ0WIgkWMmLdl2O3Hb7QQ2aV7itOR/A2o5B9LC/ptILTlskh/bDoBtYSZKi8SNNDNphnPJxvtvJzZLzjlHhBb+GcnPJH+U2cnOnJ188MObMiK0gAAzDxuIYmwgUj1I7Q824hWPglEwCkbBCAQAz7M3TO3iywcAAAAASUVORK5CYII=","orcid":"","institution":"University of Wisconsin-Milwaukee","correspondingAuthor":true,"prefix":"","firstName":"Qingsu","middleName":"","lastName":"Cheng","suffix":""}],"badges":[],"createdAt":"2025-12-02 18:23:26","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8263420/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8263420/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100396242,"identity":"59f27891-a5de-4a2a-80e9-344bf0a7ce38","added_by":"auto","created_at":"2026-01-16 11:40:17","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":596993,"visible":true,"origin":"","legend":"","description":"","filename":"ManuscriptFinal.docx","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/7db466323a3dd43b4fea1a29.docx"},{"id":100396475,"identity":"a3c4dfdb-6d37-4c4c-be14-7badf6be1389","added_by":"auto","created_at":"2026-01-16 11:40:45","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6592,"visible":true,"origin":"","legend":"","description":"","filename":"f2676d4cac454d979fbd81edb2bf69dc.json","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/5c7b442b5a673978ea12db64.json"},{"id":100396573,"identity":"4b8d6c94-c9a1-418f-98c5-7f265dd4a8a6","added_by":"auto","created_at":"2026-01-16 11:40:47","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48792,"visible":true,"origin":"","legend":"","description":"","filename":"f2676d4cac454d979fbd81edb2bf69dc1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/8868a6e93de36acd3f689fae.xml"},{"id":100396143,"identity":"902e5544-5df9-47a2-b58f-f408316a665a","added_by":"auto","created_at":"2026-01-16 11:40:02","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":212674,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/267f755e5a277ad732a15ff5.png"},{"id":100396315,"identity":"4d7b1d39-b0f6-47be-a6cc-eb830dd69eaf","added_by":"auto","created_at":"2026-01-16 11:40:36","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":140078,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/5bafc29d126655dd36bb7269.png"},{"id":100396358,"identity":"82adccb2-c2b7-4d90-8347-2af42359e27a","added_by":"auto","created_at":"2026-01-16 11:40:41","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":125938,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/5921de4341fb071c7ed85b2b.png"},{"id":100396287,"identity":"ef999ca5-55c1-450b-8b6b-f7ccf8cdebf8","added_by":"auto","created_at":"2026-01-16 11:40:31","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37312,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/860e8e00b71be828dce35c85.png"},{"id":100395950,"identity":"f3a4a06c-3084-4ec0-9d8a-e06bdbc36e32","added_by":"auto","created_at":"2026-01-16 11:39:38","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":45349,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/d7ac651eab1f87c2a726e344.png"},{"id":100396274,"identity":"1a025651-8cd2-43bd-84d5-759d6f73d1ef","added_by":"auto","created_at":"2026-01-16 11:40:21","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":34893,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/6644a54320a0f20f0324a1aa.png"},{"id":100396593,"identity":"56dea5dc-1813-4ba9-aabd-9426a0bf9d63","added_by":"auto","created_at":"2026-01-16 11:40:48","extension":"xml","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":47749,"visible":true,"origin":"","legend":"","description":"","filename":"f2676d4cac454d979fbd81edb2bf69dc1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/af8115762eaab914f771766d.xml"},{"id":100396043,"identity":"4a6b57d3-9f1b-46c5-bedf-4866bd1e58ff","added_by":"auto","created_at":"2026-01-16 11:39:50","extension":"html","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54105,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/921b938da2b95841dc61675a.html"},{"id":100396295,"identity":"051c7f0c-b097-40f6-a4c2-89849eec85b1","added_by":"auto","created_at":"2026-01-16 11:40:33","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":78612,"visible":true,"origin":"","legend":"\u003cp\u003eThe nuclei segmentation workflow shows preprocessing, boundary detection, mask generation, and manual verification, with corresponding example outputs. The top panel illustrates the automated preprocessing and segmentation pipeline, while the bottom panel shows example outputs at each stage: contrast-enhanced input, preprocessing, foreground \u0026amp; background separation, manual annotation, and final binary mask.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/33f4920003563df9248a85e8.png"},{"id":100396266,"identity":"f925a5f8-321a-45e5-894d-f0fadcbd0cab","added_by":"auto","created_at":"2026-01-16 11:40:19","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":94544,"visible":true,"origin":"","legend":"\u003cp\u003eSegmentation performance of the proposed model under varying substrate stiffness.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/324cfaefc4a46f2eec610e2f.png"},{"id":100396572,"identity":"b451026e-093d-4bb9-94f9-857eca3da3ab","added_by":"auto","created_at":"2026-01-16 11:40:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":87176,"visible":true,"origin":"","legend":"\u003cp\u003eQualitative comparison of nuclei segmentation using U-Net models with different encoder backbones. The original fluorescence image and ground truth mask are shown at the top. Algorithm-predicted masks and overlays from six encoders demonstrate differences in nuclei segmentation performance.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/ffd68eb8907046b26412e55a.png"},{"id":102397483,"identity":"a08ffc1b-5672-4f82-bf34-67945f35562b","added_by":"auto","created_at":"2026-02-11 10:17:19","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":727316,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8263420/v1/86ee754b-1a43-41db-9177-04e1dc0ddc80.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Breast Mammary Gland Dataset (BMGD): DAPI-Stained Fluorescent Images for Nuclei Segmentation","fulltext":[{"header":"Background \u0026 Summary","content":"\u003cp\u003eConfocal fluorescent microscopy, a staple in life science research for its high-quality images, often requires the labor-intensive and error-prone task of manually analyzing each nucleus. This challenge underscores the growing need for an automatic approach to analyze biological images\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. In computational biology, nuclei segmentation plays a fundamental role in analyzing morphological changes and quantifying molecular expressions \u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. These detailed biological data can be used for cell and tissue identification, cancer diagnosis, and therapeutic assessment \u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Researchers can further map molecular activities, organelle information, cellular phenotypes, and multicellular structures linked to cellular migration, division, and tissue development under environmental stimuli\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAlthough nuclei segmentation and quantification can be performed manually, these processes are labor-intensive, time-consuming, and prone to error\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. Additionally, even if the semantic segmentation of cell nuclei is done properly, several challenges can still arise. For example, unconventional morphologies in diseased environments cannot be differentiated, a high noise‑to‑signal ratio and heterogeneous staining further degrade segmentation accuracy, limiting the reliability of the produced masks, and overlapping nuclei cannot be delineated, leading to fragmented or merged instance predictions\u003csup\u003e\u003cspan additionalcitationids=\"CR9\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. To enhance the accuracy and efficacy of automated nuclei segmentation, we should apply deep learning algorithms to overcome current limitations\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn recent years, CNNs trained on large and supervised image datasets have achieved state-of-the-art results in medical image classification and segmentation. However, a shortage of well‑annotated histopathology datasets limits their advancement, and among publicly available breast‑tissue sets, annotation quality varies, with only a few providing exhaustive nucleus‑boundary markings\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Most of the labs are striving towards incorporating deep learning image segmentation for day-to-day use, but open-source datasets of nuclei images are not available for training and validating the model accuracy\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. To overcome this issue, we will release our dataset to benefit the broader research community, inspiring new developments and advancements in the field of nuclei segmentation.\u003c/p\u003e \u003cp\u003eWe present the Breast Mammary Gland Dataset (BMGD), an annotated dataset for nuclei segmentation in DAPI-stained fluorescent images. The dataset includes high-quality 40X images acquired with a Zeiss LSM 710 confocal microscope, featuring mammary epithelial cell cultures exposed to different microenvironmental stiffnesses\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Additionally, the dataset contains 819 images with more than 9,500 cell nuclei boundaries, which are annotated with human labor to ensure precision and accuracy for a reliable learning process. The BMGD can be further utilized in evaluating, training, and testing machine learning algorithms for nuclei segmentation methods, and additionally estimating the transferability and adaptability of previously developed nuclei segmentation methods.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eData Collection\u003c/h2\u003e \u003cp\u003eThe dataset proposed in this study originates from one of our prior research projects\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. The three-dimensional volume of a single mammary colony was captured utilizing a Zeiss LSM 710 confocal microscope equipped with a Zeiss Apochromat 40X/1.1 (0.8mm working distance) water-immersion objective lens. Excitation filters were configured at 405 nm, while emission filters were set to detect signals between 420\u0026ndash;480 nm. The laser intensity was maintained at 1%, and a twin-gate main beam splitter featuring two wheels, each containing 10 filter positions (resulting in 100 possible combinations), was employed to separate the excitation and emission beams. The images were taken at 12-bit resolution. The pinhole aperture was set at \"1\", and digital gain was adjusted to approximately \u0026frac34; of the maximum gain, ensuring a dynamic range of pixel values between 500\u0026ndash;2000. The voxel size was set to 0.25\u0026micro;ՠ \u0026times; 0.25\u0026micro;ՠ \u0026times; 1\u0026micro;ՠ, yielding high-resolution\u003c/p\u003e \u003cp\u003ecapture of the cellular structures. For each colony, Z-stack images were acquired to encompass the entire volume of the cellular structure. The resulting image files were saved in Laser Scanning Microscope (.lsm) format with their corresponding metadata.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eData Processing and Labeling\u003c/h3\u003e\n\u003cp\u003eTo facilitate the development and evaluation of comprehensive nuclei segmentation algorithms, we employed the Labkit extension within the FIJI platform for careful annotation of all images\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. This tool provides intuitive manual and semi-automated image segmentation capabilities. Each image in our collection is paired with a hand-crafted ground-truth mask that precisely outlines cellular structures of interest. The dataset encompasses delineations of more than 9,500 nuclei perimeters, and the annotation phase alone consumed over 800 hours of work. The complete annotation workflow was designed to ensure maximum precision and reproducibility in the segmentation process, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eThe first step is data preprocessing. Initially, a Python script was used to isolate 2D image slices from the 3D dataset\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Then, we applied an intensity threshold of 1500 or higher to preprocess the data for better visualization. This was followed by manual filtering to remove noise, using mean filters with a radius ranging from 0.5 to 2.0 and intensity subtraction to ensure optimal data quality. Gaussian blur filters were also applied to enhance edge detection. Second, foreground and background regions are separated to enable nuclei boundary detection, then the annotation process involved pixel-based delineation of nuclei boundaries, with the nuclei themselves being marked as foreground in red and the background colored in blue. Third, a random forest classifier integrated into Labkit was then utilized to generate preliminary masks, followed by manual verification against the original image\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Lastly, we binarized images to finalize the mask. The final binary mask is produced from these annotations, where white pixels represent segmented nuclei and black pixels represent background.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Data Records","content":"\u003cp\u003eThe BMGD (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD\u003c/span\u003e\u003cspan address=\"https://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) is now available for the public to access on GitHub. The dataset includes 819 DAPI-stained fluorescent microscopy images of mammary gland cells cultured under different stiffness conditions, ranging from 250Pa to 1800Pa. There is a total of \u0026gt;\u0026thinsp;9,500 manually annotated nuclei, distributed across four different stiffness conditions: 250Pa (453 images, 5,426 nuclei), 950Pa (54 images, 453 nuclei), 1200Pa (114 images, 1,538 nuclei), and 1800Pa (198 images, 2,144 nuclei). On average, each image contains between 8 and 14 nuclei, with the lowest average density observed in the 950Pa condition, 8.4 nuclei per image, and the highest in the 1200Pa condition,13.49 nuclei per image. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the quantification of nuclei across different substrate stiffness conditions. Each image is paired with corresponding binary masks and labeled segmentation data, making it suitable for both instance and semantic segmentation tasks. All images and their associated annotations were standardized to 256 \u0026times; 256 pixels and maintained their original 12-bit dynamic range.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantification of nuclei across different substrate stiffness conditions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStiffness Condition\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eImages\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNuclei\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAvg. Nuclei per image\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e250Pa\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e453\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5426\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e11.98\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e950Pa\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e453\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8.4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1200Pa\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e114\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1538\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e13.49\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1800Pa\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e198\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2144\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10.83\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"Technical Validation","content":"\u003cp\u003eTo validate the dataset's reliability for supervised segmentation tasks, we benchmarked several convolutional neural networks within the U-Net architecture, including ResNet50, MobileNetV2, Inception-ResNetV2, InceptionV3, DenseNet121, and VGG19\u003csup\u003e13\u003c/sup\u003e. We divided the dataset into training, validation, and testing subsets following an 80:10:10 ratio, corresponding to 655 images for training, 82 for validation, and 82 images reserved for testing. The model performance was evaluated using the F1-score, Intersection over Union (IoU), and validation loss, enabling a direct comparison of pixel-level segmentation accuracy and region-level agreement. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents the generated mask and overlay images produced with our in-house segmentation algorithm with Inception-ResNetV2 as encoder, compared with the ground truth mask. The segmented results of different stiffness conditions are shown to demonstrate the generalization of the code, which is published on GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zt089/BMGD-nuclei-segmentation\u003c/span\u003e\u003cspan address=\"https://github.com/zt089/BMGD-nuclei-segmentation\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe performance of each model, along with its evaluation matrices, is shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The comparative results demonstrate consistently strong performance across all models, with F1 scores ranging around the same level, from 92.90% to 93.66% and IoU values between 86.73% and 88.08%. Among the tested backbones, Inception-ResNetV2 achieved the highest overall segmentation accuracy, yielding an F1 score of 93.66% and the best IoU of 88.08%, indicating the best pixel-wise segmentation with ground-truth annotations. MobileNetV2 and DenseNet121 also performed competitively, maintaining F1 scores above 93% with IoU values exceeding 87%, while VGG19, although slightly lower in accuracy, obtained the lowest validation loss (0.06006), suggesting more stable optimization on this dataset. These results indicate that the dataset supports robust training across multiple architectures, with Inception-ResNetV2 providing the most reliable and consistent segmentation performance. The reported metrics serve as a quantitative baseline for future methodological comparisons and for assessing improvements from advanced architecture or training strategies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDataset performance on different models.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eF1 score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eValidation loss\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e93.22%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e87.31%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.06650\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMobileNetV2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e93.31%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e87.45%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.07459\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInception-ResNetV2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e93.66%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e88.08%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.08286\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInceptionV3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e93.61%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e87.99%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.07875\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDenseNet121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e93.55%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e87.89%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.07849\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVGG19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e92.90%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e86.73%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.06006\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eInitially, the dataset was evaluated previously, where EfficientNetB5, ResNet50, InceptionResNetV2, VGG19, DenseNet121, and MobileNet were used as U-Net backbone encoders, and EfficientNetB5 showed the most promising result with an F1-score of 87.11% and a mean IoU of 80.89%\u003csup\u003e13\u003c/sup\u003e. The new benchmarks from this study show clear improvement over earlier results. All tested backbones now perform much better than the previous baseline, with F1-scores above 92% and IoU values over 86%. These results show that the dataset is robust and the updated training strategy works well, leading to much stronger segmentation accuracy than previously reported. To compare each model\u0026rsquo;s performance, we selected one random image, and the results are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The original DAPI image with contrast enhancement (top) and the corresponding ground truth mask are shown as references. Predicted masks and image-mask overlays are presented for U-Net models with different encoders, ResNet50, MobileNetV2, InceptionResNetV2, InceptionV3, DenseNet121, and VGG19. There are differences in segmentation quality, including nucleus shape preservation, boundary sharpness, and detection completeness across different backbone architectures.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eUsage Notes\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe BMGD with the raw images, corresponding binary and segmented mask, is accessible through our published repository in GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD\u003c/span\u003e\u003cspan address=\"https://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). To ensure effective use of our dataset, we will provide comprehensive documentation (Read Me files) and supporting materials to assist researchers in maximizing the value of these resources. When applying data augmentation, we suggest following our documented protocol, which includes horizontal flipping, random cropping, elastic transformations, and brightness contrast adjustments. These augmentation techniques have been validated to improve model generalization without introducing artifacts that could compromise segmentation accuracy. For initial data processing, we recommend utilizing the provided Python scripts, which standardize image dimensions and normalize intensity values. The dataset is organized for compatibility with widely used deep learning frameworks, with all images pre-processed to 256\u0026times;256 pixels. Users should be aware that the images retain their original 12-bit dynamic range, which preserves detailed intensity information essential for accurate nuclei segmentation. The training and evaluation pipelines were implemented using TensorFlow and Keras, with encoders derived from the Segmentation Models library. Image augmentation was performed using Albumentations, and image input/output and preprocessing utilized OpenCV and NumPy. The encoder backbone can be readily exchanged among ResNet50, MobileNetV2, InceptionResNetV2, InceptionV3, DenseNet121, and VGG19 without requiring modifications to the core training loop. We recommend using our Inception-ResNetV2 implementation, as it provides an optimal balance between performance and computational efficiency, particularly for researchers with limited computational resources. It should be highlighted that the dataset can be used either independently for training, validation, and testing of segmentation algorithms, or as a complementary dataset to assess model generalization. Researchers can freely incorporate the dataset into their own machine-learning or deep-learning pipelines for nuclei segmentation tasks. The dataset format is lightweight, enabling seamless integration into custom analysis environments and different computational setups.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThere is no competing interest.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis research is supported in part by grant 80NSSC23K0989 from NASA and grant DE-SC0025403 from DOE. We greatly appreciate additional support from the UWM Research Foundation and the University of Wisconsin-Milwaukee.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eQ.C. conceived the project concept and acquired microscopy data. Z.T., J.F., and A.S. performed the image annotations and created the segmentation masks. A.S. initially developed the code, with Z.T. refining it significantly. J.F. drafted the initial manuscript, after which Z.T. and Q.C. revised and edited it. J.Z. contributed to mask generation and further manuscript editing.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e \u003cp\u003eWe greatly appreciate Dr. Bahram Parvin from the University of Nevada, Reno, for his advice in the past decade.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe BMGD with original images, corresponding binary and segmented masks, is accessible through GitHub ([https://github.com/zt089/Breast-Mammary-Gland-Dataset-BMGD]. All images under different stiffness conditions, with their underlying mask images, are uploaded to separate folders.\u003c/p\u003e\n \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCode Availability\u003c/h2\u003e \u003cp\u003eThe code implemented for this dataset can also be found on GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zt089/BMGD-nuclei-segmentation\u003c/span\u003e\u003cspan address=\"https://github.com/zt089/BMGD-nuclei-segmentation\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). It includes a systematic script to evaluate the BGMD for training, validation, testing, and generating new masks.\u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eXu, X. \u003cem\u003eet al.\u003c/em\u003e Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation. in \u003cem\u003eProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition\u003c/em\u003e 8300\u0026ndash;8308 (IEEE Computer Society, 2018). doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/CVPR.2018.00866\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2018.00866\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMergenthaler, P. \u003cem\u003eet al.\u003c/em\u003e Rapid 3D phenotypic analysis of neurons and organoids using data-driven cell segmentation-free machine learning. \u003cem\u003ePLoS Comput Biol\u003c/em\u003e 17, e1008630 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi, F. \u003cem\u003eet al.\u003c/em\u003e Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19. \u003cem\u003eIEEE Reviews in Biomedical Engineering\u003c/em\u003e vol. 14 4\u0026ndash;15 Preprint at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/RBME.2020.2987975\u003c/span\u003e\u003cspan address=\"10.1109/RBME.2020.2987975\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMinaee, S. \u003cem\u003eet al.\u003c/em\u003e Image Segmentation Using Deep Learning: A Survey. \u003cem\u003eIEEE Trans Pattern Anal Mach Intell\u003c/em\u003e 44, 3523\u0026ndash;3542 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRonneberger, O., Fischer, P. \u0026amp; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/1505.04597\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/1505.04597\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNg, H. P., Ong, S. H., Foong, K. W. C., Goh, P. S. \u0026amp; Nowinski, W. L. Medical image segmentation using k-means clustering and improved watershed algorithm. in \u003cem\u003eProceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation\u003c/em\u003e vol. 2006 61\u0026ndash;65 (2006).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang, S. \u003cem\u003eet al.\u003c/em\u003e Deformable multi-level feature network applied to nucleus segmentation. \u003cem\u003eFront Microbiol\u003c/em\u003e 15, (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, W. \u003cem\u003eet al.\u003c/em\u003e Keep it accurate and robust: An enhanced nuclei analysis framework. \u003cem\u003eComput Struct Biotechnol J\u003c/em\u003e 24, 699\u0026ndash;710 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGabdullin, M. T. \u003cem\u003eet al.\u003c/em\u003e Automatic cancer nuclei segmentation on histological images: comparison study of deep learning methods. \u003cem\u003eBiotechnology and Bioprocess Engineering\u003c/em\u003e 29, 1034\u0026ndash;1047 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMahbod, A. \u003cem\u003eet al.\u003c/em\u003e NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H\u0026amp;E-stained histological images. \u003cem\u003eSci Data\u003c/em\u003e 11, (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe, K., Zhang, X., Ren, S. \u0026amp; Sun, J. Deep residual learning for image recognition. in \u003cem\u003eProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition\u003c/em\u003e vols 2016-December 770\u0026ndash;778 (IEEE Computer Society, 2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLagree, A. \u003cem\u003eet al.\u003c/em\u003e A review and comparison of breast tumor cell nuclei segmentation performances using deep convolutional neural networks. \u003cem\u003eSci Rep\u003c/em\u003e 11, (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShrestha, A., Bao, X., Cheng, Q. \u0026amp; McRoy, S. CNN-Modified Encoders in U-Net for Nuclei Segmentation and Quantification of Fluorescent Images. \u003cem\u003eIEEE Access\u003c/em\u003e 12, 107089\u0026ndash;107097 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng, Q. \u003cem\u003eet al.\u003c/em\u003e Stiffness of the microenvironment upregulates ERBB2 expression in 3D cultures of MCF10A within the range of mammographic density. \u003cem\u003eSci Rep\u003c/em\u003e 6, (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchindelin, J. \u003cem\u003eet al.\u003c/em\u003e Fiji: An open-source platform for biological-image analysis. \u003cem\u003eNature Methods\u003c/em\u003e vol. 9 676\u0026ndash;682 Preprint at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/nmeth.2019\u003c/span\u003e\u003cspan address=\"10.1038/nmeth.2019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHollandi, R., Di\u0026oacute;sdi, \u0026Aacute;., Hollandi, G., Moshkov, N. \u0026amp; Horv\u0026aacute;th, P. Annotator J: An image J plugin to ease hand annotation of cellular compartments. \u003cem\u003eMol Biol Cell\u003c/em\u003e 31, 2179\u0026ndash;2186 (2020).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8263420/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8263420/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate segmentation of nuclei images is essential for analyzing cellular responses to perturbation in both \u003cem\u003ein vitro\u003c/em\u003e and \u003cem\u003ein vivo\u003c/em\u003e experiments. Although traditional methods, including watershed, thresholding, clustering, morphological operations, and active contour models, have long been used in segmenting nuclei in digital images, these methods are labor-intensive and time-consuming. Therefore, current research has shifted to deep learning techniques for improved nuclei segmentation. However, training deep learning models requires high-quality annotated ground truth datasets, which are often scarce and not available for public use. In this study, we introduce the \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eB\u003c/span\u003ereast \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eM\u003c/span\u003eammary \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eG\u003c/span\u003eland \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eD\u003c/span\u003eataset (BMGD), an annotated collection of DAPI-stained nuclei images of mammary organoids. The dataset contains 819 image patches with more than 9,500 manually segmented nuclei cultured in various stiffness conditions. Each original image in the BMGD is paired with one carefully annotated ground truth mask. This dataset will enable researchers to develop and evaluate automated nuclei segmentation algorithms, particularly for studying cellular responses in breast cancer research and treatment.\u003c/p\u003e","manuscriptTitle":"Breast Mammary Gland Dataset (BMGD): DAPI-Stained Fluorescent Images for Nuclei Segmentation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-16 08:25:24","doi":"10.21203/rs.3.rs-8263420/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b21fcc7e-e829-4bcc-b919-46bfe5ce1ca2","owner":[],"postedDate":"January 16th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-02-10T17:41:51+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-16 08:25:24","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8263420","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8263420","identity":"rs-8263420","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00