A modular deep learning pipeline for standardised analysis of pelvic ultrasound images for gynaecology

preprint OA: gold CC-BY-4.0
Full text 98,224 characters · extracted from preprint-html · click to expand
A modular deep learning pipeline for standardised analysis of pelvic ultrasound images for gynaecology | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A modular deep learning pipeline for standardised analysis of pelvic ultrasound images for gynaecology George Adams, Lorna Brightmore, Jennifer Barcroft, Nina Cooper, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7997734/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 3 You are reading this latest preprint version Abstract Pelvic ultrasound is the principal imaging modality for evaluating the female reproductive tract, yet its clinical utility is constrained by substantial inter-observer variability and dependence on operator expertise. The anatomical complexity and physiological variability of the female pelvis present a uniquely demanding environment for automated image analysis: the uterus, ovaries, and endometrium differ markedly between individuals and change continuously across the menstrual cycle, making consistent, reproducible measurement a persistent challenge even for experienced practitioners. To address these limitations, we present a modular artificial intelligence pipeline for automated extraction of quantitative morphological measurements from pelvic ultrasound DICOM images. The pipeline comprises five sequential analytical stages. A ResNet-50 convolutional neural network, trained on 7,721 images across five anatomical categories, classifies each image frame with high discriminative accuracy (AUC 0.90 to 1.00). Complementary modules using optical character recognition and HSV-based colour thresholding determine ovarian laterality and detect Colour Doppler signals, while a random forest classifier identifies biplane split-screen views required for volumetric estimation (AUC = 0.96). A YOLOv5 object detection model, trained on 5,496 annotated images encompassing 25,942 follicle and 6,896 ovary regions, localises ovarian structures exclusively within ovary-classified frames to reduce false-positive detection of pelvic fluid collections. Two dedicated U-Net segmentation models, trained on 807 expert-annotated images, delineate ovarian and uterine structures independently, achieving Dice similarity coefficients of 0.856 and 0.899 respectively. Ellipse fitting applied to segmentation contours, calibrated using DICOM pixel spacing metadata, enables standardised extraction of endometrial thickness, ovary volume, and uterus volume. End-to-end evaluation in 100 patients demonstrated that automated measurements correlate meaningfully with both manual-caliper and manual-contour reference standards. Importantly, agreement between the two manual reference methods was itself moderate for several parameters, illustrating the inherent subjectivity of morphological measurement in pelvic ultrasound and providing essential context for interpreting automated performance. Automated endometrial thickness approached the level of agreement observed between the two manual methods, whilst ovarian volume concordance was comparable to inter-method human variability for that parameter. This pipeline provides a reproducible, scalable foundation for automated gynaecological imaging assessment. By decoupling morphological measurement from individual operator judgement and enabling structured data extraction at scale, it advances the goal of consistent, accessible, and high-quality imaging analysis as a cornerstone of women’s health care. Health sciences/Anatomy Biological sciences/Computational biology and bioinformatics Physical sciences/Engineering Health sciences/Health care Physical sciences/Mathematics and computing Health sciences/Medical research Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Ultrasound is the cornerstone imaging modality for evaluating the female reproductive tract, playing a central role in the detection and monitoring of a wide spectrum of gynaecological conditions, from abnormal uterine bleeding to malignancy 8 – 10 . The female pelvis presents a uniquely demanding imaging environment: the reproductive organs are anatomically complex, highly variable between individuals, and change substantially in size, morphology, and echogenicity across the menstrual cycle and with age. These properties make pelvic ultrasound one of the most technically and interpretatively challenging domains of diagnostic imaging, and despite its widespread clinical use, image interpretation remains highly operator-dependent, resulting in substantial inter-observer variability 11 – 16 . This variability undermines diagnostic reproducibility and accuracy, contributing to delayed or missed diagnoses of prevalent conditions such as uterine fibroids and endometriosis 17 – 23 . The clinical consequences are significant, encompassing diminished quality of life, increased healthcare costs, and diagnostic inequities across patient populations 24 – 26 . These challenges are compounded by a global shortage of trained sonographers, which limits access to timely and high-quality imaging, particularly in resource-constrained settings 27 – 29 . Evidence-based reporting frameworks including IOTA, IETA, MUSA, and O-RADS have substantially improved standardisation and consistency in ultrasound interpretation, particularly for the characterisation of malignant pathology 16 , 30 , 31 . However, even within these frameworks, the subjective placement of calipers and the manual delineation of anatomical boundaries introduce measurement variability that is inherent to the imaging domain itself, and that persists even among experienced practitioners applying the same methodology to the same images. Artificial intelligence offers a compelling opportunity to complement these frameworks by enabling automated, reproducible extraction of clinically relevant morphological information from ultrasound DICOM images at scale 32 , 33 . By removing dependence on individual operator judgement for discrete measurement tasks, AI-driven pipelines can improve consistency, facilitate structured data collection, and extend standardised analysis to settings where specialist expertise is limited. Realising this potential in women’s health imaging, however, requires solutions that are explicitly designed for the complexity of pelvic anatomy: capturing the structural diversity of the uterus, ovaries, and endometrium across individuals and physiological states demands models that are robust to high intra- and inter-individual variability, ambiguous tissue boundaries, and the full range of acquisition conditions encountered in routine clinical practice. Here, we present a comprehensive, modular AI pipeline for automated analysis of pelvic ultrasound DICOM images, designed to perform image classification, anatomical localisation, semantic segmentation, and quantitative measurement extraction in a generalisable and scalable manner. In contrast to prior end-to-end approaches focused on specific pathologies such as ovarian tumours, uterine fibroids, or polycystic ovary syndrome 34 – 37 , our pipeline decomposes the complex task of ultrasound interpretation into a sequence of purpose-trained, interoperable modules. We evaluate the pipeline both at the level of individual components and in an end-to-end concordance analysis against manual reference measurements in 100 patients, providing a realistic assessment of what automated analysis can achieve when applied to the full complexity of real-world pelvic ultrasound data. The findings offer both a practical demonstration of current capabilities and a clear characterisation of the challenges that remain in bringing fully automated reproductive imaging to clinical practice. Methods Data This was a retrospective study of anonymised pelvic ultrasound DICOM images and associated demographic data, collected from individuals who completed an online health assessment and underwent diagnostic imaging through Hertility Health Ltd. At the time of participation, all individuals provided informed consent for their health and imaging data to be used in anonymised form for research purposes. Consent was obtained electronically via an opt-in process embedded within the health assessment, with clear information provided regarding data usage, privacy, and confidentiality. All DICOM files and associated metadata were fully anonymised prior to access by the research team. No identifiable information, including patient names, dates of birth, or imaging timestamps, was retained in the dataset, and no re-identification was possible at any stage of analysis. The research protocol underpinning the online health assessment, testing pathway, and data governance processes was reviewed and approved by the London-Surrey Research Ethics Committee, part of the UK National Health Service (NHS) Health Research Authority (REC reference: 20/LO/0265). Pipeline overview We developed an end-to-end computational pipeline to extract structured quantitative information from raw pelvic ultrasound DICOM images acquired for women’s health assessments. The pipeline comprises five main stages: image content categorisation, text and Colour Doppler detection, biplane image identification, anatomical object localisation, and segmentation-based measurement extraction. The primary clinical outputs of the pipeline are endometrial thickness, ovary volume, and uterus volume. Image categorisation Each raw ultrasound image was preprocessed and resized to a fixed dimension of 150 x 150 pixels with three colour channels. To categorise image content, we trained a convolutional neural network (CNN) using PyTorch 2 . The model used a ResNet-50 backbone pretrained on ImageNet 1 , with the original fully connected layer replaced by a custom multi-layer head incorporating a linear layer, ReLU activation, and dropout regularisation to improve generalisation. The network classified images into five predefined categories: ovary, embedded text, trans-abdominal uterus, trans-vaginal uterus (transverse view), and trans-vaginal uterus (sagittal view). Training data were augmented using random rotations, flips, and affine transformations (probability = 0.5). Class imbalance was addressed using class-specific weighting within the cross-entropy loss function. An Adam optimiser with weight decay was used to minimise training loss. Text and Doppler detection Images classified as containing ovarian structures were further analysed to determine laterality and the presence of Colour Doppler signals. To extract text indicating left or right ovary, an optical character recognition (OCR) system was implemented using the PaddleOCR toolkit 3 . Doppler detection was achieved by thresholding characteristic red and blue pixel intensities in HSV colour space using OpenCV 4 . Images in which Colour Doppler flow was detected were flagged and excluded from downstream morphological analysis, as Doppler overlays corrupt the greyscale pixel intensities used by the segmentation models. Both laterality and Doppler status were recorded as image-level metadata and propagated through the remainder of the pipeline. Biplane image classification Estimation of ovary and uterus volumes requires three orthogonal diameter measurements: height, width, and depth. To obtain these, split-screen images capturing the same organ in two orthogonal planes (biplane images) were identified automatically using a random forest classifier trained on 2,000 annotated images. Candidate organ pairs across the two planes were matched by spatial proximity and relative size. This step was applied to both ovary and uterus image categories to enable volumetric estimation downstream. Object detection To localise ovarian structures, we trained a YOLOv5 object detection network 5 applied exclusively to images classified as containing ovarian tissue. Restricting object detection to this image subset was a deliberate design choice to prevent misclassification of small pelvic fluid collections as follicles, which was observed when the detector was applied more broadly. The model was trained to detect two object classes: ovary regions and follicles. Training data comprised 5,496 annotated images containing 25,942 annotated follicle regions and 6,896 annotated ovary regions. Training images and corresponding bounding boxes were augmented with random horizontal and vertical flips, affine transformations, and HSV colour shifts to enhance robustness. The YOLOv5 implementation and training pipeline followed the open-source Ultralytics repository. Image segmentation Detected ovarian regions and all uterus images were processed by two dedicated semantic segmentation models, each implemented in PyTorch using a U-Net architecture with a ResNet-34 encoder pretrained on ImageNet 6 , 7 . One model was trained to segment ovary parenchyma and follicle boundaries; the second was trained to segment the myometrium and endometrium. Two separate models were used in preference to a single multi-class model because ovarian and uterine structures are rarely present within the same image frame, and task-specific training was found to improve segmentation fidelity for each tissue class. Both models were deployed from a shared Amazon SageMaker endpoint using a routing switch that directed each image to the appropriate model based on its classification label. The two models were trained on a combined annotated dataset of 807 images: 399 ovary images for the ovary-follicle model and 408 uterus images for the uterus-endometrium model. Training employed a hybrid loss function combining binary cross-entropy and Dice losses, with an AdamW optimiser and a cosine annealing learning rate scheduler. Model performance was monitored by validation loss and Dice similarity coefficient on held-out data. Model evaluation and performance metrics Model performance was evaluated using standard metrics for medical image analysis 38 – 41 . Discrimination for classification and detection models was assessed using the area under the receiver operating characteristic curve (AUC), with sensitivity, specificity, and F1 scores calculated at optimal thresholds. Object detection accuracy was further quantified by average precision (AP) and mean average precision (mAP) at an intersection-over-union threshold of 0.5. For segmentation, agreement with expert annotations was measured using the Dice similarity coefficient. All metrics were computed using scikit-learn (v1.3) and PyTorch utilities, following current best-practice recommendations for validation in biomedical imaging 41 . Quantitative measurement extraction Segmented masks were post-processed to extract anatomical contours. Ellipse fitting was applied to each contour to estimate the primary and secondary axes and orientation angle, from which uterine length, uterine width, endometrial thickness, and maximum diameters of the left and right ovaries were derived. All measurements were converted from pixels to centimetres using the pixel spacing values embedded in the DICOM metadata of each source image. For ovary and uterus volume estimation, three orthogonal diameter measurements were obtained by pairing biplane images identified by the random forest classifier, with organ pairs matched between planes by spatial proximity and relative size. Volumes were calculated from the three diameters using the standard prolate ellipsoid formula. Concordance analysis of manual and automated measurements Agreement between automated pipeline-derived measurements and manual reference measurements was assessed in R for three clinically relevant parameters: endometrial thickness, ovary volume, and uterus volume, in a cohort of 100 patients who had undergone pelvic ultrasound examinations. Automated model-derived outputs were compared against two manual reference approaches: (i) the original manual-caliper measurements recorded during clinical scan review and (ii) measurements derived from manually drawn contours applied to the same images. Agreement between the two manual approaches was also quantified to provide a reference estimate of consistency between human-derived measurement methods. For endometrial thickness, comparisons used the original manual-caliper thickness values and the corresponding manually derived contour-based measurements. For ovarian volume, left and right ovaries were matched separately across datasets and then pooled to generate a combined ovary-volume analysis. For uterine volume, paired measurements were matched at the case level across the manual-caliper, manual-contour, and automated outputs. Automated measurements were extracted from segmentation-derived contour summaries and exported as structured tabular outputs prior to statistical comparison. Concordance was quantified using Pearson’s correlation coefficient (r) to assess linear association and Lin’s concordance correlation coefficient (CCC) to assess agreement relative to the line of identity, capturing both correlation and systematic bias between paired measurements. Discussion Reliable and standardised interpretation of pelvic ultrasound is fundamental to the diagnosis and management of a wide range of gynaecological conditions that collectively affect a substantial proportion of women across their reproductive lives. Yet pelvic ultrasound remains one of the most operator-dependent domains of diagnostic imaging, with downstream measurement and interpretation highly susceptible to inter-observer variability 8 , 11 , 12 . Evidence-based frameworks including IOTA, IETA, and MUSA, alongside structured reporting systems such as O-RADS, have advanced consistency in image interpretation and reduced subjectivity, particularly for the assessment of malignant pathology 16 , 30 , 31 . AI-driven approaches offer a complementary strategy to further enhance reproducibility and standardisation at scale, and represent an important direction for progress in women’s health imaging 32 , 33 . In this study, we present a comprehensive, modular AI pipeline for automated analysis of pelvic ultrasound DICOM images, designed to perform classification, localisation, segmentation, and measurement extraction in a generalisable and scalable manner. Unlike previous single-task approaches focused on specific pathologies 34 – 37 , our pipeline decomposes the complex task of ultrasound interpretation into a sequence of interoperable modules, each designed to address a distinct analytical challenge. This modular architecture improves interpretability through the extraction of discrete, clinically relevant features, enhances adaptability across imaging protocols and clinical contexts, and simplifies iterative refinement as clinical needs evolve. The models were trained on a diverse, unfiltered dataset of real-world pelvic ultrasound images sourced from Hertility Health, encompassing the broad spectrum of normal and pathological appearances encountered in women aged 20 to 60 years. Unlike research datasets enriched for single conditions or idealised acquisition quality, this dataset reflects the genuine complexity and heterogeneity of clinical practice, including anatomical variation, imaging artefacts, and suboptimal acquisition settings, and as such provides a demanding but ecologically valid training environment. Individual pipeline components demonstrated strong performance across all analytical stages. The classification model achieved high discriminative accuracy across five clinically relevant ultrasound views (AUC 0.90 to 1.00). The YOLOv5 object detection model, trained on 5,496 annotated images encompassing over 25,000 follicle and 6,800 ovary regions, showed strong follicle detection (AUC = 0.980) and good ovary localisation (AUC = 0.777). Segmentation of uterine and ovarian structures yielded robust Dice similarity coefficients of 0.899 and 0.856 for the two dedicated models, with parenchymal structures achieving validation scores of up to 0.918. These results are consistent with, and in many cases exceed, comparable published benchmarks for automated pelvic ultrasound analysis. End-to-end evaluation in a cohort of 100 patients demonstrated that the pipeline produces automated measurements of endometrial thickness, ovary volume, and uterus volume that correlate meaningfully with both manual-caliper and manual-contour reference standards. A key contextual finding is that agreement between the two manual reference methods was itself moderate for several parameters, with Pearson r values of 0.60 for ovary volume and 0.64 for endometrial thickness between caliper and contour approaches applied to the same images. This inter-method variability between human annotators is an important reference point: it reflects the inherent difficulty of defining and measuring structures in pelvic ultrasound, and illustrates that even experienced practitioners making measurements from identical images will not always agree. The reproductive tract presents particular challenges in this regard, with structures that vary substantially in size, echogenicity, and boundary definition across individuals and across the menstrual cycle, such that no single measurement approach is immune to subjectivity. Against this backdrop, the automated pipeline performed most strongly for endometrial thickness, where model agreement with manual-caliper measurements (r = 0.69, CCC = 0.53) approached the level of agreement between the two manual methods themselves. Ovary volume showed comparable concordance with manual-contour references (r = 0.64, CCC = 0.56) to that observed between the two human approaches, consistent with the segmentation-based nature of the automated method. Uterus volume, which requires accurate biplane identification, segmentation across orthogonal planes, ellipse fitting, and organ matching before a volume can be derived, was the most demanding parameter for end-to-end automation, with moderate agreement against both reference standards. Taken together, these patterns indicate that automated measurement quality is closely tied to the number and complexity of upstream processing steps, rather than to any single model’s performance in isolation. This technical advance addresses persistent challenges in women’s health imaging by reducing operator dependence and improving consistency and reproducibility of morphological measurement. Critically, by processing de-identified DICOM images from any acquisition site, the pipeline offers a scalable and hardware-agnostic solution that could extend standardised measurement capability to resource-limited settings and help mitigate inequities in access to high-quality gynaecological imaging. It also supports longitudinal monitoring of conditions such as uterine fibroids, endometriosis, and adnexal pathology that require consistent, measurement-based tracking over time 17 , 19 , 20 . By extracting structured phenotypic data at scale, the system further provides a foundation for digital biomarker discovery, genotype-phenotype mapping, and large-scale reproductive health research 32 , 33 . Several limitations merit consideration. The training dataset, though representative of real-world clinical imaging, primarily reflects women aged 20 to 60 years and may under-represent rare pathologies or post-menopausal anatomy. Performance remains contingent on input image quality, and the ellipsoidal volume approximation introduces a degree of geometric simplification that is shared with conventional caliper-based measurement but is not eliminated by automation. Prospective validation across broader demographic groups, hardware environments, and clinical workflows is needed, and ensuring equitable performance will require continued expansion of training data across age, ethnicity, and body habitus strata. In conclusion, this modular AI pipeline provides a reproducible and scalable foundation for automated morphological measurement in pelvic ultrasound, with performance that compares favourably with inter-method variability between human approaches. The findings demonstrate what can be achieved by applying carefully designed, task-specific models to a domain as complex and variable as women’s pelvic anatomy, and point clearly to the directions in which further development is most needed. By complementing established evidence-based frameworks with automated, operator-independent measurement, the pipeline advances the goal of standardised, high-quality imaging assessment as a cornerstone of women’s health care. Declarations Ethical considerations This work was undertaken as a partnership with Hertility Health and the Zinc Innovation Fellowship, funded with an NIHR grant. The resulting model will be used by Hertility Health, but there are no financial or non-financial competing interests. Competing Interests Lorna Brightmore, Natalie Getreu and Helen O’Neill are employees of Hertility Health Ltd. Helen O’Neill and Natalie Getreu are founders and employees of Hertility Health Ltd. . The resulting model will be used by Hertility Health, but there are no financial or non-financial competing interests to declare for any authors. Funding: No additional funding used for this work. Author Contribution The design of the framework and code and writing the manuscript was done by GA and LB, with GA also doing the data labelling, model training and validation. Review, advice and direction on the project, as well as review of the manuscript was provided by HON, JB, AT, SS, NG and NC. Acknowledgement This work was undertaken as a partnership with Hertility Health and the National Institute for Health Research (NIHR)/Zinc Innovation Fellowship. We also acknowledge the National Institute for Health Research (NIHR) Biomedical Research Centre based at Imperial College Healthcare NHS Trust and Imperial College London. The opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NHS or the NIHR or Zinc. Data Availability The datasets generated and/or analysed during the current study are not publicly available. Access to the data is restricted to authorised researchers under data sharing agreements and governed by ethical oversight. Requests for access to de-identified data for research purposes may be considered on a case-by-case basis and should be directed to the corresponding author and Hertility Health.The image data used in this study are not publicly available due to privacy concerns and study-specific data sharing agreements that prohibit further sharing. References He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in 770–778 (2016). Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32, (2019). Du, Y. PaddleOCR: An ultra-lightweight OCR system. (2020). Bradski, G. The OpenCV library. Dr. Dobb’s Journal of Software Tools (2000). Jocher, G. YOLOv5 by ultralytics. (2020). Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. in 234–241 (2015). Iglovikov, V. I. & Shvets, A. A. TernausNet: U-net with VGG11 encoder pre-trained on ImageNet for image segmentation. (2018). Epstein, E. & Valentin, L. Interobserver agreement in measurements of endometrial thickness and volume. Ultrasound Obstet Gynecol 20, 486–492 (2002). Van den Bosch, T. & Dueholm, M. Ultrasound for the diagnosis of adenomyosis: A systematic review and meta-analysis. Ultrasound Obstet Gynecol 53, 552–560 (2019). Alcazar, J. L. Transvaginal sonographic diagnosis of endometrial cancer in women with postmenopausal bleeding: A meta-analysis. Ultrasound Obstet Gynecol 30, 519–525 (2007). Kaijser, J., Van Belle, V., Jensen, A. & al., et. Observer agreement between ultrasound experts on features used to classify adnexal masses. Ultrasound Obstet Gynecol 44, 89–95 (2014). Timmerman, D., Schwarzler, P. & al., et. Subjective assessment of adnexal masses with the use of ultrasonography: An analysis of interobserver variability and experience. Ultrasound Obstet Gynecol 25, 110–114 (2005). Kaijser, J., Van Belle, V., Yazbek, J. & al., et. Agreement between expert ultrasound operators on the diagnosis of adnexal masses. Ultrasound Obstet Gynecol 41, 439–444 (2013). Salomon, L. J. & al., et. Recommendations and guidelines for training in obstetric ultrasound in europe. Ultrasound Obstet Gynecol 54, 815–817 (2019). European Federation of Societies for Ultrasound in Medicine and Biology. EFSUMB Position Paper: Recommendations for Basic Training Requirements in Ultrasound Practice . (2015). Van den Bosch, T., Dueholm, M., Leone, F. P. & al., et. Terms, definitions and measurements to describe sonographic features of the myometrium and uterine masses: A consensus opinion from the morphological uterus sonographic assessment (MUSA) group. Ultrasound Obstet Gynecol 46, 284–298 (2015). Baird, D. D., Dunson, D. B., Hill, M. C., Cousins, D. & Schectman, J. M. High cumulative incidence of uterine leiomyoma in black and white women: Ultrasound evidence. Am J Obstet Gynecol 188, 100–107 (2003). World Health Organization. Endometriosis Fact Sheet . (2023). Hudelist, G., Friedl, F., Traschler, J. & al., et. Diagnostic delay for endometriosis in austria and germany: Causes and possible consequences. Hum Reprod 27, 3412–3416 (2012). Giudice, L. C. Clinical practice. endometriosis. N Engl J Med 362, 2389–2398 (2010). Van Hanegem, N., Prins, M. M., Bongers, M. Y. & al., et. The accuracy of endometrial thickness measurement in the diagnosis of endometrial carcinoma in women with postmenopausal bleeding: A systematic review and meta-analysis. BJOG 123, 293–301 (2016). National Institute for Health and Care Excellence. Heavy Menstrual Bleeding: Assessment and Management . (2021). Denny, E. Women’s experiences of abnormal uterine bleeding and the help-seeking journey: A qualitative study. BJOG 121, 623–632 (2014). De Graaff, A. A., D’Hooghe, T. M., Dunselman, G. A. & al., et. The significant effect of endometriosis on physical, mental and social wellbeing: Results from an international cross-sectional survey. Hum Reprod 28, 2677–2685 (2013). Simoens, S., Dunselman, G., Dirkx, S. & al., et. The burden of endometriosis: Costs and quality of life of women with endometriosis and treated in referral centres. Hum Reprod 27, 1292–1299 (2012). Soliman, A. M., Yang, H., Du, E. X., Katz, N. P. & Castelli-Haley, J. The direct and indirect costs associated with endometriosis: A systematic literature review. Hum Reprod 32, 712–722 (2017). World Health Organization. Global Health Workforce Statistics Database . (2022). World Health Organization. Global Atlas of Medical Devices . (2017). International Atomic Energy Agency. Diagnostic Radiology Facilities Resources Database . (2014). Van den Bosch, T., Van Schoubroeck, D., Timmerman, D. & al., et. Terms, definitions and measurements to describe the sonographic features of the endometrium and intrauterine lesions: A consensus opinion from the international endometrial tumor analysis (IETA) group. Ultrasound in Obstetrics & Gynecology (2015) doi: 10.1002/uog.14806 . Andreotti, R. F. et al. O-RADS US risk stratification and management system: A consensus guideline from the ACR ovarian-adnexal reporting and data system committee. Radiology (2020) doi: 10.1148/radiol.2020191150 . Nunes, N. & al., et. The accuracy of artificial intelligence for the diagnosis of gynecological conditions: A systematic review. Hum Reprod Update 29, 60–73 (2023). Nunes, N. & Valentin, L. Artificial intelligence in gynecological ultrasound: Current state and future directions. Ultrasound Obstet Gynecol 58, 676–684 (2021). Liu, H. et al. Computer-aided diagnosis of ovarian tumors based on ultrasound images: A review. Computers in biology and medicine 109, 250–262 (2019). Grimpen, F. B. et al. Deep learning for ultrasound diagnosis of breast cancer: Systematic review and meta-analysis. Diagnostics 11, 2288 (2021). Ali, H. A. et al. Automated diagnosis of uterine fibroids in ultrasound images using deep learning techniques. Iraqi Journal for Electrical and Electronic Engineering 18, 76–84 (2022). Jafari, M., Karimi, N., Ayatollahi, A. & Johari, M. Computer-aided diagnosis of PCOS based on ultrasound image using CNN deep learning. Journal of medical systems 44, 1–11 (2020). Taha, A. A. & Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Medical Imaging 15, 29 (2015). Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10, e0118432 (2015). Chicco, D. & Jurman, G. The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020). Maier-Hein, L., Reinke, A., Kozubek, M., et al. Metrics reloaded: Recommendations for image analysis validation in biomedical imaging. Nature Methods 19, 375–394 (2022). Additional Declarations Competing interest reported. Lorna Brightmore, Natalie Getreu and Helen O’Neill are employees of Hertility Health Ltd. Helen O’Neill and Natalie Getreu are founders and employees of Hertility Health Ltd. . The resulting model will be used by Hertility Health, but there are no financial or non-financial competing interests to declare for any authors. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers invited by journal 28 Apr, 2026 Submission checks completed at journal 06 Mar, 2026 First submitted to journal 06 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7997734","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":631303453,"identity":"58178263-d3e5-4d12-b40b-37057bef72ef","order_by":0,"name":"George Adams","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABPElEQVRIie2RMWuDQBSA3yE4He160hB/QUERQgu2Gfs3ToRMkawZMgQEs5TMSob+BSHgfEUwQ4+6WtIh0LUBg6U4STVNhlKlHQv1g+PuPe7jvccDaGn507Dy0LGO9wGFi/2Npj8pKR8cFfI7BXlOeAybFfnGjnZvY5BPVg9mhsW4c76wo3QzIXA6Y6LkfldUHpmLDgfV41ZwhvEa954j06URAcKpKPk1ijvUBMkB5DMrEDAplWSoARXLxhIQpU2NcvdaKgX0/Xi7zLDyWCqjDGhBQG5QZII1tJuC4SeWL7mUVVUEMBwCSqXUNKbggSlAREwv2QYkZWY1iwbGnGCVG/ZlzfjyLAxRPtGv5rG1TI3iut9b2y8of9e73VV4/3RbU4UBCPiwhU8Ob9y0SLnMovxLitT9a2lpafnPfADMamw2Fnd+QQAAAABJRU5ErkJggg==","orcid":"","institution":"Imperial College London","correspondingAuthor":true,"prefix":"","firstName":"George","middleName":"","lastName":"Adams","suffix":""},{"id":631303459,"identity":"b3aa850e-9e30-4c9d-8fba-51e5404b1df1","order_by":1,"name":"Lorna Brightmore","email":"","orcid":"","institution":"Hertility Health","correspondingAuthor":false,"prefix":"","firstName":"Lorna","middleName":"","lastName":"Brightmore","suffix":""},{"id":631303460,"identity":"ea108a69-903b-4bf6-b14e-250d340f749e","order_by":2,"name":"Jennifer Barcroft","email":"","orcid":"","institution":"Imperial College London","correspondingAuthor":false,"prefix":"","firstName":"Jennifer","middleName":"","lastName":"Barcroft","suffix":""},{"id":631303462,"identity":"3412f232-e7d3-4471-88bf-4e209755088b","order_by":3,"name":"Nina Cooper","email":"","orcid":"","institution":"Imperial College London","correspondingAuthor":false,"prefix":"","firstName":"Nina","middleName":"","lastName":"Cooper","suffix":""},{"id":631303463,"identity":"e0b57b9b-1c16-44f4-896d-a38cb5cefa13","order_by":4,"name":"Adrian Timpson","email":"","orcid":"","institution":"Hertility Health","correspondingAuthor":false,"prefix":"","firstName":"Adrian","middleName":"","lastName":"Timpson","suffix":""},{"id":631303465,"identity":"9d536757-07d7-49ac-98d3-69501ecba989","order_by":5,"name":"Srdjan Saso","email":"","orcid":"","institution":"Imperial College London","correspondingAuthor":false,"prefix":"","firstName":"Srdjan","middleName":"","lastName":"Saso","suffix":""},{"id":631303467,"identity":"3be8898a-540f-4955-a86c-24435dfa97de","order_by":6,"name":"Natalie Getreu","email":"","orcid":"","institution":"Hertility Health","correspondingAuthor":false,"prefix":"","firstName":"Natalie","middleName":"","lastName":"Getreu","suffix":""},{"id":631303468,"identity":"a89414b1-ca10-4d17-8068-80cdb2d66c1a","order_by":7,"name":"Helen C O’Neill","email":"","orcid":"","institution":"Hertility Health","correspondingAuthor":false,"prefix":"","firstName":"Helen","middleName":"C","lastName":"O’Neill","suffix":""}],"badges":[],"createdAt":"2025-10-31 11:08:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7997734/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7997734/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108950613,"identity":"9bb39d1c-0b46-4911-9bef-46885c93eab0","added_by":"auto","created_at":"2026-05-11 07:06:39","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":52802,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFlowchart of the computer vision pipeline for automated pelvic ultrasound image analysis.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe pipeline ingests raw DICOM files and routes each image through a series of sequential analytical modules. In the first stage, a convolutional neural network classifies each image into one of five predefined categories: ovary, text, trans-abdominal uterus or [trans-vaginal] uterus view (transverse or sagittal, showed here as a single class for simplicity). Images classified as containing ovarian structures are then passed to a YOLOv5 object detection model, which localises ovary and follicle regions within the frame; this step is applied exclusively to ovary images to reduce the risk of misidentifying small pelvic fluid collections as follicles. In parallel, an optical character recognition module extracts laterality information (left or right ovary) from on-screen annotations, and a colour thresholding algorithm operating in HSV colour space detects the presence of Colour Doppler signals; images containing Doppler flow are flagged and excluded from downstream morphological analysis. A random forest biplane classifier identifies split-screen images in which the same ovary is captured in two orthogonal planes, enabling retrieval of three orthogonal diameter measurements required for volume estimation.\u003c/p\u003e\n\u003cp\u003eOvary and uterus images are subsequently processed by two dedicated semantic segmentation models deployed from the same endpoint via a routing switch. One model segments ovary parenchyma and follicle boundaries; the other segments the myometrium and endometrium. Separate models were adopted because ovarian and uterine structures rarely co-occur within the same image frame, and task-specific training improves segmentation fidelity for each tissue class. Segmentation contours are post-processed by ellipse fitting to extract primary and secondary axes, and pixel-to-centimetre scaling is derived from embedded DICOM metadata. The final output comprises structured tabular data containing image-level classifications, detected object coordinates, segmentation-derived contours, and quantitative morphological measurements including endometrial thickness, ovary volume, and uterus volume.\u003c/p\u003e","description":"","filename":"Picture1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7997734/v1/a7be5c943a1c0db748c715c7.jpg"},{"id":108950615,"identity":"8df95e0f-ff18-452c-bcf6-17cc3ba270c1","added_by":"auto","created_at":"2026-05-11 07:06:39","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":52964,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance metrics across the sub-models of the ultrasound analysis pipeline.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Confusion matrix from the image classification convolutional neural network. The model was validated on a held-out dataset of 1,000 images and classified each image into one of five categories: OVARY, SA-UTE (sagittal uterus), TR-UTE (transverse uterus), TRANSA (transabdominal view), and TEXT. True labels are shown on the y-axis and predicted labels on the x-axis. Classification performance was high across most categories. Misclassifications were most frequent between the SA-UTE and TR-UTE classes, with 43 sagittal images incorrectly predicted as transverse and 2 transverse images misclassified as sagittal. The TEXT class achieved near-perfect discrimination with a single misclassification from 164 examples. Class imbalance, most pronounced for the OVARY category (507 images), was addressed during training using class-specific loss weighting.\u003c/p\u003e\n\u003cp\u003e(B) Receiver operating characteristic (ROC) curves for each class from the image classification model. Area under the curve (AUC) values were: OVARY 0.98, SA-UTE 0.97, TR-UTE 0.90, TRANSA 0.98, and TEXT 1.00, demonstrating strong discriminative performance across all categories with a modest reduction for the TR-UTE class.\u003c/p\u003e\n\u003cp\u003e(C) ROC curve for the random forest biplane classifier. This model, trained to identify split-screen ovarian images in which the ovary is visualised in two orthogonal planes, was trained on 2,000 images and evaluated on an independent test set of 864 images, achieving an AUC of 0.96.\u003c/p\u003e\n\u003cp\u003e(D) Precision-recall curves for the YOLOv5 object detection model. The detector was trained on 5,496 annotated ovary images and evaluated on a held-out test set of 550 images containing 25,942 annotated follicle regions and 6,896 annotated ovary regions. Precision-recall AUC values were 0.980 for follicles and 0.777 for ovaries.\u003c/p\u003e\n\u003cp\u003e(E) Dice similarity coefficient (DSC) over training epochs for the two segmentation models. DSC trajectories are shown for both training and test partitions for the ovary-follicle model (OVA-FOL) and the uterus-endometrium model (UTE-END). By the final training epoch, the UTE-END model achieved a test DSC of 0.899 and the OVA-FOL model achieved a test DSC of 0.856. The two models were trained on a combined set of 807 annotated images: 408 uterus images for the UTE-END model and 399 ovary images for the OVA-FOL model.\u003c/p\u003e\n\u003cp\u003eMean validation DSC by tissue class from the two segmentation models. Uterus and ovary parenchyma achieved validation DSC values of 0.918 and 0.834, respectively. Endometrium and follicle classes yielded more moderate scores of 0.727 and 0.634, reflecting the greater morphological variability and smaller size of these structures.\u003c/p\u003e","description":"","filename":"Picture2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7997734/v1/2b76cd93a1b3e15a9f79ac6d.jpg"},{"id":108978311,"identity":"784252dd-9e32-4201-be20-616c4b79a0c2","added_by":"auto","created_at":"2026-05-11 11:36:09","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":111908,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRepresentative outputs of object detection, segmentation, and morphological measurement extraction from pelvic ultrasound images.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Bounding box and segmentation output for a transvaginal uterus image in transverse view. Model-predicted bounding boxes are superimposed on the raw ultrasound image: the uterus is enclosed by a blue box (indicated by a filled blue arrow) and the endometrium by an orange box (indicated by a filled orange arrow). Within each bounding box, segmentation contours are rendered as discrete points: magenta points delineate the uterine boundary (open magenta arrow) and orange points delineate the endometrial contour.\u003c/p\u003e\n\u003cp\u003e(B) Biplane ovary image shown across three sequential processing stages. Three vertically arranged sub-panels display the same split-screen ovary image at successive stages of analysis. The uppermost sub-panel (labelled “baseline” in the lower-left corner) shows the unprocessed ultrasound image with the original sonographer annotations and no model outputs. The middle sub-panel (labelled “with contours”) overlays model-derived segmentation contours for the ovary boundary (blue contour, open blue arrow) and individual follicles (yellow contours, open yellow arrows). The lowermost sub-panel (labelled “with ellipses”) displays the same segmentation contours with fitted ellipses superimposed for both the ovary and follicles; ellipse axes are indicated by bidirectional pale-yellow or white arrows representing the extracted morphological parameters used for volume estimation.\u003c/p\u003e\n\u003cp\u003e(C) Contour and ellipse fitting for a transvaginal uterus image in transverse view. Segmentation outputs are displayed for the uterus (magenta outline) and endometrium (cyan outline), with ellipses fitted to each structure and their major and minor axes indicated by bidirectional arrows. Open magenta and cyan arrows identify both the contours and their corresponding fitted ellipses.\u003c/p\u003e\n\u003cp\u003e(D) Contour and ellipse fitting for the same uterus shown in panel C, visualised in sagittal view. Segmentation and ellipse fitting are applied identically, demonstrating consistent model performance across orthogonal imaging planes for a morphologically complex anatomical structure.\u003c/p\u003e","description":"","filename":"Picture3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7997734/v1/baf3b097e4f35386c6f981e2.jpg"},{"id":108950616,"identity":"c2101f20-3304-403e-9c31-f4f96bb55f23","added_by":"auto","created_at":"2026-05-11 07:06:39","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1399339,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConcordance between manual-caliper, manual-contour, and model-derived measurements for endometrial thickness, ovary volume, and uterus volume in 100 patients.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eColumns correspond to endometrial thickness, ovary volume, and uterus volume. Rows 1 to 3 present pairwise scatter plots for manual-caliper versus manual-contour, manual-contour versus model, and manual-caliper versus model comparisons, respectively. Each scatter plot includes the line of identity (grey dashed line), a linear regression fit (orange line), and the Pearson correlation coefficient (r), Lin’s concordance correlation coefficient (CCC), and sample size (n). The bottom row summarises Pearson r and CCC values for each pairwise comparison as grouped bar plots. Endometrial thickness is reported in millimetres; ovary and uterus volumes are reported in cubic centimetres.\u003c/p\u003e\n\u003cp\u003eAgreement between manual-caliper and manual-contour measurements was strongest for uterus volume (r = 0.93, CCC = 0.75, n = 56), moderate for endometrial thickness (r = 0.64, CCC = 0.60, n = 69) and ovary volume (r = 0.60, CCC = 0.60, n = 55), establishing a reference estimate of inter-method variability between the two human-derived approaches. Model-derived measurements showed moderate concordance with manual-contour references for ovary volume (r = 0.64, CCC = 0.56, n = 33), with lower agreement for uterus volume (r = 0.52, CCC = 0.41, n = 24) and endometrial thickness (r = 0.58, CCC = 0.27, n = 67). When compared against manual-caliper measurements, the model achieved its strongest agreement for endometrial thickness (r = 0.69, CCC = 0.53, n = 87), with lower concordance for uterus volume (r = 0.47, CCC = 0.36, n = 37) and ovary volume (r = 0.36, CCC = 0.33, n = 38). Collectively, these findings indicate that model-derived endometrial measurements aligned more closely with manual-caliper than with manual-contour references, ovary volume showed better agreement with manual-contour than with manual-caliper measurements, and uterus volume displayed the highest consistency between the two manual reference methods, with automated estimates falling below the level of agreement observed between human annotators for all three parameters.\u003c/p\u003e","description":"","filename":"Picture4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7997734/v1/cf7ac5f85c106a996b8a9c24.jpg"},{"id":108979921,"identity":"09a1dbbd-6c3d-4d3a-80ca-91243663d4fd","added_by":"auto","created_at":"2026-05-11 12:02:19","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1863304,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7997734/v1/e11d80f9-ac09-45eb-b910-37b0157b2240.pdf"}],"financialInterests":"Competing interest reported. Lorna Brightmore, Natalie Getreu and Helen O’Neill are employees of Hertility Health Ltd. Helen O’Neill and Natalie Getreu are founders and employees of Hertility Health Ltd. . The resulting model will be used by Hertility Health, but there are no financial or non-financial competing interests to declare for any authors.","formattedTitle":"A modular deep learning pipeline for standardised analysis of pelvic ultrasound images for gynaecology","fulltext":[{"header":"Introduction","content":"\u003cp\u003eUltrasound is the cornerstone imaging modality for evaluating the female reproductive tract, playing a central role in the detection and monitoring of a wide spectrum of gynaecological conditions, from abnormal uterine bleeding to malignancy\u003csup\u003e\u003cspan additionalcitationids=\"CR9\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. The female pelvis presents a uniquely demanding imaging environment: the reproductive organs are anatomically complex, highly variable between individuals, and change substantially in size, morphology, and echogenicity across the menstrual cycle and with age. These properties make pelvic ultrasound one of the most technically and interpretatively challenging domains of diagnostic imaging, and despite its widespread clinical use, image interpretation remains highly operator-dependent, resulting in substantial inter-observer variability\u003csup\u003e\u003cspan additionalcitationids=\"CR12 CR13 CR14 CR15\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. This variability undermines diagnostic reproducibility and accuracy, contributing to delayed or missed diagnoses of prevalent conditions such as uterine fibroids and endometriosis\u003csup\u003e\u003cspan additionalcitationids=\"CR18 CR19 CR20 CR21 CR22\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. The clinical consequences are significant, encompassing diminished quality of life, increased healthcare costs, and diagnostic inequities across patient populations\u003csup\u003e\u003cspan additionalcitationids=\"CR25\" citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThese challenges are compounded by a global shortage of trained sonographers, which limits access to timely and high-quality imaging, particularly in resource-constrained settings\u003csup\u003e\u003cspan additionalcitationids=\"CR28\" citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. Evidence-based reporting frameworks including IOTA, IETA, MUSA, and O-RADS have substantially improved standardisation and consistency in ultrasound interpretation, particularly for the characterisation of malignant pathology\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e,\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. However, even within these frameworks, the subjective placement of calipers and the manual delineation of anatomical boundaries introduce measurement variability that is inherent to the imaging domain itself, and that persists even among experienced practitioners applying the same methodology to the same images.\u003c/p\u003e \u003cp\u003eArtificial intelligence offers a compelling opportunity to complement these frameworks by enabling automated, reproducible extraction of clinically relevant morphological information from ultrasound DICOM images at scale\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. By removing dependence on individual operator judgement for discrete measurement tasks, AI-driven pipelines can improve consistency, facilitate structured data collection, and extend standardised analysis to settings where specialist expertise is limited. Realising this potential in women\u0026rsquo;s health imaging, however, requires solutions that are explicitly designed for the complexity of pelvic anatomy: capturing the structural diversity of the uterus, ovaries, and endometrium across individuals and physiological states demands models that are robust to high intra- and inter-individual variability, ambiguous tissue boundaries, and the full range of acquisition conditions encountered in routine clinical practice.\u003c/p\u003e \u003cp\u003eHere, we present a comprehensive, modular AI pipeline for automated analysis of pelvic ultrasound DICOM images, designed to perform image classification, anatomical localisation, semantic segmentation, and quantitative measurement extraction in a generalisable and scalable manner. In contrast to prior end-to-end approaches focused on specific pathologies such as ovarian tumours, uterine fibroids, or polycystic ovary syndrome\u003csup\u003e\u003cspan additionalcitationids=\"CR35 CR36\" citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e, our pipeline decomposes the complex task of ultrasound interpretation into a sequence of purpose-trained, interoperable modules. We evaluate the pipeline both at the level of individual components and in an end-to-end concordance analysis against manual reference measurements in 100 patients, providing a realistic assessment of what automated analysis can achieve when applied to the full complexity of real-world pelvic ultrasound data. The findings offer both a practical demonstration of current capabilities and a clear characterisation of the challenges that remain in bringing fully automated reproductive imaging to clinical practice.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eData\u003c/h2\u003e \u003cp\u003eThis was a retrospective study of anonymised pelvic ultrasound DICOM images and associated demographic data, collected from individuals who completed an online health assessment and underwent diagnostic imaging through Hertility Health Ltd. At the time of participation, all individuals provided informed consent for their health and imaging data to be used in anonymised form for research purposes. Consent was obtained electronically via an opt-in process embedded within the health assessment, with clear information provided regarding data usage, privacy, and confidentiality.\u003c/p\u003e \u003cp\u003eAll DICOM files and associated metadata were fully anonymised prior to access by the research team. No identifiable information, including patient names, dates of birth, or imaging timestamps, was retained in the dataset, and no re-identification was possible at any stage of analysis. The research protocol underpinning the online health assessment, testing pathway, and data governance processes was reviewed and approved by the London-Surrey Research Ethics Committee, part of the UK National Health Service (NHS) Health Research Authority (REC reference: 20/LO/0265).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePipeline overview\u003c/h3\u003e\n\u003cp\u003eWe developed an end-to-end computational pipeline to extract structured quantitative information from raw pelvic ultrasound DICOM images acquired for women\u0026rsquo;s health assessments. The pipeline comprises five main stages: image content categorisation, text and Colour Doppler detection, biplane image identification, anatomical object localisation, and segmentation-based measurement extraction. The primary clinical outputs of the pipeline are endometrial thickness, ovary volume, and uterus volume.\u003c/p\u003e\n\u003ch3\u003eImage categorisation\u003c/h3\u003e\n\u003cp\u003eEach raw ultrasound image was preprocessed and resized to a fixed dimension of 150 x 150 pixels with three colour channels. To categorise image content, we trained a convolutional neural network (CNN) using PyTorch\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. The model used a ResNet-50 backbone pretrained on ImageNet\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e, with the original fully connected layer replaced by a custom multi-layer head incorporating a linear layer, ReLU activation, and dropout regularisation to improve generalisation. The network classified images into five predefined categories: ovary, embedded text, trans-abdominal uterus, trans-vaginal uterus (transverse view), and trans-vaginal uterus (sagittal view).\u003c/p\u003e \u003cp\u003eTraining data were augmented using random rotations, flips, and affine transformations (probability\u0026thinsp;=\u0026thinsp;0.5). Class imbalance was addressed using class-specific weighting within the cross-entropy loss function. An Adam optimiser with weight decay was used to minimise training loss.\u003c/p\u003e\n\u003ch3\u003eText and Doppler detection\u003c/h3\u003e\n\u003cp\u003eImages classified as containing ovarian structures were further analysed to determine laterality and the presence of Colour Doppler signals. To extract text indicating left or right ovary, an optical character recognition (OCR) system was implemented using the PaddleOCR toolkit\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Doppler detection was achieved by thresholding characteristic red and blue pixel intensities in HSV colour space using OpenCV\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Images in which Colour Doppler flow was detected were flagged and excluded from downstream morphological analysis, as Doppler overlays corrupt the greyscale pixel intensities used by the segmentation models. Both laterality and Doppler status were recorded as image-level metadata and propagated through the remainder of the pipeline.\u003c/p\u003e\n\u003ch3\u003eBiplane image classification\u003c/h3\u003e\n\u003cp\u003eEstimation of ovary and uterus volumes requires three orthogonal diameter measurements: height, width, and depth. To obtain these, split-screen images capturing the same organ in two orthogonal planes (biplane images) were identified automatically using a random forest classifier trained on 2,000 annotated images. Candidate organ pairs across the two planes were matched by spatial proximity and relative size. This step was applied to both ovary and uterus image categories to enable volumetric estimation downstream.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eObject detection\u003c/h2\u003e \u003cp\u003eTo localise ovarian structures, we trained a YOLOv5 object detection network\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e applied exclusively to images classified as containing ovarian tissue. Restricting object detection to this image subset was a deliberate design choice to prevent misclassification of small pelvic fluid collections as follicles, which was observed when the detector was applied more broadly. The model was trained to detect two object classes: ovary regions and follicles. Training data comprised 5,496 annotated images containing 25,942 annotated follicle regions and 6,896 annotated ovary regions. Training images and corresponding bounding boxes were augmented with random horizontal and vertical flips, affine transformations, and HSV colour shifts to enhance robustness. The YOLOv5 implementation and training pipeline followed the open-source Ultralytics repository.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eImage segmentation\u003c/h3\u003e\n\u003cp\u003eDetected ovarian regions and all uterus images were processed by two dedicated semantic segmentation models, each implemented in PyTorch using a U-Net architecture with a ResNet-34 encoder pretrained on ImageNet\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. One model was trained to segment ovary parenchyma and follicle boundaries; the second was trained to segment the myometrium and endometrium. Two separate models were used in preference to a single multi-class model because ovarian and uterine structures are rarely present within the same image frame, and task-specific training was found to improve segmentation fidelity for each tissue class. Both models were deployed from a shared Amazon SageMaker endpoint using a routing switch that directed each image to the appropriate model based on its classification label.\u003c/p\u003e \u003cp\u003eThe two models were trained on a combined annotated dataset of 807 images: 399 ovary images for the ovary-follicle model and 408 uterus images for the uterus-endometrium model. Training employed a hybrid loss function combining binary cross-entropy and Dice losses, with an AdamW optimiser and a cosine annealing learning rate scheduler. Model performance was monitored by validation loss and Dice similarity coefficient on held-out data.\u003c/p\u003e\n\u003ch3\u003eModel evaluation and performance metrics\u003c/h3\u003e\n\u003cp\u003eModel performance was evaluated using standard metrics for medical image analysis\u003csup\u003e\u003cspan additionalcitationids=\"CR39 CR40\" citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e. Discrimination for classification and detection models was assessed using the area under the receiver operating characteristic curve (AUC), with sensitivity, specificity, and F1 scores calculated at optimal thresholds. Object detection accuracy was further quantified by average precision (AP) and mean average precision (mAP) at an intersection-over-union threshold of 0.5. For segmentation, agreement with expert annotations was measured using the Dice similarity coefficient. All metrics were computed using scikit-learn (v1.3) and PyTorch utilities, following current best-practice recommendations for validation in biomedical imaging\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eQuantitative measurement extraction\u003c/h2\u003e \u003cp\u003eSegmented masks were post-processed to extract anatomical contours. Ellipse fitting was applied to each contour to estimate the primary and secondary axes and orientation angle, from which uterine length, uterine width, endometrial thickness, and maximum diameters of the left and right ovaries were derived. All measurements were converted from pixels to centimetres using the pixel spacing values embedded in the DICOM metadata of each source image. For ovary and uterus volume estimation, three orthogonal diameter measurements were obtained by pairing biplane images identified by the random forest classifier, with organ pairs matched between planes by spatial proximity and relative size. Volumes were calculated from the three diameters using the standard prolate ellipsoid formula.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eConcordance analysis of manual and automated measurements\u003c/h2\u003e \u003cp\u003eAgreement between automated pipeline-derived measurements and manual reference measurements was assessed in R for three clinically relevant parameters: endometrial thickness, ovary volume, and uterus volume, in a cohort of 100 patients who had undergone pelvic ultrasound examinations. Automated model-derived outputs were compared against two manual reference approaches: (i) the original manual-caliper measurements recorded during clinical scan review and (ii) measurements derived from manually drawn contours applied to the same images. Agreement between the two manual approaches was also quantified to provide a reference estimate of consistency between human-derived measurement methods.\u003c/p\u003e \u003cp\u003eFor endometrial thickness, comparisons used the original manual-caliper thickness values and the corresponding manually derived contour-based measurements. For ovarian volume, left and right ovaries were matched separately across datasets and then pooled to generate a combined ovary-volume analysis. For uterine volume, paired measurements were matched at the case level across the manual-caliper, manual-contour, and automated outputs. Automated measurements were extracted from segmentation-derived contour summaries and exported as structured tabular outputs prior to statistical comparison.\u003c/p\u003e \u003cp\u003eConcordance was quantified using Pearson\u0026rsquo;s correlation coefficient (r) to assess linear association and Lin\u0026rsquo;s concordance correlation coefficient (CCC) to assess agreement relative to the line of identity, capturing both correlation and systematic bias between paired measurements.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eReliable and standardised interpretation of pelvic ultrasound is fundamental to the diagnosis and management of a wide range of gynaecological conditions that collectively affect a substantial proportion of women across their reproductive lives. Yet pelvic ultrasound remains one of the most operator-dependent domains of diagnostic imaging, with downstream measurement and interpretation highly susceptible to inter-observer variability\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e,\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Evidence-based frameworks including IOTA, IETA, and MUSA, alongside structured reporting systems such as O-RADS, have advanced consistency in image interpretation and reduced subjectivity, particularly for the assessment of malignant pathology\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e,\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. AI-driven approaches offer a complementary strategy to further enhance reproducibility and standardisation at scale, and represent an important direction for progress in women\u0026rsquo;s health imaging\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn this study, we present a comprehensive, modular AI pipeline for automated analysis of pelvic ultrasound DICOM images, designed to perform classification, localisation, segmentation, and measurement extraction in a generalisable and scalable manner. Unlike previous single-task approaches focused on specific pathologies\u003csup\u003e\u003cspan additionalcitationids=\"CR35 CR36\" citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e, our pipeline decomposes the complex task of ultrasound interpretation into a sequence of interoperable modules, each designed to address a distinct analytical challenge. This modular architecture improves interpretability through the extraction of discrete, clinically relevant features, enhances adaptability across imaging protocols and clinical contexts, and simplifies iterative refinement as clinical needs evolve.\u003c/p\u003e \u003cp\u003eThe models were trained on a diverse, unfiltered dataset of real-world pelvic ultrasound images sourced from Hertility Health, encompassing the broad spectrum of normal and pathological appearances encountered in women aged 20 to 60 years. Unlike research datasets enriched for single conditions or idealised acquisition quality, this dataset reflects the genuine complexity and heterogeneity of clinical practice, including anatomical variation, imaging artefacts, and suboptimal acquisition settings, and as such provides a demanding but ecologically valid training environment.\u003c/p\u003e \u003cp\u003eIndividual pipeline components demonstrated strong performance across all analytical stages. The classification model achieved high discriminative accuracy across five clinically relevant ultrasound views (AUC 0.90 to 1.00). The YOLOv5 object detection model, trained on 5,496 annotated images encompassing over 25,000 follicle and 6,800 ovary regions, showed strong follicle detection (AUC\u0026thinsp;=\u0026thinsp;0.980) and good ovary localisation (AUC\u0026thinsp;=\u0026thinsp;0.777). Segmentation of uterine and ovarian structures yielded robust Dice similarity coefficients of 0.899 and 0.856 for the two dedicated models, with parenchymal structures achieving validation scores of up to 0.918. These results are consistent with, and in many cases exceed, comparable published benchmarks for automated pelvic ultrasound analysis.\u003c/p\u003e \u003cp\u003eEnd-to-end evaluation in a cohort of 100 patients demonstrated that the pipeline produces automated measurements of endometrial thickness, ovary volume, and uterus volume that correlate meaningfully with both manual-caliper and manual-contour reference standards. A key contextual finding is that agreement between the two manual reference methods was itself moderate for several parameters, with Pearson r values of 0.60 for ovary volume and 0.64 for endometrial thickness between caliper and contour approaches applied to the same images. This inter-method variability between human annotators is an important reference point: it reflects the inherent difficulty of defining and measuring structures in pelvic ultrasound, and illustrates that even experienced practitioners making measurements from identical images will not always agree. The reproductive tract presents particular challenges in this regard, with structures that vary substantially in size, echogenicity, and boundary definition across individuals and across the menstrual cycle, such that no single measurement approach is immune to subjectivity.\u003c/p\u003e \u003cp\u003eAgainst this backdrop, the automated pipeline performed most strongly for endometrial thickness, where model agreement with manual-caliper measurements (r\u0026thinsp;=\u0026thinsp;0.69, CCC\u0026thinsp;=\u0026thinsp;0.53) approached the level of agreement between the two manual methods themselves. Ovary volume showed comparable concordance with manual-contour references (r\u0026thinsp;=\u0026thinsp;0.64, CCC\u0026thinsp;=\u0026thinsp;0.56) to that observed between the two human approaches, consistent with the segmentation-based nature of the automated method. Uterus volume, which requires accurate biplane identification, segmentation across orthogonal planes, ellipse fitting, and organ matching before a volume can be derived, was the most demanding parameter for end-to-end automation, with moderate agreement against both reference standards. Taken together, these patterns indicate that automated measurement quality is closely tied to the number and complexity of upstream processing steps, rather than to any single model\u0026rsquo;s performance in isolation.\u003c/p\u003e \u003cp\u003eThis technical advance addresses persistent challenges in women\u0026rsquo;s health imaging by reducing operator dependence and improving consistency and reproducibility of morphological measurement. Critically, by processing de-identified DICOM images from any acquisition site, the pipeline offers a scalable and hardware-agnostic solution that could extend standardised measurement capability to resource-limited settings and help mitigate inequities in access to high-quality gynaecological imaging. It also supports longitudinal monitoring of conditions such as uterine fibroids, endometriosis, and adnexal pathology that require consistent, measurement-based tracking over time\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e,\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e. By extracting structured phenotypic data at scale, the system further provides a foundation for digital biomarker discovery, genotype-phenotype mapping, and large-scale reproductive health research\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eSeveral limitations merit consideration. The training dataset, though representative of real-world clinical imaging, primarily reflects women aged 20 to 60 years and may under-represent rare pathologies or post-menopausal anatomy. Performance remains contingent on input image quality, and the ellipsoidal volume approximation introduces a degree of geometric simplification that is shared with conventional caliper-based measurement but is not eliminated by automation. Prospective validation across broader demographic groups, hardware environments, and clinical workflows is needed, and ensuring equitable performance will require continued expansion of training data across age, ethnicity, and body habitus strata.\u003c/p\u003e \u003cp\u003eIn conclusion, this modular AI pipeline provides a reproducible and scalable foundation for automated morphological measurement in pelvic ultrasound, with performance that compares favourably with inter-method variability between human approaches. The findings demonstrate what can be achieved by applying carefully designed, task-specific models to a domain as complex and variable as women\u0026rsquo;s pelvic anatomy, and point clearly to the directions in which further development is most needed. By complementing established evidence-based frameworks with automated, operator-independent measurement, the pipeline advances the goal of standardised, high-quality imaging assessment as a cornerstone of women\u0026rsquo;s health care.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003eEthical considerations\u003c/p\u003e\n\u003cp\u003eThis work was undertaken as a partnership with Hertility Health and the Zinc Innovation Fellowship, funded with an NIHR grant. The resulting model will be used by Hertility Health, but there are no financial or non-financial competing interests.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003cp\u003eLorna Brightmore, Natalie Getreu and Helen O\u0026rsquo;Neill are employees of Hertility Health Ltd. Helen O\u0026rsquo;Neill and Natalie Getreu are founders and employees of Hertility Health Ltd. . The resulting model will be used by Hertility Health, but there are no financial or non-financial competing interests to declare for any authors.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eFunding:\u003c/h2\u003e \u003cp\u003e \u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eNo additional funding used for this work.\u003c/span\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eThe design of the framework and code and writing the manuscript was done by GA and LB, with GA also doing the data labelling, model training and validation. Review, advice and direction on the project, as well as review of the manuscript was provided by HON, JB, AT, SS, NG and NC.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThis work was undertaken as a partnership with Hertility Health and the National Institute for Health Research (NIHR)/Zinc Innovation Fellowship. We also acknowledge the National Institute for Health Research (NIHR) Biomedical Research Centre based at Imperial College Healthcare NHS Trust and Imperial College London. The opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NHS or the NIHR or Zinc.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and/or analysed during the current study are not publicly available. Access to the data is restricted to authorised researchers under data sharing agreements and governed by ethical oversight. Requests for access to de-identified data for research purposes may be considered on a case-by-case basis and should be directed to the corresponding author and Hertility Health.The image data used in this study are not publicly available due to privacy concerns and study-specific data sharing agreements that prohibit further sharing.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eHe, K., Zhang, X., Ren, S. \u0026amp; Sun, J. Deep residual learning for image recognition. in 770\u0026ndash;778 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaszke, A. \u003cem\u003eet al.\u003c/em\u003e PyTorch: An imperative style, high-performance deep learning library. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e 32, (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDu, Y. PaddleOCR: An ultra-lightweight OCR system. (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBradski, G. The OpenCV library. \u003cem\u003eDr. Dobb\u0026rsquo;s Journal of Software Tools\u003c/em\u003e (2000).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJocher, G. YOLOv5 by ultralytics. (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRonneberger, O., Fischer, P. \u0026amp; Brox, T. U-net: Convolutional networks for biomedical image segmentation. in 234\u0026ndash;241 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIglovikov, V. I. \u0026amp; Shvets, A. A. TernausNet: U-net with VGG11 encoder pre-trained on ImageNet for image segmentation. (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEpstein, E. \u0026amp; Valentin, L. Interobserver agreement in measurements of endometrial thickness and volume. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 20, 486\u0026ndash;492 (2002).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan den Bosch, T. \u0026amp; Dueholm, M. Ultrasound for the diagnosis of adenomyosis: A systematic review and meta-analysis. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 53, 552\u0026ndash;560 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlcazar, J. L. Transvaginal sonographic diagnosis of endometrial cancer in women with postmenopausal bleeding: A meta-analysis. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 30, 519\u0026ndash;525 (2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaijser, J., Van Belle, V., Jensen, A. \u0026amp; al., et. Observer agreement between ultrasound experts on features used to classify adnexal masses. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 44, 89\u0026ndash;95 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTimmerman, D., Schwarzler, P. \u0026amp; al., et. Subjective assessment of adnexal masses with the use of ultrasonography: An analysis of interobserver variability and experience. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 25, 110\u0026ndash;114 (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaijser, J., Van Belle, V., Yazbek, J. \u0026amp; al., et. Agreement between expert ultrasound operators on the diagnosis of adnexal masses. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 41, 439\u0026ndash;444 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSalomon, L. J. \u0026amp; al., et. Recommendations and guidelines for training in obstetric ultrasound in europe. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 54, 815\u0026ndash;817 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEuropean Federation of Societies for Ultrasound in Medicine and Biology. \u003cem\u003eEFSUMB Position Paper: Recommendations for Basic Training Requirements in Ultrasound Practice\u003c/em\u003e. (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan den Bosch, T., Dueholm, M., Leone, F. P. \u0026amp; al., et. Terms, definitions and measurements to describe sonographic features of the myometrium and uterine masses: A consensus opinion from the morphological uterus sonographic assessment (MUSA) group. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 46, 284\u0026ndash;298 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBaird, D. D., Dunson, D. B., Hill, M. C., Cousins, D. \u0026amp; Schectman, J. M. High cumulative incidence of uterine leiomyoma in black and white women: Ultrasound evidence. \u003cem\u003eAm J Obstet Gynecol\u003c/em\u003e 188, 100\u0026ndash;107 (2003).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWorld Health Organization. \u003cem\u003eEndometriosis Fact Sheet\u003c/em\u003e. (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHudelist, G., Friedl, F., Traschler, J. \u0026amp; al., et. Diagnostic delay for endometriosis in austria and germany: Causes and possible consequences. \u003cem\u003eHum Reprod\u003c/em\u003e 27, 3412\u0026ndash;3416 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGiudice, L. C. Clinical practice. endometriosis. \u003cem\u003eN Engl J Med\u003c/em\u003e 362, 2389\u0026ndash;2398 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Hanegem, N., Prins, M. M., Bongers, M. Y. \u0026amp; al., et. The accuracy of endometrial thickness measurement in the diagnosis of endometrial carcinoma in women with postmenopausal bleeding: A systematic review and meta-analysis. \u003cem\u003eBJOG\u003c/em\u003e 123, 293\u0026ndash;301 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNational Institute for Health and Care Excellence. \u003cem\u003eHeavy Menstrual Bleeding: Assessment and Management\u003c/em\u003e. (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDenny, E. Women\u0026rsquo;s experiences of abnormal uterine bleeding and the help-seeking journey: A qualitative study. \u003cem\u003eBJOG\u003c/em\u003e 121, 623\u0026ndash;632 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDe Graaff, A. A., D\u0026rsquo;Hooghe, T. M., Dunselman, G. A. \u0026amp; al., et. The significant effect of endometriosis on physical, mental and social wellbeing: Results from an international cross-sectional survey. \u003cem\u003eHum Reprod\u003c/em\u003e 28, 2677\u0026ndash;2685 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSimoens, S., Dunselman, G., Dirkx, S. \u0026amp; al., et. The burden of endometriosis: Costs and quality of life of women with endometriosis and treated in referral centres. \u003cem\u003eHum Reprod\u003c/em\u003e 27, 1292\u0026ndash;1299 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSoliman, A. M., Yang, H., Du, E. X., Katz, N. P. \u0026amp; Castelli-Haley, J. The direct and indirect costs associated with endometriosis: A systematic literature review. \u003cem\u003eHum Reprod\u003c/em\u003e 32, 712\u0026ndash;722 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWorld Health Organization. \u003cem\u003eGlobal Health Workforce Statistics Database\u003c/em\u003e. (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWorld Health Organization. \u003cem\u003eGlobal Atlas of Medical Devices\u003c/em\u003e. (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInternational Atomic Energy Agency. \u003cem\u003eDiagnostic Radiology Facilities Resources Database\u003c/em\u003e. (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan den Bosch, T., Van Schoubroeck, D., Timmerman, D. \u0026amp; al., et. Terms, definitions and measurements to describe the sonographic features of the endometrium and intrauterine lesions: A consensus opinion from the international endometrial tumor analysis (IETA) group. \u003cem\u003eUltrasound in Obstetrics \u0026amp; Gynecology\u003c/em\u003e (2015) doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/uog.14806\u003c/span\u003e\u003cspan address=\"10.1002/uog.14806\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAndreotti, R. F. \u003cem\u003eet al.\u003c/em\u003e O-RADS US risk stratification and management system: A consensus guideline from the ACR ovarian-adnexal reporting and data system committee. \u003cem\u003eRadiology\u003c/em\u003e (2020) doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1148/radiol.2020191150\u003c/span\u003e\u003cspan address=\"10.1148/radiol.2020191150\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNunes, N. \u0026amp; al., et. The accuracy of artificial intelligence for the diagnosis of gynecological conditions: A systematic review. \u003cem\u003eHum Reprod Update\u003c/em\u003e 29, 60\u0026ndash;73 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNunes, N. \u0026amp; Valentin, L. Artificial intelligence in gynecological ultrasound: Current state and future directions. \u003cem\u003eUltrasound Obstet Gynecol\u003c/em\u003e 58, 676\u0026ndash;684 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, H. \u003cem\u003eet al.\u003c/em\u003e Computer-aided diagnosis of ovarian tumors based on ultrasound images: A review. \u003cem\u003eComputers in biology and medicine\u003c/em\u003e 109, 250\u0026ndash;262 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrimpen, F. B. \u003cem\u003eet al.\u003c/em\u003e Deep learning for ultrasound diagnosis of breast cancer: Systematic review and meta-analysis. \u003cem\u003eDiagnostics\u003c/em\u003e 11, 2288 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAli, H. A. \u003cem\u003eet al.\u003c/em\u003e Automated diagnosis of uterine fibroids in ultrasound images using deep learning techniques. \u003cem\u003eIraqi Journal for Electrical and Electronic Engineering\u003c/em\u003e 18, 76\u0026ndash;84 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJafari, M., Karimi, N., Ayatollahi, A. \u0026amp; Johari, M. Computer-aided diagnosis of PCOS based on ultrasound image using CNN deep learning. \u003cem\u003eJournal of medical systems\u003c/em\u003e 44, 1\u0026ndash;11 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTaha, A. A. \u0026amp; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. \u003cem\u003eBMC Medical Imaging\u003c/em\u003e 15, 29 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaito, T. \u0026amp; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. \u003cem\u003ePLOS ONE\u003c/em\u003e 10, e0118432 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChicco, D. \u0026amp; Jurman, G. The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. \u003cem\u003eBMC Genomics\u003c/em\u003e 21, 6 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaier-Hein, L., Reinke, A., Kozubek, M., \u003cem\u003eet al.\u003c/em\u003e Metrics reloaded: Recommendations for image analysis validation in biomedical imaging. \u003cem\u003eNature Methods\u003c/em\u003e 19, 375\u0026ndash;394 (2022).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"npj-womens-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [npj Women's Health](https://www.nature.com/npjwomenshealth/)","snPcode":"44294","submissionUrl":"https://submission.springernature.com/new-submission/44294/3","title":"npj Women's Health","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7997734/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7997734/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003ePelvic ultrasound is the principal imaging modality for evaluating the female reproductive tract, yet its clinical utility is constrained by substantial inter-observer variability and dependence on operator expertise. The anatomical complexity and physiological variability of the female pelvis present a uniquely demanding environment for automated image analysis: the uterus, ovaries, and endometrium differ markedly between individuals and change continuously across the menstrual cycle, making consistent, reproducible measurement a persistent challenge even for experienced practitioners.\u003c/p\u003e \u003cp\u003eTo address these limitations, we present a modular artificial intelligence pipeline for automated extraction of quantitative morphological measurements from pelvic ultrasound DICOM images. The pipeline comprises five sequential analytical stages. A ResNet-50 convolutional neural network, trained on 7,721 images across five anatomical categories, classifies each image frame with high discriminative accuracy (AUC 0.90 to 1.00). Complementary modules using optical character recognition and HSV-based colour thresholding determine ovarian laterality and detect Colour Doppler signals, while a random forest classifier identifies biplane split-screen views required for volumetric estimation (AUC\u0026thinsp;=\u0026thinsp;0.96). A YOLOv5 object detection model, trained on 5,496 annotated images encompassing 25,942 follicle and 6,896 ovary regions, localises ovarian structures exclusively within ovary-classified frames to reduce false-positive detection of pelvic fluid collections. Two dedicated U-Net segmentation models, trained on 807 expert-annotated images, delineate ovarian and uterine structures independently, achieving Dice similarity coefficients of 0.856 and 0.899 respectively. Ellipse fitting applied to segmentation contours, calibrated using DICOM pixel spacing metadata, enables standardised extraction of endometrial thickness, ovary volume, and uterus volume.\u003c/p\u003e \u003cp\u003eEnd-to-end evaluation in 100 patients demonstrated that automated measurements correlate meaningfully with both manual-caliper and manual-contour reference standards. Importantly, agreement between the two manual reference methods was itself moderate for several parameters, illustrating the inherent subjectivity of morphological measurement in pelvic ultrasound and providing essential context for interpreting automated performance. Automated endometrial thickness approached the level of agreement observed between the two manual methods, whilst ovarian volume concordance was comparable to inter-method human variability for that parameter.\u003c/p\u003e \u003cp\u003eThis pipeline provides a reproducible, scalable foundation for automated gynaecological imaging assessment. By decoupling morphological measurement from individual operator judgement and enabling structured data extraction at scale, it advances the goal of consistent, accessible, and high-quality imaging analysis as a cornerstone of women\u0026rsquo;s health care.\u003c/p\u003e","manuscriptTitle":"A modular deep learning pipeline for standardised analysis of pelvic ultrasound images for gynaecology","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-11 07:06:34","doi":"10.21203/rs.3.rs-7997734/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewersInvited","content":"","date":"2026-04-28T16:00:53+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-07T01:29:54+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Women's Health","date":"2026-03-07T01:26:00+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"npj-womens-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [npj Women's Health](https://www.nature.com/npjwomenshealth/)","snPcode":"44294","submissionUrl":"https://submission.springernature.com/new-submission/44294/3","title":"npj Women's Health","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"3cca6c63-cbc7-4cba-af4d-778bd7cf7c78","owner":[],"postedDate":"May 11th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":67193642,"name":"Health sciences/Anatomy"},{"id":67193643,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":67193644,"name":"Physical sciences/Engineering"},{"id":67193645,"name":"Health sciences/Health care"},{"id":67193646,"name":"Physical sciences/Mathematics and computing"},{"id":67193647,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-05-11T07:06:35+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-11 07:06:34","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7997734","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7997734","identity":"rs-7997734","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Outcome instruments

MUSA

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-06-18T06:35:04.414907+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0