Automated Diagnosis of Developmental Dysplasia of the Hip in Ultrasound Using an End- to-End Deep Learning Approach | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Automated Diagnosis of Developmental Dysplasia of the Hip in Ultrasound Using an End- to-End Deep Learning Approach Łukasz Pulik, Jadwiga Kaliszewska, Bartłomiej Mulewicz, Maciej Pykosz, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9439534/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Ultrasound-based diagnosis of developmental dysplasia of the hip (DDH) according to Graf’s method is operator-dependent, with variability arising from image acquisition, frame selection, and landmark identification. Building on our previous study, we developed and evaluated an automated end-to-end artificial intelligence system for DDH detection. The system integrates preprocessing, quality assessment, segmentation, landmark identification, measurement of α and β angles, and classification. The model, based on a PP-LiteSeg-B architecture with an STDC2 backbone, was developed using a dataset comprising 2,062 ultrasound video sequences (31,572 annotated images) from 880 infants. Performance was evaluated on an independent set of additional 412 video sequences (5,303 annotated images) derived from 169 patients. The system achieved accuracy of 0.961, a sensitivity of 0.857 and specificity of 0.969. After exclusion of cases within a predefined diagnostic uncertainty interval (α = 57–63°), accuracy increased to 0.983. The decrease in performance within the interval highlights the impact of borderline measurements on classification reliability. An automated end-to-end approach may improve consistency and reproducibility in DDH screening according to the Graf method. It may further serve as a basis for future clinical decision support systems (CDSS) integrating clinical and imaging data, aimed at improving standardization and supporting clinical decision-making across the diagnostic pathway. Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Health care Physical sciences/Mathematics and computing Health sciences/Medical research developmental dysplasia of the hip Graf method hip ultrasound artificial intelligence deep learning image segmentation Figures Figure 1 Figure 2 Figure 3 1. Introduction Ultrasound-based diagnosis of developmental dysplasia of the hip (DDH) according to Graf’s method relies on the identification of specific anatomical landmarks and the measurement of α and β angles within a standardized imaging plane. Diagnostic classification is determined by predefined angular thresholds, making the accuracy of landmark localization and measurement consistency critical for reliable interpretation 1 . The Graf method is a well-established and reliable reference standard used across many countries worldwide 2 . However, an important limitation of the method is its operator dependency. Even small variations in probe orientation during hip ultrasound (US) can substantially influence classification. Probe deviations averaging 24° can result in differences of up to 19° in α-angle 3 . Accurate acquisition of the standard plane, selection of diagnostically valid frames, and precise placement of measurement lines require substantial experience and technical proficiency. Consequently, inter- and intra-observer variability has been reported even among trained clinicians, particularly in cases with angle values close to diagnostic cut-off points, where minor measurement differences may affect final classification 4 , 5 . Moreover, systematic evaluations of the literature indicate that a considerable proportion of published studies do not fully adhere to the essential quality criteria of the Graf protocol. Inconsistent image acquisition and landmark identification under real-world conditions likely contribute to the observed variability 6 . Recent developments in artificial intelligence, particularly deep learning–based semantic segmentation, enable automated identification of anatomical structures in US images 7 . Such approaches offer the potential to standardize landmark detection, apply consistent measurement rules, and reduce operator-related variability across heterogeneous imaging conditions. Previous studies applying artificial intelligence to DDH have explored selected components of the diagnostic workflow, including automated segmentation and angle estimation, often in controlled settings or on relatively homogeneous datasets 8 . Building on previous work on automated segmentation of hip ultrasound images 9 , the present study integrates preprocessing, quality control, anatomical segmentation, landmark detection, angle measurement, and final hip classification into a unified AI-based system. The primary aim of this study was to assess the diagnostic performance of this system within a unified framework based on the Graf protocol. 2. Materials and Methods 2.1 Study Materials This retrospective cross-sectional observational study used ultrasound material obtained during routine hip screening and follow-up examinations performed at a private orthopaedic clinic. All scans were conducted between January 2022 and December 2025 according to the Graf method by six orthopaedic surgeons trained in this technique. Ultrasound images were acquired using Samsung V7 (Samsung Medison Co., Ltd., Seoul, Korea), DC-60S (Shenzhen Mindray Bio-Medical Electronics Co., Ltd., Shenzhen, China), and E-Cube 8 Diamond (Alpinion Medical Systems, Seoul, Korea) with imaging parameters adjusted individually by the examining physician. The recordings were stored as DICOM video files on a PACS server (Mini PACS server HP ProLiant DL20 G10; Hewlett Packard Enterprise, Houston, TX, USA) running the Mini mEdivum electronic ultrasound archiving system (mEdivum Sp. z o.o., Warsaw, Poland). All DICOM data were anonymized using a dedicated tool. Image annotation was performed by six independent orthopaedic surgeons trained in the Graf method and certified through a Graf-accredited course following a previously described protocol 9 . Anatomical structures were annotated on selected frames using dedicated software. For each video sequence, up to 25 frames of the highest diagnostic quality were selected. Annotation followed Graf’s Checklists I and II 10 , and frames were included only when at least five relevant structures were identifiable, with at least one frame fulfilling the criteria of the Graf standard plane. To reduce potential bias, the software provided automated feedback to detect common technical errors, all annotated frames were independently reviewed. A total of 880 patients, 2,062 US video sequences, and 31,572 images were included in the dataset and divided into training, validation, and test sets. The distribution of patients, videos, and images across the datasets is presented in Table 1 . Additional, independent test set comprising 412 US video sequences (5,303 annotated images) was obtained from 169 patients whose data were not used at any stage of system development or segmentation model training. Table 1 Distribution of patients, ultrasound videos, and image frames across training, validation, and test datasets (column percentages). Test set Patients n (%) Videos n (%) Images n (%) 249 (28.3%) 711 (34.5%) 11382 (36.1%) Validation set 105 (11.9%) 238 (11.5%) 3348 (10.6%) Training set 526 (59.8%) 1113 (54.0%) 16842 (53.3%) 2.2 AI-based System for DDH Diagnosis This section describes the architecture of the artificial intelligence system developed to support the diagnosis of DDH based on the Graf method (Fig. 1 ). The system integrates ultrasound imaging, image processing and deep learning to enable automated analysis of hip US images. Data preprocessing Data preprocessing constitutes a key stage in preparing DICOM images, conditioning the effective operation of the machine learning model. This procedure includes the extraction of the proper ultrasound area (Region of Interest—ROI). The field of view was reduced to the ultrasound imaging area to remove non-diagnostic elements, such as background and device interface components. Black pixel verification involved quantitative assessment of the proportion of black pixels within each frame. Frames exceeding a predefined threshold were excluded as non-diagnostic. To reduce data redundancy, frames were filtered based on visual similarity using Average Hash and Hamming distance, retaining only unique representations. Image sharpness was assessed using Gaussian filtering and Laplacian-based analysis, and frames with insufficient sharpness were excluded. To ensure compatibility with the segmentation model, images were resized to 1024×890 pixels. Subsequently, pixel value normalization is performed for individual color channels. This procedure utilizes established statistical parameters: a vector of mean values and a vector of standard deviations, forming the final input to the neural network. Image segmentation Images returned from the preprocessing stage undergo segmentation. The artificial intelligence model was developed as a semantic segmentation system designed for the automated extraction of eight key anatomical structures visible in hip US images 9 . The deep neural network model accepts images in the RGB color space as input. During the inference process, the network generates logits, constituting a multi-channel prediction map in which the probabilities of each pixel belonging to the defined anatomical classes are encoded. Final classification occurs through the selection of the class with the highest probability value (argmax operation). The segmentation mask obtained in this manner is subsequently subjected to an inverse transformation—rescaling to the original dimensions of the source image. This procedure restores the spatial consistency of the result with the original US, enabling precise visualization, overlaying of the mask on the image, and further analysis of the segments. A segmentation model based on the PP-LiteSeg-B architecture 11 containing an STDC2 (Short-Term Dense Concatenate) backbone was utilized, ensuring a compromise between operational speed and segmentation quality (Fig. 2 ). The model was trained for 51,200 iterations using the SGD optimization algorithm, with a momentum coefficient of 0.9 and weight decay of 5e-4. A PolynomialDecay schedule with warmup for the first 1000 iterations was applied to regulate the learning rate, helping to avoid a sudden increase in error at the beginning of training. To enhance the stability of the training process and the detection accuracy of smaller structures, the OhemCrossEntropyLoss function was employed. This type of loss function preferentially focuses on more difficult examples (hard-to-classify pixels), bypassing those that are correctly classified in the early training phase. Consequently, the model learns to effectively identify less distinct or poorly visible anatomical elements in US images. During the training process, the images underwent a set of transformations, such as: random horizontal flip, as well as brightness, contrast, and saturation distortions. These operations aimed to improve model generalization and increase its robustness to the qualitative variability of images from different ultrasound devices. Postprocessing and Frame filtering After generating the segmentation masks, a post-processing stage follows, which is of key importance for the reliability of the subsequent geometric analysis. The raw prediction results undergo filtration to eliminate artifacts, defined as small, isolated clusters of pixels detached from the main object of a given class. The system identifies the dominant segment (the largest cluster of pixels), recognizing it as the proper representation of the anatomy. Any smaller, scattered fragments are treated as prediction errors and are removed. Next stage aims to eliminate frames in which the geometry of the recognized anatomical structures is incomplete or distorted to a degree that prevents a reliable diagnosis. The first selection criterion concerns the dimensions of the bony roof. A filter is applied here that checks the vertical span of the extracted segment. For a frame to be considered correct, the length of this structure, measured in pixels, must exceed a defined threshold value of 45 pixels. This parameter was determined experimentally. The second criterion verifies the spatial relationship between the bony roof and the lower edge of the ilium (lower limb). Verification is realized through morphological operations (dilation), which allow for checking whether these areas border each other. A lack of contact between the bony roof and the lower limb suggests a segmentation error or an incorrect ultrasound view, resulting in the frame disqualification. Masks that have successfully passed the preliminary validation are directed to the next stage, which is the determination of key landmarks: baseline, lower limb bony rim, centroid of labrum and tangent point (the point of contact between the bony roof and the straight line drawn through the lower limb point in the direction of the baseline). After the successful identification of anatomical points and baseline, the masks undergo a secondary validation procedure. This process includes the verification of three detailed criteria: the spatial configuration of the acetabular floor, the orientation of the baseline, and the signal quality in the lower limb area. The first criterion evaluates acetabular morphology based on the presence of a characteristic concave (“hockey stick”) configuration defined by the spatial relationship between the lower limb, tangent point, and bony roof. Frames not meeting this condition are excluded. The second criterion verifies probe positioning by assessing baseline orientation, which should be vertical, with an allowable deviation of ± 3°. Images exceeding this threshold are rejected. The third criterion assesses local image quality near the lower limb by analyzing pixel intensity to exclude frames affected by noise or acoustic artifacts. Finally, only frames meeting all criteria are retained. If more than 10 valid images are available, up to 10 frames are selected based on the leftward position of the lower limb point, favoring deeper visualization of the acetabular floor. Measurement of α and β angles The final stage of the analytical procedure is the determination of the α and β angles, which constitute the basis for clinical diagnosis. The algorithm, operating on reference points, constructs directional vectors of the lines and determines the angles contained between them. The results of the operation are expressed in degrees. From the set of valid frames, a single frame with the highest α angle is selected, and the final diagnostic classification is based on this frame. In accordance with the Graf standard, the following decision rule was adopted: hips with an α angle ≥ 60° were classified as healthy, whereas hips with an α angle < 60° were classified as dysplastic or physiologically immature. 2.3 Statistical methods Accurate evaluation of model performance is a critical aspect of image segmentation research. In this study, two widely adopted metrics were employed: Intersection over Union (IoU) and the Dice Similarity Coefficient (DSC). Additionally, for classes representing points (bony rim and lower limb), the Euclidean distance from the centroid of the model's segment to the centroid of the physician-annotated segment was calculated. For the baseline class, the angle of deviation between the physician-annotated baseline and the model's baseline was computed. Segmentation performance was assessed using a success rate defined as the proportion of frames with an Intersection over Union (IoU) ≥ 0.5 between model predictions and annotations. Diagnostic performance was evaluated as a binary classification task (non-dysplastic vs. dysplastic/physiologically immature hip), with accuracy as the primary metric, alongside sensitivity and specificity. Confidence intervals for diagnostic performance metrics were calculated using exact binomial methods (Clopper-Pearson). Expert manual segmentation of the test dataset served as the reference standard, from which the α angle and corresponding diagnostic class were derived. Patient-level separation was ensured across datasets to avoid data leakage. A dual evaluation approach was applied. First, standard metrics were calculated for the entire test set. Second, a interval of α = 57–63° was introduced to account for measurement uncertainty reported in the literature 3 . Cases within this range were excluded from automated performance assessment and considered as requiring expert verification. The differences between expert-based and algorithm-based measurement of α angle on video sequences were assessed using the Wilcoxon signed-rank test for paired samples. Tests were two-sided and a p value < 0.05 was considered statistically significant. All scientific content was verified by the authors. All statistical analyses were performed using Python version 3.9.13 with the SciPy library (version 1.7.3) A large language model was used exclusively for language editing (GPT-5.4 Thinking). The reporting of this study was informed by the TRIPOD-AI statement, where applicable, although strict adherence to all checklist items was not intended. 3. Results The demographic characteristics of the study population are summarized in Table 2 . The median age at examination was 6.7 weeks (IQR 5.6–9.4). Age-related statistics were calculated for 847 patients, as 33 were excluded due to missing age data. Table 2 Demographic characteristics of the study population across datasets. Test set Age, days* Female ** 47 (39–64) 133 (53.4) Validation set 52 (38–72) 56 (53.3) Training set 47 (39–66) 273 (51.9) * Median (IQR) ** Number of females in the set (percentage of females in the set) Segmentation performance across anatomical structures is summarized in Supplementary Table 1. The highest accuracy was observed for the femoral head, whereas slightly lower performance was noted for the labrum and the chondro-osseous border, which represent more challenging and less clearly defined structures in ultrasound imaging. The accuracy of landmark localization and reference line estimation is presented in Supplementary Table 2. The model demonstrated stable performance in identifying key points required for angle calculation, with low deviations in both point localization and baseline orientation. The experiment was conducted on an independent test set comprising 412 US scans. The α-angle measurements based on anatomical landmark detection by the clinician and by the model were compared. The mean α-angle was 66.40° ± 5.0 for the clinician and 66.73° ± 4.9 for the model. The mean difference between methods was − 0.32° ± 3.05 (p = 0.004). Although statistically significant, the observed mean difference of 0.32° falls within the expected measurement error and is unlikely to be clinically meaningful 12 . Within the framework of the first approach (standard evaluation), the system correctly classified 396 out of 412 cases, which translates to an overall accuracy of 96.1% (Table 3 ). Moreover, the model demonstrated high accuracy across both categories: Dysplastic: The system correctly detected 24 of the 28 dysplastic cases, resulting in a true positive rate of 85.7%. Healthy: Among healthy individuals, 372 out of 384 cases were accurately classified, corresponding to a true negative rate of 96.9%. In the second approach, after accounting for the diagnostic uncertainty interval, 63 cases (constituting 15.3% of the set) were excluded from the analysis, having been qualified as diagnostically ambiguous. In the reduced set of 349 cases, the system achieved a correct diagnosis for 343 scans. Accuracy in this approach reaches the level of 98.3% (Table 3 ). This result suggests high system reliability in cases lying outside the narrow measurement error margin. Furthermore, high accuracy is maintained in both classes: Dysplastic: The system correctly identified 10 out of 10 dysplastic cases, achieving a true positive rate of 100%. This testifies to the high sensitivity of the model in clinically significant cases. Healthy: In the group of healthy patients, the system correctly classified 333 out of 339 cases, yielding a true negative rate of 98.2%. Table 3 Diagnostic performance of the AI-based DDH classification model before and after exclusion of cases within the diagnostic uncertainty interval (α = 57–63°). Metric Standard evaluation Exclusion of interval (α = 57–63°) Accuracy 0.961 (95% CI 0.938–0.976) 0.983 (95% CI 0.963–0.992) Sensitivity 0.857 (95% CI 0.685–0.943) 1.000 (95% CI 0.722–1.000) Specificity 0.969 (95% CI 0.946–0.982) 0.982 (95% CI 0.962–0.992) An additional assessment was conducted for the isolated group of 63 cases located inside the diagnostic uncertainty interval of 57–63 degrees. In this zone of increased error risk, the overall accuracy of the model dropped to the level of 84.1%. The noticeable drop in accuracy within this interval (84.1% vs. 98.3% for the rest of the set) confirms the validity of extracting this zone. Referring patients with borderline results for detailed expert verification allows for significant minimization of the risk of diagnostic error. A supplementary analysis was performed to assess the sensitivity of α angle measurements to small variations in landmark localization. Horizontal displacements of the lower limb point (− 10 to + 10 pixels) were simulated. The results demonstrated a high sensitivity of the Graf method, with an average change of approximately 0.66° in the α angle per 1-pixel shift (Supplementary Fig. 1). This analysis was performed on an image at its original resolution (1260 × 910 pixels). Discussion The findings of the present study should be interpreted within the context of the rapidly evolving field of AI-assisted ultrasound in DDH. Most existing approaches rely on supervised deep learning models, typically CNN-based architectures, and are often developed on relatively limited datasets or focus on selected components of the diagnostic workflow 13 . In contrast, the present study leverages a substantially larger dataset and implements an integrated pipeline reflecting the full sequence of Graf-based assessment, thereby advancing toward a clinically applicable diagnostic framework. Compared with our previous work 9 , a modest decrease in segmentation and landmark localization performance was observed. This likely reflects the incorporation of additional preprocessing, quality control, and rule-based validation steps, which impose stricter constraints on the input data. While the earlier study focused on segmentation under controlled conditions, the current approach prioritizes end-to-end diagnostic reliability. This trade-off suggests that slightly lower intermediate performance metrics may be acceptable in exchange for improved robustness and clinical consistency of the overall system. The introduction of a diagnostic uncertainty interval for α angle values (57–63°) provides a structured approach to handling borderline DDH cases and reflects the inherent uncertainty near diagnostic thresholds. This is consistent with previously reported variability in both human and AI-based image interpretation, where small measurement differences may lead to different classifications 14 . Importantly, such uncertainty represents a challenge even at the expert level and is likely to be amplified in routine clinical practice, particularly among less experienced operators. In this context, the proposed system - by explicitly identifying uncertain cases may function as a decision-support tool, improving standardization and reducing operator dependency rather than replacing expert judgment. Furthermore, sensitivity analysis demonstrated that α angle measurements are highly dependent on precise landmark localization, with small positional shifts translating into measurable changes in angle values. This finding highlights the critical importance of robust preprocessing and quality control mechanisms in ensuring reliable automated measurements and supports the rationale for incorporating strict validation steps within the pipeline. Several limitations of this study should be acknowledged. First, the model was developed and evaluated on a single-center dataset, which may limit generalizability across different populations, ultrasound systems, and acquisition protocols. Second, the reference standard was based on expert annotations, which are inherently subject to inter- and intra-observer variability. Third, although the system demonstrated high performance under controlled conditions, its real-world clinical applicability - particularly in prospective, multi-operator settings - requires further validation. Future studies should focus on prospective validation, comparison with less experienced operators, and integration of the system into real-time clinical workflows. The system is intended to support screening rather than replace expert-level diagnosis. Declarations Ethics Approval and Consent to Participate This study was conducted in accordance with the Declaration of Helsinki. The Ethics Committee of the Medical University of Warsaw (protocol code: AKBE/07/2022 and date: 17 January 2022) approved the study protocol and waived the requirement for informed consent due to its retrospective design. Data Availability Statement The data that support the findings of this study are available from Pentacomp Systemy Informatyczne S.A., but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Pentacomp Systemy Informatyczne S.A. Author Contributions Ł.P. and J.K. conceptualized the study. P.C., B.M., M.P. and J.W. developed the methodology and software. Ł.P., P.Ł. and J.K. provided resources. Ł.P., P.C., J.K., B.M., M.P. and J.W. wrote the original draft. Ł.P., P.C., J.K., B.M., M.P., P.Ł. and J.W. reviewed and edited the manuscript. B.M. prepared the figures. J.K. and Ł.P. supervised the project. J.K. and P.C. coordinated project administration. J.K. secured funding. All authors approved the final version of the manuscript. Competing interests P.C., B.M., M.P., J.W., and J.K. are employees of Pentacomp Systemy Informatyczne S.A. Ł.P. collaborates with Pentacomp Systemy Informatyczne S.A. The company may benefit from the development of products related to this work. P.Ł. declares no competing interests. Funding This research was funded by the Polish Medical Research Agency (ABM), grant number 2022/ABM/02/00004. References Graf, R. The use of ultrasonography in developmental dysplasia of the hip. Acta Orthop. Traumatol. Turc. 41 Suppl 1 , 6–13 (2007). Krysta, W., Dudek, P., Pulik, Ł. & Łęgosz, P. Screening of developmental dysplasia of the hip in Europe: a systematic review. Children 11 , 97 (2024). Jaremko, J. L. et al. Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. Radiology 273 , 870–878 (2014). Ömeroglu, H., Biçmoglu, A., Koparal, S. & Seber, S. Assessment of Variations in the Measurement of Hip Ultrasonography by the Graf Method in Developmental Dysplasia of the Hip §. J. Pediatr. Orthop. B 10 , 89–95 (2001). Dogruel, H., Baymurat, A. C., Ismail, C. T. & Atalar, H. Evaluation of Interobserver and Intraobserver Differences of Graf Method in Developmental Hip Dysplasia. Gazi Med. J. 34 , 425 (2023). Walter, S. G. et al. Four decades of developmental dysplastic hip screening according to Graf: What have we learned? Front. Pediatr. 10 , 990806 (2022). Asgari Taghanaki, S., Abhishek, K., Cohen, J. P., Cohen-Adad, J. & Hamarneh, G. Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54 , 137–178 (2021). Chen, M., Cai, R., Zhang, A., Chi, X. & Qian, J. The diagnostic value of artificial intelligence-assisted imaging for developmental dysplasia of the hip: a systematic review and meta-analysis. J. Orthop. Surg. 19 , 522 (2024). Pulik, Ł. et al. Artificial Intelligence Algorithm Supporting the Diagnosis of Developmental Dysplasia of the Hip: Automated Ultrasound Image Segmentation. J. Clin. Med. 14 , 6332 (2025). Graf, R. & Synder, M. Hip sonography worldwide–experience, results, problems. Chir. Narządów Ruchu Ortop. Pol. 85 , 29–34 (2020). Peng, J. et al. Pp-liteseg: A superior real-time semantic segmentation model. ArXiv Prepr. ArXiv220402681 (2022). Hendrickx, J. et al. Can artificial intelligence-driven cephalometric analysis replace manual tracing? A systematic review and meta-analysis. Eur. J. Orthod. 46 , cjae029 (2024). Yonis, R. et al. Artificial intelligence to support ultrasound in the detection of developmental dysplasia of the hip. Bone Jt. Open 7 , 223–233 (2026). Lenskjold, A. et al. Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort. Sci. Rep. 14 , 26782 (2024). Additional Declarations Competing interest reported. P.C., B.M., M.P., J.W., and J.K. are employees of Pentacomp Systemy Informatyczne S.A. Ł.P. collaborates with Pentacomp Systemy Informatyczne S.A. The company may benefit from the development of products related to this work. P.Ł. declares no competing interests. Supplementary Files SupplementaryFigure1.docx SupplementaryTable1.docx SupplementaryTable2.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9439534","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":627631711,"identity":"4a2da87d-780c-4562-8c28-85b3175b9f7a","order_by":0,"name":"Łukasz Pulik","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA70lEQVRIiWNgGAWjYBACPgY2KIudsZnhA4RpAMQWOLWwwbUwMzYzzkBokSBGCwMzMw9RWiTSEj/++MOQZ3CYudnYtu1wNAN78zYJxh14tRyW5uFhKDY4zNicnNt2OLeB51iZBOMZfFrSG6SBrkjcANRyOHcbUItEjpkEYxteLc0/fxhAtViCtMi/IaQl7ZgETwJESzIj2BYeAlp4nqVZ8xyQKJYEajHs/Zee28aTVmyRiMcv/Oxpxjd//LHJ4zve/ljixxnr3H72wxtvfNxhg1MLFEgkIOwFEYkNhHQwMCSgchmJ0DIKRsEoGAUjBgAAOn9LnfCh3tUAAAAASUVORK5CYII=","orcid":"","institution":"Medical University of Warsaw","correspondingAuthor":true,"prefix":"","firstName":"Łukasz","middleName":"","lastName":"Pulik","suffix":""},{"id":627631712,"identity":"7cb00b1d-da85-4a07-bcdc-89fd7521db56","order_by":1,"name":"Jadwiga Kaliszewska","email":"","orcid":"","institution":"Gustav Clinic","correspondingAuthor":false,"prefix":"","firstName":"Jadwiga","middleName":"","lastName":"Kaliszewska","suffix":""},{"id":627631713,"identity":"a1bd3d29-b269-4031-99ab-40df8ccf7e49","order_by":2,"name":"Bartłomiej Mulewicz","email":"","orcid":"","institution":"Pentacomp Systemy Informatyczne S.A","correspondingAuthor":false,"prefix":"","firstName":"Bartłomiej","middleName":"","lastName":"Mulewicz","suffix":""},{"id":627631714,"identity":"3a84662e-0c21-4fd3-bfec-e3f4a327bb2d","order_by":3,"name":"Maciej Pykosz","email":"","orcid":"","institution":"Pentacomp Systemy Informatyczne S.A","correspondingAuthor":false,"prefix":"","firstName":"Maciej","middleName":"","lastName":"Pykosz","suffix":""},{"id":627631715,"identity":"62cad931-9641-42b1-8d9b-1549e4d4d615","order_by":4,"name":"Wiszniewska Joanna","email":"","orcid":"","institution":"Pentacomp Systemy Informatyczne S.A","correspondingAuthor":false,"prefix":"","firstName":"Wiszniewska","middleName":"","lastName":"Joanna","suffix":""},{"id":627631716,"identity":"05551e60-4aa9-4b83-927c-82c574230756","order_by":5,"name":"Paweł Czech","email":"","orcid":"","institution":"Pentacomp Systemy Informatyczne S.A","correspondingAuthor":false,"prefix":"","firstName":"Paweł","middleName":"","lastName":"Czech","suffix":""},{"id":627631718,"identity":"7e57f902-094d-40de-ba16-27cb0142f8f2","order_by":6,"name":"Paweł Łęgosz","email":"","orcid":"","institution":"Medical University of Warsaw","correspondingAuthor":false,"prefix":"","firstName":"Paweł","middleName":"","lastName":"Łęgosz","suffix":""}],"badges":[],"createdAt":"2026-04-16 14:38:35","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9439534/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9439534/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108976637,"identity":"0ea0cf29-210f-411a-b0c7-a8d70caf603b","added_by":"auto","created_at":"2026-05-11 11:27:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":225795,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the workflow for automated analysis of hip ultrasound images and the processing pipeline from raw ultrasound acquisition to the final diagnostic output with annotated α and β angles. After preprocessing and deep learning–based segmentation, frames are filtered to retain diagnostically relevant images. Anatomical landmarks and the baseline are identified, enabling automated angle calculation.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/3c15005ff0e730075af47842.png"},{"id":108839971,"identity":"3b30ab0b-8f24-4f1f-88fb-8d26b63b69ec","added_by":"auto","created_at":"2026-05-09 00:51:31","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":163779,"visible":true,"origin":"","legend":"\u003cp\u003eArchitecture of the PP-LiteSeg-based model. The encoder extracts image features at progressively lower resolutions (from 1/4 to 1/32 of the input size), capturing both fine details and global context. The decoder then combines these features and reconstructs the image to produce the final segmentation output.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/9879f7d6ff96541ec62468f2.png"},{"id":109081151,"identity":"6786ecb8-b960-4406-b1ee-1655aced8604","added_by":"auto","created_at":"2026-05-12 12:02:25","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":508792,"visible":true,"origin":"","legend":"\u003cp\u003eModel segments for anatomical structures with marked landmarks and baseline, as well as α and β angles\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/98034d14c980337db139a249.png"},{"id":109249388,"identity":"0f2b3586-548c-4339-bb6f-2a01d88ffff7","added_by":"auto","created_at":"2026-05-14 08:50:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1240586,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/e0fae275-133b-4c16-9236-0d60986243b1.pdf"},{"id":108839970,"identity":"eb7b9b51-8111-415f-ac1e-ab196407556f","added_by":"auto","created_at":"2026-05-09 00:51:31","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":34175,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFigure1.docx","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/355fa68b031222ee4b5581fb.docx"},{"id":108977086,"identity":"fb4d0624-3723-40ce-bb48-d2659242147f","added_by":"auto","created_at":"2026-05-11 11:30:20","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":14856,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable1.docx","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/7e4a0463ad2696142fcd4ec0.docx"},{"id":108839974,"identity":"86c7d755-fd99-4a7d-a3bd-1bbc47985493","added_by":"auto","created_at":"2026-05-09 00:51:31","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":14837,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable2.docx","url":"https://assets-eu.researchsquare.com/files/rs-9439534/v1/42ff38dffc9b6724e84bbf78.docx"}],"financialInterests":"Competing interest reported. P.C., B.M., M.P., J.W., and J.K. are employees of Pentacomp Systemy Informatyczne S.A. Ł.P. collaborates with Pentacomp Systemy Informatyczne S.A. The company may benefit from the development of products related to this work. P.Ł. declares no competing interests.","formattedTitle":"Automated Diagnosis of Developmental Dysplasia of the Hip in Ultrasound Using an End- to-End Deep Learning Approach","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eUltrasound-based diagnosis of developmental dysplasia of the hip (DDH) according to Graf\u0026rsquo;s method relies on the identification of specific anatomical landmarks and the measurement of α and β angles within a standardized imaging plane. Diagnostic classification is determined by predefined angular thresholds, making the accuracy of landmark localization and measurement consistency critical for reliable interpretation\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe Graf method is a well-established and reliable reference standard used across many countries worldwide\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. However, an important limitation of the method is its operator dependency. Even small variations in probe orientation during hip ultrasound (US) can substantially influence classification. Probe deviations averaging 24\u0026deg; can result in differences of up to 19\u0026deg; in α-angle\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Accurate acquisition of the standard plane, selection of diagnostically valid frames, and precise placement of measurement lines require substantial experience and technical proficiency. Consequently, inter- and intra-observer variability has been reported even among trained clinicians, particularly in cases with angle values close to diagnostic cut-off points, where minor measurement differences may affect final classification\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Moreover, systematic evaluations of the literature indicate that a considerable proportion of published studies do not fully adhere to the essential quality criteria of the Graf protocol. Inconsistent image acquisition and landmark identification under real-world conditions likely contribute to the observed variability\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eRecent developments in artificial intelligence, particularly deep learning\u0026ndash;based semantic segmentation, enable automated identification of anatomical structures in US images\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. Such approaches offer the potential to standardize landmark detection, apply consistent measurement rules, and reduce operator-related variability across heterogeneous imaging conditions.\u003c/p\u003e \u003cp\u003ePrevious studies applying artificial intelligence to DDH have explored selected components of the diagnostic workflow, including automated segmentation and angle estimation, often in controlled settings or on relatively homogeneous datasets\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eBuilding on previous work on automated segmentation of hip ultrasound images\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, the present study integrates preprocessing, quality control, anatomical segmentation, landmark detection, angle measurement, and final hip classification into a unified AI-based system. The primary aim of this study was to assess the diagnostic performance of this system within a unified framework based on the Graf protocol.\u003c/p\u003e"},{"header":"2. Materials and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e2.1 Study Materials\u003c/h2\u003e\n \u003cp\u003eThis retrospective cross-sectional observational study used ultrasound material obtained during routine hip screening and follow-up examinations performed at a private orthopaedic clinic. All scans were conducted between January 2022 and December 2025 according to the Graf method by six orthopaedic surgeons trained in this technique.\u003c/p\u003e\n \u003cp\u003eUltrasound images were acquired using Samsung V7 (Samsung Medison Co., Ltd., Seoul, Korea), DC-60S (Shenzhen Mindray Bio-Medical Electronics Co., Ltd., Shenzhen, China), and E-Cube 8 Diamond (Alpinion Medical Systems, Seoul, Korea) with imaging parameters adjusted individually by the examining physician.\u003c/p\u003e\n \u003cp\u003eThe recordings were stored as DICOM video files on a PACS server (Mini PACS server HP ProLiant DL20 G10; Hewlett Packard Enterprise, Houston, TX, USA) running the Mini mEdivum electronic ultrasound archiving system (mEdivum Sp. z o.o., Warsaw, Poland). All DICOM data were anonymized using a dedicated tool.\u003c/p\u003e\n \u003cp\u003eImage annotation was performed by six independent orthopaedic surgeons trained in the Graf method and certified through a Graf-accredited course following a previously described protocol\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Anatomical structures were annotated on selected frames using dedicated software.\u003c/p\u003e\n \u003cp\u003eFor each video sequence, up to 25 frames of the highest diagnostic quality were selected. Annotation followed Graf\u0026rsquo;s Checklists I and II\u003csup\u003e10\u003c/sup\u003e, and frames were included only when at least five relevant structures were identifiable, with at least one frame fulfilling the criteria of the Graf standard plane. To reduce potential bias, the software provided automated feedback to detect common technical errors, all annotated frames were independently reviewed.\u003c/p\u003e\n \u003cp\u003eA total of 880 patients, 2,062 US video sequences, and 31,572 images were included in the dataset and divided into training, validation, and test sets. The distribution of patients, videos, and images across the datasets is presented in Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Additional, independent test set comprising 412 US video sequences (5,303 annotated images) was obtained from 169 patients whose data were not used at any stage of system development or segmentation model training.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eDistribution of patients, ultrasound videos, and image frames across training, validation, and test datasets (column percentages).\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\n \u003cp\u003eTest set\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003ePatients n (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eVideos n (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eImages n (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e249 (28.3%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e711 (34.5%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e11382 (36.1%)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eValidation set\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e105 (11.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e238 (11.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e3348 (10.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eTraining set\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e526 (59.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e1113 (54.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e16842 (53.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e2.2 AI-based System for DDH Diagnosis\u003c/h2\u003e\n \u003cp\u003eThis section describes the architecture of the artificial intelligence system developed to support the diagnosis of DDH based on the Graf method (Fig. \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The system integrates ultrasound imaging, image processing and deep learning to enable automated analysis of hip US images.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eData preprocessing\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eData preprocessing constitutes a key stage in preparing DICOM images, conditioning the effective operation of the machine learning model. This procedure includes the extraction of the proper ultrasound area (Region of Interest\u0026mdash;ROI).\u003c/p\u003e\n \u003cp\u003eThe field of view was reduced to the ultrasound imaging area to remove non-diagnostic elements, such as background and device interface components. Black pixel verification involved quantitative assessment of the proportion of black pixels within each frame. Frames exceeding a predefined threshold were excluded as non-diagnostic. To reduce data redundancy, frames were filtered based on visual similarity using Average Hash and Hamming distance, retaining only unique representations. Image sharpness was assessed using Gaussian filtering and Laplacian-based analysis, and frames with insufficient sharpness were excluded.\u003c/p\u003e\n \u003cp\u003eTo ensure compatibility with the segmentation model, images were resized to 1024\u0026times;890 pixels. Subsequently, pixel value normalization is performed for individual color channels. This procedure utilizes established statistical parameters: a vector of mean values and a vector of standard deviations, forming the final input to the neural network.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eImage segmentation\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eImages returned from the preprocessing stage undergo segmentation. The artificial intelligence model was developed as a semantic segmentation system designed for the automated extraction of eight key anatomical structures visible in hip US images\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n \u003cp\u003eThe deep neural network model accepts images in the RGB color space as input. During the inference process, the network generates logits, constituting a multi-channel prediction map in which the probabilities of each pixel belonging to the defined anatomical classes are encoded. Final classification occurs through the selection of the class with the highest probability value (argmax operation). The segmentation mask obtained in this manner is subsequently subjected to an inverse transformation\u0026mdash;rescaling to the original dimensions of the source image. This procedure restores the spatial consistency of the result with the original US, enabling precise visualization, overlaying of the mask on the image, and further analysis of the segments.\u003c/p\u003e\n \u003cp\u003eA segmentation model based on the PP-LiteSeg-B architecture\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e containing an STDC2 (Short-Term Dense Concatenate) backbone was utilized, ensuring a compromise between operational speed and segmentation quality (Fig. \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The model was trained for 51,200 iterations using the SGD optimization algorithm, with a momentum coefficient of 0.9 and weight decay of 5e-4. A PolynomialDecay schedule with warmup for the first 1000 iterations was applied to regulate the learning rate, helping to avoid a sudden increase in error at the beginning of training. To enhance the stability of the training process and the detection accuracy of smaller structures, the OhemCrossEntropyLoss function was employed. This type of loss function preferentially focuses on more difficult examples (hard-to-classify pixels), bypassing those that are correctly classified in the early training phase. Consequently, the model learns to effectively identify less distinct or poorly visible anatomical elements in US images.\u003c/p\u003e\n \u003cp\u003eDuring the training process, the images underwent a set of transformations, such as: random horizontal flip, as well as brightness, contrast, and saturation distortions. These operations aimed to improve model generalization and increase its robustness to the qualitative variability of images from different ultrasound devices.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003ePostprocessing and Frame filtering\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eAfter generating the segmentation masks, a post-processing stage follows, which is of key importance for the reliability of the subsequent geometric analysis. The raw prediction results undergo filtration to eliminate artifacts, defined as small, isolated clusters of pixels detached from the main object of a given class. The system identifies the dominant segment (the largest cluster of pixels), recognizing it as the proper representation of the anatomy. Any smaller, scattered fragments are treated as prediction errors and are removed. Next stage aims to eliminate frames in which the geometry of the recognized anatomical structures is incomplete or distorted to a degree that prevents a reliable diagnosis.\u003c/p\u003e\n \u003cp\u003eThe first selection criterion concerns the dimensions of the bony roof. A filter is applied here that checks the vertical span of the extracted segment. For a frame to be considered correct, the length of this structure, measured in pixels, must exceed a defined threshold value of 45 pixels. This parameter was determined experimentally. The second criterion verifies the spatial relationship between the bony roof and the lower edge of the ilium (lower limb). Verification is realized through morphological operations (dilation), which allow for checking whether these areas border each other. A lack of contact between the bony roof and the lower limb suggests a segmentation error or an incorrect ultrasound view, resulting in the frame disqualification.\u003c/p\u003e\n \u003cp\u003eMasks that have successfully passed the preliminary validation are directed to the next stage, which is the determination of key landmarks: baseline, lower limb bony rim, centroid of labrum and tangent point (the point of contact between the bony roof and the straight line drawn through the lower limb point in the direction of the baseline).\u003c/p\u003e\n \u003cp\u003eAfter the successful identification of anatomical points and baseline, the masks undergo a secondary validation procedure. This process includes the verification of three detailed criteria: the spatial configuration of the acetabular floor, the orientation of the baseline, and the signal quality in the lower limb area. The first criterion evaluates acetabular morphology based on the presence of a characteristic concave (\u0026ldquo;hockey stick\u0026rdquo;) configuration defined by the spatial relationship between the lower limb, tangent point, and bony roof. Frames not meeting this condition are excluded. The second criterion verifies probe positioning by assessing baseline orientation, which should be vertical, with an allowable deviation of \u0026plusmn;\u0026thinsp;3\u0026deg;. Images exceeding this threshold are rejected. The third criterion assesses local image quality near the lower limb by analyzing pixel intensity to exclude frames affected by noise or acoustic artifacts. Finally, only frames meeting all criteria are retained. If more than 10 valid images are available, up to 10 frames are selected based on the leftward position of the lower limb point, favoring deeper visualization of the acetabular floor.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eMeasurement of \u0026alpha; and \u0026beta; angles\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe final stage of the analytical procedure is the determination of the \u0026alpha; and \u0026beta; angles, which constitute the basis for clinical diagnosis. The algorithm, operating on reference points, constructs directional vectors of the lines and determines the angles contained between them. The results of the operation are expressed in degrees. From the set of valid frames, a single frame with the highest \u0026alpha; angle is selected, and the final diagnostic classification is based on this frame. In accordance with the Graf standard, the following decision rule was adopted: hips with an \u0026alpha; angle\u0026thinsp;\u0026ge;\u0026thinsp;60\u0026deg; were classified as healthy, whereas hips with an \u0026alpha; angle\u0026thinsp;\u0026lt;\u0026thinsp;60\u0026deg; were classified as dysplastic or physiologically immature.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003e2.3 Statistical methods\u003c/h2\u003e\n \u003cp\u003eAccurate evaluation of model performance is a critical aspect of image segmentation research. In this study, two widely adopted metrics were employed: Intersection over Union (IoU) and the Dice Similarity Coefficient (DSC). Additionally, for classes representing points (bony rim and lower limb), the Euclidean distance from the centroid of the model\u0026apos;s segment to the centroid of the physician-annotated segment was calculated. For the baseline class, the angle of deviation between the physician-annotated baseline and the model\u0026apos;s baseline was computed. Segmentation performance was assessed using a success rate defined as the proportion of frames with an Intersection over Union (IoU)\u0026thinsp;\u0026ge;\u0026thinsp;0.5 between model predictions and annotations.\u003c/p\u003e\n \u003cp\u003eDiagnostic performance was evaluated as a binary classification task (non-dysplastic vs. dysplastic/physiologically immature hip), with accuracy as the primary metric, alongside sensitivity and specificity. Confidence intervals for diagnostic performance metrics were calculated using exact binomial methods (Clopper-Pearson). Expert manual segmentation of the test dataset served as the reference standard, from which the \u0026alpha; angle and corresponding diagnostic class were derived. Patient-level separation was ensured across datasets to avoid data leakage.\u003c/p\u003e\n \u003cp\u003eA dual evaluation approach was applied. First, standard metrics were calculated for the entire test set. Second, a interval of \u0026alpha;\u0026thinsp;=\u0026thinsp;57\u0026ndash;63\u0026deg; was introduced to account for measurement uncertainty reported in the literature\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Cases within this range were excluded from automated performance assessment and considered as requiring expert verification.\u003c/p\u003e\n \u003cp\u003eThe differences between expert-based and algorithm-based measurement of \u0026alpha; angle on video sequences were assessed using the Wilcoxon signed-rank test for paired samples. Tests were two-sided and a p value\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant. All scientific content was verified by the authors. All statistical analyses were performed using Python version 3.9.13 with the SciPy library (version 1.7.3)\u003c/p\u003e\n \u003cp\u003eA large language model was used exclusively for language editing (GPT-5.4 Thinking). The reporting of this study was informed by the TRIPOD-AI statement, where applicable, although strict adherence to all checklist items was not intended.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3. Results","content":"\u003cp\u003eThe demographic characteristics of the study population are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The median age at examination was 6.7 weeks (IQR 5.6\u0026ndash;9.4). Age-related statistics were calculated for 847 patients, as 33 were excluded due to missing age data.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDemographic characteristics of the study population across datasets.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eTest set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAge, days*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFemale **\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e47 (39\u0026ndash;64)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e133 (53.4)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eValidation set\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e52 (38\u0026ndash;72)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e56 (53.3)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTraining set\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e47 (39\u0026ndash;66)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e273 (51.9)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"3\"\u003e* \u003cem\u003eMedian (IQR)\u003c/em\u003e\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"3\"\u003e\u003cem\u003e** Number of females in the set (percentage of females in the set)\u003c/em\u003e\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eSegmentation performance across anatomical structures is summarized in Supplementary Table\u0026nbsp;1. The highest accuracy was observed for the femoral head, whereas slightly lower performance was noted for the labrum and the chondro-osseous border, which represent more challenging and less clearly defined structures in ultrasound imaging. The accuracy of landmark localization and reference line estimation is presented in Supplementary Table\u0026nbsp;2. The model demonstrated stable performance in identifying key points required for angle calculation, with low deviations in both point localization and baseline orientation.\u003c/p\u003e \u003cp\u003eThe experiment was conducted on an independent test set comprising 412 US scans. The α-angle measurements based on anatomical landmark detection by the clinician and by the model were compared. The mean α-angle was 66.40\u0026deg; \u0026plusmn; 5.0 for the clinician and 66.73\u0026deg; \u0026plusmn; 4.9 for the model. The mean difference between methods was \u0026minus;\u0026thinsp;0.32\u0026deg; \u0026plusmn; 3.05 (p\u0026thinsp;=\u0026thinsp;0.004). Although statistically significant, the observed mean difference of 0.32\u0026deg; falls within the expected measurement error and is unlikely to be clinically meaningful\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWithin the framework of the first approach (standard evaluation), the system correctly classified 396 out of 412 cases, which translates to an overall accuracy of 96.1% (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Moreover, the model demonstrated high accuracy across both categories:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eDysplastic: The system correctly detected 24 of the 28 dysplastic cases, resulting in a true positive rate of 85.7%.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHealthy: Among healthy individuals, 372 out of 384 cases were accurately classified, corresponding to a true negative rate of 96.9%.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eIn the second approach, after accounting for the diagnostic uncertainty interval, 63 cases (constituting 15.3% of the set) were excluded from the analysis, having been qualified as diagnostically ambiguous. In the reduced set of 349 cases, the system achieved a correct diagnosis for 343 scans. Accuracy in this approach reaches the level of 98.3% (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). This result suggests high system reliability in cases lying outside the narrow measurement error margin. Furthermore, high accuracy is maintained in both classes:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eDysplastic: The system correctly identified 10 out of 10 dysplastic cases, achieving a true positive rate of 100%. This testifies to the high sensitivity of the model in clinically significant cases.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHealthy: In the group of healthy patients, the system correctly classified 333 out of 339 cases, yielding a true negative rate of 98.2%.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDiagnostic performance of the AI-based DDH classification model before and after exclusion of cases within the diagnostic uncertainty interval (α\u0026thinsp;=\u0026thinsp;57\u0026ndash;63\u0026deg;).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMetric\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStandard evaluation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExclusion of interval (α\u0026thinsp;=\u0026thinsp;57\u0026ndash;63\u0026deg;)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.961 (95% CI 0.938\u0026ndash;0.976)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.983 (95% CI 0.963\u0026ndash;0.992)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.857 (95% CI 0.685\u0026ndash;0.943)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.000 (95% CI 0.722\u0026ndash;1.000)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.969 (95% CI 0.946\u0026ndash;0.982)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.982 (95% CI 0.962\u0026ndash;0.992)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAn additional assessment was conducted for the isolated group of 63 cases located inside the diagnostic uncertainty interval of 57\u0026ndash;63 degrees. In this zone of increased error risk, the overall accuracy of the model dropped to the level of 84.1%. The noticeable drop in accuracy within this interval (84.1% vs. 98.3% for the rest of the set) confirms the validity of extracting this zone. Referring patients with borderline results for detailed expert verification allows for significant minimization of the risk of diagnostic error.\u003c/p\u003e \u003cp\u003eA supplementary analysis was performed to assess the sensitivity of α angle measurements to small variations in landmark localization. Horizontal displacements of the lower limb point (\u0026minus;\u0026thinsp;10 to +\u0026thinsp;10 pixels) were simulated. The results demonstrated a high sensitivity of the Graf method, with an average change of approximately 0.66\u0026deg; in the α angle per 1-pixel shift (Supplementary Fig.\u0026nbsp;1). This analysis was performed on an image at its original resolution (1260 \u0026times; 910 pixels).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe findings of the present study should be interpreted within the context of the rapidly evolving field of AI-assisted ultrasound in DDH. Most existing approaches rely on supervised deep learning models, typically CNN-based architectures, and are often developed on relatively limited datasets or focus on selected components of the diagnostic workflow\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. In contrast, the present study leverages a substantially larger dataset and implements an integrated pipeline reflecting the full sequence of Graf-based assessment, thereby advancing toward a clinically applicable diagnostic framework.\u003c/p\u003e \u003cp\u003eCompared with our previous work\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, a modest decrease in segmentation and landmark localization performance was observed. This likely reflects the incorporation of additional preprocessing, quality control, and rule-based validation steps, which impose stricter constraints on the input data. While the earlier study focused on segmentation under controlled conditions, the current approach prioritizes end-to-end diagnostic reliability. This trade-off suggests that slightly lower intermediate performance metrics may be acceptable in exchange for improved robustness and clinical consistency of the overall system.\u003c/p\u003e \u003cp\u003eThe introduction of a diagnostic uncertainty interval for α angle values (57\u0026ndash;63\u0026deg;) provides a structured approach to handling borderline DDH cases and reflects the inherent uncertainty near diagnostic thresholds. This is consistent with previously reported variability in both human and AI-based image interpretation, where small measurement differences may lead to different classifications\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Importantly, such uncertainty represents a challenge even at the expert level and is likely to be amplified in routine clinical practice, particularly among less experienced operators. In this context, the proposed system - by explicitly identifying uncertain cases may function as a decision-support tool, improving standardization and reducing operator dependency rather than replacing expert judgment.\u003c/p\u003e \u003cp\u003eFurthermore, sensitivity analysis demonstrated that α angle measurements are highly dependent on precise landmark localization, with small positional shifts translating into measurable changes in angle values. This finding highlights the critical importance of robust preprocessing and quality control mechanisms in ensuring reliable automated measurements and supports the rationale for incorporating strict validation steps within the pipeline.\u003c/p\u003e \u003cp\u003eSeveral limitations of this study should be acknowledged. First, the model was developed and evaluated on a single-center dataset, which may limit generalizability across different populations, ultrasound systems, and acquisition protocols. Second, the reference standard was based on expert annotations, which are inherently subject to inter- and intra-observer variability. Third, although the system demonstrated high performance under controlled conditions, its real-world clinical applicability - particularly in prospective, multi-operator settings - requires further validation.\u003c/p\u003e \u003cp\u003eFuture studies should focus on prospective validation, comparison with less experienced operators, and integration of the system into real-time clinical workflows. The system is intended to support screening rather than replace expert-level diagnosis.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics Approval and Consent to Participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was conducted in accordance with the Declaration of Helsinki. The Ethics Committee of the Medical University of Warsaw (protocol code: AKBE/07/2022 and date: 17 January 2022) approved the study protocol and waived the requirement for informed consent due to its retrospective design.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data that support the findings of this study are available from Pentacomp Systemy Informatyczne S.A., but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Pentacomp Systemy Informatyczne S.A.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eŁ.P. and J.K. conceptualized the study. P.C., B.M., M.P. and J.W. developed the methodology and software. Ł.P., P.Ł. and J.K. provided resources. Ł.P., P.C., J.K., B.M., M.P. and J.W. wrote the original draft. Ł.P., P.C., J.K., B.M., M.P., P.Ł. and J.W. reviewed and edited the manuscript. B.M. prepared the figures. J.K. and Ł.P. supervised the project. J.K. and P.C. coordinated project administration. J.K. secured funding. All authors approved the final version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eP.C., B.M., M.P., J.W., and J.K. are employees of Pentacomp Systemy Informatyczne S.A. Ł.P. collaborates with Pentacomp Systemy Informatyczne S.A. The company may benefit from the development of products related to this work. P.Ł. declares no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was funded by the Polish Medical Research Agency (ABM), grant number 2022/ABM/02/00004.\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eGraf, R. The use of ultrasonography in developmental dysplasia of the hip. \u003cem\u003eActa Orthop. Traumatol. Turc.\u003c/em\u003e \u003cstrong\u003e41 Suppl 1\u003c/strong\u003e, 6\u0026ndash;13 (2007).\u003c/li\u003e\n \u003cli\u003eKrysta, W., Dudek, P., Pulik, Ł. \u0026amp; Łęgosz, P. Screening of developmental dysplasia of the hip in Europe: a systematic review. \u003cem\u003eChildren\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 97 (2024).\u003c/li\u003e\n \u003cli\u003eJaremko, J. L. \u003cem\u003eet al.\u003c/em\u003e Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. \u003cem\u003eRadiology\u003c/em\u003e \u003cstrong\u003e273\u003c/strong\u003e, 870\u0026ndash;878 (2014).\u003c/li\u003e\n \u003cli\u003e\u0026Ouml;meroglu, H., Bi\u0026ccedil;moglu, A., Koparal, S. \u0026amp; Seber, S. Assessment of Variations in the Measurement of Hip Ultrasonography by the Graf Method in Developmental Dysplasia of the Hip \u0026sect;. \u003cem\u003eJ. Pediatr. Orthop. B\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 89\u0026ndash;95 (2001).\u003c/li\u003e\n \u003cli\u003eDogruel, H., Baymurat, A. C., Ismail, C. T. \u0026amp; Atalar, H. Evaluation of Interobserver and Intraobserver Differences of Graf Method in Developmental Hip Dysplasia. \u003cem\u003eGazi Med. J.\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 425 (2023).\u003c/li\u003e\n \u003cli\u003eWalter, S. G. \u003cem\u003eet al.\u003c/em\u003e Four decades of developmental dysplastic hip screening according to Graf: What have we learned? \u003cem\u003eFront. Pediatr.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 990806 (2022).\u003c/li\u003e\n \u003cli\u003eAsgari Taghanaki, S., Abhishek, K., Cohen, J. P., Cohen-Adad, J. \u0026amp; Hamarneh, G. Deep semantic segmentation of natural and medical images: a review. \u003cem\u003eArtif. Intell. Rev.\u003c/em\u003e \u003cstrong\u003e54\u003c/strong\u003e, 137\u0026ndash;178 (2021).\u003c/li\u003e\n \u003cli\u003eChen, M., Cai, R., Zhang, A., Chi, X. \u0026amp; Qian, J. The diagnostic value of artificial intelligence-assisted imaging for developmental dysplasia of the hip: a systematic review and meta-analysis. \u003cem\u003eJ. Orthop. Surg.\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 522 (2024).\u003c/li\u003e\n \u003cli\u003ePulik, Ł. \u003cem\u003eet al.\u003c/em\u003e Artificial Intelligence Algorithm Supporting the Diagnosis of Developmental Dysplasia of the Hip: Automated Ultrasound Image Segmentation. \u003cem\u003eJ. Clin. Med.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 6332 (2025).\u003c/li\u003e\n \u003cli\u003eGraf, R. \u0026amp; Synder, M. Hip sonography worldwide\u0026ndash;experience, results, problems. \u003cem\u003eChir. Narząd\u0026oacute;w Ruchu Ortop. Pol.\u003c/em\u003e \u003cstrong\u003e85\u003c/strong\u003e, 29\u0026ndash;34 (2020).\u003c/li\u003e\n \u003cli\u003ePeng, J. \u003cem\u003eet al.\u003c/em\u003e Pp-liteseg: A superior real-time semantic segmentation model. \u003cem\u003eArXiv Prepr. ArXiv220402681\u003c/em\u003e (2022).\u003c/li\u003e\n \u003cli\u003eHendrickx, J. \u003cem\u003eet al.\u003c/em\u003e Can artificial intelligence-driven cephalometric analysis replace manual tracing? A systematic review and meta-analysis. \u003cem\u003eEur. J. Orthod.\u003c/em\u003e \u003cstrong\u003e46\u003c/strong\u003e, cjae029 (2024).\u003c/li\u003e\n \u003cli\u003eYonis, R. \u003cem\u003eet al.\u003c/em\u003e Artificial intelligence to support ultrasound in the detection of developmental dysplasia of the hip. \u003cem\u003eBone Jt. Open\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 223\u0026ndash;233 (2026).\u003c/li\u003e\n \u003cli\u003eLenskjold, A. \u003cem\u003eet al.\u003c/em\u003e Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 26782 (2024).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"developmental dysplasia of the hip, Graf method, hip ultrasound, artificial intelligence, deep learning, image segmentation","lastPublishedDoi":"10.21203/rs.3.rs-9439534/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9439534/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eUltrasound-based diagnosis of developmental dysplasia of the hip (DDH) according to Graf\u0026rsquo;s method is operator-dependent, with variability arising from image acquisition, frame selection, and landmark identification.\u003c/p\u003e \u003cp\u003eBuilding on our previous study, we developed and evaluated an automated end-to-end artificial intelligence system for DDH detection. The system integrates preprocessing, quality assessment, segmentation, landmark identification, measurement of α and β angles, and classification.\u003c/p\u003e \u003cp\u003eThe model, based on a PP-LiteSeg-B architecture with an STDC2 backbone, was developed using a dataset comprising 2,062 ultrasound video sequences (31,572 annotated images) from 880 infants. Performance was evaluated on an independent set of additional 412 video sequences (5,303 annotated images) derived from 169 patients.\u003c/p\u003e \u003cp\u003eThe system achieved accuracy of 0.961, a sensitivity of 0.857 and specificity of 0.969. After exclusion of cases within a predefined diagnostic uncertainty interval (α\u0026thinsp;=\u0026thinsp;57\u0026ndash;63\u0026deg;), accuracy increased to 0.983. The decrease in performance within the interval highlights the impact of borderline measurements on classification reliability.\u003c/p\u003e \u003cp\u003eAn automated end-to-end approach may improve consistency and reproducibility in DDH screening according to the Graf method. It may further serve as a basis for future clinical decision support systems (CDSS) integrating clinical and imaging data, aimed at improving standardization and supporting clinical decision-making across the diagnostic pathway.\u003c/p\u003e","manuscriptTitle":"Automated Diagnosis of Developmental Dysplasia of the Hip in Ultrasound Using an End- to-End Deep Learning Approach","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-09 00:51:26","doi":"10.21203/rs.3.rs-9439534/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"aa979cab-b96b-49da-a5e7-b033eefaed31","owner":[],"postedDate":"May 9th, 2026","published":true,"recentEditorialEvents":[{"type":"decision","content":"Withdrawn","date":"2026-05-06T12:06:13+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":66800454,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":66800455,"name":"Health sciences/Diseases"},{"id":66800456,"name":"Health sciences/Health care"},{"id":66800457,"name":"Physical sciences/Mathematics and computing"},{"id":66800458,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-05-09T00:51:41+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-09 00:51:26","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9439534","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9439534","identity":"rs-9439534","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.