Explainable AI Models on Radiographic Images Integrated with Clinical Measurements: Prediction for Unstable Hips in Infants

doi:10.21203/rs.3.rs-3805622/v1

Explainable AI Models on Radiographic Images Integrated with Clinical Measurements: Prediction for Unstable Hips in Infants

2024 · doi:10.21203/rs.3.rs-3805622/v1

preprint OA: closed

Full text JSON View at publisher

Full text 95,791 characters · extracted from preprint-html · click to expand

Explainable AI Models on Radiographic Images Integrated with Clinical Measurements: Prediction for Unstable Hips in Infants | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Explainable AI Models on Radiographic Images Integrated with Clinical Measurements: Prediction for Unstable Hips in Infants Hirokazu Shimizu, Ken Enda, Hidenori Koyano, Tomohiro Shimizu, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3805622/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 Aug, 2024 Read the published version in Scientific Reports → Version 1 posted 10 You are reading this latest preprint version Abstract Considering explainability is crucial in medical artificial intelligence, technologies to quantify Grad-CAM heatmaps and perform automatic integration based on domain knowledge remain lacking. Hence, we created an end-to-end model that produced CAM scores on regions of interest (CSoR), a measure of relative CAM activity, and feature importance scores by automatic algorithms for clinical measurement (aaCM) followed by LightGBM. In this multicenter research project, the diagnostic performance of the model was investigated with 813 radiographic hip images in infants at risk of unstable hips, with the ground truth defined by provocative examinations. The results indicated that the accuracy of aaCM was higher than that of specialists, and the model with ad hoc adoption of aaCM outperformed the image-only-based model. Subgroup analyses in positive cases indicated significant differences in CSoR between the unstable and contralateral sides despite containing only binary labels (positive or negative). In conclusion, aaCM reinforces the performance, and CSoR potentially indicates model reliability. Health sciences/Diseases Health sciences/Medical research/Paediatric research Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction The medical imaging field is currently seeing increased usage of machine learning (ML) using networks such as convolutional neural networks (CNNs). Although CNNs perform better than specialized physicians in classification models 1 , the rationale behind the prediction is lacking. To address this issue, explainable artificial intelligence (XAI) such as the Grad-CAM and feature selection techniques have been introduced 2 – 4 . XAI explains the internal operation as visualizing the weighted area or featuring tabular data for classification. While previous studies have reported that weighted regions of the Grad-CAM heatmaps matched with clinically concerned areas in representative images 4 – 6 , technologies to quantify Grad-CAM heatmaps remain lacking 7 . Developmental dysplasia of the hip (DDH) is one of the most common congenital abnormalities of the musculoskeletal apparatus affecting infants, ranging from mild dysplasia to dislocated hips 8 . Provocative maneuvers have been used widely for screening unstable hips 9 – 12 . Therapeutic interventions should be performed on the unstable hips of infants under 6 months 13 , whereas infants with stable hips can be observed. Notably, a randomized study found that interventions for stable dysplasia did not affect acetabular growth 14 , thus making it crucial to identify unstable hips because therapeutic strategies differ according to hip instability. Image investigations can also assist with diagnosing DDH with the use of the following four parameters on the hip region: acetabular index, O-edge angle, Yamamuro A, and Yamamuro-B 15 , 16 . Currently, automatic integration based on domain knowledge using these clinical parameters is rarely performed for end-to-end models. Here, we create an end-to-end model that produces CAM Scores on a region of interest (CSoR), a measure of the relative CAM activity on the hip area, as well as feature importance scores by an automatic algorithm of clinical measurements (aaCM) followed by LightGBM; aaCM can estimate the four parameters on each side. This study aims to evaluate the diagnostic performance on infantile hip images by integrating clinical measurements and quantifying the inner processes through CSoR on the images and the feature importance scores. Results Model overview and data collection The settings were prepared as follows: Setting A had pure CNNs; Setting B had aaCM followed by LightGBM; Setting C was the integrated model (Fig. 1 ). Data were collected from infants who visited our and related hospitals and were at risk of DDH based on the Japanese Pediatric Orthopaedic Association’s guidelines 17 . A total of 813 radiographic images were enrolled, with no data excluded. Demographic data showed that the left hip was affected more than the right hip (Table 1 ). As a binary ground truth, infants were defined as having an unstable hip if provocative examinations were positive. Table 1 Demographic data Our and related hospitals Number of infants 813 Mean age, month 4.4 (± 0.02) Female, % 83.3 (677/813) Breech presentation, % 28.9 (235/813) Family history of first relatives, % 23.2 (189/813) Skin laterality, % 48.3 (393/813) Limb limitation, % 40.6 (330/813) Unstable hip, % 28.2 (232/813) Being unstable in left hip, % 76.7 (178/232) Mean ± standard error Performance of end-to-end models for unstable hips Six-fold cross-validation was applied to ensure a combination of models could be used to reach a majority decision, as previously described 18 , 19 . Upon six-fold cross-validation by using EfficientNetB4, the average accuracy of the models for predicting unstable hips was 80.8% (± 0.97%) for Setting A and 80.5% (± 1.12%) for Setting B (Fig. 2 a-c). The average area under the precision-recall curve (AUPRC) was 0.733 (± 0.050) for Setting A and 0.706 (± 0.048) for Setting B. The average area under the receiver operating curve (AUROC) was 0.840 (± 0.029) for Setting A and 0.789 (± 0.036) for Setting B. The average F1 score was 0.791 (± 0.027) for Setting A and 0.754 (± 0.035) for Setting B. Setting C was trained on all images simultaneously, obtaining an average accuracy of 83.2% (± 1.84%), average AUPRC of 0.804 (± 0.060), average AUROC of 0.885 (± 0.022), and average F1 score of 0.825 (± 0.017), thus indicating that Setting C offered significantly better performance than either Setting A or B with P ≤ 0.05 for accuracy, AUPRC, AUROC, and F1 score (Fig. 2 a-c). Using EfficientNet B0 and B8 revealed that both average AUPRC and AUROC in Setting C were significantly higher than those of Setting A with P ≤ 0.05. In EfficientNetB0, the average AUPRC was 0.722 (± 0.036) for Setting A and 0.778 (± 0.034) for Setting C, and the average AUROC was 0.826(± 0.035) for Setting A and 0.864 (± 0.028) for Setting C. In EfficientNetB8, the average AUPRC was 0.758 (± 0.032) for Setting A and 0.795 (± 0.029) for Setting C, and the average AUROC was 0.848 (± 0.031) for Setting A and 0.869 (± 0.029) for Setting C. Accuracy of the estimated parameters by aaCM The mean absolute error of the estimated acetabular index was 1.76° on the right and 1.91° on the left, lesser than previously reported 20 , 21 (Table 2 ). Moreover, the estimated acetabular index produced by aaCM had a significantly smaller error than that produced by the orthopedic specialists (Table 3 ). Table 2 Mean absolute error of acetabular index Method Mean absolute error of acetabular index Right Left Bashir 19 3.78° 2.95° Liu 20 2.25° 2.23° Our model 1.76° 1.91° Table 3 Accuracy of the estimated parameters produced by aaCM compared with that of orthopedic specialists Our model Orthopedic specialists P value Acetabular index right 1.76 (± 0.16) 2.40 (± 0.12) < 0.01 left 1.91 (± 0.12) 2.48 (± 0.16) 0.01 O-edge angle right 4.37 (± 0.39) 5.84 (± 0.31) < 0.01 left 4.84 (± 0.36) 6.77 (± 0.39) < 0.01 Yamamuro A right 0.98 (± 0.06) 1.08 (± 0.06) 0.18 left 0.91 (± 0.07) 0.97 (± 0.06) 0.23 Yamamuro B right 0.69 (± 0.04) 0.72 (± 0.06) 0.20 left 0.76 (± 0.08) 0.74 (± 0.05) 0.73 Mean ± standard error ROI detection The right and left sides of the acetabulum, proximal femur, and ischium were annotated. As those ROIs were considered bone ROIs (bROIs), each image had six bROIs. The six bROIs in all images were extracted using YOLOv5 models. The mean value of mAP50 for hROIs was 0.978 (± 0.011) in YOLOv5 S, 0.987 (± 0.004) in YOLOv5 M, and 0.990 (± 0.003) in L (Table 4 ). Using the ipsilateral three bROIs, a rectangle with hip ROIs (hROIs) was built. Table 4 Mean Average Precision (mAP) YOLOv5 S YOLOv5 M YOLOv5 L mAP50 0.978 (± 0.011) 0.987 (± 0.004) 0.990 (± 0.003) mAP50-95 0.688 (± 0.010) 0.699 (± 0.009) 0.699 (± 0.008) Mean ± 95% confidence interval Evaluation by CSoR Grad-CAM heatmaps with hROI were conducted on all cases (Fig. 3 a). 3D-plotting and cropping heatmaps using hROI were performed (Fig. 3 b), and CSoR, a measure of relative CAM activity in an ROI, was quantified. The mean value on each hROI of the normalized CAM was defined as CSoR mean and the maximum value was defined as CSoR max . Both had similar tendencies; CSoR max and CSoR mean in the positive cases were significantly higher than those in the negative cases (Fig. 3 c,d). On subgroup analyses of the positive cases, CSoR max and CSoR mean at the affected sides were significantly higher than those at the contralateral sides (Fig. 3 c,d). In contrast, no significant differences were observed in CSoR max or CSoR mean between the left and right sides in the negative cases (Fig. 3 c,d). Feature importance scores in estimated parameters by aaCM As the estimated parameters themselves have diagnostic value, the feature importance scores of the parameters were investigated (Fig. 4 a,b). The scores on Yamamuro A and O-edge angle on the left side were significantly higher than those on the right side, while Yamamuro A was the top-ranked feature detected in this algorithm. Discussion Our findings showed that the integrated model exhibited the best diagnostic performance. hROIs were confirmed as a clinically relevant area because the estimated parameters extracted from hROIs had moderate diagnostic performance. As the inner processes were analyzed, CSoR was significantly higher in the positive cases than in the negative ones. In subgroup analyses of the positive cases, CSoR revealed an increase in CAM activities on the affected side, despite containing only binary labels. Feature importance scores were also significantly higher on the left side, while demographic data showed the affected side was commonly the left side. The diagnostic performance of the end-to-end model was found to be enhanced without any manual preparation of clinical data by physicians, owing to an ad hoc adoption of clinical insights. Setting C might not have been established if the models had been utilized only by technicians who did not have domain knowledge about DDH. Ad hoc adoptions like our model could contribute toward performance reinforcement independently from versatile ML techniques. The estimated parameters had an absolute error smaller than those of the orthopedic specialists or previous reports 20 , 21 . Automatic integration based on domain knowledge with high accuracy into ML might be helpful for the development of artificial intelligence in medicine. CSoR was introduced in this study as a quantitative method to evaluate model reliability. In previous studies, Grad-CAM heatmaps have only been shown as rendering images 4 – 6 , appearing compatible with the rationale for the outcomes because the weighted region matched the region of clinical concern. To the best of our knowledge, our study is the first reported in the medical field to perform statistical validation of Grad-CAM heatmaps and use this as the rationale for the outcomes. Specifically, hROIs were confirmed as a clinically relevant area because the estimated parameters extracted from hROIs had moderate diagnostic performance. Based on this observation, CSoR revealed an increase in CAM activities on the affected side in the positive cases, despite the ground truth using only binary labels (positive or negative). As these results were consistent with the clinical processes of hip evaluations, CoSR can function as a reasonable tool to investigate model reliability. Clinically, unstable hips have been conventionally diagnosed by provocative maneuvers 9 – 12 , which are widely used around the world as screening techniques. However, the reliability of this test is dependent on the skill and experience of the examiner, with iatrogenic effects possible from repeated examinations 22 ; thus, a standardized image analysis method is desired to address this issue. Automated analysis of images could decrease the number of provocative maneuvers or shorten the waiting period for infants to get the correct diagnosis. This study has some limitations. First, this study used radiographic images. While ultrasound inspection is a representative modality used to evaluate DDH in clinics, radiographs are also accepted for four- to six-month-old infants. Some reports state that radiographs are preferred for this age 23 , 24 . Second, the ground truth for unstable hips is provocative maneuvers, considered a conventional method. Although those maneuvers are widely used for screening, dynamic or static ultrasounds might also be preferred for defining unstable hips. Third, our dataset was relatively small compared to that of a previous study 21 because we focused on six-month-old infants, a critical age group for DDH. Fourth, the effect size between Settings C and A was not relatively large. In conclusion, we presented the XAI model on infantile hip images integrated with clinical measurements. We demonstrated that aaCM reinforces the diagnostic performance and CSoR potentially indicates model reliability. Methods Participants We enrolled infants at risk for DDH who visited the orthopedic department of our hospital and a related hospital between 2010 and 2020 based on the Japanese Pediatric Orthopaedic Association’s guidelines 17 . This multicenter and retrospective study was conducted and in accordance with the ethical principles outlined in the Declaration of Helsinki and approved by Human Ethics Committee of Hokkaido University Hospital (approved number:018–0397). Informed consent was obtained from all their legal guardians for participation in the study and publication of information. Model development Datasets and ground truth Anterior-posterior X-rays were collected from the infants aged four to six months. A total of 813 images were collected as Digital Imaging and Communications in Medicine (DICOM) data, with no data excluded. As a binary ground truth, three orthopedic surgeons with 22, 17, and 16 years of experience defined whether infants had an unstable hip. An unstable hip was diagnosed if provocative examinations such as the Barlow or Ortolani tests were positive. Therapeutic intervention such as brace treatment was performed on infants with unstable hips. The three surgeons also grounded the four parameters in each image. Demographic data in the whole dataset showed the affected side (right or left) in infants with unstable hips (Table 1 ). Six-fold cross-validation stratified by binary ground truth was adopted for the dataset 25 . Definition of bone ROIs and hip ROIs As training data, the right and left sides of the acetabulum, proximal femur, and ischium were annotated and validated by the three surgeons. As those ROIs were considered bone ROIs (bROIs), each image had six bROIs. Using the ipsilateral three bROIs, a rectangle with hip ROIs (hROIs) was built as follows: Height was defined as the distance between the upmost point of bROIs and the bottommost point, whereas width was defined as the distance between the lateralmost and innermost points. Framework of proposed models The settings were prepared as follows: Setting A had pure CNNs; Setting B had aaCM followed by LightGBM; Setting C was the integrated model (Fig. 1 ). aaCM part Component 1: ROI detection YOLOv5 was used to extract six bROIs 26 , with the S, M, and L models trained based on input parameters. As augmentation methods, images and bROIs were flipped in addition to the basics of the library 26 to correct the right and left sides. Component 2: Yielding clinical measurements For each bROI, γ correction was adopted to adjust the image contrast. This process was conducted until the adaptive threshold and blob detection were completed. Binarization with adaptive thresholding was applied to each bROI to transform the bone area into a blob 27 . The local threshold was calculated at every individual point of the image with sliding window image processing 28 . The threshold value is based on the intensity of the pixel and its neighborhood, with the blob itself detected by labeling processing. Then, the contour and featured points were detected. The radiographic hip parameters were measured using those points. The output was defined as the clinical measurements acetabular index, O-edge angle, Yamamuro A, and Yamamuro B. Setting A: Pure CNNs As popular investigating models, EfficientNet B0, B4, and B8 models were investigated, with their initial parameters ported from ImageNet-pretrained models 29 – 32 . A sigmoid function was used for activation, and binary cross-entropy loss was used to train the NN 33 . Image augmentation, flip, Gauss noise, blur, CLAHE, and saturation processes were performed. This model was given the tensors of a radiographic image, outputting one scalar value ranging from zero to one. Setting B: aaCM followed by LightGBM LightGBM was trained 34 using the outputs of aaCM. The max depth was three, early stopping round was 50, and boost round was 10,000. Setting C: Integrated model The features obtained from the convolutional layers of the CNNs with the clinical measurements were concatenated as the input for the fully connected layer. This architecture was originally created to enable training and inference in a single shot by dynamically combining the cached features with additional inputs in an online setting, rather than caching the generated features. Inner process analyses CAM Score on regions of interest (CSoR) The Grad-CAM technique was applied to a test image in each fold. CSoR, a measure of relative CAM activity in a region of interest, was created as follows: CAM images were generated and resized to match the input image's dimensions and then normalized based on their resolution; specifically, the pixel values of the CAM image were divided by the sum of all pixel values and then multiplied by the resolution of the CAM image. This process ensured that the average pixel value on the CAM image was 1.0. The normalized CAM pixel values obtained in this manner can be used to evaluate the level of interest with a threshold of 1.0, akin to SUV in PET-CT 35 . The mean value on each hROI of the normalized CAM was defined as "CSoR mean " and the maximum value was defined as "CSoR max ." Feature importance scores Feature selection was conducted based on the LightGBM implementation to detect the relevant features among the estimated parameters 2 . Evaluation metrics The mean Average Precision (mAP) was investigated to evaluate the performance of ROI detection 36 . Mean absolute errors between the clinical measurements from aaCM and the two orthopedic specialists were calculated. Several evaluation metrics were analyzed to compare the settings: accuracy, average AUPRC, AUROC, and F1 score for the test data in each fold 18 . These values were compared using a paired t-test with Bonferroni correction for multiple comparisons 19 . As the outcomes of inner process analyses, several comparisons were performed for CSoR mean and CSoR max in both the positive and negative cases: The first was between the positive and negative cases on the bilateral sides. The second was between the affected and contralateral sides in the positive cases. The third was between the right and left sides in negative cases. An unpaired t-test was used to compare two individual groups with a normal distribution. For each analysis, the value distribution was tested for normality. Declarations Author Contributions H. S. designed the study and collected data, all supervised by D.T. K.E. and H.K. conducted models. D.T., T.S., and S.S. defined the positive cases. K.S. and T.O. measured the clinical parameters. H. Shimizu wrote the manuscript, and K.E. illustrated the figures. N.I. and S.T. directed the entire study. Acknowledgments This study received no funding. Competing Interests All authors declare no financial or non-financial competing interests. Data Availability The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request. Code Availability The underlying code for this study is available from the corresponding author upon reasonable request. References Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019). Shimizu, H. et al. Machine learning algorithms: Prediction and feature selection for clinical refracture after surgically treated fragility fracture. J. Clin. Med. 11, 2021 (2022). Jahmunah, V., Ng, E. Y. K., Tan, R. S., Oh, S. L. & Acharya, U. R. Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals. Comput. Biol. Med. 146, 105550 (2022). Zheng, X. et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat. Commun. 11, 1236 (2020). Zhou, W. et al. Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat. Commun. 12, 1259 (2021). Sangha, V. et al. Automated multilabel diagnosis on electrocardiographic images and signals. Nat. Commun. 13, 1583 (2022). Zhang, Y. et al. Grad-CAM helps interpret the deep learning models trained to classify multiple sclerosis types using clinical brain magnetic resonance imaging. J. Neurosci. Methods 353, 109098 (2021). Mureşan, S., Mărginean, M. O., Voidăzan, S., Vlasa, I. & Sîntean, I. Musculoskeletal ultrasound: A useful tool for diagnosis of hip developmental dysplasia: One single-center experience. Med. (Baltim.) 98, e14081 (2019). Cook, K. A. et al. Pavlik Harness initiation on Barlow positive hips: Can we wait? J. Orthop. 16, 378–381 (2019). Neal, D. et al. Comparison of Pavlik Harness treatment regimens for reduced but dislocatable (Barlow positive) hips in infantile DDH. J. Orthop. 16, 440–444 (2019). Jackson, J. C., Runge, M. M. & Nye, N. S. Common questions about developmental dysplasia of the hip. Am. Fam. Physician. 90, 843–850 (2014). Williams, N. Improving early detection of developmental dysplasia of the hip through general practitioner assessment and surveillance. Aust. J. Gen. Pract. 47, 619–623 (2018). Agostiniani, R. et al. Recommendations for early diagnosis of Developmental Dysplasia of the Hip (DDH): Working group intersociety consensus document. Ital. J. Pediatr. 46, 150 (2020). Pollet, V. et al. Abduction treatment in stable hip dysplasia does not alter the acetabular growth: Results of a randomized clinical trial. Sci. Rep. 10, 9647 (2020). Narayanan, U. et al. Reliability of a new radiographic classification for developmental dysplasia of the hip. J. Pediatr. Orthop. 35, 478–484 (2015). Ohmori, T. et al. Radiographic prediction of the results of long-term treatment with the Pavlik harness for developmental dislocation of the hip. Acta Med. Okayama 63, 123–128 (2009). Shimizu, T. et al. Validation of parameters recommended for secondary screening for developmental dysplasia of the hip in Japan. J. Orthop. Sci. (2023). Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023). Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. & Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968 (2021). Al-Bashir, A. K., Al-Abed, M., Abu Sharkh, F. M., Kordeya, M. N. & Rousan, F. M. Algorithm for automatic angles measurement and screening for Developmental Dysplasia of the Hip (DDH). Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. . Annu. Int Conf IEEE Eng. Med. Biol. Soc. 2015 2015, 6386–6389 (2015). Liu, C. et al. Misshapen pelvis landmark detection with local-global feature learning for diagnosing developmental dysplasia of the hip. IEEE Trans. Med. Imaging 39, 3944–3954 (2020). Sewell, M. D. & Eastwood, D. M. Screening and treatment in developmental dysplasia of the hip-where do we go from here? Int. Orthop. 35, 1359–1367 (2011). Schaeffer, E., Lubicky, J. & Mulpuri, K. AAOS appropriate use criteria: The management of developmental dysplasia of the hip in infants up to 6 months of age: Intended for use by general pediatricians and referring physicians. J. Am. Acad. Orthop. Surg. 27, e364-e368 (2019). Shaw, B. A., Segal, L. S. & SECTION ON ORTHOPAEDICS. Evaluation and referral for developmental dysplasia of the hip in infants. Pediatrics 138 (2016). Jung, Y. & Hu, J. A. A K-fold averaging cross-validation procedure. J. Nonparametric Stat. 27, 167–179 (2015). Zhu, X., Lyu, S., Wang, X. & Zhao, Q. in Proceedings of the IEEE/CVF International Conference on Computer Vision 2778–2788. Korzynska, A. et al. Validation of various adaptive threshold methods of segmentation applied to follicular lymphoma digital images stained with 3,3’-diaminobenzidine&Haematoxylin. Diagn. Pathol. 8, 48 (2013). Sezgin, M., Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–165 (2004). Tan, M. & Le, Q. in International conference on machine learning 6105–6114 (PMLR). Marques, G., Ferreras, A. & de la Torre-Diez, I. An ensemble-based approach for automated medical diagnosis of malaria using EfficientNet. Multimed. Tools Appl. 81, 28061–28078 (2022). Chen, X. et al. Application of EfficientNet-B0 and GRU-based deep learning on classifying the colposcopy diagnosis of precancerous cervical lesions. Cancer Med. 12, 8690–8699 (2023). Sharma, N. et al. EfficientNetB0 cum FPN Based Semantic Segmentation of gastrointestinal Tract Organs in MRI Scans. Diagnostics (Basel) 13 (2023). Tan, M. & Le, Q. V. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946. https://ui.adsabs.harvard.edu/abs/2019arXiv190511946T , (2019). Ke, G. et al. in Proceedings of the 31st International Conference on Neural Information Processing Systems 3149–3157 (Curran Associates Inc., Long Beach, CA, 2017). Kinahan, P. E. & Fletcher, J. W. Positron emission tomography-computed tomography standardized uptake values in clinical practice and assessing response to therapy. Semin. Ultrasound CT MR 31, 496–505 (2010). Lin, T.-Y. et al. Computer vision–ECCV 2014 in Proceedings of the Part V: 13th European Conference, Zurich, Switzerland, September 6–12, 2014 13 740–755 (Springer, 2014). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 01 Aug, 2024 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 30 May, 2024 Reviews received at journal 11 May, 2024 Reviewers agreed at journal 20 Apr, 2024 Reviews received at journal 08 Mar, 2024 Reviewers agreed at journal 01 Mar, 2024 Reviewers invited by journal 06 Feb, 2024 Editor assigned by journal 31 Jan, 2024 Editor invited by journal 27 Dec, 2023 Submission checks completed at journal 27 Dec, 2023 First submitted to journal 25 Dec, 2023 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3805622","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":264471641,"identity":"c3ba0eee-e62b-4116-86c0-f369fd707a7d","order_by":0,"name":"Hirokazu Shimizu","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Hirokazu","middleName":"","lastName":"Shimizu","suffix":""},{"id":264471642,"identity":"3b3cb14c-616f-41b2-bc8c-11683dd97fc3","order_by":1,"name":"Ken Enda","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Ken","middleName":"","lastName":"Enda","suffix":""},{"id":264471643,"identity":"fbf606fc-f7b6-4ded-9aaf-c3159a939eb2","order_by":2,"name":"Hidenori Koyano","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Hidenori","middleName":"","lastName":"Koyano","suffix":""},{"id":264471644,"identity":"db920145-69d3-4d6f-a5c5-989a57f70f4d","order_by":3,"name":"Tomohiro Shimizu","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Tomohiro","middleName":"","lastName":"Shimizu","suffix":""},{"id":264471645,"identity":"bc1d7270-a070-4ab3-896a-d981346345af","order_by":4,"name":"Shun Shimodan","email":"","orcid":"","institution":"Kushiro City General Hospital","correspondingAuthor":false,"prefix":"","firstName":"Shun","middleName":"","lastName":"Shimodan","suffix":""},{"id":264471646,"identity":"3c4b18c3-73e7-4f3d-be6f-a7d7c388980b","order_by":5,"name":"Komei Sato","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Komei","middleName":"","lastName":"Sato","suffix":""},{"id":264471647,"identity":"6e9669cd-7c2a-4698-b4ab-39e2dc0ab33e","order_by":6,"name":"Takuya Ogawa","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Takuya","middleName":"","lastName":"Ogawa","suffix":""},{"id":264471648,"identity":"511e06b7-948f-4b38-b0a4-b8bbdf62cda2","order_by":7,"name":"Shinya Tanaka","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Shinya","middleName":"","lastName":"Tanaka","suffix":""},{"id":264471649,"identity":"811f44ef-e255-4a0d-90e1-fa1f0b8c4186","order_by":8,"name":"Norimasa Iwasaki","email":"","orcid":"","institution":"Hokkaido University","correspondingAuthor":false,"prefix":"","firstName":"Norimasa","middleName":"","lastName":"Iwasaki","suffix":""},{"id":264471650,"identity":"ff85e0c6-b1ee-4cf4-89bd-8116e9d3ae14","order_by":9,"name":"Daisuke Takahashi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJ0lEQVRIie2QQWvCMBSAXwnMS7XXiMPfEAhY/DcJQncRVhhID4P1lF72A3IY+w2ePDcE9BI8FybMUei54g/YMq3KsNPrYP0gj/cgX957AWho+JM4MToVkT0IpQA3+5pcVVJjQ0uwK4p99pgpYYNryFGpw0+0yMNIQycb6e3mddX3MSNQTlbgt2J4CM+VW8MTKo2GbhYEWM0KOpSMOXJZwPA5BSrPFQxc9Nri7WmajQegZppPM5aittBAMgbUrVG8j51iL9xvS/XyrfB4r7yv6xXMD8oYsIqtYqesusAviu0izSd0TTHAZm53SQQoudQuMTyu3cW7K3phFEBnMcrL6NH+GPI263Ki+2Sh57Tmx06kVSRVbkdyBL1k/FAOoPyi0tDQ0PBP+ALlRXHJGSGJKgAAAABJRU5ErkJggg==","orcid":"","institution":"Hokkaido University","correspondingAuthor":true,"prefix":"","firstName":"Daisuke","middleName":"","lastName":"Takahashi","suffix":""}],"badges":[],"createdAt":"2023-12-25 22:14:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3805622/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3805622/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-024-68484-7","type":"published","date":"2024-08-01T15:57:46+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":49083266,"identity":"a87c44dc-9736-44a9-a5ab-32f763c6c1b4","added_by":"auto","created_at":"2024-01-02 20:22:12","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":485259,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOverview of end-to-end models\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSetting A: Pure CNNs; Setting B: aaCM followed LightGBM on the four parameters (acetabular index, O-edge angle, Yamamuro A, Yamamuro B); Setting C: integrated model of CNN and aaCM followed by LightGBM;\u003c/p\u003e\n\u003cp\u003eaaCM: automatic algorithms of Clinical Measurements; CNNs: convolutional neural networks on images\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-3805622/v1/e398d0bd6a795ac3b409af7f.jpeg"},{"id":49082682,"identity":"469985ad-ed18-4c6e-85ed-64798037550f","added_by":"auto","created_at":"2024-01-02 20:14:12","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":540623,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModel Performance\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ea) Distribution of the accuracy, AUPRC, AUROC, and F1 score of the model. EfficientNetB4 models were trained during six-fold cross-validation per group. The 10\u003csup\u003eth\u003c/sup\u003e, 50\u003csup\u003eth \u003c/sup\u003e(median), and 90\u003csup\u003eth\u003c/sup\u003e quantiles, as well as minimum and maximum, are shown. A paired t-test with Bonferroni correction for multiple comparisons. *\u003cem\u003eP\u003c/em\u003e\u0026lt;0.05, **\u003cem\u003eP\u003c/em\u003e\u0026lt;0.01, ***\u003cem\u003eP\u003c/em\u003e\u0026lt;0.001 compared with setting C. b) Precision–recall and receiver operating characteristics curves of settings A, B, and C. The mean of the six-fold cross-validation is shown.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-3805622/v1/89efa9001ccc995feab8ea36.jpeg"},{"id":49083401,"identity":"ef7d06dd-0115-47c2-b957-7779e498af72","added_by":"auto","created_at":"2024-01-02 20:30:12","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1169841,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eEvaluation by GSoR\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ea) Representative images of positive and negative cases in which Grad-CAM heat-map was integrated with hROIs. b) 3D plotting of relative CAM activities (left) and cropping by hROIs (right). c) CSoR\u003csub\u003emax \u003c/sub\u003eof the positive and negative cases (left), the affected side and contralateral side in the positive cases(middle), and the right and left side in the negative case(right). d) CSoR\u003csub\u003emean \u003c/sub\u003eof the positive and negative cases (left), the affected side and contralateral side in the positive cases (middle), and the right and left side in the negative case. A non-paired student t-test was used. n.s. \u003cem\u003eP\u003c/em\u003e\u0026gt;0.05, *\u003cem\u003eP\u003c/em\u003e\u0026lt;0.05, **\u003cem\u003eP\u003c/em\u003e\u0026lt;0.01, ***\u003cem\u003eP\u003c/em\u003e\u0026lt;0.001. Contra.: Contralateral\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-3805622/v1/dac7d96d255ae19f12ea03ca.png"},{"id":49082681,"identity":"bdfda8fd-a58c-4b7d-b873-160a89484197","added_by":"auto","created_at":"2024-01-02 20:14:12","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":131213,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature Importance scores on the parameters\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ea) Scheme of radiographic parameters. b) Feature importance scores of Yamamuro A, Yamamuro B, acetabular index, and O-edge angle on left and right sides. A non-paired student t-test was performed for comparison. n.s. \u003cem\u003eP\u003c/em\u003e\u0026gt;0.05, *\u003cem\u003eP\u003c/em\u003e\u0026lt;0.05, **\u003cem\u003eP\u003c/em\u003e\u0026lt;0.01, ***\u003cem\u003eP\u003c/em\u003e\u0026lt;0.001.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-3805622/v1/d86dfd0b320a51b47cf24cce.png"},{"id":61793656,"identity":"6047f61f-4d6a-45d5-90a9-8bfb2add49c0","added_by":"auto","created_at":"2024-08-05 16:14:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2846839,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3805622/v1/eeb5db2c-27da-4e90-8d6d-270fa0e18bfe.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Explainable AI Models on Radiographic Images Integrated with Clinical Measurements: Prediction for Unstable Hips in Infants","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe medical imaging field is currently seeing increased usage of machine learning (ML) using networks such as convolutional neural networks (CNNs). Although CNNs perform better than specialized physicians in classification models\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e, the rationale behind the prediction is lacking. To address this issue, explainable artificial intelligence (XAI) such as the Grad-CAM and feature selection techniques have been introduced\u003csup\u003e\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. XAI explains the internal operation as visualizing the weighted area or featuring tabular data for classification. While previous studies have reported that weighted regions of the Grad-CAM heatmaps matched with clinically concerned areas in representative images\u003csup\u003e\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, technologies to quantify Grad-CAM heatmaps remain lacking\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDevelopmental dysplasia of the hip (DDH) is one of the most common congenital abnormalities of the musculoskeletal apparatus affecting infants, ranging from mild dysplasia to dislocated hips\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. Provocative maneuvers have been used widely for screening unstable hips\u003csup\u003e\u003cspan additionalcitationids=\"CR10 CR11\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Therapeutic interventions should be performed on the unstable hips of infants under 6 months\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, whereas infants with stable hips can be observed. Notably, a randomized study found that interventions for stable dysplasia did not affect acetabular growth\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e, thus making it crucial to identify unstable hips because therapeutic strategies differ according to hip instability. Image investigations can also assist with diagnosing DDH with the use of the following four parameters on the hip region: acetabular index, O-edge angle, Yamamuro A, and Yamamuro-B\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e,\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Currently, automatic integration based on domain knowledge using these clinical parameters is rarely performed for end-to-end models.\u003c/p\u003e \u003cp\u003eHere, we create an end-to-end model that produces CAM Scores on a region of interest (CSoR), a measure of the relative CAM activity on the hip area, as well as feature importance scores by an automatic algorithm of clinical measurements (aaCM) followed by LightGBM; aaCM can estimate the four parameters on each side. This study aims to evaluate the diagnostic performance on infantile hip images by integrating clinical measurements and quantifying the inner processes through CSoR on the images and the feature importance scores.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e \u003cb\u003eModel overview and data collection\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe settings were prepared as follows: Setting A had pure CNNs; Setting B had aaCM followed by LightGBM; Setting C was the integrated model (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eData were collected from infants who visited our and related hospitals and were at risk of DDH based on the Japanese Pediatric Orthopaedic Association\u0026rsquo;s guidelines\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. A total of 813 radiographic images were enrolled, with no data excluded. Demographic data showed that the left hip was affected more than the right hip (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). As a binary ground truth, infants were defined as having an unstable hip if provocative examinations were positive.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDemographic data\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOur and related hospitals\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of infants\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e813\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean age, month\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4.4 (\u0026plusmn;\u0026thinsp;0.02)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFemale, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.3 (677/813)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBreech presentation, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28.9 (235/813)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFamily history of first relatives, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e23.2 (189/813)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSkin laterality, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e48.3 (393/813)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLimb limitation, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e40.6 (330/813)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnstable hip, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e28.2 (232/813)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBeing unstable in left hip, %\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e76.7 (178/232)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"2\"\u003eMean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard error\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003ePerformance of end-to-end models for unstable hips\u003c/h3\u003e\n\u003cp\u003eSix-fold cross-validation was applied to ensure a combination of models could be used to reach a majority decision, as previously described\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Upon six-fold cross-validation by using EfficientNetB4, the average accuracy of the models for predicting unstable hips was 80.8% (\u0026plusmn;\u0026thinsp;0.97%) for Setting A and 80.5% (\u0026plusmn;\u0026thinsp;1.12%) for Setting B (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea-c). The average area under the precision-recall curve (AUPRC) was 0.733 (\u0026plusmn;\u0026thinsp;0.050) for Setting A and 0.706 (\u0026plusmn;\u0026thinsp;0.048) for Setting B. The average area under the receiver operating curve (AUROC) was 0.840 (\u0026plusmn;\u0026thinsp;0.029) for Setting A and 0.789 (\u0026plusmn;\u0026thinsp;0.036) for Setting B. The average F1 score was 0.791 (\u0026plusmn;\u0026thinsp;0.027) for Setting A and 0.754 (\u0026plusmn;\u0026thinsp;0.035) for Setting B. Setting C was trained on all images simultaneously, obtaining an average accuracy of 83.2% (\u0026plusmn;\u0026thinsp;1.84%), average AUPRC of 0.804 (\u0026plusmn;\u0026thinsp;0.060), average AUROC of 0.885 (\u0026plusmn;\u0026thinsp;0.022), and average F1 score of 0.825 (\u0026plusmn;\u0026thinsp;0.017), thus indicating that Setting C offered significantly better performance than either Setting A or B with \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026le;\u0026thinsp;0.05 for accuracy, AUPRC, AUROC, and F1 score (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea-c).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eUsing EfficientNet B0 and B8 revealed that both average AUPRC and AUROC in Setting C were significantly higher than those of Setting A with \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026le;\u0026thinsp;0.05. In EfficientNetB0, the average AUPRC was 0.722 (\u0026plusmn;\u0026thinsp;0.036) for Setting A and 0.778 (\u0026plusmn;\u0026thinsp;0.034) for Setting C, and the average AUROC was 0.826(\u0026plusmn;\u0026thinsp;0.035) for Setting A and 0.864 (\u0026plusmn;\u0026thinsp;0.028) for Setting C. In EfficientNetB8, the average AUPRC was 0.758 (\u0026plusmn;\u0026thinsp;0.032) for Setting A and 0.795 (\u0026plusmn;\u0026thinsp;0.029) for Setting C, and the average AUROC was 0.848 (\u0026plusmn;\u0026thinsp;0.031) for Setting A and 0.869 (\u0026plusmn;\u0026thinsp;0.029) for Setting C.\u003c/p\u003e\n\u003ch3\u003eAccuracy of the estimated parameters by aaCM\u003c/h3\u003e\n\u003cp\u003eThe mean absolute error of the estimated acetabular index was 1.76\u0026deg; on the right and 1.91\u0026deg; on the left, lesser than previously reported\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Moreover, the estimated acetabular index produced by aaCM had a significantly smaller error than that produced by the orthopedic specialists (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMean absolute error of acetabular index\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethod\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eMean absolute error of acetabular index\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRight\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLeft\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBashir\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3.78\u0026deg;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2.95\u0026deg;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLiu\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2.25\u0026deg;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2.23\u0026deg;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOur model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.76\u0026deg;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.91\u0026deg;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAccuracy of the estimated parameters produced by aaCM compared with that of orthopedic specialists\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eOur model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eOrthopedic specialists\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eP value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAcetabular index\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eright\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e1.76 (\u0026plusmn;\u0026thinsp;0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e2.40 (\u0026plusmn;\u0026thinsp;0.12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eleft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e1.91 (\u0026plusmn;\u0026thinsp;0.12)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e2.48 (\u0026plusmn;\u0026thinsp;0.16)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eO-edge angle\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eright\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e4.37 (\u0026plusmn;\u0026thinsp;0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e5.84 (\u0026plusmn;\u0026thinsp;0.31)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eleft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e4.84 (\u0026plusmn;\u0026thinsp;0.36)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e6.77 (\u0026plusmn;\u0026thinsp;0.39)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYamamuro A\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eright\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.98 (\u0026plusmn;\u0026thinsp;0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e1.08 (\u0026plusmn;\u0026thinsp;0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.18\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eleft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.91 (\u0026plusmn;\u0026thinsp;0.07)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.97 (\u0026plusmn;\u0026thinsp;0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.23\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYamamuro B\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eright\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.69 (\u0026plusmn;\u0026thinsp;0.04)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.72 (\u0026plusmn;\u0026thinsp;0.06)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.20\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eleft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.76 (\u0026plusmn;\u0026thinsp;0.08)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.74 (\u0026plusmn;\u0026thinsp;0.05)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003eMean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard error\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eROI detection\u003c/h3\u003e\n\u003cp\u003eThe right and left sides of the acetabulum, proximal femur, and ischium were annotated. As those ROIs were considered bone ROIs (bROIs), each image had six bROIs. The six bROIs in all images were extracted using YOLOv5 models. The mean value of mAP50 for hROIs was 0.978 (\u0026plusmn;\u0026thinsp;0.011) in YOLOv5 S, 0.987 (\u0026plusmn;\u0026thinsp;0.004) in YOLOv5 M, and 0.990 (\u0026plusmn;\u0026thinsp;0.003) in L (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Using the ipsilateral three bROIs, a rectangle with hip ROIs (hROIs) was built.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMean Average Precision (mAP)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYOLOv5 S\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYOLOv5 M\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYOLOv5 L\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003emAP50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.978 (\u0026plusmn;\u0026thinsp;0.011)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.987 (\u0026plusmn;\u0026thinsp;0.004)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.990 (\u0026plusmn;\u0026thinsp;0.003)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003emAP50-95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.688 (\u0026plusmn;\u0026thinsp;0.010)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.699 (\u0026plusmn;\u0026thinsp;0.009)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.699 (\u0026plusmn;\u0026thinsp;0.008)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eMean\u0026thinsp;\u0026plusmn;\u0026thinsp;95% confidence interval\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eEvaluation by CSoR\u003c/h3\u003e\n\u003cp\u003eGrad-CAM heatmaps with hROI were conducted on all cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea). 3D-plotting and cropping heatmaps using hROI were performed (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb), and CSoR, a measure of relative CAM activity in an ROI, was quantified. The mean value on each hROI of the normalized CAM was defined as CSoR\u003csub\u003emean\u003c/sub\u003e and the maximum value was defined as CSoR\u003csub\u003emax\u003c/sub\u003e. Both had similar tendencies; CSoR\u003csub\u003emax\u003c/sub\u003e and CSoR\u003csub\u003emean\u003c/sub\u003e in the positive cases were significantly higher than those in the negative cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec,d). On subgroup analyses of the positive cases, CSoR\u003csub\u003emax\u003c/sub\u003e and CSoR\u003csub\u003emean\u003c/sub\u003e at the affected sides were significantly higher than those at the contralateral sides (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec,d). In contrast, no significant differences were observed in CSoR\u003csub\u003emax\u003c/sub\u003e or CSoR\u003csub\u003emean\u003c/sub\u003e between the left and right sides in the negative cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec,d).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eFeature importance scores in estimated parameters by aaCM\u003c/h3\u003e\n\u003cp\u003eAs the estimated parameters themselves have diagnostic value, the feature importance scores of the parameters were investigated (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea,b). The scores on Yamamuro A and O-edge angle on the left side were significantly higher than those on the right side, while Yamamuro A was the top-ranked feature detected in this algorithm.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur findings showed that the integrated model exhibited the best diagnostic performance. hROIs were confirmed as a clinically relevant area because the estimated parameters extracted from hROIs had moderate diagnostic performance. As the inner processes were analyzed, CSoR was significantly higher in the positive cases than in the negative ones. In subgroup analyses of the positive cases, CSoR revealed an increase in CAM activities on the affected side, despite containing only binary labels. Feature importance scores were also significantly higher on the left side, while demographic data showed the affected side was commonly the left side.\u003c/p\u003e \u003cp\u003eThe diagnostic performance of the end-to-end model was found to be enhanced without any manual preparation of clinical data by physicians, owing to an ad hoc adoption of clinical insights. Setting C might not have been established if the models had been utilized only by technicians who did not have domain knowledge about DDH. Ad hoc adoptions like our model could contribute toward performance reinforcement independently from versatile ML techniques. The estimated parameters had an absolute error smaller than those of the orthopedic specialists or previous reports\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. Automatic integration based on domain knowledge with high accuracy into ML might be helpful for the development of artificial intelligence in medicine.\u003c/p\u003e \u003cp\u003eCSoR was introduced in this study as a quantitative method to evaluate model reliability. In previous studies, Grad-CAM heatmaps have only been shown as rendering images\u003csup\u003e\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, appearing compatible with the rationale for the outcomes because the weighted region matched the region of clinical concern. To the best of our knowledge, our study is the first reported in the medical field to perform statistical validation of Grad-CAM heatmaps and use this as the rationale for the outcomes. Specifically, hROIs were confirmed as a clinically relevant area because the estimated parameters extracted from hROIs had moderate diagnostic performance. Based on this observation, CSoR revealed an increase in CAM activities on the affected side in the positive cases, despite the ground truth using only binary labels (positive or negative). As these results were consistent with the clinical processes of hip evaluations, CoSR can function as a reasonable tool to investigate model reliability.\u003c/p\u003e \u003cp\u003eClinically, unstable hips have been conventionally diagnosed by provocative maneuvers\u003csup\u003e\u003cspan additionalcitationids=\"CR10 CR11\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, which are widely used around the world as screening techniques. However, the reliability of this test is dependent on the skill and experience of the examiner, with iatrogenic effects possible from repeated examinations\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e; thus, a standardized image analysis method is desired to address this issue. Automated analysis of images could decrease the number of provocative maneuvers or shorten the waiting period for infants to get the correct diagnosis.\u003c/p\u003e \u003cp\u003eThis study has some limitations. First, this study used radiographic images. While ultrasound inspection is a representative modality used to evaluate DDH in clinics, radiographs are also accepted for four- to six-month-old infants. Some reports state that radiographs are preferred for this age\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e,\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. Second, the ground truth for unstable hips is provocative maneuvers, considered a conventional method. Although those maneuvers are widely used for screening, dynamic or static ultrasounds might also be preferred for defining unstable hips. Third, our dataset was relatively small compared to that of a previous study\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e because we focused on six-month-old infants, a critical age group for DDH. Fourth, the effect size between Settings C and A was not relatively large.\u003c/p\u003e \u003cp\u003eIn conclusion, we presented the XAI model on infantile hip images integrated with clinical measurements. We demonstrated that aaCM reinforces the diagnostic performance and CSoR potentially indicates model reliability.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e \u003cb\u003eParticipants\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWe enrolled infants at risk for DDH who visited the orthopedic department of our hospital and a related hospital between 2010 and 2020 based on the Japanese Pediatric Orthopaedic Association\u0026rsquo;s guidelines\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. This multicenter and retrospective study was conducted and in accordance with the ethical principles outlined in the Declaration of Helsinki and approved by Human Ethics Committee of Hokkaido University Hospital (approved number:018\u0026ndash;0397). Informed consent was obtained from all their legal guardians for participation in the study and publication of information.\u003c/p\u003e\n\u003ch3\u003eModel development\u003c/h3\u003e\n\u003cp\u003e \u003cb\u003eDatasets and ground truth\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAnterior-posterior X-rays were collected from the infants aged four to six months. A total of 813 images were collected as Digital Imaging and Communications in Medicine (DICOM) data, with no data excluded. As a binary ground truth, three orthopedic surgeons with 22, 17, and 16 years of experience defined whether infants had an unstable hip. An unstable hip was diagnosed if provocative examinations such as the Barlow or Ortolani tests were positive. Therapeutic intervention such as brace treatment was performed on infants with unstable hips. The three surgeons also grounded the four parameters in each image. Demographic data in the whole dataset showed the affected side (right or left) in infants with unstable hips (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Six-fold cross-validation stratified by binary ground truth was adopted for the dataset\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003ch3\u003eDefinition of bone ROIs and hip ROIs\u003c/h3\u003e\n\u003cp\u003eAs training data, the right and left sides of the acetabulum, proximal femur, and ischium were annotated and validated by the three surgeons. As those ROIs were considered bone ROIs (bROIs), each image had six bROIs. Using the ipsilateral three bROIs, a rectangle with hip ROIs (hROIs) was built as follows: Height was defined as the distance between the upmost point of bROIs and the bottommost point, whereas width was defined as the distance between the lateralmost and innermost points.\u003c/p\u003e\n\u003ch3\u003eFramework of proposed models\u003c/h3\u003e\n\u003cp\u003eThe settings were prepared as follows: Setting A had pure CNNs; Setting B had aaCM followed by LightGBM; Setting C was the integrated model (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\n\u003ch3\u003eaaCM part\u003c/h3\u003e\n\u003cp\u003e \u003cb\u003eComponent 1: ROI detection\u003c/b\u003e \u003c/p\u003e \u003cp\u003eYOLOv5 was used to extract six bROIs\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, with the S, M, and L models trained based on input parameters. As augmentation methods, images and bROIs were flipped in addition to the basics of the library\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e to correct the right and left sides.\u003c/p\u003e\n\u003ch3\u003eComponent 2: Yielding clinical measurements\u003c/h3\u003e\n\u003cp\u003eFor each bROI, γ correction was adopted to adjust the image contrast. This process was conducted until the adaptive threshold and blob detection were completed. Binarization with adaptive thresholding was applied to each bROI to transform the bone area into a blob\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. The local threshold was calculated at every individual point of the image with sliding window image processing\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. The threshold value is based on the intensity of the pixel and its neighborhood, with the blob itself detected by labeling processing. Then, the contour and featured points were detected. The radiographic hip parameters were measured using those points. The output was defined as the clinical measurements acetabular index, O-edge angle, Yamamuro A, and Yamamuro B.\u003c/p\u003e\n\u003ch3\u003eSetting A: Pure CNNs\u003c/h3\u003e\n\u003cp\u003eAs popular investigating models, EfficientNet B0, B4, and B8 models were investigated, with their initial parameters ported from ImageNet-pretrained models\u003csup\u003e\u003cspan additionalcitationids=\"CR30 CR31\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. A sigmoid function was used for activation, and binary cross-entropy loss was used to train the NN\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. Image augmentation, flip, Gauss noise, blur, CLAHE, and saturation processes were performed. This model was given the tensors of a radiographic image, outputting one scalar value ranging from zero to one.\u003c/p\u003e \u003cp\u003e \u003cb\u003eSetting B: aaCM followed by LightGBM\u003c/b\u003e \u003c/p\u003e \u003cp\u003eLightGBM was trained\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e using the outputs of aaCM. The max depth was three, early stopping round was 50, and boost round was 10,000.\u003c/p\u003e \u003cp\u003e \u003cb\u003eSetting C: Integrated model\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe features obtained from the convolutional layers of the CNNs with the clinical measurements were concatenated as the input for the fully connected layer. This architecture was originally created to enable training and inference in a single shot by dynamically combining the cached features with additional inputs in an online setting, rather than caching the generated features.\u003c/p\u003e\n\u003ch3\u003eInner process analyses\u003c/h3\u003e\n\u003cp\u003e \u003cb\u003eCAM Score on regions of interest (CSoR)\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe Grad-CAM technique was applied to a test image in each fold. CSoR, a measure of relative CAM activity in a region of interest, was created as follows: CAM images were generated and resized to match the input image's dimensions and then normalized based on their resolution; specifically, the pixel values of the CAM image were divided by the sum of all pixel values and then multiplied by the resolution of the CAM image. This process ensured that the average pixel value on the CAM image was 1.0. The normalized CAM pixel values obtained in this manner can be used to evaluate the level of interest with a threshold of 1.0, akin to SUV in PET-CT\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e. The mean value on each hROI of the normalized CAM was defined as \"CSoR\u003csub\u003emean\u003c/sub\u003e\" and the maximum value was defined as \"CSoR\u003csub\u003emax\u003c/sub\u003e.\"\u003c/p\u003e\n\u003ch3\u003eFeature importance scores\u003c/h3\u003e\n\u003cp\u003eFeature selection was conducted based on the LightGBM implementation to detect the relevant features among the estimated parameters\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003ch3\u003eEvaluation metrics\u003c/h3\u003e\n\u003cp\u003eThe mean Average Precision (mAP) was investigated to evaluate the performance of ROI detection\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. Mean absolute errors between the clinical measurements from aaCM and the two orthopedic specialists were calculated.\u003c/p\u003e \u003cp\u003eSeveral evaluation metrics were analyzed to compare the settings: accuracy, average AUPRC, AUROC, and F1 score for the test data in each fold\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. These values were compared using a paired t-test with Bonferroni correction for multiple comparisons\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAs the outcomes of inner process analyses, several comparisons were performed for CSoR\u003csub\u003emean\u003c/sub\u003e and CSoR\u003csub\u003emax\u003c/sub\u003e in both the positive and negative cases: The first was between the positive and negative cases on the bilateral sides. The second was between the affected and contralateral sides in the positive cases. The third was between the right and left sides in negative cases. An unpaired t-test was used to compare two individual groups with a normal distribution. For each analysis, the value distribution was tested for normality.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eH. S. designed the study and collected data, all supervised by D.T. K.E. and H.K. conducted models. D.T., T.S., and S.S. defined the positive cases.\u0026nbsp;K.S. and T.O. measured the clinical parameters. H. Shimizu wrote the manuscript, and\u0026nbsp;K.E.\u0026nbsp;illustrated the figures. N.I.\u0026nbsp;and S.T. directed the entire study.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study received no funding.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors declare no financial or non-financial competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe underlying code for this study is available from the corresponding author upon reasonable request.\u0026nbsp;\u003cbr\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eArdila, D. \u003cem\u003eet al.\u003c/em\u003e End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954\u0026ndash;961 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShimizu, H. \u003cem\u003eet al.\u003c/em\u003e Machine learning algorithms: Prediction and feature selection for clinical refracture after surgically treated fragility fracture. J. Clin. Med. 11, 2021 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJahmunah, V., Ng, E. Y. K., Tan, R. S., Oh, S. L. \u0026amp; Acharya, U. R. Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals. Comput. Biol. Med. 146, 105550 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng, X. \u003cem\u003eet al.\u003c/em\u003e Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat. Commun. 11, 1236 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou, W. \u003cem\u003eet al.\u003c/em\u003e Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat. Commun. 12, 1259 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSangha, V. \u003cem\u003eet al.\u003c/em\u003e Automated multilabel diagnosis on electrocardiographic images and signals. Nat. Commun. 13, 1583 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y. \u003cem\u003eet al.\u003c/em\u003e Grad-CAM helps interpret the deep learning models trained to classify multiple sclerosis types using clinical brain magnetic resonance imaging. J. Neurosci. Methods 353, 109098 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMureşan, S., Mărginean, M. O., Voidăzan, S., Vlasa, I. \u0026amp; S\u0026icirc;ntean, I. Musculoskeletal ultrasound: A useful tool for diagnosis of hip developmental dysplasia: One single-center experience. Med. (Baltim.) 98, e14081 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCook, K. A. \u003cem\u003eet al.\u003c/em\u003e Pavlik Harness initiation on Barlow positive hips: Can we wait? J. Orthop. 16, 378\u0026ndash;381 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNeal, D. \u003cem\u003eet al.\u003c/em\u003e Comparison of Pavlik Harness treatment regimens for reduced but dislocatable (Barlow positive) hips in infantile DDH. J. Orthop. 16, 440\u0026ndash;444 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJackson, J. C., Runge, M. M. \u0026amp; Nye, N. S. Common questions about developmental dysplasia of the hip. Am. Fam. Physician. 90, 843\u0026ndash;850 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilliams, N. Improving early detection of developmental dysplasia of the hip through general practitioner assessment and surveillance. Aust. J. Gen. Pract. 47, 619\u0026ndash;623 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAgostiniani, R. \u003cem\u003eet al.\u003c/em\u003e Recommendations for early diagnosis of Developmental Dysplasia of the Hip (DDH): Working group intersociety consensus document. Ital. J. Pediatr. 46, 150 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePollet, V. \u003cem\u003eet al.\u003c/em\u003e Abduction treatment in stable hip dysplasia does not alter the acetabular growth: Results of a randomized clinical trial. Sci. Rep. 10, 9647 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNarayanan, U. \u003cem\u003eet al.\u003c/em\u003e Reliability of a new radiographic classification for developmental dysplasia of the hip. J. Pediatr. Orthop. 35, 478\u0026ndash;484 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOhmori, T. \u003cem\u003eet al.\u003c/em\u003e Radiographic prediction of the results of long-term treatment with the Pavlik harness for developmental dislocation of the hip. Acta Med. Okayama 63, 123\u0026ndash;128 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShimizu, T. \u003cem\u003eet al.\u003c/em\u003e Validation of parameters recommended for secondary screening for developmental dysplasia of the hip in Japan. J. Orthop. Sci. (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFoersch, S. \u003cem\u003eet al.\u003c/em\u003e Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430\u0026ndash;439 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. \u0026amp; Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAl-Bashir, A. K., Al-Abed, M., Abu Sharkh, F. M., Kordeya, M. N. \u0026amp; Rousan, F. M. Algorithm for automatic angles measurement and screening for Developmental Dysplasia of the Hip (DDH). \u003cem\u003eAnnu. Int. Conf. IEEE Eng. Med. Biol. Soc.\u003c/em\u003e. Annu. Int Conf IEEE Eng. Med. Biol. Soc. 2015 2015, 6386\u0026ndash;6389 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, C. \u003cem\u003eet al.\u003c/em\u003e Misshapen pelvis landmark detection with local-global feature learning for diagnosing developmental dysplasia of the hip. IEEE Trans. Med. Imaging 39, 3944\u0026ndash;3954 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSewell, M. D. \u0026amp; Eastwood, D. M. Screening and treatment in developmental dysplasia of the hip-where do we go from here? Int. Orthop. 35, 1359\u0026ndash;1367 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchaeffer, E., Lubicky, J. \u0026amp; Mulpuri, K. AAOS appropriate use criteria: The management of developmental dysplasia of the hip in infants up to 6 months of age: Intended for use by general pediatricians and referring physicians. J. Am. Acad. Orthop. Surg. 27, e364-e368 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShaw, B. A., Segal, L. S. \u0026amp; SECTION ON ORTHOPAEDICS. Evaluation and referral for developmental dysplasia of the hip in infants. Pediatrics 138 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJung, Y. \u0026amp; Hu, J. A. A K-fold averaging cross-validation procedure. J. Nonparametric Stat. 27, 167\u0026ndash;179 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, X., Lyu, S., Wang, X. \u0026amp; Zhao, Q. in Proceedings of the IEEE/CVF International Conference on Computer Vision 2778\u0026ndash;2788.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKorzynska, A. \u003cem\u003eet al.\u003c/em\u003e Validation of various adaptive threshold methods of segmentation applied to follicular lymphoma digital images stained with 3,3\u0026rsquo;-diaminobenzidine\u0026amp;Haematoxylin. Diagn. Pathol. 8, 48 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSezgin, M., Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146\u0026ndash;165 (2004).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTan, M. \u0026amp; Le, Q. in International conference on machine learning 6105\u0026ndash;6114 (PMLR).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarques, G., Ferreras, A. \u0026amp; de la Torre-Diez, I. An ensemble-based approach for automated medical diagnosis of malaria using EfficientNet. Multimed. Tools Appl. 81, 28061\u0026ndash;28078 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, X. \u003cem\u003eet al.\u003c/em\u003e Application of EfficientNet-B0 and GRU-based deep learning on classifying the colposcopy diagnosis of precancerous cervical lesions. Cancer Med. 12, 8690\u0026ndash;8699 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSharma, N. \u003cem\u003eet al.\u003c/em\u003e EfficientNetB0 cum FPN Based Semantic Segmentation of gastrointestinal Tract Organs in MRI Scans. Diagnostics (Basel) 13 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTan, M. \u0026amp; Le, Q. V. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://ui.adsabs.harvard.edu/abs/2019arXiv190511946T\u003c/span\u003e\u003cspan address=\"https://ui.adsabs.harvard.edu/abs/2019arXiv190511946T\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKe, G. \u003cem\u003eet al.\u003c/em\u003e in Proceedings of the 31st International Conference on Neural Information Processing Systems 3149\u0026ndash;3157 (Curran Associates Inc., Long Beach, CA, 2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKinahan, P. E. \u0026amp; Fletcher, J. W. Positron emission tomography-computed tomography standardized uptake values in clinical practice and assessing response to therapy. Semin. Ultrasound CT MR 31, 496\u0026ndash;505 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin, T.-Y. \u003cem\u003eet al.\u003c/em\u003e Computer vision\u0026ndash;ECCV 2014 in Proceedings of the Part V: 13th European Conference, Zurich, Switzerland, September 6\u0026ndash;12, 2014 13 740\u0026ndash;755 (Springer, 2014).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-3805622/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3805622/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eConsidering explainability is crucial in medical artificial intelligence, technologies to quantify Grad-CAM heatmaps and perform automatic integration based on domain knowledge remain lacking. Hence, we created an end-to-end model that produced CAM scores on regions of interest (CSoR), a measure of relative CAM activity, and feature importance scores by automatic algorithms for clinical measurement (aaCM) followed by LightGBM. In this multicenter research project, the diagnostic performance of the model was investigated with 813 radiographic hip images in infants at risk of unstable hips, with the ground truth defined by provocative examinations. The results indicated that the accuracy of aaCM was higher than that of specialists, and the model with ad hoc adoption of aaCM outperformed the image-only-based model. Subgroup analyses in positive cases indicated significant differences in CSoR between the unstable and contralateral sides despite containing only binary labels (positive or negative). In conclusion, aaCM reinforces the performance, and CSoR potentially indicates model reliability.\u003c/p\u003e","manuscriptTitle":"Explainable AI Models on Radiographic Images Integrated with Clinical Measurements: Prediction for Unstable Hips in Infants","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-02 20:14:08","doi":"10.21203/rs.3.rs-3805622/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-05-30T10:03:14+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-05-11T19:25:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"020a5176-6b3d-48c4-b4ec-0fcfe8e7283f","date":"2024-04-20T19:51:23+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-03-08T05:42:16+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"affa8cec-d56e-4326-b383-c345555d140d","date":"2024-03-01T23:57:03+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-02-06T13:56:33+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-01-31T14:38:48+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2023-12-27T11:42:38+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2023-12-27T11:36:33+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2023-12-25T22:01:56+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"922e1488-5fbb-47e9-81ca-25b9d3d88b16","owner":[],"postedDate":"January 2nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":27874442,"name":"Health sciences/Diseases"},{"id":27874443,"name":"Health sciences/Medical research/Paediatric research"}],"tags":[],"updatedAt":"2024-08-05T16:04:44+00:00","versionOfRecord":{"articleIdentity":"rs-3805622","link":"https://doi.org/10.1038/s41598-024-68484-7","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2024-08-01 15:57:46","publishedOnDateReadable":"August 1st, 2024"},"versionCreatedAt":"2024-01-02 20:14:08","video":"","vorDoi":"10.1038/s41598-024-68484-7","vorDoiUrl":"https://doi.org/10.1038/s41598-024-68484-7","workflowStages":[]},"version":"v1","identity":"rs-3805622","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3805622","identity":"rs-3805622","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00