Automated assessment of cervical vertebral maturation stages on lateral cephalometric radiographs using a two-stage deep learning framework | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Automated assessment of cervical vertebral maturation stages on lateral cephalometric radiographs using a two-stage deep learning framework Maryam Javaheri Mahd, Hossein Agha Aghili, Hamid Reza Dehghan, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9226756/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Background Accurate assessment of cervical vertebral maturation (CVM) is critical for timing orthodontic interventions. Manual assessment is often subjective and prone to inter-observer variability. This study aimed to develop and validate a fully automated, two-stage deep learning framework for CVM stage classification using lateral cephalometric radiographs (LCRs). Methods A dataset of 1102 LCRs from individuals aged 7–18 years was curated, preserving native radiographic noise to accurately simulate real-world imaging conditions. The proposed pipeline consists of two stages: (1) automated detection and instance segmentation of C2, C3, and C4 vertebrae using Mask R-CNN architecture with a ResNet-50-FPN backbone, and (2) skeletal maturity classification (CS1–CS6) using an EfficientNet-B3 model with a late-fusion strategy. The model was trained using transfer learning and evaluated using mean average precision (mAP), accuracy, and confusion matrices. Results The detection model achieved high localization precision (AP@IoU = 0.5 = 0.96; mAP@50 = 0.85). The classification stage demonstrated an overall accuracy of approximately 70%, with peak performance in identifying CS6 (73%). While early (CS1-CS2) and late stages showed high reliability, the transitional stage (CS3) exhibited the lowest accuracy (33%), reflecting the inherent morphological overlap during peak pubertal growth. Conclusion The proposed framework provided a standardized, reproducible, and fully automated tool for skeletal maturity assessment. Its robustness against real-world image noise and high anatomical accuracy can make it a suitable auxiliary tool for orthodontic clinical diagnoses. Health sciences/Anatomy Biological sciences/Computational biology and bioinformatics Health sciences/Health care Health sciences/Medical research Cervical Vertebral Maturation (CVM) Deep Learning Mask R-CNN EfficientNet Cephalometric Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Introduction Accurate assessment of skeletal maturation is a fundamental component of clinical decision-making across dental disciplines, most notably orthodontics, maxillofacial surgery, and implantology. While chronological age is easily determined, skeletal maturation stages offer a far more reliable correlation with the adolescent growth spurt. This correlation allows clinicians to pinpoint the optimal window for treatment and more accurately predict outcomes that depend on growth 1 , 2 . For pediatric and adolescent patients, timing orthodontic or orthopaedic interventions to coincide with the pubertal growth peak can significantly enhance treatment efficacy. Conversely, poorly timed treatment often leads to relapse, unnecessary biological costs, and extended treatment durations. Consequently, precise determination of skeletal maturity is not merely an adjunct but a prerequisite for high-quality, growth-oriented treatment planning 3 , 4 . Historically, skeletal maturation has been evaluated through various methods, with hand–wrist radiography and cervical vertebral maturation (CVM) analysis via lateral cephalometric imaging being the most prominent 5 . Over the last two decades, CVM analysis has largely surpassed hand-wrist methods in clinical popularity because it utilises routine orthodontic records, thereby sparing the patient from additional radiation exposure. Despite its utility, the method is inherently limited by its reliance on subjective visual interpretation of vertebral morphology. Clinicians must assess subtle changes in the concavity of the lower borders and the shape of the C2, C3, and C4 vertebrae. This subjectivity often results in significant inter-examiner variability and inconsistent reproducibility, which can undermine diagnostic reliability 6 , 7 . To address these human limitations, researchers have long sought to automate the CVM staging process. Early attempts utilised traditional computer vision techniques, such as Active Shape Models (ASMs) and manual feature extraction, to identify vertebral landmarks. While these were pioneering, they often struggled with the high levels of noise and varying contrast typical of lateral cephalograms. With the recent breakthrough of deep learning, specifically Convolutional Neural Networks (CNNs), the landscape of automated diagnostics has shifted. Unlike traditional methods, CNNs possess the ability to extract complex hierarchical features directly from raw imaging data, demonstrating remarkable proficiency in recognising anatomical landmarks and interpreting radiographic patterns. In the dental field, deep learning has already shown promise across a spectrum of diagnostic tasks, from caries detection and periodontal assessment to the analysis of complex craniofacial structures 8 , 9 . Specific to skeletal maturation, several recent studies have explored AI-driven CVM staging. For instance, some researchers have employed single-stage classification models that analyse the entire cephalogram; however, these often suffer from "background noise" in the image that is irrelevant to the vertebrae themselves. Others have looked at multi-stage approaches, but the seamless integration of precise vertebral segmentation with accurate stage classification remains a challenge in the quest for clinical-grade accuracy. While these technological strides are encouraging, the clinical adoption of AI systems hinges on their ability to match the performance of expert clinicians under rigorous evaluation. The current study was designed to bridge the gap between raw radiographic data and clinical diagnosis by developing a fully automated, two-stage deep learning framework. The first stage focuses on the precise detection and segmentation of the C2-C4 vertebrae to isolate the regions of interest. The second stage then classifies these isolated regions into the standard CVM stages (CS1–CS6). We evaluated the performance of this framework using standardised detection and classification metrics, benchmarked against expert-labelled reference standards, to determine its viability as a reliable clinical decision-support tool. Results Performance of the Cervical Vertebrae Detection Model In the initial stage, we compared various object detection architectures, including Cascade R-CNN and Mask R-CNN. The Mask R-CNN framework, integrated with a ResNet-50-FPN backbone, demonstrated the most robust performance for the automated localisation of C2, C3, and C4 vertebrae and was thus selected for the final pipeline. The model’s efficacy was quantified using standard computer vision metrics. After a training regimen of 3,300 iterations, the model attained a mean Average Precision (mAP@50) of 0.85, indicating a high level of localisation reliability at an IoU threshold of 0.5 (Fig. 1 ). Detailed analysis of the IoU thresholds revealed that at a moderate spatial overlap (IoU = 0.5), the model achieved an AP of 0.96. However, under more stringent localisation constraints (IoU = 0.75), the AP adjusted to 0.66, reflecting the inherent complexity of precise boundary delineation in radiographic imaging (Fig. 2 ). The overall recall was 0.62, signifying a stable detection rate across the heterogeneous dataset. The confusion matrix (Fig. 3 ) further confirms high discriminatory power, particularly for the C4 vertebra, with minimal cross-classification between adjacent vertebral levels. To further evaluate the reliability of the detection model’s output, Cohen’s Kappa coefficient, the coefficient yielded a value of approximately κ ≈ 0.513. This result indicates a moderate to good agreement between the model’s predictions and the ground truth for vertebrae localization, suggesting that the detection performance surpasses random chance. Performance of Stage II: CVM Stage Classification In the second stage, the 756 successfully segmented triplets (C2–C4) were processed through the EfficientNet-B3 classifier. The diagnostic performance across the six maturation stages (CS1–CS6) is detailed in the confusion matrix (Fig. 4 ) and summarised in Table 1 . Table 1 Stage Precision Recall F1-Score CS1 61% 54% 57% CS2 42% 53% 47% CS3 32% 30% 31% CS4 37% 34% 35% CS5 45% 34% 39% CS6 62% 73% 67% The model demonstrated its peak performance in identifying CS6 (Late Maturation), achieving an accuracy of 73% (38/52 samples). The early stages, CS1 and CS2, showed moderate-to-high classification reliability, both exceeding the 50% accuracy threshold. Conversely, the intermediate stage (CS3) proved to be the most challenging for the framework, yielding a classification accuracy of 33%. Most misclassifications for CS3 were attributed to Stage CS2, reflecting the clinical difficulty in detecting the initial onset of the concavity This performance dip is primarily attributed to the "morphological transition" phase; the subtle changes in the concavity of the lower borders and the transition from trapezoidal to rectangular shapes in C3 and C4 often lead to overlapping features between CS2, CS3, and CS4. The Cohen’s Kappa coefficient, calculated to evaluate the agreement between the model’s predictions and the ground, yielded a value of approximately 0.37200 and this indicates that the model’s performance is better than random chance, but there is still significant room for improvement, and the model is not yet fully capable of accurately distinguishing between all classes.This result can be attributed to several factors: Morphological Complexity: The inherent difficulty in distinguishing between mid-maturation stages (particularly CS3) due to subtle transitional features and overlapping anatomical characteristics poses a significant challenge for accurate classification. Stage I Dependency: The quality of the input data (756 C2-C4 triplets) extracted from the initial vertebrae detection stage directly impacts the classification accuracy. Potential errors in localization or segmentation from Stage I can lead to misclassifications in Stage II. Data Distribution: The nature and distribution of samples within the 756-instance dataset, especially for stages with fewer samples, may have influenced the model's ability to generalize and achieve higher agreement. In summary, the Kappa of 0.37200 reflects the challenges in accurately classifying CVM stages, influenced by morphological complexities and the quality of upstream detection data. Discussion The integration of artificial intelligence (AI) into orthodontic diagnostics has emerged as a transformative approach to mitigating the inherent subjectivity of skeletal maturity assessment. The present study successfully developed and validated a fully automated, two-stage deep learning framework for CVM assessment. Our clinical pipeline achieved two primary milestones: first, a highly precise localisation of the C2–C4 vertebrae with an AP of 0.96 (at IoU = 0.5), and second, a robust classification of skeletal maturity stages using an EfficientNet-B3 architecture, reaching a peak accuracy of 73% for the final maturation stage (CS6). While the model demonstrated high reliability in identifying polar stages (CS1, CS2, and CS6), it also highlighted the inherent diagnostic complexity of the transitional CS3 stage, which yielded a lower accuracy of 33%. These results underscore the potential of hierarchical deep learning models to standardise orthodontic growth analysis while maintaining high anatomical transparency. Comparative Performance and Architectural Advantages Methodologically, our framework addresses the limitations of prior "black-box" approaches that perform direct classification on entire radiographs. For instance, while Seo et al. and Li et al. reported varying accuracies using direct classification 10 , 11 , these models often lack anatomical transparency. By implementing an explicit detection and segmentation stage, our model ensures that the feature extraction process is strictly confined to the C2–C4 vertebral bodies, effectively filtering out non-contributory craniofacial structures—a frequent source of noise in single-stage models 12 , 13 . Our late-fusion architecture independently encodes C2–C4 vertebrae to preserve local morphological nuances while capturing inter-vertebral spatial correlations. This hierarchical approach ensures a more robust and physiologically aligned representation of skeletal maturation compared to conventional single-stream models. Furthermore, while semi-automated methods like those proposed by Kavousinejad et al. 14 achieve high accuracy, they remain dependent on manual landmarking, which is time-consuming and prone to inter-observer bias. In contrast, our end-to-end framework offers superior clinical scalability, eliminating manual intervention while maintaining high fidelity in anatomical localisation. The Challenge of Transitional Stages (CS3) A notable finding in our study was the performance disparity between polar and intermediate maturation stages. The model achieved its highest sensitivity in CS6 (73%), whereas CS3 exhibited the lowest accuracy (33%). This phenomenon is consistent with the findings of Jiang et al. and likely reflects the "morphological continuum" of puberty. During the CS3 stage, the transition from a flat to a concave lower border in C3 and C4 is often subtle and lacks a discrete threshold, posing significant challenges even for experienced clinicians 15 , 16 . The high localisation precision of our Stage I model suggests that this misclassification is rooted in the morphological overlap between classes rather than a failure in feature detection. Clinical Implications and Robustness To bridge the gap between experimental models and "real-world" clinical practice, we purposefully integrated 187 noisy and low-quality images into our training pipeline. This strategy, combined with the use of the Detectron2 framework for standardised pre-processing, enhances the model’s generalizability across different radiographic hardware and exposure settings. From a clinical standpoint, this tool could serve as a reliable "second opinion" for orthodontists, standardising the timing of functional appliance therapy and orthognathic surgery. Limitations and Future Directions Despite the promising results, certain limitations warrant acknowledgement. The moderate recall (0.62) in the detection stage indicates that some vertebral structures were missed, potentially due to severe postural variations or suboptimal contrast in the archives. Additionally, the inherent class imbalance in pediatric populations remains a challenge for deep learning models. Future research should focus on: Multi-branch Fusion Architectures: Leveraging the independent morphological signals of C2, C3, and C4 more effectively through advanced feature fusion. Dataset Expansion: Utilizing multi-center datasets to further mitigate class imbalance and improve performance during the transitional CS3–CS4 phases. Conclusion The two-stage deep learning framework developed provides a robust and fully automated solution for CVM assessment, eliminating the subjectivity of manual methods. By coupling high-precision vertebral segmentation with an EfficientNet-based classifier, the model achieves high diagnostic reliability, particularly in identifying polar maturation stages. Despite the challenges posed by the morphological nuances of transitional phases (CS3), the system’s resilience to "real-world" image noise demonstrates its potential as a scalable auxiliary tool in clinical orthodontics. This pipeline offers a standardized approach to skeletal growth assessment, facilitating more precise timing for growth-related interventions. Materials and Methods Sampling Method and Sample Size Determination In this retrospective cross-sectional study, a comprehensive dataset of lateral cephalometric radiographs (LCRs) of individuals aged 7 - 18 years was retrieved from the digital archives of two private orthodontic clinics in Tehran and Yazd, Iran, as well as the Department of Orthodontics, Faculty of Dentistry, Shahid Sadoughi University of Medical Sciences, Yazd, Iran spanning from spanning from January 2018 to December 2023 .The study population comprised. Utilizing routine clinical archives ensured a diverse representation of radiographic images obtained under standard diagnostic conditions. To ensure the integrity of the vertebral morphological analysis, the following exclusion criteria were applied: Presence of systemic diseases, endocrine disorders, or developmental delays known to impair skeletal growth or bone maturation. Congenital or acquired craniofacial anomalies or cervical spine deformities. Suboptimal image quality, including artefacts, positioning errors, or inadequate visualization of the second (C2), third (C3), and fourth (C4) cervical vertebrae. Following the initial screening, a total of 1,390 high-quality LCRs were included. This sample size was determined to meet the high data-dimensionality requirements of deep learning architectures, ensuring robust feature extraction and model generalizability. Ethics approval and consent to participate The study protocol was strictly conducted in accordance with the Declaration of Helsinki and received formal approval from the Ethics Committee of Shahid Sadoughi University of Medical Sciences (Protocol No: IR.SSU.DENTISTRY.REC.1403.074). Given the retrospective nature of the study and the use of de-identified, anonymized data, the requirement for written informed consent was waived by the institutional review board. Data Pre-processing and Quality Control Image Standardisation and Curation The initial dataset consisted of 1,390 lateral cephalometric radiographs (LCRs) originally captured in Digital Imaging and Communications in Medicine (DICOM) format. To streamline the deep learning workflow, these images were converted to Portable Network Graphics (PNG) format. The raw dataset exhibited significant heterogeneity in resolution, with widths ranging from 568 to 2,144 pixels and heights from 570 to 2,600 pixels. A rigorous quality control (QC) protocol was implemented to ensure the reliability of the training data. Following an expert review, 288 radiographs were excluded due to excessive noise, motion artifacts, or failure to meet the anatomical inclusion criteria (e.g., inadequate visualization of C2–C4). This resulted in a refined primary dataset of 1,102 images for subsequent analysis. Class Distribution and Imbalance Analysis of the CVM stages (CS1–CS6) revealed a significant class imbalance, with CS3 being the least represented category (Figure 5). To mitigate the risk of biased learning toward majority classes, we implemented specific strategies, including: Cost-sensitive learning (class weighting). Targeted data augmentation for underrepresented stages. Exploratory Data Analysis (EDA) Comprehensive EDA was performed to evaluate the geometric and photometric characteristics of the dataset (Figure 6). The distribution of image dimensions (width, height, and aspect ratio) displayed multimodal patterns, suggesting that dimensional standardisation was not a prerequisite for the feature extraction layers. Photometric analysis revealed that the mean brightness and pixel intensity followed a Gaussian distribution centered on mid-gray levels, indicating consistent exposure across the archives. Feature Complexity and Noise Strategy To quantify image detail, edge density was calculated for the entire dataset (Figure 7). The majority of the images fell within the 0.07–0.18 range. Images at the lower tail of the distribution (edge density < 0.04) were identified as low-detail samples. Furthermore, to enhance the model's robustness against real-world clinical variations, a strategic decision was made regarding noisy data. Instead of total exclusion, 187 images with controlled levels of noise were specifically retained and integrated into the data augmentation pipeline. This approach was designed to minimize overfitting and ensure that the framework maintains high diagnostic performance even when processing suboptimal, "non-ideal" clinical radiographs (Figure 5). Model Design Proposed AI Framework The proposed diagnostic pipeline follows a two-stage deep learning architecture: (1) automated detection and instance segmentation of the C2, C3, and C4 cervical vertebrae from LCRs, and (2) multi-class classification of skeletal maturity stages based on the localised vertebral regions. Dataset Preparation and Stratification The experimental dataset was categorised into two subsets: noisy (n = 187) and clean (n = 915) images, as detailed in Figure 8. To maintain statistical balance while ensuring the model's robustness against real-world artefacts, we adopted a differential splitting strategy. For the noisy subset, an 80/19/1% split was used for training, validation, and testing, respectively. The clean subset followed a standard 70/20/10% distribution. This partitioning ensures that the model is exposed to a high volume of suboptimal data during training, thereby improving its generalisation across diverse clinical environments (Figure 9). Stage I: Vertebral Detection and Instance Segmentation Ground Truth Generation: Manual annotations were performed by orthodontic specialists using a web-based tool. Polygonal contours were delineated for C2, C3, and C4 to capture precise morphological boundaries. Architecture and Transfer Learning: We implemented a Mask R-CNN architecture with a ResNet-50 backbone and a Feature Pyramid Network (FPN) to facilitate multi-scale feature extraction. To leverage prior knowledge, transfer learning was employed by initialising the network with weights pre-trained on the COCO dataset. During fine-tuning, the initial convolutional layers were frozen to retain low-level generic features, while deeper layers were optimised to learn the specific morphology of cervical vertebrae. Inference and Refinement: During the inference phase, a confidence threshold of 0.5 and a Non-Maximum Suppression (NMS) threshold of 0.6 were applied to filter out low-probability or redundant detections. Only radiographs where all three target vertebrae (C2–C4) were detected with an Intersection-over-Union (IoU) > 0.5 were retained. This rigorous filtering resulted in a curated set of 756 radiographs for the second stage (Figures 9, 10). 3.3. Stage II: Skeletal Maturity Classification Localised Pre-processing: The predicted masks from Stage I were used to crop the C2, C3, and C4 regions. Each vertebral crop was normalised and resized to a fixed resolution of 224 × 224 pixels. The final classification dataset (n = 756) was divided into training (70%) and testing (30%) sets using stratified sampling to preserve the distribution of CVM stages (CS1–CS6) across both cohorts (Figure 11). Classification Network and Late-Fusion Strategy: We developed a triple-input classification network based on EfficientNet-B3. The architecture was modified to accept single-channel grayscale inputs, with weights initialised from ImageNet. Data Augmentation: To prevent overfitting and simulate clinical variations, independent stochastic augmentations were applied to each vertebral crop: Geometric : Random rotations (±15∘), horizontal flipping, and translations (±10%). Photometric: Isotropic scaling (0.9–1.1) and pixel intensity normalisation (μ=128, σ=64). Feature Fusion and Output: Each vertebra (C2, C3, and C4) was processed through a dedicated EfficientNet-B3 branch. Following the Global Average Pooling and flattening layers, three 1,536-dimensional feature vectors were generated. We implemented a late-fusion strategy by concatenating these vectors into a comprehensive 4,608-dimensional representation. This fused vector was then passed through a dropout layer and a fully connected layer with a Softmax activation function, yielding the final class probabilities for stages CS1 through CS6. This approach allows the model to learn both the individual morphological changes of each vertebra and their collective spatial relationships. Declarations Data Availability The datasets generated and analysed during the current study are available from the corresponding author on reasonable request, subject to ethical considerations and institutional data privacy policies Author Contributions Conceptualization and study design: MJM Dataset preparation and radiographic annotation: MJM, HAA, AMF, YS, MEG Development of the two-stage deep learning framework: MZE, HD, AK Model training, testing, and performance evaluation: MZE Statistical analysis: HD Clinical interpretation of results: MJM, HAA, MZE Writing – original draft: AK Writing – review & editing: MJM, MZE Supervision and final approval: MJM Additional Information Funding Declaration: This research received no specific grant or funding from any funding institutions in the public, commercial, or not-for-profit sectors. Competing interests: The authors declare that there are no conflicts of interest regarding the publication of this manuscript. References Hägg, U. & Taranger, J. Skeletal stages of the hand and wrist as indicators of the pubertal growth spurt. Acta Odontologica Scandinavica 38 , 187-200 (1980). Fishman, L. S. Radiographic evaluation of skeletal maturation. The Angle Orthodontist 52 , 88-112 (1982). Cha, K.-S. Skeletal changes of maxillary protraction in patients exhibiting skeletal class III malocclusion: a comparison of three skeletal maturation groups. The Angle Orthodontist 73 , 26-35 (2003). Ngan, P. Early treatment of Class III malocclusion: is it worth the burden? American journal of orthodontics and dentofacial orthopedics 129 , S82-S85 (2006). Kucukkeles, N., Acar, A., Biren, S. & Arun, T. Comparisons between cervical vertebrae and hand-wrist maturation for the assessment of skeletal maturity. The Journal of clinical pediatric dentistry 24 , 47-52 (1999). Gabriel, D. B. et al. Cervical vertebrae maturation method: poor reproducibility. American Journal of Orthodontics and Dentofacial Orthopedics 136 , 478. e471-478. e477 (2009). Hassel, B. & Farman, A. G. Skeletal maturation evaluation using cervical vertebrae. American Journal of Orthodontics and Dentofacial Orthopedics 107 , 58-66 (1995). Arık, S. Ö., Ibragimov, B. & Xing, L. Fully automated quantitative cephalometry using convolutional neural networks. Journal of Medical Imaging 4 , 014501-014501 (2017). Lee, J.-H., Kim, D.-H., Jeong, S.-N. & Choi, S.-H. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. Journal of dentistry 77 , 106-111 (2018). Seo, H., Hwang, J., Jeong, T. & Shin, J. Comparison of deep learning models for cervical vertebral maturation stage classification on lateral cephalometric radiographs. Journal of Clinical Medicine 10 , 3591 (2021). Li, H. et al. Convolutional neural network-based automatic cervical vertebral maturation classification method. Dentomaxillofacial Radiology 51 , 20220070 (2022). Zhou, J. et al. Development of an artificial intelligence system for the automatic evaluation of cervical vertebral maturation status. Diagnostics 11 , 2200 (2021). Kim, E.-G. et al. Estimating cervical vertebral maturation with a lateral cephalogram using the convolutional neural network. Journal of Clinical Medicine 10 , 5400 (2021). Kavousinejad, S., Ebadifar, A., Tehranchi, A., Zakermashhadi, F. & Dalaie, K. Determination of cervical vertebral maturation using machine learning in lateral cephalograms. Journal of Dental Research, Dental Clinics, Dental Prospects 18 , 232 (2024). Jiang, F. et al. Deep learning based quantitative cervical vertebral maturation analysis. Head & Face Medicine 21 , 20 (2025). Amasya, H., Yildirim, D., Aydogan, T., Kemaloglu, N. & Orhan, K. Cervical vertebral maturation assessment on lateral cephalometric radiographs using artificial intelligence: comparison of machine learning classifier models. Dentomaxillofacial Radiology 49 , 20190441 (2020). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 11 May, 2026 Reviewers agreed at journal 04 May, 2026 Reviewers invited by journal 30 Apr, 2026 Editor invited by journal 08 Apr, 2026 Editor assigned by journal 31 Mar, 2026 Submission checks completed at journal 31 Mar, 2026 First submitted to journal 25 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9226756","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":634383111,"identity":"931357a3-3003-4a33-8ef7-e87f6f0d1398","order_by":0,"name":"Maryam Javaheri Mahd","email":"","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Maryam","middleName":"Javaheri","lastName":"Mahd","suffix":""},{"id":634383116,"identity":"42640652-e4be-4ec3-a0ca-de708d2aba6d","order_by":1,"name":"Hossein Agha Aghili","email":"","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Hossein","middleName":"Agha","lastName":"Aghili","suffix":""},{"id":634383118,"identity":"6537e747-e7c5-48b3-ac88-c5e667ffb3c1","order_by":2,"name":"Hamid Reza Dehghan","email":"","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Hamid","middleName":"Reza","lastName":"Dehghan","suffix":""},{"id":634383121,"identity":"33bca21f-2fb4-46b2-b8fd-09ae72053135","order_by":3,"name":"Ali Kamaei","email":"","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Ali","middleName":"","lastName":"Kamaei","suffix":""},{"id":634383123,"identity":"f677431d-b203-42b3-b836-d58409b6b5fa","order_by":4,"name":"Aref Mohammadi Fard","email":"","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Aref","middleName":"Mohammadi","lastName":"Fard","suffix":""},{"id":634383124,"identity":"4a70b816-8476-4e26-a2b3-8ee75123a1a7","order_by":5,"name":"Yaser Safi","email":"","orcid":"","institution":"Shahid Beheshti University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yaser","middleName":"","lastName":"Safi","suffix":""},{"id":634383125,"identity":"71d9ced2-aab6-470c-b702-110d054c5583","order_by":6,"name":"Mahjube Entezar-e-Ghaem","email":"","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Mahjube","middleName":"","lastName":"Entezar-e-Ghaem","suffix":""},{"id":634383126,"identity":"9950a715-b307-4316-a081-304a4344b51b","order_by":7,"name":"Mohadese Zare Ebrahimabad","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzUlEQVRIiWNgGAWjYLCCDwYSPPwgRkIBkToYZ1RYyEk2gLQYEKmFmedMhbHBARCTGC267e3PPvC2SSRuPr868cMDAwZ5frED+LWYnTmQPEMSqGXbjbebJYAOM5w5O4GAlhsJhxkMwVrObgBpSTC4TVBLYjNDIshhM85u/kGklmRmhgNnJIwN+Hu3EWnLmWPMjA0VEnISN3i3WSQYSBDhl+Ptj5n/GNTx8Pef3XzzR4WNPL80AS0IIAFWKUGschDgP0CK6lEwCkbBKBhJAABwRUYlP74giQAAAABJRU5ErkJggg==","orcid":"","institution":"Shahid Sadoughi University of Medical Sciences","correspondingAuthor":true,"prefix":"","firstName":"Mohadese","middleName":"Zare","lastName":"Ebrahimabad","suffix":""}],"badges":[],"createdAt":"2026-03-25 19:53:14","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9226756/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9226756/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108977976,"identity":"6389c3fe-e128-4ac0-8a3b-21e97da9b6b9","added_by":"auto","created_at":"2026-05-11 11:33:35","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":92197,"visible":true,"origin":"","legend":"\u003cp\u003eTraining loss and test mean Average Precision (
[email protected]) for the cervical vertebrae detection model\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/9457f840ae1dd7cc793a23cd.png"},{"id":108972337,"identity":"968d51cc-aa28-49bd-ab13-3a2095c0ac83","added_by":"auto","created_at":"2026-05-11 10:36:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":59793,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance metrics of the cervical vertebrae detection model: (a) bounding box detection results and (b) segmentation results, reported using Average Precision (AP) at different IoU thresholds and object scales\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/433f73bf051ae83f93723ad9.png"},{"id":108978189,"identity":"ff8b16c3-a249-4744-b9e6-58c2ef7009db","added_by":"auto","created_at":"2026-05-11 11:34:46","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":60307,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix of the cervical vertebrae detection model showing the classification performance for vertebrae C2, C3, and C4.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/5f501311b5e60ed870732002.png"},{"id":108972339,"identity":"e1bfa5ce-986f-4c44-9de7-078f685092dd","added_by":"auto","created_at":"2026-05-11 10:36:01","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":66803,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix resulting from classifier model\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/9bf71e22c815ad59bba19da9.png"},{"id":108978129,"identity":"a63871be-b4fe-4022-b5aa-736e1209d6fc","added_by":"auto","created_at":"2026-05-11 11:34:14","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":26267,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of cervical vertebral maturation stages (CS1–CS6)\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/4c77e4836b23605b8f38b53a.png"},{"id":108972340,"identity":"92ee419b-125e-4082-84bb-a94f1eef6af2","added_by":"auto","created_at":"2026-05-11 10:36:01","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":96445,"visible":true,"origin":"","legend":"\u003cp\u003eExploratory data analysis of geometric and photometric image characteristics.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/da526669c4b4f7a460ea58eb.png"},{"id":108978215,"identity":"9e76aca1-6b1f-49e3-bcb4-df9eab7b51ef","added_by":"auto","created_at":"2026-05-11 11:35:01","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":87681,"visible":true,"origin":"","legend":"\u003cp\u003eEdge density distribution analysis revealing primary cluster (0.07–0.18 range) and low-detail tail (\u0026lt;0.04) used for quality flagging. Majority of images fall within optimal edge density range for vertebra detection\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/8b1dfebd677fff2c3b929d55.png"},{"id":108978095,"identity":"34ae3cc3-673e-4ed8-92aa-068b61646afa","added_by":"auto","created_at":"2026-05-11 11:34:03","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":39111,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of noisy data in the dataset\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/d289213697e9307edb77fa5f.png"},{"id":108978328,"identity":"991641b0-98f6-4fe1-ab91-44593e75e6ef","added_by":"auto","created_at":"2026-05-11 11:36:21","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":1030773,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExamples of cervical vertebrae detection outcomes using Detectron2.\u003c/strong\u003e(a) Incomplete detection of cervical vertebrae, (b) complete detection of C2–C4, (c) redundant detections with overlapping masks, and (d) incorrect detection. Detected vertebrae are highlighted using color-coded masks, and confidence scores displayed for each prediction.\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/5e4316205144ec0dcf2ccdfe.png"},{"id":108972344,"identity":"e938565c-625a-47b8-9854-16f785790ac2","added_by":"auto","created_at":"2026-05-11 10:36:01","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":365340,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eEffect of detection optimization on cervical vertebrae segmentation results.\u003c/strong\u003e (a) Initial detection results before post-processing, showing redundant or missing vertebral predictions. (b) Optimized detection after filtering duplicate masks and retaining predictions with higher confidence scores. Detected cervical vertebrae (C2–C4) are highlighted using color-coded masks with corresponding confidence values.\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/1f1dd75cf7f660abcc8da6ac.png"},{"id":108972345,"identity":"8c2ad313-0e92-4bfa-a1c9-7487953e221b","added_by":"auto","created_at":"2026-05-11 10:36:01","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":324709,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of samples obtained from the cervical vertebrae detection and segmentation stage (C2–C4) across different skeletal maturity classes (CS1–CS6), and their allocation into training and test sets for skeletal age classification.\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/c87f50222a493ab3b8a546a2.png"},{"id":108980127,"identity":"4befcdd0-ce3c-49e9-906d-84685f40bd84","added_by":"auto","created_at":"2026-05-11 12:03:40","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2578259,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9226756/v1/6f5ccbaa-ad9b-4e3e-a28b-a04168a09657.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Automated assessment of cervical vertebral maturation stages on lateral cephalometric radiographs using a two-stage deep learning framework","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAccurate assessment of skeletal maturation is a fundamental component of clinical decision-making across dental disciplines, most notably orthodontics, maxillofacial surgery, and implantology. While chronological age is easily determined, skeletal maturation stages offer a far more reliable correlation with the adolescent growth spurt. This correlation allows clinicians to pinpoint the optimal window for treatment and more accurately predict outcomes that depend on growth \u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e,\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. For pediatric and adolescent patients, timing orthodontic or orthopaedic interventions to coincide with the pubertal growth peak can significantly enhance treatment efficacy. Conversely, poorly timed treatment often leads to relapse, unnecessary biological costs, and extended treatment durations. Consequently, precise determination of skeletal maturity is not merely an adjunct but a prerequisite for high-quality, growth-oriented treatment planning \u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Historically, skeletal maturation has been evaluated through various methods, with hand\u0026ndash;wrist radiography and cervical vertebral maturation (CVM) analysis via lateral cephalometric imaging being the most prominent \u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Over the last two decades, CVM analysis has largely surpassed hand-wrist methods in clinical popularity because it utilises routine orthodontic records, thereby sparing the patient from additional radiation exposure. Despite its utility, the method is inherently limited by its reliance on subjective visual interpretation of vertebral morphology. Clinicians must assess subtle changes in the concavity of the lower borders and the shape of the C2, C3, and C4 vertebrae. This subjectivity often results in significant inter-examiner variability and inconsistent reproducibility, which can undermine diagnostic reliability \u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. To address these human limitations, researchers have long sought to automate the CVM staging process. Early attempts utilised traditional computer vision techniques, such as Active Shape Models (ASMs) and manual feature extraction, to identify vertebral landmarks. While these were pioneering, they often struggled with the high levels of noise and varying contrast typical of lateral cephalograms. With the recent breakthrough of deep learning, specifically Convolutional Neural Networks (CNNs), the landscape of automated diagnostics has shifted. Unlike traditional methods, CNNs possess the ability to extract complex hierarchical features directly from raw imaging data, demonstrating remarkable proficiency in recognising anatomical landmarks and interpreting radiographic patterns. In the dental field, deep learning has already shown promise across a spectrum of diagnostic tasks, from caries detection and periodontal assessment to the analysis of complex craniofacial structures \u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Specific to skeletal maturation, several recent studies have explored AI-driven CVM staging. For instance, some researchers have employed single-stage classification models that analyse the entire cephalogram; however, these often suffer from \"background noise\" in the image that is irrelevant to the vertebrae themselves. Others have looked at multi-stage approaches, but the seamless integration of precise vertebral segmentation with accurate stage classification remains a challenge in the quest for clinical-grade accuracy. While these technological strides are encouraging, the clinical adoption of AI systems hinges on their ability to match the performance of expert clinicians under rigorous evaluation. The current study was designed to bridge the gap between raw radiographic data and clinical diagnosis by developing a fully automated, two-stage deep learning framework. The first stage focuses on the precise detection and segmentation of the C2-C4 vertebrae to isolate the regions of interest. The second stage then classifies these isolated regions into the standard CVM stages (CS1\u0026ndash;CS6). We evaluated the performance of this framework using standardised detection and classification metrics, benchmarked against expert-labelled reference standards, to determine its viability as a reliable clinical decision-support tool.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePerformance of the Cervical Vertebrae Detection Model\u003c/h2\u003e \u003cp\u003eIn the initial stage, we compared various object detection architectures, including Cascade R-CNN and Mask R-CNN. The Mask R-CNN framework, integrated with a ResNet-50-FPN backbone, demonstrated the most robust performance for the automated localisation of C2, C3, and C4 vertebrae and was thus selected for the final pipeline.\u003c/p\u003e \u003cp\u003eThe model\u0026rsquo;s efficacy was quantified using standard computer vision metrics. After a training regimen of 3,300 iterations, the model attained a mean Average Precision (mAP@50) of 0.85, indicating a high level of localisation reliability at an IoU threshold of 0.5 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDetailed analysis of the IoU thresholds revealed that at a moderate spatial overlap (IoU\u0026thinsp;=\u0026thinsp;0.5), the model achieved an AP of 0.96. However, under more stringent localisation constraints (IoU\u0026thinsp;=\u0026thinsp;0.75), the AP adjusted to 0.66, reflecting the inherent complexity of precise boundary delineation in radiographic imaging (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The overall recall was 0.62, signifying a stable detection rate across the heterogeneous dataset. The confusion matrix (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003e) further confirms high discriminatory power, particularly for the C4 vertebra, with minimal cross-classification between adjacent vertebral levels. To further evaluate the reliability of the detection model\u0026rsquo;s output, Cohen\u0026rsquo;s Kappa coefficient, the coefficient yielded a value of approximately κ\u0026thinsp;\u0026asymp;\u0026thinsp;0.513. This result indicates a moderate to good agreement between the model\u0026rsquo;s predictions and the ground truth for vertebrae localization, suggesting that the detection performance surpasses random chance.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePerformance of Stage II: CVM Stage Classification\u003c/h3\u003e\n\u003cp\u003eIn the second stage, the 756 successfully segmented triplets (C2\u0026ndash;C4) were processed through the EfficientNet-B3 classifier. The diagnostic performance across the six maturation stages (CS1\u0026ndash;CS6) is detailed in the confusion matrix (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003e) and summarised in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e\u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStage\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-Score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCS1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e61%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e54%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e57%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCS2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e42%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e53%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e47%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCS3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e32%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e30%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e31%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCS4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e37%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e35%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCS5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e45%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e39%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCS6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e62%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e73%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e67%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe model demonstrated its peak performance in identifying CS6 (Late Maturation), achieving an accuracy of 73% (38/52 samples). The early stages, CS1 and CS2, showed moderate-to-high classification reliability, both exceeding the 50% accuracy threshold.\u003c/p\u003e \u003cp\u003eConversely, the intermediate stage (CS3) proved to be the most challenging for the framework, yielding a classification accuracy of 33%. Most misclassifications for CS3 were attributed to Stage CS2, reflecting the clinical difficulty in detecting the initial onset of the concavity This performance dip is primarily attributed to the \"morphological transition\" phase; the subtle changes in the concavity of the lower borders and the transition from trapezoidal to rectangular shapes in C3 and C4 often lead to overlapping features between CS2, CS3, and CS4.\u003c/p\u003e \u003cp\u003eThe Cohen\u0026rsquo;s Kappa coefficient, calculated to evaluate the agreement between the model\u0026rsquo;s predictions and the ground, yielded a value of approximately 0.37200 and this indicates that the model\u0026rsquo;s performance is better than random chance, but there is still significant room for improvement, and the model is not yet fully capable of accurately distinguishing between all classes.This result can be attributed to several factors:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMorphological Complexity: The inherent difficulty in distinguishing between mid-maturation stages (particularly CS3) due to subtle transitional features and overlapping anatomical characteristics poses a significant challenge for accurate classification.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eStage I Dependency: The quality of the input data (756 C2-C4 triplets) extracted from the initial vertebrae detection stage directly impacts the classification accuracy. Potential errors in localization or segmentation from Stage I can lead to misclassifications in Stage II.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eData Distribution: The nature and distribution of samples within the 756-instance dataset, especially for stages with fewer samples, may have influenced the model's ability to generalize and achieve higher agreement.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eIn summary, the Kappa of 0.37200 reflects the challenges in accurately classifying CVM stages, influenced by morphological complexities and the quality of upstream detection data.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe integration of artificial intelligence (AI) into orthodontic diagnostics has emerged as a transformative approach to mitigating the inherent subjectivity of skeletal maturity assessment. The present study successfully developed and validated a fully automated, two-stage deep learning framework for CVM assessment. Our clinical pipeline achieved two primary milestones: first, a highly precise localisation of the C2\u0026ndash;C4 vertebrae with an AP of 0.96 (at IoU\u0026thinsp;=\u0026thinsp;0.5), and second, a robust classification of skeletal maturity stages using an EfficientNet-B3 architecture, reaching a peak accuracy of 73% for the final maturation stage (CS6). While the model demonstrated high reliability in identifying polar stages (CS1, CS2, and CS6), it also highlighted the inherent diagnostic complexity of the transitional CS3 stage, which yielded a lower accuracy of 33%. These results underscore the potential of hierarchical deep learning models to standardise orthodontic growth analysis while maintaining high anatomical transparency.\u003c/p\u003e\n\u003ch3\u003eComparative Performance and Architectural Advantages\u003c/h3\u003e\n\u003cp\u003eMethodologically, our framework addresses the limitations of prior \"black-box\" approaches that perform direct classification on entire radiographs. For instance, while Seo et al. and Li et al. reported varying accuracies using direct classification \u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, these models often lack anatomical transparency. By implementing an explicit detection and segmentation stage, our model ensures that the feature extraction process is strictly confined to the C2\u0026ndash;C4 vertebral bodies, effectively filtering out non-contributory craniofacial structures\u0026mdash;a frequent source of noise in single-stage models \u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e,\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Our late-fusion architecture independently encodes C2\u0026ndash;C4 vertebrae to preserve local morphological nuances while capturing inter-vertebral spatial correlations. This hierarchical approach ensures a more robust and physiologically aligned representation of skeletal maturation compared to conventional single-stream models.\u003c/p\u003e \u003cp\u003eFurthermore, while semi-automated methods like those proposed by Kavousinejad et al. \u003csup\u003e14\u003c/sup\u003e achieve high accuracy, they remain dependent on manual landmarking, which is time-consuming and prone to inter-observer bias. In contrast, our end-to-end framework offers superior clinical scalability, eliminating manual intervention while maintaining high fidelity in anatomical localisation.\u003c/p\u003e\n\u003ch3\u003eThe Challenge of Transitional Stages (CS3)\u003c/h3\u003e\n\u003cp\u003eA notable finding in our study was the performance disparity between polar and intermediate maturation stages. The model achieved its highest sensitivity in CS6 (73%), whereas CS3 exhibited the lowest accuracy (33%). This phenomenon is consistent with the findings of Jiang et al. and likely reflects the \"morphological continuum\" of puberty. During the CS3 stage, the transition from a flat to a concave lower border in C3 and C4 is often subtle and lacks a discrete threshold, posing significant challenges even for experienced clinicians \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e,\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. The high localisation precision of our Stage I model suggests that this misclassification is rooted in the morphological overlap between classes rather than a failure in feature detection.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eClinical Implications and Robustness\u003c/h2\u003e \u003cp\u003eTo bridge the gap between experimental models and \"real-world\" clinical practice, we purposefully integrated 187 noisy and low-quality images into our training pipeline. This strategy, combined with the use of the Detectron2 framework for standardised pre-processing, enhances the model\u0026rsquo;s generalizability across different radiographic hardware and exposure settings. From a clinical standpoint, this tool could serve as a reliable \"second opinion\" for orthodontists, standardising the timing of functional appliance therapy and orthognathic surgery.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eLimitations and Future Directions\u003c/h3\u003e\n\u003cp\u003eDespite the promising results, certain limitations warrant acknowledgement. The moderate recall (0.62) in the detection stage indicates that some vertebral structures were missed, potentially due to severe postural variations or suboptimal contrast in the archives. Additionally, the inherent class imbalance in pediatric populations remains a challenge for deep learning models.\u003c/p\u003e\n\u003ch3\u003eFuture research should focus on:\u003c/h3\u003e\n\u003cp\u003eMulti-branch Fusion Architectures: Leveraging the independent morphological signals of C2, C3, and C4 more effectively through advanced feature fusion.\u003c/p\u003e \u003cp\u003eDataset Expansion: Utilizing multi-center datasets to further mitigate class imbalance and improve performance during the transitional CS3\u0026ndash;CS4 phases.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe two-stage deep learning framework developed provides a robust and fully automated solution for CVM assessment, eliminating the subjectivity of manual methods. By coupling high-precision vertebral segmentation with an EfficientNet-based classifier, the model achieves high diagnostic reliability, particularly in identifying polar maturation stages. Despite the challenges posed by the morphological nuances of transitional phases (CS3), the system\u0026rsquo;s resilience to \"real-world\" image noise demonstrates its potential as a scalable auxiliary tool in clinical orthodontics. This pipeline offers a standardized approach to skeletal growth assessment, facilitating more precise timing for growth-related interventions.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003e\u003cstrong\u003eSampling Method and Sample Size Determination\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn this retrospective cross-sectional study, a comprehensive dataset of lateral cephalometric radiographs (LCRs) of individuals aged 7 - 18 years was retrieved from the digital archives of two private orthodontic clinics in Tehran and Yazd, Iran, as well as the Department of Orthodontics, Faculty of Dentistry, Shahid Sadoughi University of Medical Sciences, Yazd, Iran spanning from \u0026nbsp;spanning from January 2018 to December 2023 .The study population comprised. Utilizing routine clinical archives ensured a diverse representation of radiographic images obtained under standard diagnostic conditions. To ensure the integrity of the vertebral morphological analysis, the following exclusion criteria were applied:\u003c/p\u003e\n\u003cul type=\"disc\"\u003e\n \u003cli\u003ePresence of systemic diseases, endocrine disorders, or developmental delays known to impair skeletal growth or bone maturation.\u003c/li\u003e\n \u003cli\u003eCongenital or acquired craniofacial anomalies or cervical spine deformities.\u003c/li\u003e\n \u003cli\u003eSuboptimal image quality, including artefacts, positioning errors, or inadequate visualization of the second (C2), third (C3), and fourth (C4) cervical vertebrae.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eFollowing the initial screening, a total of 1,390 high-quality LCRs were included. This sample size was determined to meet the high data-dimensionality requirements of deep learning architectures, ensuring robust feature extraction and model generalizability.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study protocol was strictly conducted in accordance with the Declaration of Helsinki and received formal approval from the Ethics Committee of Shahid Sadoughi University of Medical Sciences (Protocol No: IR.SSU.DENTISTRY.REC.1403.074). Given the retrospective nature of the study and the use of de-identified, anonymized data, the requirement for written informed consent was waived by the institutional review board.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Pre-processing and Quality Control\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImage Standardisation and Curation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe initial dataset consisted of 1,390 lateral cephalometric radiographs (LCRs) originally captured in Digital Imaging and Communications in Medicine (DICOM) format. To streamline the deep learning workflow, these images were converted to Portable Network Graphics (PNG) format. \u003cstrong\u003eThe raw dataset exhibited significant heterogeneity in resolution, with widths ranging from 568 to 2,144 pixels and heights from 570 to 2,600 pixels.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA rigorous quality control (QC) protocol was implemented to ensure the reliability of the training data.\u003c/strong\u003e Following an expert review, 288 radiographs were excluded due to excessive noise, motion artifacts, or failure to meet the anatomical inclusion criteria (e.g., inadequate visualization of C2\u0026ndash;C4). This resulted in a refined primary dataset of 1,102 images for subsequent analysis.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClass Distribution and Imbalance\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAnalysis of the CVM stages (CS1\u0026ndash;CS6) revealed a significant class imbalance, with CS3 being the least represented category (Figure 5). To mitigate the risk of biased learning toward majority classes, we implemented specific strategies, including:\u003c/p\u003e\n\u003cul type=\"disc\"\u003e\n \u003cli\u003eCost-sensitive learning (class weighting).\u003c/li\u003e\n \u003cli\u003eTargeted data augmentation for underrepresented stages.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eExploratory Data Analysis (EDA)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eComprehensive EDA was performed to evaluate the geometric and photometric characteristics of the dataset (Figure 6). The distribution of image dimensions (width, height, and aspect ratio) displayed multimodal patterns, suggesting that dimensional standardisation was not a prerequisite for the feature extraction layers. Photometric analysis revealed that the mean brightness and pixel intensity followed a Gaussian distribution centered on mid-gray levels, indicating consistent exposure across the archives.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFeature Complexity and Noise Strategy\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo quantify image detail, edge density was calculated for the entire dataset (Figure 7). The majority of the images fell within the 0.07\u0026ndash;0.18 range. Images at the lower tail of the distribution (edge density \u0026lt; 0.04) were identified as low-detail samples.\u003c/p\u003e\n\u003cp\u003eFurthermore, to enhance the model\u0026apos;s robustness against real-world clinical variations, a strategic decision was made regarding noisy data. Instead of total exclusion, 187 images with controlled levels of noise were specifically retained and integrated into the data augmentation pipeline. This approach was designed to minimize overfitting and ensure that the framework maintains high diagnostic performance even when processing suboptimal, \u0026quot;non-ideal\u0026quot; clinical radiographs (Figure 5).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eModel Design\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eProposed AI Framework\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe proposed diagnostic pipeline follows a two-stage deep learning architecture: (1) automated detection and instance segmentation of the C2, C3, and C4 cervical vertebrae from LCRs, and (2) multi-class classification of skeletal maturity stages based on the localised vertebral regions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDataset Preparation\u003c/strong\u003e \u003cstrong\u003eand Stratification\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe experimental dataset was categorised into two subsets: noisy (n = 187) and clean (n = 915) images, as detailed in Figure 8. To maintain statistical balance while ensuring the model\u0026apos;s robustness against real-world artefacts, we adopted a differential splitting strategy. For the noisy subset, an 80/19/1% split was used for training, validation, and testing, respectively. The clean subset followed a standard 70/20/10% distribution. This partitioning ensures that the model is exposed to a high volume of suboptimal data during training, thereby improving its generalisation across diverse clinical environments (Figure 9). \u003cstrong\u003eStage I: Vertebral Detection and Instance Segmentation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGround Truth Generation:\u003c/strong\u003e Manual annotations were performed by orthodontic specialists using a web-based tool. Polygonal contours were delineated for C2, C3, and C4 to capture precise morphological boundaries.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eArchitecture and Transfer Learning:\u003c/strong\u003e We implemented a Mask R-CNN architecture with a ResNet-50 backbone and a Feature Pyramid Network (FPN) to facilitate multi-scale feature extraction. To leverage prior knowledge, transfer learning was employed by initialising the network with weights pre-trained on the COCO dataset. During fine-tuning, the initial convolutional layers were frozen to retain low-level generic features, while deeper layers were optimised to learn the specific morphology of cervical vertebrae.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInference and Refinement:\u003c/strong\u003e During the inference phase, a confidence threshold of 0.5 and a Non-Maximum Suppression (NMS) threshold of 0.6 were applied to filter out low-probability or redundant detections. Only radiographs where all three target vertebrae (C2\u0026ndash;C4) were detected with an Intersection-over-Union (IoU) \u0026gt; 0.5 were retained. This rigorous filtering resulted in a curated set of 756 radiographs for the second stage (Figures 9, 10).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.3. Stage II: Skeletal Maturity Classification\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLocalised Pre-processing:\u003c/strong\u003e The predicted masks from Stage I were used to crop the C2, C3, and C4 regions. Each vertebral crop was normalised and resized to a fixed resolution of 224 \u0026times; 224 pixels. The final classification dataset (n = 756) was divided into training (70%) and testing (30%) sets using stratified sampling to preserve the distribution of CVM stages (CS1\u0026ndash;CS6) across both cohorts (Figure 11).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClassification Network and Late-Fusion Strategy:\u003c/strong\u003e We developed a triple-input classification network based on EfficientNet-B3. The architecture was modified to accept single-channel grayscale inputs, with weights initialised from ImageNet.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Augmentation:\u003c/strong\u003e To prevent overfitting and simulate clinical variations, independent stochastic augmentations were applied to each vertebral crop:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGeometric\u003c/strong\u003e: Random rotations (\u0026plusmn;15∘), horizontal flipping, and translations (\u0026plusmn;10%).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePhotometric:\u003c/strong\u003e Isotropic scaling (0.9\u0026ndash;1.1) and pixel intensity normalisation (\u0026mu;=128, \u0026sigma;=64).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFeature Fusion and Output:\u003c/strong\u003e Each vertebra (C2, C3, and C4) was processed through a dedicated EfficientNet-B3 branch. Following the Global Average Pooling and flattening layers, three 1,536-dimensional feature vectors were generated. We implemented a late-fusion strategy by concatenating these vectors into a comprehensive 4,608-dimensional representation. This fused vector was then passed through a dropout layer and a fully connected layer with a Softmax activation function, yielding the final class probabilities for stages CS1 through CS6. This approach allows the model to learn both the individual morphological changes of each vertebra and their collective spatial relationships.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated and analysed during the current study are available from the corresponding author on reasonable request, subject to ethical considerations and institutional data privacy policies\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConceptualization and study design:\u003c/strong\u003e MJM\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDataset preparation and radiographic annotation:\u003c/strong\u003e MJM, HAA, AMF, YS, MEG\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDevelopment of the two-stage deep learning framework:\u0026nbsp;\u003c/strong\u003eMZE, HD, AK\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eModel training, testing, and performance evaluation:\u0026nbsp;\u003c/strong\u003eMZE\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStatistical analysis:\u003c/strong\u003e HD\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical interpretation of results:\u0026nbsp;\u003c/strong\u003eMJM, HAA, MZE\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWriting \u0026ndash; original draft:\u0026nbsp;\u003c/strong\u003eAK\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWriting \u0026ndash; review \u0026amp; editing:\u003c/strong\u003e MJM, MZE\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupervision and final approval:\u003c/strong\u003e MJM\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAdditional Information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding Declaration:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research received no specific grant or funding from any funding institutions in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that there are no conflicts of interest regarding the publication of this manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eH\u0026auml;gg, U. \u0026amp; Taranger, J. Skeletal stages of the hand and wrist as indicators of the pubertal growth spurt. \u003cem\u003eActa Odontologica Scandinavica\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 187-200 (1980).\u003c/li\u003e\n\u003cli\u003eFishman, L. S. Radiographic evaluation of skeletal maturation. \u003cem\u003eThe Angle Orthodontist\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 88-112 (1982).\u003c/li\u003e\n\u003cli\u003eCha, K.-S. Skeletal changes of maxillary protraction in patients exhibiting skeletal class III malocclusion: a comparison of three skeletal maturation groups. \u003cem\u003eThe Angle Orthodontist\u003c/em\u003e \u003cstrong\u003e73\u003c/strong\u003e, 26-35 (2003).\u003c/li\u003e\n\u003cli\u003eNgan, P. Early treatment of Class III malocclusion: is it worth the burden? \u003cem\u003eAmerican journal of orthodontics and dentofacial orthopedics\u003c/em\u003e \u003cstrong\u003e129\u003c/strong\u003e, S82-S85 (2006).\u003c/li\u003e\n\u003cli\u003eKucukkeles, N., Acar, A., Biren, S. \u0026amp; Arun, T. Comparisons between cervical vertebrae and hand-wrist maturation for the assessment of skeletal maturity. \u003cem\u003eThe Journal of clinical pediatric dentistry\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 47-52 (1999).\u003c/li\u003e\n\u003cli\u003eGabriel, D. B.\u003cem\u003e et al.\u003c/em\u003e Cervical vertebrae maturation method: poor reproducibility. \u003cem\u003eAmerican Journal of Orthodontics and Dentofacial Orthopedics\u003c/em\u003e \u003cstrong\u003e136\u003c/strong\u003e, 478. e471-478. e477 (2009).\u003c/li\u003e\n\u003cli\u003eHassel, B. \u0026amp; Farman, A. G. Skeletal maturation evaluation using cervical vertebrae. \u003cem\u003eAmerican Journal of Orthodontics and Dentofacial Orthopedics\u003c/em\u003e \u003cstrong\u003e107\u003c/strong\u003e, 58-66 (1995).\u003c/li\u003e\n\u003cli\u003eArık, S. \u0026Ouml;., Ibragimov, B. \u0026amp; Xing, L. Fully automated quantitative cephalometry using convolutional neural networks. \u003cem\u003eJournal of Medical Imaging\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 014501-014501 (2017).\u003c/li\u003e\n\u003cli\u003eLee, J.-H., Kim, D.-H., Jeong, S.-N. \u0026amp; Choi, S.-H. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. \u003cem\u003eJournal of dentistry\u003c/em\u003e \u003cstrong\u003e77\u003c/strong\u003e, 106-111 (2018).\u003c/li\u003e\n\u003cli\u003eSeo, H., Hwang, J., Jeong, T. \u0026amp; Shin, J. Comparison of deep learning models for cervical vertebral maturation stage classification on lateral cephalometric radiographs. \u003cem\u003eJournal of Clinical Medicine\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 3591 (2021).\u003c/li\u003e\n\u003cli\u003eLi, H.\u003cem\u003e et al.\u003c/em\u003e Convolutional neural network-based automatic cervical vertebral maturation classification method. \u003cem\u003eDentomaxillofacial Radiology\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 20220070 (2022).\u003c/li\u003e\n\u003cli\u003eZhou, J.\u003cem\u003e et al.\u003c/em\u003e Development of an artificial intelligence system for the automatic evaluation of cervical vertebral maturation status. \u003cem\u003eDiagnostics\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 2200 (2021).\u003c/li\u003e\n\u003cli\u003eKim, E.-G.\u003cem\u003e et al.\u003c/em\u003e Estimating cervical vertebral maturation with a lateral cephalogram using the convolutional neural network. \u003cem\u003eJournal of Clinical Medicine\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 5400 (2021).\u003c/li\u003e\n\u003cli\u003eKavousinejad, S., Ebadifar, A., Tehranchi, A., Zakermashhadi, F. \u0026amp; Dalaie, K. Determination of cervical vertebral maturation using machine learning in lateral cephalograms. \u003cem\u003eJournal of Dental Research, Dental Clinics, Dental Prospects\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 232 (2024).\u003c/li\u003e\n\u003cli\u003eJiang, F.\u003cem\u003e et al.\u003c/em\u003e Deep learning based quantitative cervical vertebral maturation analysis. \u003cem\u003eHead \u0026amp; Face Medicine\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 20 (2025).\u003c/li\u003e\n\u003cli\u003eAmasya, H., Yildirim, D., Aydogan, T., Kemaloglu, N. \u0026amp; Orhan, K. Cervical vertebral maturation assessment on lateral cephalometric radiographs using artificial intelligence: comparison of machine learning classifier models. \u003cem\u003eDentomaxillofacial Radiology\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, 20190441 (2020).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Cervical Vertebral Maturation (CVM), Deep Learning, Mask R-CNN, EfficientNet, Cephalometric","lastPublishedDoi":"10.21203/rs.3.rs-9226756/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9226756/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eBackground\u003c/b\u003e\u003c/p\u003e \u003cp\u003eAccurate assessment of cervical vertebral maturation (CVM) is critical for timing orthodontic interventions. Manual assessment is often subjective and prone to inter-observer variability. This study aimed to develop and validate a fully automated, two-stage deep learning framework for CVM stage classification using lateral cephalometric radiographs (LCRs).\u003c/p\u003e\u003cp\u003e\u003cb\u003eMethods\u003c/b\u003e\u003c/p\u003e \u003cp\u003eA dataset of 1102 LCRs from individuals aged 7\u0026ndash;18 years was curated, preserving native radiographic noise to accurately simulate real-world imaging conditions. The proposed pipeline consists of two stages: (1) automated detection and instance segmentation of C2, C3, and C4 vertebrae using Mask R-CNN architecture with a ResNet-50-FPN backbone, and (2) skeletal maturity classification (CS1\u0026ndash;CS6) using an EfficientNet-B3 model with a late-fusion strategy. The model was trained using transfer learning and evaluated using mean average precision (mAP), accuracy, and confusion matrices.\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe detection model achieved high localization precision (AP@IoU\u0026thinsp;=\u0026thinsp;0.5\u0026thinsp;=\u0026thinsp;0.96; mAP@50\u0026thinsp;=\u0026thinsp;0.85). The classification stage demonstrated an overall accuracy of approximately 70%, with peak performance in identifying CS6 (73%). While early (CS1-CS2) and late stages showed high reliability, the transitional stage (CS3) exhibited the lowest accuracy (33%), reflecting the inherent morphological overlap during peak pubertal growth.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusion\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe proposed framework provided a standardized, reproducible, and fully automated tool for skeletal maturity assessment. Its robustness against real-world image noise and high anatomical accuracy can make it a suitable auxiliary tool for orthodontic clinical diagnoses.\u003c/p\u003e","manuscriptTitle":"Automated assessment of cervical vertebral maturation stages on lateral cephalometric radiographs using a two-stage deep learning framework","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-11 10:35:53","doi":"10.21203/rs.3.rs-9226756/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"137251914110142632106717102918677718402","date":"2026-05-11T13:20:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52767903739119707516193263457335492054","date":"2026-05-04T11:15:07+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-30T17:21:25+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-04-08T09:43:13+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-01T00:39:17+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-01T00:39:12+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-03-25T19:40:58+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3693c7a7-ee64-4a6e-84f4-7b5032c1ab09","owner":[],"postedDate":"May 11th, 2026","published":true,"recentEditorialEvents":[{"type":"reviewerAgreed","content":"137251914110142632106717102918677718402","date":"2026-05-11T13:20:42+00:00","index":85,"fulltext":""},{"type":"reviewerAgreed","content":"52767903739119707516193263457335492054","date":"2026-05-04T11:15:07+00:00","index":79,"fulltext":""},{"type":"reviewersInvited","content":"10","date":"2026-04-30T17:21:25+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":67513507,"name":"Health sciences/Anatomy"},{"id":67513508,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":67513509,"name":"Health sciences/Health care"},{"id":67513510,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2026-05-11T10:35:53+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-11 10:35:53","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9226756","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9226756","identity":"rs-9226756","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.