Image Quality Evaluation of Panoramic Radiographs Using Vision Transformer: A Pilot Study

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 119,155 characters · extracted from preprint-html · click to expand
Image Quality Evaluation of Panoramic Radiographs Using Vision Transformer: A Pilot Study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Image Quality Evaluation of Panoramic Radiographs Using Vision Transformer: A Pilot Study Cunji Wang, Jingyuan Su, Yue Gao, Xiancong Hou, Xingxing Yu, Hao Liu, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7525359/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background This study assesses the performance of a Vision Transformer (ViT)-based algorithm designed for the automatic detection of image quality defects in panoramic radiographs (PRs). Methods A total of 1806 anonymized PRs were retrospectively collected and randomly divided into training, validation, and test sets in a 4:1:1 ratio. Six categories of image quality defects were defined: foreign objects, image coverage, symmetry, head position, chin position, and tongue position. A ViT based model was developed, trained, and fine-tuned. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The model’s inference speed was also measured. Results The model achieved AUC values of 0.96, 0.96, 0.61, 0.62, 0.88, and 0.93 for detecting foreign objects, image coverage errors, symmetry defects, head positioning errors, chin positioning errors, and tongue positioning errors, respectively. The average processing time per image was 0.03 ± 0.002 seconds, indicating efficient real-time performance. Conclusions The proposed ViT-based deep learning algorithm demonstrates effective performance in detecting image quality defects in PRs. Its rapid processing speed and capability for real-time feedback highlight its potential as a valuable tool for quality control and operator training in clinical settings. panoramic imaging vision transformer quality control deep learning artificial intelligence Figures Figure 1 Background Panoramic radiographs (PRs) play a critical role in dental clinics, particularly for diagnoses that require comprehensive visualization of the jaws [ 1 ]. However, in routine clinical practice, the image quality of PRs is frequently compromised due to factors such as inadequate patient preparation and improper patient positioning. Moreover, image quality defects in PRs—such as the presence of foreign objects, incomplete anatomical coverage, and geometric distortions caused by insufficient patient preparation and incorrect positioning—cannot be corrected through post-processing techniques [ 2 – 4 ]. Consequently, suboptimal image quality may impede accurate clinical decision-making in diagnosis and treatment planning, thereby adversely affecting the overall quality of patient care [ 5 ]. The evaluation of image quality is essential for enhancing overall image standards [ 6 ]. Providing real-time feedback on image quality facilitates immediate re-capture when required. Additionally, retrospective assessment of image quality over time constitutes a critical component of image quality control protocols. This process enables the identification and analysis of underlying causes of encountered issues, thereby supporting the formulation of targeted improvement strategies. To date, quality evaluation for PRs remains a manual process, characterized by subjectivity, labor intensity, and time consumption. The implementation of an automated image quality evaluation system has the potential to mitigate or eliminate these limitations [ 7 ]. Recent advancements in artificial intelligence (AI) have enabled the development of deep learning networks that exhibit considerable potential across a range of medical imaging applications, including image recognition, classification, segmentation, diagnosis, and treatment decision-making [ 8 , 9 ]. The Vision Transformer (ViT) represents an innovative deep learning architecture that adapts the Transformer model—originally designed for natural language processing—to computer vision tasks. In contrast to convolutional neural networks (CNN), ViT processes images by partitioning them into fixed-size patches, linearly embedding these patches, and employing self-attention mechanisms to capture global contextual relationships throughout the entire image. This architectural approach allows ViT to model long-range dependencies and complex spatial patterns, thereby addressing the locality limitations inherent in CNNs [ 10 , 11 ]. Within the field of medical imaging, ViT has demonstrated significant promise in applications such as disease classification, anatomical segmentation, and prognosis prediction [ 12 , 13 ]. Specifically, in dentistry, ViT has been applied to various tasks, including the identification of caries, hypomineralization, periodontal bone loss, osteolytic lesions, tooth segmentation, prediction of palatal midline closure, and diagnosis of oral potentially malignant disorders [ 14 – 19 ]. Our review of the current literature identified only one recently published study that utilized the YOLOv8 algorithm, a type of CNNs, to assess the image quality of PRs [ 20 ]. At present, the performance of various deep learning models in detecting image quality defects in PRs has not been thoroughly investigated. Therefore, the objective of this study was to implement an algorithm based on the ViT architecture to detect image quality defects in PRs and to evaluate its performance. Methods This retrospective study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki by the World Medical Association. Approval for the study and the use of patient data was granted by the Institutional Medical Ethics Committee (Approval No. 2024KQYX016). Due to the retrospective design of the study, the requirement for informed consent from patients was waived. Data Collection A total of 1,806 PRs were retrospectively retrieved from the Picture Archiving and Communication System (PACS) of Qingdao Stomatological Hospital Affiliated to Qingdao University in March 2024. All images were acquired using the same device (ORTHOPHOS XG5, SIRONA, Germany), with exposure parameters set bewteen 73-80Kv, 13-15mA and an exposure time of 14.1 seconds, adjusted according to patient size. The inclusion criterion was the presence of permanent dentition in the images, while images exhibiting severe motion artifacts were excluded. Patient ages ranged from 13 to 88 years, with a median age of 34 years. The collected images were anonymized and exported in JPEG format. Data Annotation All images were annotated by a single experienced oral radiologist (Su JY) with 10 years of experience. Prior to this, the annotator, in collaboration with another experienced oral radiographer (GY), who has 12 years of experience, reviewed approximately 200 images to standardize the labeling criteria. Annotations were classified into six distinct categories, as detailed in Table 1 : foreign objects, image coverage, symmetry, chin position, head position, and tongue position. Each category was assigned a value of 0, 1, or 2, reflecting the absence or presence of the corresponding image quality defects. Representative images are presented in Fig. 1. Table 1 Description of the categories Category Index Category Name Value Detailed Description C1 foreign objects 0 No foreign objects 1 Foreign objects, such as lead collars, earrings, necklaces, and similar items overlapped with the anatomical structures C2 image coverage 0 Appropriate 1 Bilateral intact maxilla or mandible were not fully displayed C3 Symmetry 0 Appropriate 1 The widths of the teeth and/or the mandibular rami on both sides exhibited asymmetrical C4 Head position 0 Appropriate 1 The head was positioned excessively anteriorly in the machine, resulting in a mesiodistal narrowing of the anterior teeth. 2 The head was positioned excessively posteriorly in the machine, resulting in a mesiodistal widening of the anterior teeth. C5 Chin position 0 Appropriate 1 The chin tipped too high, the occlusal plane appears flat or inverted 2 The chin tipped too low, the occlusal plane appears excessively curved, even U-shaped C6 Tongue position 0 Appropriate 1 Low-density air cavity overlapped on the apical region of the maxillary teeth 1A : PRs of good quality; 1 B : incomplete image coverage and tongue mispositioning; 1 C : chin mispositioning with an inverted occlusal plane and tongue mispositioning; 1 D : chin mispositioning with a U-shaped occlusal plane and asymmetry; 1 E : foreign object (earrings) and head mispositioning with widened anterior teeth; 1 F : foreign object (lead collar) and head mispositioning with narrowed anterior teeth. Figure 1 Representative images illustrating the quality issues examined in this study. Following annotation, the number of categories and their respective proportions across all 1806 PRs are presented in Table 2 . Table 2 Number and proportion of the categories Category Index Category Name Value N Proportion C1 foreign objects 0 1646 91.1% 1 160 8.9% C2 image coverage 0 634 35.1% 1 1172 64.9% C3 Symmetry 0 1027 56.9% 1 779 43.1% C4 Head position 0 1553 86.0% 1 53 2.9% 2 200 11.1% C5 Chin position 0 1488 82.4% 1 227 12.6% 2 91 5.0% C6 Tongue position 0 316 17.5% 1 1490 82.5% Data Preprocessing Data normalization was conducted by standardizing the pixel values of each image using the widely accepted mean ([0.485, 0.456, 0.406]) and standard deviation ([0.229, 0.224, 0.225]) statistics derived from the ImageNet dataset, thereby transforming the data to approximate a distribution with zero mean and unit variance. Prior to training, the complete dataset comprising 1,806 images was partitioned into training, validation, and testing subsets in a 4:1:1 ratio, yielding 1,204 images for training, 301 for validation, and 301 for testing. To enhance the model's generalization capabilities and reduce the risk of overfitting, dynamic (online) data augmentation was employed during the training process. Instead of pre-generating and storing augmented images, transformations were applied randomly and independently each time an image was loaded. These transformations included random resized cropping (yielding a randomly scaled and cropped 224×224 region), random rotation within ± 15°, horizontal flipping with a 50% probability, and random color jittering with variations in brightness, contrast, and saturation up to ± 0.2, and hue up to ± 0.1. Such augmentations increased the diversity of the training data encountered by the model in each epoch, thereby effectively enriching the input distribution without increasing the dataset’s storage requirements. For the validation and test datasets, no data augmentation techniques were employed to maintain consistent and unbiased evaluation. Instead, a standardized preprocessing pipeline was implemented: each image was resized to 224×224 pixels, center-cropped to preserve spatial consistency, converted into a tensor, and normalized using the same ImageNet statistics applied during training. The validation dataset was employed for hyperparameter tuning, encompassing modifications to the number of layers, neurons per layer, learning rate, and regularization methods. The test dataset was kept entirely unseen throughout model development and was reserved exclusively for the final, independent assessment of model performance. ViT Architecture The ViT incorporates the Transformer architecture into image classification by representing images as sequences of fixed-size patches. Rather than employing convolutional operations, ViT divides each image into non-overlapping patches, embeds these patches into a fixed-dimensional space, and inputs them into a standard Transformer encoder augmented with positional encodings. In our implementation, we employ the ViT-B/16 model, comprising 12 Transformer encoder layers. Each layer includes 12 self-attention heads and utilizes a hidden embedding dimension of 768. The fundamental element of the Transformer architecture is the Multi-Head Self-Attention (MHSA) mechanism, which is computed as follows: $$\:Attention\left(Q,K,V\right)=softmax\left(\frac{\left(Q\bullet\:{K}^{T}\right)}{\sqrt{{d}_{k}}}\right)\bullet\:V$$ Where \(\:Q,\:K,\:and\:V\) are the query, key, and value matrices, respectively, and \(\:d\text{ₖ}\) is the scaling factor. This self-attention mechanism enables ViT to capture global dependencies between patches. Since the models were trained solely on clinical images, an inherent data imbalance arose, as not all features, error types, and their combinations were equally represented. To address this challenge, we implemented a combined strategy that integrates data resampling with loss weighting. This approach is designed to improve model performance, particularly by mitigating class imbalance, while maintaining the accuracy of the majority class. The objective function for this combined strategy is expressed as follows: $$\:{L}_{\text{total\:}}=-\sum\:_{i=1}^{N}\:{\alpha\:}_{{c}_{i}}\cdot\:{w}_{i}\cdot\:{y}_{i}\text{l}\text{o}\text{g}\left({\widehat{y}}_{i}\right)$$ where \(\:N\) is the number of samples, \(\:{\alpha\:}_{{c}_{i}}\) is the weight for class \(\:{\:c}_{i}\) , \(\:{y}_{i}\) is the ground truth label, \(\:{w}_{i}\) is the weight for sampling and \(\:{\widehat{y}}_{i}\) is the model′s predicted probability. ViT Training Strategies A batch size of 32 was utilized during training, and the Adam optimizer was applied with a learning rate of 1×10 − 4 to reduce overfitting. The model underwent training for 100 epochs, employing the ViT-B/16 architecture initialized with pre-trained weights from ImageNet-21k to facilitate faster convergence and improve generalization performance. Each image was annotated with six independent categories, each corresponding to a distinct clinically relevant classification task. Consequently, six separate classification models were developed, all employing the same architecture and training pipeline. To mitigate class imbalance inherent in the clinical dataset, a combined approach utilizing weighted cross-entropy loss and a Weighted Random Sampler was implemented. The training, validation, and testing procedures were performed on the following hardware configuration: Nvidia GeForce RTX 3090 (24 GB), AMD Ryzen 7 5800X, and 64 GB of RAM. Statistical Analysis To assess the intra- and inter-examiner reproducibility of the annotator, 100 randomly selected images were re-annotated independently by Su JY and GY after an interval of one month. Cohen’s Kappa test was employed to evaluate the statistical significance of both inter- and intra-observer variability. The Kappa values were interpreted according to the following scale: slight (0-0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and almost perfect (0.81-1) agreement. To assess the model’s performance in classifying image quality defects, the area under the receiver operating characteristic curve (AUC) was computed. The model’s performance was categorized as excellent (AUC ≥ 0.9), high (0.8 ≤ AUC < 0.9), fair (0.7 ≤ AUC < 0.8), or poor (AUC < 0.7). Additionally, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were also calculated. To assess the prediction speed of the model, a dataset comprising 1,000 PRs was partitioned into five subsets representing 20%, 40%, 60%, 80%, and 100% of the data, corresponding to 200, 400, 600, 800, and 1,000 images, respectively. For each subset, the prediction time was measured to examine how processing speed varies with increasing input size. Each prediction was repeated three times to compute the average duration and enhance the reliability of the measurements. Furthermore, computation times for each of the six labels were recorded to evaluate the time costs associated with different labels. All statistical analyses were conducted using a custom-developed script written in the Python programming language (version 3.6; http://www.python.org ). Results The Kappa values for the total categories and each individual category are presented in Table 3 . For the total categories, the Kappa values for intra-examiner agreement (0.872) and inter-examiner agreement (0.862) was almost perfect level of agreement. Regarding each category individually, only the inter-examiner Kappa value for symmetry demonstrated moderate agreement (0.588), whereas the other labels exhibited at least substantial agreement (ranging from 0.636 to 1.0). These findings support the robustness of the labeling process. Table 3 Kappa values for intra- and inter-examiner reproducibility Foreign objects Image coverage Symmetry Head position Chin position Tongue position Total Intra- 1 0.954 0.636 0.702 0.833 0.880 0.872 Inter- 0.884 0.930 0.588 0.783 0.771 0.860 0.862 The classification performance of the model is presented in Table 4 . The proposed model demonstrated varying performance across different defect categories: it achieved excellent accuracy for foreign objects (AUC = 0.96), image coverage (AUC = 0.96), and tongue position (AUC = 0.93). Additionally, it showed high accuracy for chin position (AUC = 0.88) but exhibited poor accuracy for symmetry (AUC = 0.61) and head position (AUC = 0.62). The accuracy values were as follows: 0.95 for foreign objects, 0.94 for image coverage, 0.57 for symmetry, 0.82 for head position, 0.87 for chin position, and 0.86 for tongue position. Sensitivity values were 0.80 for foreign objects, 0.92 for image coverage, 0.25 for symmetry, 0.09 for head position, 0.77 for chin position, and 0.97 for tongue position. Specificity values included 0.96 for foreign objects, 0.97 for image coverage, 0.78 for symmetry, 0.95 for head position, 0.89 for chin position, and 0.29 for tongue position. PPV were 0.65 for foreign objects, 0.98 for image coverage, 0.43 for symmetry, 0.24 for head position, 0.60 for chin position, and 0.88 for tongue position. NPV were 0.98 for foreign objects, 0.87 for image coverage, 0.61 for symmetry, 0.85 for head position, 0.95 for chin position, and 0.67 for tongue position. Table 4 Classification accuracy of the model Foreign objects Image coverage Symmetry Head position Chin position Tongue position Accuracy 0.95 0.94 0.57 0.82 0.87 0.86 Sensitivity 0.80 0.92 0.25 0.09 0.77 0.97 Specificity 0.96 0.97 0.78 0.95 0.89 0.29 PPV 0.65 0.98 0.43 0.24 0.60 0.88 NPV 0.98 0.87 0.61 0.85 0.95 0.67 AUC 0.96 0.96 0.61 0.62 0.88 0.93 In the evaluation of model prediction speed performance, the results indicated a linear increase in prediction time corresponding to the growth in data size from 20% to 100% (refer to Table 5 ). Overall, as the volume of input data increased, the model's prediction time extended substantially, exhibiting consistent increments across various tasks. The average processing speed per image was approximately 0.03 ± 0.002 seconds, underscoring the model's computational efficiency for real-time or high-throughput dental imaging applications. Table 5 Predictive Speed Performance of the Model Category 20% 40% 60% 80% 100% Foreign objects 9.13 12.23 19.47 23.10 30.48 Image coverage 8.52 13.08 16.73 23.53 28.15 Symmetry 8.79 12.00 16.86 24.13 28.69 Head position 6.27 12.29 17.22 23.94 29.61 Chin position 6.65 10.87 18.74 24.84 27.59 Tongue position 9.55 10.79 16.89 26.74 27.89 Discussion High-quality medical images are critical for accurate diagnosis. The implementation of an image quality control program aimed at identifying deficiencies and providing targeted training to operators can systematically enhance image quality. This approach not only improves the technical proficiency of operators but also reduces unnecessary radiation exposure to patients [ 21 – 22 ]. AI has demonstrated potential in increasing the efficiency of image quality assessment by reducing subjective variability among evaluators. Recent studies have reported the application of AI in assessing various medical imaging modalities, including human chest and knee radiographs, corneal slit-lamp images, gastrointestinal endoscopic images, hip ultrasound images, and canine thoracic radiographs [ 23 – 29 ]. Nevertheless, the use of AI for evaluating image quality in PRs remains inadequately investigated [ 20 ]. In the present study, the performance in detecting foreign objects and assessing incomplete image coverage was exemplary, with both tasks achieving AUC values of 0.96. The most frequently identified foreign object was the lead collar, followed occasionally by earrings. Given the relatively fixed locations of these foreign objects and their distinct appearance compared to normal anatomical structures, the high detection performance aligns with anticipated outcomes. Prior research has investigated the application of AI for detecting multiple diseases on PRs, where AI models demonstrated high accuracy in identifying caries (91.5%), osteoporosis (89.89%), maxillary sinusitis (87.5%), periodontal bone loss (93.09%), and tooth identification and numbering (93.67%) [ 30 ]. Compared to disease identification, the detection of foreign objects is inherently less complex for AI systems. Regarding image coverage recognition, the chin region was most frequently affected by incomplete visualization in this study. These findings are consistent with previous reports indicating that AI can accurately detect incomplete image boundaries. For instance, Nousiainen et al. trained a CNN using 2,589 posteroanterior chest radiographs to evaluate the visibility of the left, right, cranial, and caudal lung edges, achieving AUC values exceeding 0.92 for all four edges [ 23 ]. Similarly, Banzato et al. introduced the label "Cut" to denote incomplete visualization of the thorax in canine radiographs, with the CNN attaining AUC values above 0.84 [ 19 ]. Furthermore, Ameli’s recent study demonstrated that YOLOv8 achieved classification accuracies of 0.872 and 0.741 for foreign artifacts and image coverage on PRs, respectively [ 20 ]. The detection performance for symmetry and head position was suboptimal, with AUC values below 0.65, accompanied by notably low sensitivity (0.25 for symmetry and 0.09 for head position) and PPV (0.43 for symmetry and 0.24 for head position). Two potential factors may account for these findings. First, the evaluation of these two types of image defects is inherently subjective, as evidenced by the relatively low intra-examiner Kappa coefficients for symmetry (0.636) and head position (0.702). Second, the assessment of symmetry primarily involves comparing the widths of the posterior teeth on the left and right sides, whereas head position identification focuses on significant variations in the width of the anterior teeth, which also entails comparing anterior and posterior tooth widths. Consequently, accurate subjective evaluation of these image quality defects requires precise identification of individual teeth and comparison of their dimensions. The ViT-based algorithm employed may be limited in effectively processing such detailed information. Performance might be enhanced by integrating multiple deep learning approaches, such as initially segmenting target teeth via image segmentation techniques, followed by quantitative measurement of tooth dimensions. Notably, the proposed model exhibited promising results in detecting chin position on PRs, achieving an overall accuracy of 0.87 and an AUC of 0.88. In a study by Ameli et al., multiple features—including image symmetry and occlusal plane deformation attributable to chin position—were combined into a single label, with the YOLOv8 model attaining a recognition accuracy of 0.773 [ 20 ]. The proposed model attained an accuracy of 0.964 and an AUC of 0.93 in the classification of tongue positions. However, the specificity of 0.292 reveals a considerable limitation in accurately identifying normal tongue positions, leading to a relatively high rate of false-positive results. This limitation is further corroborated by the PPV of 0.88 and the NPV of 0.67, indicating that although the model is reliable in confirming abnormal cases, it is less effective in excluding normal cases. The criteria for determining image quality defects in the present study, while widely accepted, are relatively stringent for clinical applications. Consequently, the proportion of optimal images within our datasets is small compared to that of suboptimal images based on these criteria. To enhance operator training and better align with clinical needs, it would be beneficial to introduce an additional category representing clinically acceptable images in future work. This category would denote cases in which re-exposure is unnecessary, despite the image having room for improvement. This study has several limitations. First, the data source is limited, and the sample size is relatively small. Previous research indicates that the ViT model performs optimally with large datasets [ 31 – 32 ]. Second, teeth in the mixed dentition stage present greater complexity in panoramic images, and children are more prone to severe motion artifacts during imaging. Therefore, this preliminary study exclusively included images of patients with permanent dentition to ensure higher image quality and consistency. Third, the image quality labels used in this study do not account for overexposure and underexposure due to the lack of standardized objective evaluation criteria and the capacity to adjust image contrast post-acquisition in digital panoramic radiography. In future research, we plan to incorporate a larger number of images exhibiting greater variability in quality from diverse sources to enhance the robustness and generalizability of the model. Furthermore, integrating multiple algorithms, such as image segmentation and landmark recognition, may further improve the model’s performance. Conclusion This study presents a Vit-based algorithm developed to detect image quality defects in PRs. The algorithm exhibited excellent accuracy in identifying foreign objects, image coverage, and tongue position, while achieving high accuracy in detecting chin position. However, its performance was comparatively less effective in assessing symmetry and head position. The average processing time per image was approximately 0.03 ± 0.002 seconds, underscoring the model's computational efficiency for real-time or high-throughput dental imaging applications. This algorithm holds potential for scaling to real-time quality control, enabling comprehensive statistical analysis of all images acquired within an imaging center. Furthermore, it may function as a training tool by providing immediate and anonymous feedback to operators. Abbreviations ViT Vision Transformer PRs Panoramic Radiographs AUC Area Under the Receiver Operating Characteristic Curve PPV Positive Predictive Value NPV Negative Predictive Value AI Artificial Intelligence CNN Convolutional Neural Networks Declarations Author contributions G. ZP and L. H designed the study, analyzed the data, and critically revised the manuscript; W.CJ conducted the primary model training and contributed to writing the manuscript. S.JY performed image annotation and drafted the manuscript. G.Y collected the image data and assisted with image annotation. H.XC and Y.XX provided support in model training. All authors reviewed and approved the final manuscript. Funding This work was supported by Key R&D Program of Shandong Province (2024TSGC0226); Qingdao Key Health Discipline Development Fund (2025-2027); Qingdao Clinical Research Center for Oral Diseases (22-3-7-lczx-7-nsh); Shandong Provincial Key Medical and Health Discipline of Oral Medicine(Qingdao Stomatological Hospital Affiliated to Qingdao University) (2025-2027). Data availability The datasets generated and/or analyzed during the present study are available from the corresponding author upon reasonable request. Ethics approval and consent to participate This retrospective study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki by the World Medical Association. Approval for the study and the use of patient data was granted by the Institutional Medical Ethics Committee (Approval No. 2024KQYX016). Due to the retrospective design of the study, the requirement for informed consent from patients was waived. Consent for publication Not Applicable. Competing interests The authors declare no competing interests. References Pritchard B, Akbarian Tefaghi F, Makdissi J. Anatomy in panoramic image interpretation. Br Dent J. 2020;228(4):229. doi: 10.1038/s41415-020-1324-1. MacDonald D, Telyakova V. An Overview of Cone-Beam Computed Tomography and Dental Panoramic Radiography in Dentistry in the Community. Tomography. 2024;10(8):1222-1237. doi: 10.3390/tomography10080092. Ekströmer K, Hjalmarsson L. Positioning errors in panoramic images in general dentistry in Sörmland County, Sweden. Swed Dent J. 2014;38(1):31-8. Lingam AS, Koppolu P, Abdulsalam R, Reddy RL, Anwarullah A, Koppolu D. Assessment of common errors and subjective quality of digital panoramic radiographs in dental institution, Riyadh. Ann Afr Med. 2023;22(1):49-54. doi: 10.4103/aam.aam_213_21. Khator AM, Motwani MB, Choudhary AB. A study for determination of various positioning errors in digital panoramic radiography for evaluation of diagnostic image quality. Indian J Dent Res. 2017;28(6):666-670. doi: 10.4103/ijdr.IJDR_781_16. Ohashi K, Nagatani Y, Yoshigoe M, Iwai K, Tsuchiya K, Hino A, Kida Y, Yamazaki A, Ishida T. Applicability Evaluation of Full-Reference Image Quality Assessment Methods for Computed Tomography Images. J Digit Imaging. 2023;36(6):2623-2634. doi: 10.1007/s10278-023-00875-0. Tan Y, Peng Y, Guo L, Liu D, Luo Y. Cost-effectiveness analysis of AI-based image quality control for perinatal ultrasound screening. BMC Med Educ. 2024;24(1):1437. doi: 10.1186/s12909-024-06477-w. Patil S, Bhandi S, Awan KH, Licari F. AI-assisted dental care. Br Dent J. 2023;234(8):555-556. doi: 10.1038/s41415-023-5813-x. Schwendicke F, Samek W, Krois J. Artificial Intelligence in Dentistry: Chances and Challenges. J Dent Res. 2020;99(7):769-774. doi: 10.1177/0022034520915714. Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, Yang Z, Zhang Y, Tao D. A Survey on Vision Transformer. IEEE Trans Pattern Anal Mach Intell. 2023;45(1):87-110. doi: 10.1109/TPAMI.2022.3152247. Zhang Y, Wang J, Gorriz JM, Wang S. Deep Learning and Vision Transformer for Medical Image Analysis. J Imaging. 2023;9(7):147. doi: 10.3390/jimaging9070147. Khan S, Ali H, Shah Z. Identifying the role of vision transformer for skin cancer-A scoping review. Front Artif Intell. 2023;6:1202990. doi: 10.3389/frai.2023.1202990. Goceri E. Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function. J Imaging Inform Med. 2024;37(2):851-863. doi: 10.1007/s10278-023-00954-2. Felsch M, Meyer O, Schlickenrieder A, Engels P, Schönewolf J, Zöllner F, Heinrich-Weltzien R, Hesenius M, Hickel R, Gruhn V, Kühnisch J. Detection and localization of caries and hypomineralization on dental photographs with a vision transformer model. NPJ Digit Med. 2023;6(1):198. doi: 10.1038/s41746-023-00944-2. Dujic H, Meyer O, Hoss P, Wölfle UC, Wülk A, Meusburger T, Meier L, Gruhn V, Hesenius M, Hickel R, Kühnisch J. Automatized Detection of Periodontal Bone Loss on Periapical Radiographs by Vision Transformer Networks. Diagnostics (Basel). 2023;13(23):3562. doi: 10.3390/diagnostics13233562. van Nistelrooij N, Ghanad I, Bigdeli AK, Thiem DGE, von See C, Rendenbach C, Maistreli I, Xi T, Bergé S, Heiland M, Vinayahalingam S, Gaudin R. Automated detection and classification of osteolytic lesions in panoramic radiographs using CNNs and vision transformers. BMC Oral Health. 2025;25(1):950. doi: 10.1186/s12903-025-06209-6. Schneider L, Krasowski A, Pitchika V, Bombeck L, Schwendicke F, Büttner M. Assessment of CNNs, transformers, and hybrid architectures in dental image segmentation. J Dent. 2025;156:105668. doi: 10.1016/j.jdent.2025.105668. Tang H, Liu S, Tan W, Fu L, Yan M, Feng H. Prediction of midpalatal suture maturation stage based on transfer learning and enhanced vision transformer. BMC Med Inform Decis Mak. 2024; 24(1):232. doi: 10.1186/s12911-024-02598-w. Vinayahalingam S, van Nistelrooij N, Rothweiler R, Tel A, Verhoeven T, Tröltzsch D, Kesting M, Bergé S, Xi T, Heiland M, Flügge T. Advancements in diagnosing oral potentially malignant disorders: leveraging Vision transformers for multi-class detection. Clin Oral Investig. 2024;28(7):364. doi: 10.1007/s00784-024-05762-8. Ameli N, Miri Moghaddam M, Lai H, Pacheco-Pereira C. Automated quality evaluation of dental panoramic radiographs using deep learning. Imaging Sci Dent. 2025;55(2):175-188. doi: 10.5624/isd.20240232. Jenkins NW, Parrish JM, Sheha ED, Singh K. Intraoperative risks of radiation exposure for the surgeon and patient. Ann Transl Med. 2021;9(1):84. doi: 10.21037/atm-20-1052. Najjar R. Radiology's Ionising Radiation Paradox: Weighing the Indispensable Against the Detrimental in Medical Imaging. Cureus. 2023;15(7):e41623. doi: 10.7759/cureus.41623. Nousiainen K, Mäkelä T, Piilonen A, Peltonen JI. Automating chest radiograph imaging quality control. Phys Med. 2021;83:138-145. doi: 10.1016/j.ejmp.2021.03.014. Meng Y, Ruan J, Yang B, Gao Y, Jin J, Dong F, Ji H, He L, Cheng G, Gong X. Automated quality assessment of chest radiographs based on deep learning and linear regression cascade algorithms. Eur Radiol. 2022;32(11):7680-7690. doi: 10.1007/s00330-022-08771-x. Sun H, Wang W, He F, Wang D, Liu X, Xu S, Zhao B, Li Q, Wang X, Jiang Q, Zhang R, Liu S, Xiao Y. An AI-Based Image Quality Control Framework for Knee Radiographs. J Digit Imaging. 2023;36(5):2278-2289. doi: 10.1007/s10278-023-00853-6. Li Z, Jiang J, Chen K, Zheng Q, Liu X, Weng H, Wu S, Chen W. Development of a deep learning-based image quality control system to detect and filter out ineligible slit-lamp images: A multicenter study. Comput Methods Programs Biomed. 2021;203:106048. doi: 10.1016/j.cmpb.2021.106048. Yuan P, Bai R, Yan Y, Li S, Wang J, Cao C, Wu Q. Subjective and objective quality assessment of gastrointestinal endoscopy images: From manual operation to artificial intelligence. Front Neurosci. 2023;16:1118087. doi: 10.3389/fnins.2022.1118087. Hareendranathan AR, Chahal BS, Zonoobi D, Sukhdeep D, Jaremko JL. Artificial Intelligence to Automatically Assess Scan Quality in Hip Ultrasound. Indian J Orthop. 2021;55(6):1535-1542. doi: 10.1007/s43465-021-00455-w. Banzato T, Wodzinski M, Burti S, Vettore E, Muller H, Zotti A. An AI-based algorithm for the automatic evaluation of image quality in canine thoracic radiographs. Sci Rep. 2023;13(1):17024. doi: 10.1038/s41598-023-44089-4. Turosz N, Chęcińska K, Chęciński M, Brzozowska A, Nowak Z, Sikora M. Applications of artificial intelligence in the analysis of dental panoramic radiographs: an overview of systematic reviews. Dentomaxillofac Radiol. 2023;52(7):20230284. doi: 10.1259/dmfr.20230284. Usman M, Zia T, Tariq A. Analyzing Transfer Learning of Vision Transformers for Interpreting Chest Radiography. J Digit Imaging. 2022;35(6):1445-1462. doi: 10.1007/s10278-022-00666-z. Ma D, Taher MRH, Pang J, Islam NU, Haghighi F, Gotway MB, Liang J. Benchmarking and Boosting Transformers for Medical Image Classification. Domain Adapt Represent Transf (2022). 2022;13542:12-22. doi: 10.1007/978-3-031-16852-9_2. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7525359","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":534208007,"identity":"51c2d878-40e6-4cfc-9c38-1d977633894f","order_by":0,"name":"Cunji Wang","email":"","orcid":"","institution":"Ocean University of China","correspondingAuthor":false,"prefix":"","firstName":"Cunji","middleName":"","lastName":"Wang","suffix":""},{"id":534208008,"identity":"419aba09-74f7-44ee-9337-c66c90528cef","order_by":1,"name":"Jingyuan Su","email":"","orcid":"","institution":"Qingdao Stomatological Hospital Affiliated to Qingdao University","correspondingAuthor":false,"prefix":"","firstName":"Jingyuan","middleName":"","lastName":"Su","suffix":""},{"id":534208009,"identity":"c57792cb-c555-477d-9179-908ef53de09a","order_by":2,"name":"Yue Gao","email":"","orcid":"","institution":"Qingdao Stomatological Hospital Affiliated to Qingdao University","correspondingAuthor":false,"prefix":"","firstName":"Yue","middleName":"","lastName":"Gao","suffix":""},{"id":534208010,"identity":"771d1bec-03df-4e40-8c27-da9a198b745c","order_by":3,"name":"Xiancong Hou","email":"","orcid":"","institution":"Ocean University of China","correspondingAuthor":false,"prefix":"","firstName":"Xiancong","middleName":"","lastName":"Hou","suffix":""},{"id":534208011,"identity":"f3865ed4-8f41-4c79-871b-41bc6d94bdb7","order_by":4,"name":"Xingxing Yu","email":"","orcid":"","institution":"Ocean University of China","correspondingAuthor":false,"prefix":"","firstName":"Xingxing","middleName":"","lastName":"Yu","suffix":""},{"id":534208012,"identity":"5eec858f-1590-4c9f-b68c-7beb8ffeef81","order_by":5,"name":"Hao Liu","email":"","orcid":"","institution":"Ocean University of China","correspondingAuthor":false,"prefix":"","firstName":"Hao","middleName":"","lastName":"Liu","suffix":""},{"id":534208013,"identity":"1c8b04f3-d911-4571-9a72-369d4e8a3bff","order_by":6,"name":"Zhipu Ge","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1ElEQVRIiWNgGAWjYDCCA3AW84EDHypI08KWeHDGGdK08Bgf5m0hQgff7cMHP92ouGc3f3bPhwO8DQzy/GIH8GuRPJeWLJ1zpji5cc7ZDQckdzAYzpydgF+LwRkeA+nctoRkZoncDQcMzzAkGNwmqIX/8+/cfwnJbBI5Dw4kthGlhYdNOrchwY5HIofhwEFitEieYTOzzjmWkCAhc8zgYMMZCcJ+4TvD/Ph2Tk2Cvfzs5sef/1TYyPNLE9ACA4kNEmBagjjlIGBPiuJRMApGwSgYYQAAOqBKIMxN3JIAAAAASUVORK5CYII=","orcid":"","institution":"Qingdao Stomatological Hospital Affiliated to Qingdao University","correspondingAuthor":true,"prefix":"","firstName":"Zhipu","middleName":"","lastName":"Ge","suffix":""}],"badges":[],"createdAt":"2025-09-03 09:38:32","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7525359/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7525359/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":94622383,"identity":"cf56cdd1-5c5d-4543-941c-54a2447d66ce","added_by":"auto","created_at":"2025-10-29 04:18:18","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":298356,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/5ef6df8d209ec830f7796258.docx"},{"id":94622157,"identity":"80e31539-fa20-4bc1-859f-a191ae1fff69","added_by":"auto","created_at":"2025-10-29 04:18:04","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8107,"visible":true,"origin":"","legend":"","description":"","filename":"fd900ca499ba40c88eba0040c5110bd5.json","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/e3c6edce38c05f096e0463b2.json"},{"id":94622171,"identity":"375da011-2811-442e-8426-c35ecdbe52d6","added_by":"auto","created_at":"2025-10-29 04:18:06","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":111941,"visible":true,"origin":"","legend":"","description":"","filename":"fd900ca499ba40c88eba0040c5110bd51enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/0b408ef362ce12ecf33a2934.xml"},{"id":94622193,"identity":"9974dc08-8366-470d-8a4d-382460b1c600","added_by":"auto","created_at":"2025-10-29 04:18:09","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":142075,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/2ad30bb16220652301ca756e.png"},{"id":94622190,"identity":"d85559ce-5b7a-47eb-aacf-852234206f68","added_by":"auto","created_at":"2025-10-29 04:18:09","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":111152,"visible":true,"origin":"","legend":"","description":"","filename":"fd900ca499ba40c88eba0040c5110bd51structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/7bbec640ba0906662889bca6.xml"},{"id":94622116,"identity":"2a95744f-8289-4c41-be2f-64b7fd43e8be","added_by":"auto","created_at":"2025-10-29 04:17:53","extension":"html","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":119975,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/f93841a0e8ff2801e180667c.html"},{"id":94622227,"identity":"e4ab3cbe-3428-4ad8-948a-768128307a0e","added_by":"auto","created_at":"2025-10-29 04:18:12","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":241283,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentative images illustrating the quality issues examined in this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1A:\u003c/strong\u003e PRs of good quality; 1\u003cstrong\u003eB:\u003c/strong\u003e incomplete image coverage and tongue mispositioning; 1\u003cstrong\u003eC: \u003c/strong\u003echin mispositioning with an inverted occlusal plane and tongue mispositioning; 1\u003cstrong\u003eD:\u003c/strong\u003e chin mispositioning with a U-shaped occlusal plane and asymmetry; 1\u003cstrong\u003eE: \u003c/strong\u003eforeign object (earrings) and head mispositioning with widened anterior teeth; 1\u003cstrong\u003eF:\u003c/strong\u003eforeign object (lead collar) and head mispositioning with narrowed anterior teeth.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/5e1088a75e87b2a2bd876eb4.jpeg"},{"id":96913044,"identity":"74b87eb4-b426-4dca-9d2c-b8ef5f48b78c","added_by":"auto","created_at":"2025-11-27 13:51:14","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1020100,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7525359/v1/c7af1545-913c-4fc9-9eb5-5779dfb12eca.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Image Quality Evaluation of Panoramic Radiographs Using Vision Transformer: A Pilot Study","fulltext":[{"header":"Background","content":"\u003cp\u003ePanoramic radiographs (PRs) play a critical role in dental clinics, particularly for diagnoses that require comprehensive visualization of the jaws [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. However, in routine clinical practice, the image quality of PRs is frequently compromised due to factors such as inadequate patient preparation and improper patient positioning. Moreover, image quality defects in PRs\u0026mdash;such as the presence of foreign objects, incomplete anatomical coverage, and geometric distortions caused by insufficient patient preparation and incorrect positioning\u0026mdash;cannot be corrected through post-processing techniques [\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Consequently, suboptimal image quality may impede accurate clinical decision-making in diagnosis and treatment planning, thereby adversely affecting the overall quality of patient care [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe evaluation of image quality is essential for enhancing overall image standards [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Providing real-time feedback on image quality facilitates immediate re-capture when required. Additionally, retrospective assessment of image quality over time constitutes a critical component of image quality control protocols. This process enables the identification and analysis of underlying causes of encountered issues, thereby supporting the formulation of targeted improvement strategies. To date, quality evaluation for PRs remains a manual process, characterized by subjectivity, labor intensity, and time consumption. The implementation of an automated image quality evaluation system has the potential to mitigate or eliminate these limitations [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eRecent advancements in artificial intelligence (AI) have enabled the development of deep learning networks that exhibit considerable potential across a range of medical imaging applications, including image recognition, classification, segmentation, diagnosis, and treatment decision-making [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The Vision Transformer (ViT) represents an innovative deep learning architecture that adapts the Transformer model\u0026mdash;originally designed for natural language processing\u0026mdash;to computer vision tasks. In contrast to convolutional neural networks (CNN), ViT processes images by partitioning them into fixed-size patches, linearly embedding these patches, and employing self-attention mechanisms to capture global contextual relationships throughout the entire image. This architectural approach allows ViT to model long-range dependencies and complex spatial patterns, thereby addressing the locality limitations inherent in CNNs [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Within the field of medical imaging, ViT has demonstrated significant promise in applications such as disease classification, anatomical segmentation, and prognosis prediction [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Specifically, in dentistry, ViT has been applied to various tasks, including the identification of caries, hypomineralization, periodontal bone loss, osteolytic lesions, tooth segmentation, prediction of palatal midline closure, and diagnosis of oral potentially malignant disorders [\u003cspan additionalcitationids=\"CR15 CR16 CR17 CR18\" citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eOur review of the current literature identified only one recently published study that utilized the YOLOv8 algorithm, a type of CNNs, to assess the image quality of PRs [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. At present, the performance of various deep learning models in detecting image quality defects in PRs has not been thoroughly investigated. Therefore, the objective of this study was to implement an algorithm based on the ViT architecture to detect image quality defects in PRs and to evaluate its performance.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e This retrospective study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki by the World Medical Association. Approval for the study and the use of patient data was granted by the Institutional Medical Ethics Committee (Approval No. 2024KQYX016). Due to the retrospective design of the study, the requirement for informed consent from patients was waived.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eData Collection\u003c/h2\u003e\u003cp\u003eA total of 1,806 PRs were retrospectively retrieved from the Picture Archiving and Communication System (PACS) of Qingdao Stomatological Hospital Affiliated to Qingdao University in March 2024. All images were acquired using the same device (ORTHOPHOS XG5, SIRONA, Germany), with exposure parameters set bewteen 73-80Kv, 13-15mA and an exposure time of 14.1 seconds, adjusted according to patient size. The inclusion criterion was the presence of permanent dentition in the images, while images exhibiting severe motion artifacts were excluded. Patient ages ranged from 13 to 88 years, with a median age of 34 years. The collected images were anonymized and exported in JPEG format.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eData Annotation\u003c/h3\u003e\n\u003cp\u003eAll images were annotated by a single experienced oral radiologist (Su JY) with 10 years of experience. Prior to this, the annotator, in collaboration with another experienced oral radiographer (GY), who has 12 years of experience, reviewed approximately 200 images to standardize the labeling criteria.\u003c/p\u003e\u003cp\u003eAnnotations were classified into six distinct categories, as detailed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e: foreign objects, image coverage, symmetry, chin position, head position, and tongue position. Each category was assigned a value of 0, 1, or 2, reflecting the absence or presence of the corresponding image quality defects. Representative images are presented in \u003cb\u003eFig.\u0026nbsp;1.\u003c/b\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eDescription of the categories\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCategory Index\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCategory Name\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eValue\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eDetailed Description\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eforeign objects\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNo foreign objects\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eForeign objects, such as lead collars, earrings, necklaces, and similar items overlapped with the anatomical structures\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eimage coverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eAppropriate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eBilateral intact maxilla or mandible were not fully displayed\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eSymmetry\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eAppropriate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eThe widths of the teeth and/or the mandibular rami on both sides exhibited asymmetrical\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eC4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eHead position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eAppropriate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eThe head was positioned excessively anteriorly in the machine, resulting in a mesiodistal narrowing of the anterior teeth.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eThe head was positioned excessively posteriorly in the machine, resulting in a mesiodistal widening of the anterior teeth.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eC5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eChin position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eAppropriate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eThe chin tipped too high, the occlusal plane appears flat or inverted\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eThe chin tipped too low, the occlusal plane appears excessively curved, even U-shaped\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eTongue position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eAppropriate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eLow-density air cavity overlapped on the apical region of the maxillary teeth\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003e1A\u003c/b\u003e: PRs of good quality; 1\u003cb\u003eB\u003c/b\u003e: incomplete image coverage and tongue mispositioning; 1\u003cb\u003eC\u003c/b\u003e: chin mispositioning with an inverted occlusal plane and tongue mispositioning; 1\u003cb\u003eD\u003c/b\u003e: chin mispositioning with a U-shaped occlusal plane and asymmetry; 1\u003cb\u003eE\u003c/b\u003e: foreign object (earrings) and head mispositioning with widened anterior teeth; 1\u003cb\u003eF\u003c/b\u003e: foreign object (lead collar) and head mispositioning with narrowed anterior teeth.\u003c/p\u003e\u003cp\u003e\u003cb\u003eFigure\u0026nbsp;1\u003c/b\u003e Representative images illustrating the quality issues examined in this study.\u003c/p\u003e\u003cp\u003eFollowing annotation, the number of categories and their respective proportions across all 1806 PRs are presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eNumber and proportion of the categories\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCategory Index\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCategory Name\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eValue\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eN\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eProportion\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eforeign objects\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1646\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e91.1%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e160\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e8.9%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eimage coverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e634\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e35.1%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1172\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e64.9%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eSymmetry\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1027\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e56.9%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e779\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e43.1%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eC4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eHead position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1553\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e86.0%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e53\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2.9%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e200\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e11.1%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eC5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e\u003cp\u003eChin position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1488\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e82.4%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e227\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e12.6%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e91\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.0%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eC6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eTongue position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e316\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e17.5%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1490\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e82.5%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\n\u003ch3\u003eData Preprocessing\u003c/h3\u003e\n\u003cp\u003eData normalization was conducted by standardizing the pixel values of each image using the widely accepted mean ([0.485, 0.456, 0.406]) and standard deviation ([0.229, 0.224, 0.225]) statistics derived from the ImageNet dataset, thereby transforming the data to approximate a distribution with zero mean and unit variance.\u003c/p\u003e\u003cp\u003ePrior to training, the complete dataset comprising 1,806 images was partitioned into training, validation, and testing subsets in a 4:1:1 ratio, yielding 1,204 images for training, 301 for validation, and 301 for testing.\u003c/p\u003e\u003cp\u003eTo enhance the model's generalization capabilities and reduce the risk of overfitting, dynamic (online) data augmentation was employed during the training process. Instead of pre-generating and storing augmented images, transformations were applied randomly and independently each time an image was loaded. These transformations included random resized cropping (yielding a randomly scaled and cropped 224\u0026times;224 region), random rotation within \u0026plusmn;\u0026thinsp;15\u0026deg;, horizontal flipping with a 50% probability, and random color jittering with variations in brightness, contrast, and saturation up to \u0026plusmn;\u0026thinsp;0.2, and hue up to \u0026plusmn;\u0026thinsp;0.1. Such augmentations increased the diversity of the training data encountered by the model in each epoch, thereby effectively enriching the input distribution without increasing the dataset\u0026rsquo;s storage requirements.\u003c/p\u003e\u003cp\u003eFor the validation and test datasets, no data augmentation techniques were employed to maintain consistent and unbiased evaluation. Instead, a standardized preprocessing pipeline was implemented: each image was resized to 224\u0026times;224 pixels, center-cropped to preserve spatial consistency, converted into a tensor, and normalized using the same ImageNet statistics applied during training.\u003c/p\u003e\u003cp\u003eThe validation dataset was employed for hyperparameter tuning, encompassing modifications to the number of layers, neurons per layer, learning rate, and regularization methods. The test dataset was kept entirely unseen throughout model development and was reserved exclusively for the final, independent assessment of model performance.\u003c/p\u003e\n\u003ch3\u003eViT Architecture\u003c/h3\u003e\n\u003cp\u003eThe ViT incorporates the Transformer architecture into image classification by representing images as sequences of fixed-size patches. Rather than employing convolutional operations, ViT divides each image into non-overlapping patches, embeds these patches into a fixed-dimensional space, and inputs them into a standard Transformer encoder augmented with positional encodings.\u003c/p\u003e\u003cp\u003eIn our implementation, we employ the ViT-B/16 model, comprising 12 Transformer encoder layers. Each layer includes 12 self-attention heads and utilizes a hidden embedding dimension of 768. The fundamental element of the Transformer architecture is the Multi-Head Self-Attention (MHSA) mechanism, which is computed as follows:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:Attention\\left(Q,K,V\\right)=softmax\\left(\\frac{\\left(Q\\bullet\\:{K}^{T}\\right)}{\\sqrt{{d}_{k}}}\\right)\\bullet\\:V$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:Q,\\:K,\\:and\\:V\\)\u003c/span\u003e\u003c/span\u003e are the query, key, and value matrices, respectively, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:d\\text{ₖ}\\)\u003c/span\u003e\u003c/span\u003e is the scaling factor. This self-attention mechanism enables ViT to capture global dependencies between patches.\u003c/p\u003e\u003cp\u003eSince the models were trained solely on clinical images, an inherent data imbalance arose, as not all features, error types, and their combinations were equally represented. To address this challenge, we implemented a combined strategy that integrates data resampling with loss weighting. This approach is designed to improve model performance, particularly by mitigating class imbalance, while maintaining the accuracy of the majority class. The objective function for this combined strategy is expressed as follows:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:{L}_{\\text{total\\:}}=-\\sum\\:_{i=1}^{N}\\:{\\alpha\\:}_{{c}_{i}}\\cdot\\:{w}_{i}\\cdot\\:{y}_{i}\\text{l}\\text{o}\\text{g}\\left({\\widehat{y}}_{i}\\right)$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:N\\)\u003c/span\u003e\u003c/span\u003e is the number of samples, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\alpha\\:}_{{c}_{i}}\\)\u003c/span\u003e\u003c/span\u003e is the weight for class\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\:c}_{i}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the ground truth label, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{w}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the weight for sampling and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{y}}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the model\u0026prime;s predicted probability.\u003c/p\u003e\n\u003ch3\u003eViT Training Strategies\u003c/h3\u003e\n\u003cp\u003eA batch size of 32 was utilized during training, and the Adam optimizer was applied with a learning rate of 1\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e to reduce overfitting. The model underwent training for 100 epochs, employing the ViT-B/16 architecture initialized with pre-trained weights from ImageNet-21k to facilitate faster convergence and improve generalization performance.\u003c/p\u003e\u003cp\u003eEach image was annotated with six independent categories, each corresponding to a distinct clinically relevant classification task. Consequently, six separate classification models were developed, all employing the same architecture and training pipeline. To mitigate class imbalance inherent in the clinical dataset, a combined approach utilizing weighted cross-entropy loss and a Weighted Random Sampler was implemented. The training, validation, and testing procedures were performed on the following hardware configuration: Nvidia GeForce RTX 3090 (24 GB), AMD Ryzen 7 5800X, and 64 GB of RAM.\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eStatistical Analysis\u003c/h2\u003e\u003cp\u003eTo assess the intra- and inter-examiner reproducibility of the annotator, 100 randomly selected images were re-annotated independently by Su JY and GY after an interval of one month. Cohen\u0026rsquo;s Kappa test was employed to evaluate the statistical significance of both inter- and intra-observer variability. The Kappa values were interpreted according to the following scale: slight (0-0.20), fair (0.21\u0026ndash;0.40), moderate (0.41\u0026ndash;0.60), substantial (0.61\u0026ndash;0.80), and almost perfect (0.81-1) agreement.\u003c/p\u003e\u003cp\u003eTo assess the model\u0026rsquo;s performance in classifying image quality defects, the area under the receiver operating characteristic curve (AUC) was computed. The model\u0026rsquo;s performance was categorized as excellent (AUC\u0026thinsp;\u0026ge;\u0026thinsp;0.9), high (0.8\u0026thinsp;\u0026le;\u0026thinsp;AUC\u0026thinsp;\u0026lt;\u0026thinsp;0.9), fair (0.7\u0026thinsp;\u0026le;\u0026thinsp;AUC\u0026thinsp;\u0026lt;\u0026thinsp;0.8), or poor (AUC\u0026thinsp;\u0026lt;\u0026thinsp;0.7). Additionally, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were also calculated.\u003c/p\u003e\u003cp\u003eTo assess the prediction speed of the model, a dataset comprising 1,000 PRs was partitioned into five subsets representing 20%, 40%, 60%, 80%, and 100% of the data, corresponding to 200, 400, 600, 800, and 1,000 images, respectively. For each subset, the prediction time was measured to examine how processing speed varies with increasing input size. Each prediction was repeated three times to compute the average duration and enhance the reliability of the measurements. Furthermore, computation times for each of the six labels were recorded to evaluate the time costs associated with different labels.\u003c/p\u003e\u003cp\u003eAll statistical analyses were conducted using a custom-developed script written in the Python programming language (version 3.6; \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.python.org\u003c/span\u003e\u003cspan address=\"http://www.python.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eThe Kappa values for the total categories and each individual category are presented in\u003c/span\u003e Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. \u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eFor the total categories, the Kappa values for intra-examiner agreement (0.872) and inter-examiner agreement (0.862) was almost perfect level of agreement. Regarding each category individually, only the inter-examiner Kappa value for symmetry demonstrated moderate agreement (0.588), whereas the other labels exhibited at least substantial agreement (ranging from 0.636 to 1.0). These findings support the robustness of the labeling process.\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eKappa values for intra- and inter-examiner reproducibility\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"8\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eForeign objects\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eImage coverage\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSymmetry\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eHead position\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eChin position\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eTongue position\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eTotal\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eIntra-\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.954\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.636\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.702\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.833\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.880\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e0.872\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eInter-\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.884\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.930\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.588\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.783\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.771\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.860\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e0.862\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThe classification performance of the model is presented in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. The proposed model demonstrated varying performance across different defect categories: it achieved excellent accuracy for foreign objects (AUC\u0026thinsp;=\u0026thinsp;0.96), image coverage (AUC\u0026thinsp;=\u0026thinsp;0.96), and tongue position (AUC\u0026thinsp;=\u0026thinsp;0.93). Additionally, it showed high accuracy for chin position (AUC\u0026thinsp;=\u0026thinsp;0.88) but exhibited poor accuracy for symmetry (AUC\u0026thinsp;=\u0026thinsp;0.61) and head position (AUC\u0026thinsp;=\u0026thinsp;0.62).\u003c/p\u003e\u003cp\u003eThe accuracy values were as follows: 0.95 for foreign objects, 0.94 for image coverage, 0.57 for symmetry, 0.82 for head position, 0.87 for chin position, and 0.86 for tongue position. Sensitivity values were 0.80 for foreign objects, 0.92 for image coverage, 0.25 for symmetry, 0.09 for head position, 0.77 for chin position, and 0.97 for tongue position. Specificity values included 0.96 for foreign objects, 0.97 for image coverage, 0.78 for symmetry, 0.95 for head position, 0.89 for chin position, and 0.29 for tongue position. PPV were 0.65 for foreign objects, 0.98 for image coverage, 0.43 for symmetry, 0.24 for head position, 0.60 for chin position, and 0.88 for tongue position. NPV were 0.98 for foreign objects, 0.87 for image coverage, 0.61 for symmetry, 0.85 for head position, 0.95 for chin position, and 0.67 for tongue position.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eClassification accuracy of the model\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eForeign objects\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eImage coverage\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSymmetry\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eHead position\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eChin position\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eTongue position\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.94\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.57\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.82\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.86\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSensitivity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.25\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.09\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.77\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSpecificity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.89\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.29\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePPV\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.65\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.43\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.24\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.60\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNPV\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.61\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.85\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.67\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAUC\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.61\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.62\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eIn the evaluation of model prediction speed performance, the results indicated a linear increase in prediction time corresponding to the growth in data size from 20% to 100% (refer to Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). Overall, as the volume of input data increased, the model's prediction time extended substantially, exhibiting consistent increments across various tasks. The average processing speed per image was approximately 0.03\u0026thinsp;\u0026plusmn;\u0026thinsp;0.002 seconds, underscoring the model's computational efficiency for real-time or high-throughput dental imaging applications.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePredictive Speed Performance of the Model\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCategory\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e20%\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e40%\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e60%\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e80%\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003e100%\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eForeign objects\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e9.13\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e12.23\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e19.47\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e23.10\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e30.48\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eImage coverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8.52\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e13.08\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e16.73\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e23.53\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e28.15\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSymmetry\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8.79\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e12.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e16.86\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e24.13\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e28.69\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHead position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e6.27\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e12.29\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e17.22\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e23.94\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e29.61\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eChin position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e6.65\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e10.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e18.74\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e24.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e27.59\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTongue position\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e9.55\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e10.79\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e16.89\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e26.74\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e27.89\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eHigh-quality medical images are critical for accurate diagnosis. The implementation of an image quality control program aimed at identifying deficiencies and providing targeted training to operators can systematically enhance image quality. This approach not only improves the technical proficiency of operators but also reduces unnecessary radiation exposure to patients [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. AI has demonstrated potential in increasing the efficiency of image quality assessment by reducing subjective variability among evaluators. Recent studies have reported the application of AI in assessing various medical imaging modalities, including human chest and knee radiographs, corneal slit-lamp images, gastrointestinal endoscopic images, hip ultrasound images, and canine thoracic radiographs [\u003cspan additionalcitationids=\"CR24 CR25 CR26 CR27 CR28\" citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. Nevertheless, the use of AI for evaluating image quality in PRs remains inadequately investigated [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn the present study, the performance in detecting foreign objects and assessing incomplete image coverage was exemplary, with both tasks achieving AUC values of 0.96. The most frequently identified foreign object was the lead collar, followed occasionally by earrings. Given the relatively fixed locations of these foreign objects and their distinct appearance compared to normal anatomical structures, the high detection performance aligns with anticipated outcomes. Prior research has investigated the application of AI for detecting multiple diseases on PRs, where AI models demonstrated high accuracy in identifying caries (91.5%), osteoporosis (89.89%), maxillary sinusitis (87.5%), periodontal bone loss (93.09%), and tooth identification and numbering (93.67%) [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. Compared to disease identification, the detection of foreign objects is inherently less complex for AI systems. Regarding image coverage recognition, the chin region was most frequently affected by incomplete visualization in this study. These findings are consistent with previous reports indicating that AI can accurately detect incomplete image boundaries. For instance, Nousiainen et al. trained a CNN using 2,589 posteroanterior chest radiographs to evaluate the visibility of the left, right, cranial, and caudal lung edges, achieving AUC values exceeding 0.92 for all four edges [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Similarly, Banzato et al. introduced the label \"Cut\" to denote incomplete visualization of the thorax in canine radiographs, with the CNN attaining AUC values above 0.84 [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. Furthermore, Ameli\u0026rsquo;s recent study demonstrated that YOLOv8 achieved classification accuracies of 0.872 and 0.741 for foreign artifacts and image coverage on PRs, respectively [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe detection performance for symmetry and head position was suboptimal, with AUC values below 0.65, accompanied by notably low sensitivity (0.25 for symmetry and 0.09 for head position) and PPV (0.43 for symmetry and 0.24 for head position). Two potential factors may account for these findings. First, the evaluation of these two types of image defects is inherently subjective, as evidenced by the relatively low intra-examiner Kappa coefficients for symmetry (0.636) and head position (0.702). Second, the assessment of symmetry primarily involves comparing the widths of the posterior teeth on the left and right sides, whereas head position identification focuses on significant variations in the width of the anterior teeth, which also entails comparing anterior and posterior tooth widths. Consequently, accurate subjective evaluation of these image quality defects requires precise identification of individual teeth and comparison of their dimensions. The ViT-based algorithm employed may be limited in effectively processing such detailed information. Performance might be enhanced by integrating multiple deep learning approaches, such as initially segmenting target teeth via image segmentation techniques, followed by quantitative measurement of tooth dimensions. Notably, the proposed model exhibited promising results in detecting chin position on PRs, achieving an overall accuracy of 0.87 and an AUC of 0.88. In a study by Ameli et al., multiple features\u0026mdash;including image symmetry and occlusal plane deformation attributable to chin position\u0026mdash;were combined into a single label, with the YOLOv8 model attaining a recognition accuracy of 0.773 [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe proposed model attained an accuracy of 0.964 and an AUC of 0.93 in the classification of tongue positions. However, the specificity of 0.292 reveals a considerable limitation in accurately identifying normal tongue positions, leading to a relatively high rate of false-positive results. This limitation is further corroborated by the PPV of 0.88 and the NPV of 0.67, indicating that although the model is reliable in confirming abnormal cases, it is less effective in excluding normal cases.\u003c/p\u003e\u003cp\u003eThe criteria for determining image quality defects in the present study, while widely accepted, are relatively stringent for clinical applications. Consequently, the proportion of optimal images within our datasets is small compared to that of suboptimal images based on these criteria. To enhance operator training and better align with clinical needs, it would be beneficial to introduce an additional category representing clinically acceptable images in future work. This category would denote cases in which re-exposure is unnecessary, despite the image having room for improvement.\u003c/p\u003e\u003cp\u003eThis study has several limitations. First, the data source is limited, and the sample size is relatively small. Previous research indicates that the ViT model performs optimally with large datasets [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Second, teeth in the mixed dentition stage present greater complexity in panoramic images, and children are more prone to severe motion artifacts during imaging. Therefore, this preliminary study exclusively included images of patients with permanent dentition to ensure higher image quality and consistency. Third, the image quality labels used in this study do not account for overexposure and underexposure due to the lack of standardized objective evaluation criteria and the capacity to adjust image contrast post-acquisition in digital panoramic radiography. In future research, we plan to incorporate a larger number of images exhibiting greater variability in quality from diverse sources to enhance the robustness and generalizability of the model. Furthermore, integrating multiple algorithms, such as image segmentation and landmark recognition, may further improve the model\u0026rsquo;s performance.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study presents a Vit-based algorithm developed to detect image quality defects in PRs. The algorithm exhibited excellent accuracy in identifying foreign objects, image coverage, and tongue position, while achieving high accuracy in detecting chin position. However, its performance was comparatively less effective in assessing symmetry and head position. The average processing time per image was approximately 0.03\u0026thinsp;\u0026plusmn;\u0026thinsp;0.002 seconds, underscoring the model's computational efficiency for real-time or high-throughput dental imaging applications. This algorithm holds potential for scaling to real-time quality control, enabling comprehensive statistical analysis of all images acquired within an imaging center. Furthermore, it may function as a training tool by providing immediate and anonymous feedback to operators.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eViT \u0026nbsp; \u0026nbsp; \u0026nbsp;Vision Transformer\u003c/p\u003e\n\u003cp\u003ePRs\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Panoramic Radiographs\u003c/p\u003e\n\u003cp\u003eAUC\u0026nbsp; \u0026nbsp;\u0026nbsp;Area Under the Receiver Operating Characteristic Curve\u003c/p\u003e\n\u003cp\u003ePPV \u0026nbsp; \u0026nbsp;\u0026nbsp;Positive Predictive Value\u003c/p\u003e\n\u003cp\u003eNPV \u0026nbsp; \u0026nbsp;Negative Predictive Value\u003c/p\u003e\n\u003cp\u003eAI\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Artificial Intelligence\u003c/p\u003e\n\u003cp\u003eCNN \u0026nbsp; \u0026nbsp;Convolutional Neural Networks\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eG. ZP and L. H designed the study, analyzed the data, and critically revised the manuscript; W.CJ conducted the primary model training and contributed to writing the manuscript. S.JY performed image annotation and drafted the manuscript. G.Y collected the image data and assisted with image annotation. H.XC and Y.XX provided support in model training. All authors reviewed and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by Key R\u0026amp;D Program of Shandong Province (2024TSGC0226); Qingdao Key Health Discipline Development Fund (2025-2027); Qingdao Clinical Research Center for Oral Diseases (22-3-7-lczx-7-nsh); Shandong Provincial Key Medical and Health Discipline of Oral Medicine(Qingdao Stomatological Hospital Affiliated to Qingdao University) (2025-2027).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated and/or analyzed during the present study are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis retrospective study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki by the World Medical Association. Approval for the study and the use of patient data was granted by the Institutional Medical Ethics Committee (Approval No. 2024KQYX016). Due to the retrospective design of the study, the requirement for informed consent from patients was waived.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot Applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003ePritchard B, Akbarian Tefaghi F, Makdissi J. Anatomy in panoramic image interpretation. Br Dent J. 2020;228(4):229. doi: 10.1038/s41415-020-1324-1.\u003c/li\u003e\n\u003cli\u003eMacDonald D, Telyakova V. An Overview of Cone-Beam Computed Tomography and Dental Panoramic Radiography in Dentistry in the Community. Tomography. 2024;10(8):1222-1237. doi: 10.3390/tomography10080092.\u003c/li\u003e\n\u003cli\u003eEkstr\u0026ouml;mer K, Hjalmarsson L. Positioning errors in panoramic images in general dentistry in S\u0026ouml;rmland County, Sweden. Swed Dent J. 2014;38(1):31-8. \u003c/li\u003e\n\u003cli\u003eLingam AS, Koppolu P, Abdulsalam R, Reddy RL, Anwarullah A, Koppolu D. Assessment of common errors and subjective quality of digital panoramic radiographs in dental institution, Riyadh. Ann Afr Med. 2023;22(1):49-54. doi: 10.4103/aam.aam_213_21. \u003c/li\u003e\n\u003cli\u003eKhator AM, Motwani MB, Choudhary AB. A study for determination of various positioning errors in digital panoramic radiography for evaluation of diagnostic image quality. Indian J Dent Res. 2017;28(6):666-670. doi: 10.4103/ijdr.IJDR_781_16. \u003c/li\u003e\n\u003cli\u003eOhashi K, Nagatani Y, Yoshigoe M, Iwai K, Tsuchiya K, Hino A, Kida Y, Yamazaki A, Ishida T. Applicability Evaluation of Full-Reference Image Quality Assessment Methods for Computed Tomography Images. J Digit Imaging. 2023;36(6):2623-2634. doi: 10.1007/s10278-023-00875-0. \u003c/li\u003e\n\u003cli\u003eTan Y, Peng Y, Guo L, Liu D, Luo Y. Cost-effectiveness analysis of AI-based image quality control for perinatal ultrasound screening. BMC Med Educ. 2024;24(1):1437. doi: 10.1186/s12909-024-06477-w.\u003c/li\u003e\n\u003cli\u003ePatil S, Bhandi S, Awan KH, Licari F. AI-assisted dental care. Br Dent J. 2023;234(8):555-556. doi: 10.1038/s41415-023-5813-x. \u003c/li\u003e\n\u003cli\u003eSchwendicke F, Samek W, Krois J. Artificial Intelligence in Dentistry: Chances and Challenges. J Dent Res. 2020;99(7):769-774. doi: 10.1177/0022034520915714. \u003c/li\u003e\n\u003cli\u003eHan K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, Yang Z, Zhang Y, Tao D. A Survey on Vision Transformer. IEEE Trans Pattern Anal Mach Intell. 2023;45(1):87-110. doi: 10.1109/TPAMI.2022.3152247. \u003c/li\u003e\n\u003cli\u003eZhang Y, Wang J, Gorriz JM, Wang S. Deep Learning and Vision Transformer for Medical Image Analysis. J Imaging. 2023;9(7):147. doi: 10.3390/jimaging9070147. \u003c/li\u003e\n\u003cli\u003eKhan S, Ali H, Shah Z. Identifying the role of vision transformer for skin cancer-A scoping review. Front Artif Intell. 2023;6:1202990. doi: 10.3389/frai.2023.1202990.\u003c/li\u003e\n\u003cli\u003eGoceri E. Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function. J Imaging Inform Med. 2024;37(2):851-863. doi: 10.1007/s10278-023-00954-2.\u003c/li\u003e\n\u003cli\u003eFelsch M, Meyer O, Schlickenrieder A, Engels P, Sch\u0026ouml;newolf J, Z\u0026ouml;llner F, Heinrich-Weltzien R, Hesenius M, Hickel R, Gruhn V, K\u0026uuml;hnisch J. Detection and localization of caries and hypomineralization on dental photographs with a vision transformer model. NPJ Digit Med. 2023;6(1):198. doi: 10.1038/s41746-023-00944-2.\u003c/li\u003e\n\u003cli\u003eDujic H, Meyer O, Hoss P, W\u0026ouml;lfle UC, W\u0026uuml;lk A, Meusburger T, Meier L, Gruhn V, Hesenius M, Hickel R, K\u0026uuml;hnisch J. Automatized Detection of Periodontal Bone Loss on Periapical Radiographs by Vision Transformer Networks. Diagnostics (Basel). 2023;13(23):3562. doi: 10.3390/diagnostics13233562. \u003c/li\u003e\n\u003cli\u003evan Nistelrooij N, Ghanad I, Bigdeli AK, Thiem DGE, von See C, Rendenbach C, Maistreli I, Xi T, Berg\u0026eacute; S, Heiland M, Vinayahalingam S, Gaudin R. Automated detection and classification of osteolytic lesions in panoramic radiographs using CNNs and vision transformers. BMC Oral Health. 2025;25(1):950. doi: 10.1186/s12903-025-06209-6.\u003c/li\u003e\n\u003cli\u003eSchneider L, Krasowski A, Pitchika V, Bombeck L, Schwendicke F, B\u0026uuml;ttner M. Assessment of CNNs, transformers, and hybrid architectures in dental image segmentation. J Dent. 2025;156:105668. doi: 10.1016/j.jdent.2025.105668. \u003c/li\u003e\n\u003cli\u003eTang H, Liu S, Tan W, Fu L, Yan M, Feng H. Prediction of midpalatal suture maturation stage based on transfer learning and enhanced vision transformer. BMC Med Inform Decis Mak. 2024; 24(1):232. doi: 10.1186/s12911-024-02598-w.\u003c/li\u003e\n\u003cli\u003eVinayahalingam S, van Nistelrooij N, Rothweiler R, Tel A, Verhoeven T, Tr\u0026ouml;ltzsch D, Kesting M, Berg\u0026eacute; S, Xi T, Heiland M, Fl\u0026uuml;gge T. Advancements in diagnosing oral potentially malignant disorders: leveraging Vision transformers for multi-class detection. Clin Oral Investig. 2024;28(7):364. doi: 10.1007/s00784-024-05762-8.\u003c/li\u003e\n\u003cli\u003eAmeli N, Miri Moghaddam M, Lai H, Pacheco-Pereira C. Automated quality evaluation of dental panoramic radiographs using deep learning. Imaging Sci Dent. 2025;55(2):175-188. doi: 10.5624/isd.20240232.\u003c/li\u003e\n\u003cli\u003eJenkins NW, Parrish JM, Sheha ED, Singh K. Intraoperative risks of radiation exposure for the surgeon and patient. Ann Transl Med. 2021;9(1):84. doi: 10.21037/atm-20-1052. \u003c/li\u003e\n\u003cli\u003eNajjar R. Radiology\u0026apos;s Ionising Radiation Paradox: Weighing the Indispensable Against the Detrimental in Medical Imaging. Cureus. 2023;15(7):e41623. doi: 10.7759/cureus.41623. \u003c/li\u003e\n\u003cli\u003eNousiainen K, M\u0026auml;kel\u0026auml; T, Piilonen A, Peltonen JI. Automating chest radiograph imaging quality control. Phys Med. 2021;83:138-145. doi: 10.1016/j.ejmp.2021.03.014. \u003c/li\u003e\n\u003cli\u003eMeng Y, Ruan J, Yang B, Gao Y, Jin J, Dong F, Ji H, He L, Cheng G, Gong X. Automated quality assessment of chest radiographs based on deep learning and linear regression cascade algorithms. Eur Radiol. 2022;32(11):7680-7690. doi: 10.1007/s00330-022-08771-x.\u003c/li\u003e\n\u003cli\u003eSun H, Wang W, He F, Wang D, Liu X, Xu S, Zhao B, Li Q, Wang X, Jiang Q, Zhang R, Liu S, Xiao Y. An AI-Based Image Quality Control Framework for Knee Radiographs. J Digit Imaging. 2023;36(5):2278-2289. doi: 10.1007/s10278-023-00853-6. \u003c/li\u003e\n\u003cli\u003eLi Z, Jiang J, Chen K, Zheng Q, Liu X, Weng H, Wu S, Chen W. Development of a deep learning-based image quality control system to detect and filter out ineligible slit-lamp images: A multicenter study. Comput Methods Programs Biomed. 2021;203:106048. doi: 10.1016/j.cmpb.2021.106048. \u003c/li\u003e\n\u003cli\u003eYuan P, Bai R, Yan Y, Li S, Wang J, Cao C, Wu Q. Subjective and objective quality assessment of gastrointestinal endoscopy images: From manual operation to artificial intelligence. Front Neurosci. 2023;16:1118087. doi: 10.3389/fnins.2022.1118087. \u003c/li\u003e\n\u003cli\u003eHareendranathan AR, Chahal BS, Zonoobi D, Sukhdeep D, Jaremko JL. Artificial Intelligence to Automatically Assess Scan Quality in Hip Ultrasound. Indian J Orthop. 2021;55(6):1535-1542. doi: 10.1007/s43465-021-00455-w.\u003c/li\u003e\n\u003cli\u003eBanzato T, Wodzinski M, Burti S, Vettore E, Muller H, Zotti A. An AI-based algorithm for the automatic evaluation of image quality in canine thoracic radiographs. Sci Rep. 2023;13(1):17024. doi: 10.1038/s41598-023-44089-4.\u003c/li\u003e\n\u003cli\u003eTurosz N, Chęcińska K, Chęciński M, Brzozowska A, Nowak Z, Sikora M. Applications of artificial intelligence in the analysis of dental panoramic radiographs: an overview of systematic reviews. Dentomaxillofac Radiol. 2023;52(7):20230284. doi: 10.1259/dmfr.20230284.\u003c/li\u003e\n\u003cli\u003eUsman M, Zia T, Tariq A. Analyzing Transfer Learning of Vision Transformers for Interpreting Chest Radiography. J Digit Imaging. 2022;35(6):1445-1462. doi: 10.1007/s10278-022-00666-z. \u003c/li\u003e\n\u003cli\u003eMa D, Taher MRH, Pang J, Islam NU, Haghighi F, Gotway MB, Liang J. Benchmarking and Boosting Transformers for Medical Image Classification. Domain Adapt Represent Transf (2022). 2022;13542:12-22. doi: 10.1007/978-3-031-16852-9_2.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"panoramic imaging, vision transformer, quality control, deep learning, artificial intelligence","lastPublishedDoi":"10.21203/rs.3.rs-7525359/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7525359/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e\u003cp\u003eThis study assesses the performance of a Vision Transformer (ViT)-based algorithm designed for the automatic detection of image quality defects in panoramic radiographs (PRs).\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eA total of 1806 anonymized PRs were retrospectively collected and randomly divided into training, validation, and test sets in a 4:1:1 ratio. Six categories of image quality defects were defined: foreign objects, image coverage, symmetry, head position, chin position, and tongue position. A ViT based model was developed, trained, and fine-tuned. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The model\u0026rsquo;s inference speed was also measured.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eThe model achieved AUC values of 0.96, 0.96, 0.61, 0.62, 0.88, and 0.93 for detecting foreign objects, image coverage errors, symmetry defects, head positioning errors, chin positioning errors, and tongue positioning errors, respectively. The average processing time per image was 0.03\u0026thinsp;\u0026plusmn;\u0026thinsp;0.002 seconds, indicating efficient real-time performance.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e\u003cp\u003eThe proposed ViT-based deep learning algorithm demonstrates effective performance in detecting image quality defects in PRs. Its rapid processing speed and capability for real-time feedback highlight its potential as a valuable tool for quality control and operator training in clinical settings.\u003c/p\u003e","manuscriptTitle":"Image Quality Evaluation of Panoramic Radiographs Using Vision Transformer: A Pilot Study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-29 03:59:24","doi":"10.21203/rs.3.rs-7525359/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f4968ab9-efcf-4d7e-98e7-e935df7d3592","owner":[],"postedDate":"October 29th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-11-25T10:53:54+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-29 03:59:24","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7525359","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7525359","identity":"rs-7525359","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0