Deep Learning-Based Multileaf Collimator Error Classification and Quantification in Patient- Specific Intensity Modulated Radiation Therapy Quality Assurance | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Deep Learning-Based Multileaf Collimator Error Classification and Quantification in Patient- Specific Intensity Modulated Radiation Therapy Quality Assurance Chirasak Khamfongkhruea, Sawitri Jitsuk, Kampheang Nimjaroeng, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6231733/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Purpose This study presents a deep learning–based patient-specific quality assurance (PSQA) framework for rectal cancer intensity-modulated radiation therapy (IMRT) designed to classify and quantify multileaf collimator (MLC) position errors. Materials and Methods Thirty rectal IMRT treatment plans were analyzed, and both systematic and random MLC errors were deliberately introduced by modifying the digital imaging and communications in medicine - radiation therapy plan files. The framework utilizes convolutional neural networks (CNNs) trained on subtraction images generated from electronic portal imaging device–acquired and portal dose image prediction–predicted dose distributions. One CNN was developed to categorize plans based on the associated errors into three groups: error-free, systematic errors, and random errors. In parallel, regression-based CNN models were created to estimate the magnitude of the detected errors. Results The classification network achieved an overall accuracy of 96.67%, with excellent sensitivity and specificity across all categories. For systematic error estimation, the regression model produced a mean absolute error of 1.082 and a strong R-squared of 0.804, indicating precise quantification capability. In contrast, the random error model reached an accuracy of 89.00% but had a lower R-squared of 0.294, highlighting an area for future improvement. Conclusion These findings suggest that deep learning models can offer more detailed and quantitative insights into treatment errors compared to traditional gamma analysis, ultimately enhancing PSQA processes and contributing to improved treatment verification and patient safety. Patient-specific quality assurance error detection MLC positional errors deep learning electronic portal imaging device Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction Advanced radiation therapy techniques, particularly intensity-modulated radiation therapy (IMRT), play a crucial role in rectal cancer treatment by delivering radiation in a highly precise manner, minimizing damage to surrounding organs and significantly reducing acute bowel side effects ( 1 – 4 ). This precision is achieved through the dynamic modulation of multileaf collimators (MLCs), sophisticated mechanical components that shape radiation beams to match tumor volumes ( 5 , 6 ). Accurate delivery of these complex treatment plans is critical to ensuring that the planned dose distribution is faithfully reproduced in the patient. Patient-specific quality assurance (PSQA) is essential for maintaining treatment accuracy and safety ( 7 , 8 ). PSQA typically involves measurement-based verification ( 9 ), with the gamma passing rate (GPR) serving as a commonly used evaluation metric. The GPR compares predicted and measured dose distributions based on the dose difference (DD) and distance-to-agreement (DTA) criteria ( 10 – 12 ). Task Group 218 recommends a 95% GPR threshold using 3%/2 mm criteria ( 8 ). However, while GPR effectively identifies large deviations, it has notable limitations in detecting subtle but clinically significant errors, particularly those related to MLC positioning ( 13 – 17 ). Such errors can alter dose delivery in critical regions, potentially compromising treatment outcomes. Improving the detection of MLC-related errors has thus become a priority for enhancing the reliability and effectiveness of PSQA. Recent advancements in AI have shown significant potential to address this challenge ( 18 – 23 ). AI models provide a more detailed, quantitative analysis of treatment errors, complementing or surpassing traditional gamma analysis. For example, Wootton et al. ( 24 ) investigated the application of logistic regression models to detect systematic and random MLC positioning errors by analyzing radiomic features extracted from gamma maps generated using an electronic portal imaging device (EPID). Their model classified plans into three categories—error-free, systematic MLC errors (uniform 2 mm shift), and random MLC errors (0–2 mm variability)—achieving classification accuracies of 0.72 and 0.76 for systematic and random errors, respectively, surpassing conventional gamma analysis with an area under the curve (AUC) of 0.74. Building on this, Nyflot et al. ( 20 ) explored multiple machine learning models, including support vector machines (SVMs), artificial neural networks (ANNs), decision trees, and k-nearest neighbors (KNNs). By integrating radiomic features from gamma maps with convolutional neural network (CNN)–derived features, the authors aimed to detect patient-specific errors in IMRT QA measurements. Among the models tested, the SVM demonstrated the highest classification accuracy, achieving a success rate of 0.64 across the three error categories. Further advancing this field, Kimura et al. ( 18 ) employed a deep learning-based approach, specifically a CNN, to predict MLC positional errors using features extracted from dose difference maps and PSQA measurements obtained from 3D detectors. Their deep learning model demonstrated an overall accuracy of 94.40%, effectively classifying treatment plans into error-free, systematic error, and random error categories. These studies highlight the potential of deep learning, particularly CNN-based models ( 25 – 27 ), to advance PSQA beyond traditional gamma analysis. While existing research demonstrates the effectiveness of CNNs in error detection, a critical gap remains: Most studies do not quantify error magnitudes ( 22 , 23 , 28 – 33 ), which are essential for clinical decision-making. Both minor and significant MLC deviations can critically impact dose distribution, especially in high-gradient regions. By estimating error type and magnitude, deep learning models can provide actionable insights, improving the precision and reliability of IMRT QA and enhancing treatment safety and effectiveness. Therefore, this study aims to build upon these advancements by developing a deep learning-based tool to classify and quantify MLC errors in PSQA for rectal cancer IMRT. We hypothesize that this approach will provide more accurate and clinically meaningful error detection than traditional methods, thereby improving treatment verification and patient safety. Unlike previous studies, which primarily focus on classification, this study also quantifies error magnitudes, providing actionable clinical insights for real-time IMRT QA. 2. Materials and Methods This study employed a deep learning-based approach to classify and quantify MLC positioning errors in IMRT plans. The methodology consisted of multiple stages, including plan selection, plan generation, data acquisition, data preparation, and model generation and evaluation, as shown in Fig. 1 . 2.1 Plan selection In this study, we selected 30 intensity-modulated radiation therapy (IMRT) plans comprising 357 fields for rectal cancer patients treated at Chulabhorn Hospital between January and November 2022. Each treatment plan was designed to deliver a total dose of 50 Gy in 25 fractions (2 Gy per fraction) using the Ethos Treatment Planning System (TPS) (Varian Medical Systems, Palo Alto, CA, USA) with the Acuros XB algorithm. The IMRT technique, utilizing 12 beams with 6 MV flattening filter-free (6FFF) photon energy, was employed. The advanced dual-layer MLC of the Ethos system played a crucial role in precisely shaping the radiation beams, with the added benefit of reduced interleaf leakage. For plan selection, PSQA was performed on all plans, confirming a gamma passing rate (3%/2 mm) exceeding 95%. This rigorous validation ensured that the clinical plans were accurate and free from significant errors, establishing a reliable baseline for further investigation into the impact of MLC positioning errors on treatment delivery. This retrospective study complied with the Declaration of Helsinki and received approval from the Chulabhorn Royal Academy Institutional Review Board (IRB) under approval number EC039/2566. Given the retrospective nature of the study and the use of de-identified patient treatment plan data, the IRB waived the requirement for informed consent, as no direct patient interaction or intervention was involved. Clinical trial number: not applicable. 2.2 Investigated scenarios with and without errors To investigate error detection, three conditions were evaluated: (a) error-free, (b) systematic MLC position errors, and (c) random MLC position errors, as illustrated in Fig. 2 . The original rectal IMRT treatment plans, which exhibited a gamma passing rate of more than 95% (3%/2 mm criteria), served as the error-free group. Systematic and random MLC position errors were introduced by directly altering the treatment plan’s DICOM RT Plan files (specifically targeting MLC positions) using MATLAB 2023a (MathWorks, Inc., Natick, MA, USA). Uniform shifts were applied to all MLC leaves across every control point to induce systematic MLC position errors. Six error magnitudes were introduced: ±1.0 mm, ± 2.0 mm, and ± 5.0 mm. MATLAB functions were developed to randomize MLC leaf positions (distal and proximal banks) and error magnitudes for random errors. Error magnitudes were categorized as relevant (exceeding 2 mm, up to 5 mm) or nonrelevant (within 2 mm). Each plan incorporated 10 random position errors—five per bank—with two nonrelevant and three relevant errors per bank. A total of 2,856 raw data fields were generated from the initial 357 fields by introducing systematic errors (six magnitudes), random errors, and error-free conditions, enabling a comprehensive investigation into error detection and its impact on treatment accuracy. 2.3 Data acquisition PSQA was conducted using an EPID-based dosimetry method for all plans. These plans were delivered on an Ethos linear accelerator (Halcyon version 3.1, Varian Medical Systems, Palo Alto, CA), a state-of-the-art system designed for adaptive radiotherapy. During beam delivery, integrated EPID images were acquired for each treatment field using the aSi-1200 EPID, a high-resolution amorphous silicon detector with an active area of 40 × 30 cm², a resolution of 1024 × 768 pixels, and a pixel size of 0.39 mm. A two-dimensional (2D) dose distribution at a source-to-imager distance (SID) of 154 cm was calculated using the portal dose image prediction (PDIP) algorithm integrated within the Eclipse treatment planning system (Version 13.6, Varian Medical Systems, Palo Alto, CA). The PDIP algorithm utilizes a sophisticated model to predict the expected dose distribution at the EPID plane, accounting for beam energy, field size, and patient-specific anatomy. 2.4 Data preparation The data preparation process consisted of sequential steps to ensure consistency and accuracy in the PSQA analysis. The first step involved resetting the collimator angle to 0°. Because treatment plans often use various collimator angles to optimize beam delivery, these rotations can cause misalignment of the fluence map on EPID images. To address this issue, the images were rotated back to their original orientations, and missing regions were filled by extending pixel values from the edges of the image to maintain uniformity. Next, image resizing was performed to correct differences in resolution between the EPID (1,180 × 1,180 pixels) and PDIP (512 × 512 pixels) images. Both image types were resized to a standardized dimension of 256 × 256 pixels, ensuring a consistent pixel spacing of 1 mm × 1 mm, which is necessary for reliable spatial comparison of dose distributions. After resizing, image registration was applied to minimize positional discrepancies between the EPID and PDIP images. Rigid registration, beginning with edge detection, was used to identify key structural boundaries. The PDIP image was treated as the fixed reference, while the EPID image was adjusted to match its position and orientation, ensuring accurate alignment for subsequent analysis. Once registered, image subtraction was performed to highlight deviations between the planned and delivered dose distributions. This process entailed performing a pixel-by-pixel subtraction of the EPID image from the PDIP image, resulting in a different image that highlighted regions of inconsistency. Figure 3 illustrates examples of these subtraction images under various scenarios, including cases with error-free conditions, systematic errors, and random MLC errors. The final data preparation step involved augmenting the data to enhance the training dataset. Data augmentation improved model training efficiency, reduced overfitting, and enhanced prediction performance. Three augmentation techniques, brightness adjustment, contrast adjustment, and Gaussian noise addition, were applied. Brightness adjustment was used to create variations in image brightness by randomly modifying brightness levels within a predefined range. Contrast adjustment introduced sharpness variations by altering each image’s contrast intensity. Gaussian noise was added to simulate random noise, improving the robustness and diversity of the training dataset. After applying data augmentation, the total dataset size increased to 11,424 images, significantly enhancing the volume and diversity of the data for robust model development. 2.5 Model generation This study divided the error detection step into three stages: ( 1 ) error type classification, ( 2 ) systematic error identification, and ( 3 ) random error identification. All training processes utilized an NVIDIA GeForce RTX 3070 GPU, with progress monitored through real-time training progress plots. 2.5.1. Error type classification The dataset was labeled into three categories: error-free, systematic MLC position errors, and random MLC position errors. Subsequently, the dataset was divided, with 90% (3,864 images) placed in a training dataset; these images were equally distributed among the three types (1,288 images each). The remaining 10% were placed in a testing dataset (420 images). A CNN was developed using MATLAB’s deep learning toolbox. The network started with a 256 × 256 × 1 grayscale image, which was processed by a convolutional layer using 3 × 3 filters with 16 filters, followed by batch normalization, The rectified linear activation (ReLU) activation, and max pooling (2 × 2, stride 2 × 2). A similar block with 32 filters was applied next, followed by another with 64 filters. The extracted features were fed into a fully connected layer with six neurons, followed by SoftMax activation and a final classification layer for three classes. The training process utilized the Adam optimizer with an initial learning rate of 0.001, 20 epochs, and a mini-batch size of 32. The data were shuffled at each epoch, and validation was conducted every five epochs using augmented data. 2.5.2. Systematic error identification For training, systematic error subtraction images were created by subtracting the predicted image of the original treatment plan from the measurement image generated under the error plan. Six systematic MLC position error magnitudes (+ 1 mm, + 2 mm, + 5 mm, − 1 mm, − 2 mm, and − 5 mm) were introduced, forming a dataset classified into six error categories. Following augmentation, 8,568 images were used for systematic error identification. The dataset was subsequently split, with 90% (7,711 images) allocated for training and 10% (857 images) reserved for testing. A CNN was developed using MATLAB’s deep learning toolbox. The model began with a 256 × 256 × 1 grayscale image input that was processed through eight convolutional layers—each using a 3 × 3 filter with 64 filters and the same padding—followed by ReLU activation, which helped introduce nonlinearity. Finally, the extracted features passed through a fully connected layer with a single neuron before reaching the regression layer for the final output. Hyperparameter tuning was performed to enhance model performance using the Adam optimizer with an initial learning rate of 0.001, 20 epochs, and a mini-batch size of 32. The data were shuffled at each epoch, and validation was conducted every five epochs using the augmented image. 2.5.3. Random error identification The process was further divided into MLC position identification and magnitude identification. In the position identification step, the subtraction images were normalized by rescaling the pixel values within a range from 0 to 1 and adjusting the background values to approximately 0.5. Cluster labels were generated based on cutoffs defined by adding and subtracting 1.5 from the minimum and maximum pixel values. Cropped 10 × 10-pixel clusters were mapped to MLC leaf positions on the Ethos machine. Cluster centroids determined which leaf contained the error. In the magnitude identification step, a dataset of 795 cropped images was labeled with error magnitudes and split into 90% for training and 10% for testing. The model took a 10 × 10 × 1 grayscale image as input and processed it through five convolutional layers, each using 2 × 2 filters with 64 filters and the same padding, followed by ReLU activations. After extracting the features, the model passed them through a fully connected layer with a single neuron, followed by a regression layer for the final output. The network began with a 10 × 10 grayscale image, which was fed into a series of five convolutional layers. Each convolutional layer employed a 2 × 2 filter with 64 filters and the same padding and was immediately followed by ReLU activation to introduce nonlinearity. The extracted features were then passed to a fully connected layer with a single neuron, which fed into the final regression layer for output. The training was performed using the Adam optimizer with an initial learning rate of 0.001, 20 epochs, and a mini-batch size of 16. 2.6 Model evaluation The model’s performance was evaluated using classification and regression metrics. 2.6.1. Error type classification This study assessed the performance of the classification model using several key metrics defined by Equations ( 1 ) through ( 5 ). Accuracy represents the proportion of correctly identified scenarios (both error and non-error) out of all investigated cases. Sensitivity measures the model’s ability to detect error scenarios, while specificity indicates how effectively the model identifies non-error scenarios. Precision calculates the proportion of correctly predicted positive instances among all instances labeled positive, and the F 1 score is the harmonic mean of precision and sensitivity, providing a balanced measure of the model’s performance. These metrics rely on four fundamental quantities: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). $$\:Accuracy=\frac{TP\:+TN}{(TP\:+TN\:+FP\:+FN)}$$ 1 $$\:\text{S}\text{e}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}=\frac{TP}{TP+FN}$$ 2 $$\:\text{S}\text{p}\text{e}\text{c}\text{i}\text{f}\text{i}\text{c}\text{i}\text{t}\text{y}=\frac{TN}{TN+FP}$$ 3 $$\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}\:=\frac{TP}{TP+FP}$$ 4 $$\:\text{F}1-\text{s}\text{c}\text{o}\text{r}\text{e}\:\:=\frac{2\:\bullet\:Precision\:\bullet\:Sensitivity}{Precision\:+\:Sensitivity}$$ 5 2.6.2 Systematic error identification The model’s performance in predicting continuous values was assessed using the mean absolute error (MAE), the mean squared error (MSE), the root mean squared error (RMSE), and R-squared. The MAE measures the average magnitude of prediction errors, providing a straightforward understanding of accuracy. By squaring the errors, the MSE emphasizes more significant deviations and is sensitive to outliers. The RMSE, as the square root of the MSE, offers an interpretable measure of the error spread in the same units as the target variable. R-squared quantifies the proportion of variance explained by the model, ranging from 0 (no fit) to 1 (perfect fit). Together, these metrics comprehensively evaluate the model’s prediction accuracy and reliability. The formulas are as follows: $$\:MAE\:=\:\frac{1}{n}{\sum\:}_{i=1}^{n}\left|{y}_{i}-{\widehat{y}}_{i}\right|$$ 6 $$\:\:MSE=\:\frac{1}{n}{\sum\:}_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}$$ 7 $$\:RMSE=\:\sqrt{MSE}=\:\sqrt{\frac{1}{n}{\sum\:}_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$$ 8 $$\:R-squared=1-\:\frac{{\sum\:}_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{{\sum\:}_{i=1}^{n}{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}}$$ 9 where \(\:n\) = The number of observations \(\:{y}_{i}\) = The actual value for the 𝑖th observation \(\:{\widehat{y}}_{i}\) = The predicted value for the 𝑖th observation \(\:\stackrel{-}{y}\) = The mean of actual values 2.6.3. Random error identification The performance of the position identification model was evaluated using accuracy, calculated by comparing the predicted MLC leaf positions with ground truth data. Accuracy was determined by dividing the number of correctly identified leaf positions by the total number of predictions across the four possible positions. The performance of the MLC magnitude identification model was assessed using regression metrics, including the MAE, MSE, RMSE, and R-squared, as described in the systematic error identification section. 3. Results 3.1. Error type classification In this study, a classification model for MLC position errors was developed and optimized through hyperparameter tuning. The model achieved optimal performance after 20 training epochs, completing 1,920 iterations (96 iterations per epoch) in 5 minutes and 1 second. Validation was conducted every five epochs, achieving a validation accuracy of 96.0%. The model successfully categorized errors into three types: error-free, systematic MLC position errors, and random MLC position errors. The results from the independent dataset indicate the model’s strong reliability in distinguishing MLC position errors, as shown in Fig. 4 . To assess the model’s performance, key metrics—including accuracy, sensitivity, specificity, precision, and F 1 score—were evaluated and are summarized in Table 1 . The model achieved an overall accuracy of 96.67%, with high classification performance across all error types. For the error-free category, the model exhibited 97.14% accuracy, 99.29% sensitivity, 96.07% specificity, and 92.67% precision and had an F 1 score of 95.86%. The systematic MLC position error classification showed an accuracy of 98.81%, a sensitivity of 98.57%, a specificity of 98.93%, a precision of 97.87%, and an F 1 score of 98.22%. Similarly, the random MLC position error category demonstrated 97.38% accuracy, 92.14% sensitivity, 100.00% specificity, 100.00% precision, and an F 1 score of 95.91%. Table 1 Error type classification performance Error type Performance metrics (%) Accuracy (overall) Accuracy (individual) Sensitivity Specificity Precision F 1 Score Error-free 96.67 97.14 99.29 96.07 92.67 95.86 Systematic MLC position errors 98.81 98.57 98.93 97.87 98.22 Random MLC position errors 97.38 92.14 100.00 100.00 95.91 3.2. Systematic error identification This study optimized the systematic error identification model by adjusting the hyperparameters based on the training RMSE values. The model was trained over 20 epochs, completing 1,920 iterations (96 iterations per epoch) in 216 minutes and 39 seconds. Validation was conducted every five epochs, yielding a validation RMSE of 0.352. The model predicted the magnitude of systematic errors, with performance evaluated using the MAE, MSE, RMSE, and R-squared. The results were 1.082, 1.954, 1.398, and 0.804, respectively. 3.3. Random error identification The MLC position identification model identified MLC positions at which random errors occurred, evaluating performance based on accuracy. Of the 795 samples, 708 predictions matched the ground truth, resulting in an accuracy of 0.89. The MLC magnitude identification model was optimized by tuning the hyperparameters based on the training RMSE values. The model was trained over 40 epochs, completing 1,400 iterations (35 per epoch) in 21 minutes and 3 seconds. Validation was conducted every five epochs, yielding a validation RMSE of 0.995. The model was designed to predict error magnitudes, with performance metrics assessed for regression tasks. The MAE, MSE, RMSE, and R-squared results were 0.818, 1.010, 1.005, and 0.294, respectively, indicating the model’s capability to predict error magnitudes accurately. 4. Discussion This study demonstrates the effectiveness of a deep learning-based PSQA framework for classifying and quantifying systematic and random MLC position errors in IMRT. By leveraging CNNs trained on subtraction image–based data, our model achieved a classification accuracy of 96.67%, surpassing conventional gamma analysis, which provides only a pass/fail assessment and lacks specificity in identifying the root causes of treatment errors. Furthermore, our model successfully quantified systematic error magnitudes, with a strong correlation (R-squared = 0.804) between the predicted and actual values, while random error detection reached 89% accuracy. However, the lower R-squared value (0.294) for random error magnitude estimation suggests that further refinements are necessary to enhance predictive performance for clinical applications. Traditional gamma analysis remains the standard in PSQA, but its reliance on a binary pass/fail criterion limits its ability to distinguish between spatial and dosimetric discrepancies. By blending dose difference (DD) and distance-to-agreement (DTA) into a single score, gamma maps with 2%/2 mm criteria fail to detect clinically significant deviations, particularly systematic MLC positioning errors up to 2 mm ( 15 , 34 ). Our approach, which employs subtraction-based images instead of gamma maps, preserves pixel-wise dose differences, improving model sensitivity to spatial and dosimetric discrepancies. This enhances classification accuracy by providing continuous-valued pixel intensities, enabling CNN models to learn complex spatial patterns and effectively differentiate systematic from random errors. The performance of our model aligns with previous studies demonstrating the superiority of deep learning–based methods in PSQA. For instance, Kimura et al. ( 30 ) reported 98.6% accuracy for systematic errors and 84.7% for random errors, while our model achieved higher specificity (100%) for random errors, confirming its improved classification capability. Similarly, while Sakai et al. ( 23 ) reported 100% sensitivity and 81.80% specificity for MLC errors using radiomics-based learning (MPC position error vs. error-free with the MLC position error model), our model attained 98.57% sensitivity and 98.93% specificity for systematic errors, demonstrating greater classification reliability. Nyflot et al. ( 20 ) trained deep learning models on gamma maps, achieving 77.3% accuracy for binary classification and only 64.3% for multi-class classification—significantly lower than our 96.67% accuracy for three-class classification. This disparity highlights the advantage of subtraction image–based CNN models, which retain spatial information more effectively than gamma map–based training. Additionally, Potter et al. ( 28 ) developed a dual neural network for IMRT QA, achieving 95.3% accuracy for spatial errors. While our classification performance is comparable, our model’s ability to quantify both systematic and random MLC errors provides an additional advantage in treatment verification and real-time clinical decision-making. Finally, findings by Wolfs et al. (2020) demonstrate that CNNs trained on EPID dosimetry data can detect and quantify treatment errors—a capability that aligns closely with our model’s strengths in systematic error quantification. Together, these comparisons illustrate that deep learning models, especially those based on CNNs, provide significant advantages in addressing spatially complex errors beyond the reach of conventional gamma analysis. The classification model effectively distinguished between error-free, systematic, and random MLC errors, demonstrating high sensitivity, specificity, and precision across all categories. Our model attained 99.29% sensitivity for error-free treatment plans, ensuring the accurate detection of plans without MLC errors. For systematic MLC position errors, the model achieved 98.57% sensitivity and 98.93% specificity, confirming its strong classification reliability. Meanwhile, random MLC position errors reached 92.14% sensitivity and 100.00% specificity, demonstrating precise error isolation. These results underscore the robustness of CNN-based PSQA, providing a significant improvement over gamma analysis, which lacks the specificity to differentiate between systematic and random deviations. Furthermore, the systematic error identification model achieved an R-squared value of 0.804, confirming a strong correlation between the predicted and actual error magnitudes, making it a clinically useful tool for refining treatment plan accuracy. Although random error detection reached 89% accuracy, the lower R-squared value (0.294) for estimating the magnitudes of random errors suggests greater variability in random error patterns, making precise quantification more challenging. This finding aligns with Sakai et al. ( 23 ), who observed that radiomics-based learning models struggled with multiple concurrent errors, emphasizing the need for further optimization in spatial error modeling. Despite these challenges, our model demonstrates strong performance in systematic and random error classification, confirming its potential to improve PSQA automation and treatment safety. In the context of random MLC position errors, our method leverages a mapping strategy that capitalizes on the double-layer configuration of the MLC, which inherently provides four potential leaf. This design, while introducing complexity, is addressed by our approach that initially maps all four MLC positions but ultimately refines the process to accurately identify and isolate the specific single leaf exhibiting an error. By focusing on a one-leaf identification strategy, the method reduces ambiguity and enhances the reliability of error detection, ensuring that even subtle random deviations are captured. This targeted approach not only simplifies the interpretation of positional discrepancies but also bolsters the overall sensitivity of the CNN model in detecting clinically significant errors, paving the way for more precise and effective quality assurance in IMRT treatments. Future research should focus on clinical validation using real patient data to ensure generalizability across different treatment centers and equipment configurations. Additionally, integrating additional error sources, such as gantry misalignment, measurement setup errors, and dose calculation inaccuracies, will expand the model’s robustness. Enhancing combination error detection, in which multiple deviations interact simultaneously, will further improve clinical decision-making and treatment plan verification. Furthermore, hybrid AI approaches incorporating radiomics-based feature extraction and 4D dose distribution modeling could enhance real-time error detection sensitivity, enabling automated PSQA workflows. Given its computational efficiency, our method holds strong potential for clinical integration, reducing manual verification time while improving treatment accuracy and patient safety. This study confirmed that deep learning-based PSQA significantly enhances IMRT error classification and quantification, surpassing gamma analysis in sensitivity, specificity, and clinical interpretability. Compared to previous methods, our CNN model achieved higher classification accuracy, improved specificity for systematic and random errors, and precise error magnitude estimation. These findings align with prior research ( 20 , 23 , 28 – 30 ), reinforcing the growing need for AI-driven automation in radiotherapy QA. 5. Conclusion In conclusion, this study demonstrated the potential of a deep learning approach to enhance EPID-based PSQA by accurately classifying and quantifying both systematic and random MLC errors. This capability could significantly improve the efficiency, precision, and reliability of PSQA processes, leading to better treatment planning and execution. By improving detection sensitivity and providing insight into the causes of plan failure, deep learning models could complement or potentially replace gamma-based QA methods, ultimately enhancing the safety and efficacy of radiation therapy. Further research should focus on increasing model adaptability across different treatment protocols, improving the detection of subtle combination errors, and integrating hybrid approaches that combine radiomics, fluence map analysis, and dose distribution data. Declarations Conflicts of interest There is no conflict of interest with regard to this manuscript. Author Contributions Statement C.K. and T.F. conceptualized and designed the study. C.K., N.C., and T.F. provided consultation. S.J. and T.C. conducted data collection. S.J. performed model generation and data analysis. C.K. wrote the main manuscript text. K.N. and S.J. prepared figures and tables. All authors reviewed and approved the final manuscript. Funding This research was funded by the Thailand Institute of Nuclear Technology (Public Organization). Author Contribution C.K. and T.F. conceptualized and designed the study. C.K., N.C., and T.F. provided consultation. S.J. and T.C. conducted data collection. S.J. performed model generation and data analysis. C.K. wrote the main manuscript text. K.N. and S.J. prepared figures and tables. All authors reviewed and approved the final manuscript. Acknowledgement We thank Ioana Gianina Buda, PhD from Scribendi (www.scribendi.com) for editing a draft of this manuscript. References Kouklidis G, Nikolopoulos M, Ahmed O, Eskander B, Masters B. A Retrospective Comparison of Toxicity, Response and Survival of Intensity-Modulated Radiotherapy Versus Three-Dimensional Conformal Radiation Therapy in the Treatment of Rectal Carcinoma. Cureus [Internet]. 2023 Nov 2 [cited 2025 Feb 7];15(11). Available from: https://pubmed.ncbi.nlm.nih.gov/37929269/ Ng SY, Colborn KL, Cambridge L, Hajj C, Yang TJ, Wu AJ et al. Acute toxicity with intensity modulated radiotherapy versus 3-dimensional conformal radiotherapy during preoperative chemoradiation for locally advanced rectal cancer. Radiother Oncol [Internet]. 2016 Nov 1 [cited 2025 Feb 7];121(2):252–7. Available from: https://pubmed.ncbi.nlm.nih.gov/27751605/ Wee CW, Kang HC, Wu HG, Chie EK, Choi N, Park JM et al. Intensity-modulated radiotherapy versus three-dimensional conformal radiotherapy in rectal cancer treated with neoadjuvant concurrent chemoradiation: a meta-analysis and pooled-analysis of acute toxicity. Jpn J Clin Oncol [Internet]. 2018 May 1 [cited 2025 Feb 7];48(5):458–66. Available from: https://pubmed.ncbi.nlm.nih.gov/29554287/ Jabbour SK, Patel S, Herman JM, Wild A, Nagda SN, Altoos T et al. Intensity-Modulated Radiation Therapy for Rectal Carcinoma Can Reduce Treatment Breaks and Emergency Department Visits. Int J Surg Oncol [Internet]. 2012 [cited 2025 Feb 7];2012:891067. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC3425793/ Cho B. Intensity-modulated radiation therapy: a review with a physics perspective. Radiat Oncol J [Internet]. 2018 Mar 1 [cited 2025 Feb 7];36(1):1–10. Available from: http://www.e-roj.org/journal/view.php?doi=10.3857/roj.2018.00122 Webb S. Intensity-modulated radiation therapy. Intensity-Modulated Radiation Therapy [Internet]. 2015 Jan 1 [cited 2025 Feb 7];1–422. Available from: https://www.taylorfrancis.com/books/mono/10.1201/9781420034110/intensity-modulated-radiation-therapy-webb Ezzell GA, Burmeister JW, Dogan N, Losasso TJ, Mechalakos JG, Mihailidis D et al. IMRT commissioning: multiple institution planning and dosimetry comparisons, a report from AAPM Task Group 119. Med Phys [Internet]. 2009 [cited 2025 Feb 7];36(11):5359–73. Available from: https://pubmed.ncbi.nlm.nih.gov/19994544/ Miften M, Olch A, Mihailidis D, Moran J, Pawlicki T, Molineu A et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendations of AAPM Task Group No. 218. Med Phys [Internet]. 2018 Apr 1 [cited 2025 Feb 7];45(4):e53–83. Available from: https://onlinelibrary.wiley.com/doi/full/ 10.1002/mp.12810 Dogan N, Mijnheer BJ, Padgett K, Nalichowski A, Wu C, Nyflot MJ et al. AAPM Task Group Report 307: Use of EPIDs for Patient-Specific IMRT and VMAT QA. Med Phys [Internet]. 2023 Aug 1 [cited 2025 Feb 7];50(8):e865–903. Available from: https://pubmed.ncbi.nlm.nih.gov/37384416/ Low DA, Harms WB, Mutic S, Purdy JA. A technique for the quantitative evaluation of dose distributions. Med Phys. 1998;25(5):656–61. Yu L, Tang TLS, Cassim N, Livingstone A, Cassidy D, Kairn T et al. Analysis of dose comparison techniques for patient-specific quality assurance in radiation therapy. J Appl Clin Med Phys [Internet]. 2019 Nov 1 [cited 2025 Feb 7];20(11):189–98. Available from: https://pubmed.ncbi.nlm.nih.gov/31613053/ Das S, Kharade V, Pandey VP, Kv A, Pasricha RK, Gupta M. Gamma Index Analysis as a Patient-Specific Quality Assurance Tool for High-Precision Radiotherapy: A Clinical Perspective of Single Institute Experience. Cureus [Internet]. 2022 Oct 31 [cited 2025 Feb 7];14(10):e30885. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC9626372/ Nelms BE, Zhen H, Toḿ WA. Per-beam, planar IMRT QA passing rates do not predict clinically relevant patient dose errors. Med Phys [Internet]. 2011 Feb 1 [cited 2025 Feb 7];38(2):1037–44. Available from: https://onlinelibrary.wiley.com/doi/full/10.1118/1.3544657 Stasi M, Bresciani S, Miranti A, Maggio A, Sapino V, Gabriele P. Pretreatment patient-specific IMRT quality assurance: A correlation study between gamma index and patient clinical dose volume histogram. Med Phys [Internet]. 2012 Dec 1 [cited 2025 Feb 7];39(12):7626–34. Available from: https://onlinelibrary.wiley.com/doi/full/ 10.1118/1.4767763 Yan G, Liu C, Simon TA, Peng LC, Fox C, Li JG. On the sensitivity of patient-specific IMRT QA to MLC positioning errors. J Appl Clin Med Phys [Internet]. 2009 [cited 2025 Feb 7];10(1):120. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC5720508/ Nakamura S, Sakai M, Ishizaka N, Mayumi K, Kinoshita T, Akamatsu S et al. Deep learning-based detection and classification of multi-leaf collimator modeling errors in volumetric modulated radiation therapy. J Appl Clin Med Phys [Internet]. 2023 Dec 1 [cited 2025 Feb 7];24(12). Available from: https://pubmed.ncbi.nlm.nih.gov/37633834/ Ma C, Wang R, Zhou S, Wang M, Yue H, Zhang Y, et al. The structural similarity index for IMRT quality assurance: radiomics-based error classification. Med Phys. 2021;48(1):80–93. Kimura Y, Kadoya N, Tomori S, Oku Y, Jingu K. Error detection using a convolutional neural network with dose difference maps in patient-specific quality assurance for volumetric modulated arc therapy. Physica Med. 2020;73:57–64. Li G, Duan L, Xie L, Hu T, Wei W, Bai L et al. Deep learning for patient-specific quality assurance of volumetric modulated arc therapy: Prediction accuracy and cost-sensitive classification performance. Physica Med. 2024;125. Nyflot MJ, Thammasorn P, Wootton LS, Ford EC, Chaovalitwongse WA. Deep learning for patient-specific quality assurance: Identifying errors in radiotherapy delivery by radiomic analysis of gamma images with convolutional neural networks. Med Phys. 2019;46(2):456–64. Ono T, Iramina H, Hirashima H, Adachi T, Nakamura M, Mizowaki T. Applications of artificial intelligence for machine- and patient-specific quality assurance in radiation therapy: Current status and future directions. J Radiat Res. 2024;65(4):421–32. Osman AFI, Maalej NM, Jayesh K. Prediction of the individual multileaf collimator positional deviations during dynamic IMRT delivery priori with artificial neural network. Med Phys. 2020;47(4):1421–30. Sakai M, Nakano H, Kawahara D, Tanabe S, Takizawa T, Narita A, et al. Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy. Med Phys. 2021;48(3):991–1002. Wootton LS, Nyflot MJ, Chaovalitwongse WA, Ford E. Error Detection in Intensity-Modulated Radiation Therapy Quality Assurance Using Radiomic Analysis of Gamma Distributions. Int J Radiat Oncol Biol Phys. 2018;102(1):219–28. Li Z, Liu F, Yang W, Peng S, Zhou J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans Neural Netw Learn Syst. 2022;33(12):6999–7019. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1). Shen C, Nguyen D, Zhou Z, Jiang SB, Dong B, Jia X. An introduction to deep learning in medical physics: Advantages, potential, and challenges. Phys Med Biol. 2020;65(5). Potter NJ, Mund K, Andreozzi JM, Li JG, Liu C, Yan G. Error detection and classification in patient-specific IMRT QA with dual neural networks. Med Phys. 2020;47(10):4711–20. Wolfs CJA, Canters RAM, Verhaegen F. Identification of treatment error types for lung cancer patients using convolutional neural networks and EPID dosimetry. Radiother Oncol. 2020;153:243–9. Kimura Y, Kadoya N, Oku Y, Jingu K. Development of a deep learning-based error detection system without error dose maps in the patient-specific quality assurance of volumetric modulated arc therapy. J Radiat Res. 2023;64(4):728–37. Kimura Y, Kadoya N, Oku Y, Kajikawa T, Tomori S, Jingu K. Error detection model developed using a multi-task convolutional neural network in patient-specific quality assurance for volumetric-modulated arc therapy. Med Phys. 2021;48(9):4769–83. Sheen H, Shin HB, Kim H, Kim C, Kim J, Kim JS et al. Application of error classification model using indices based on dose distribution for characteristics evaluation of multileaf collimator position errors. Sci Rep. 2023;13(1). Wang L, Li J, Zhang S, Zhang X, Zhang Q, Chan MF et al. Multi-task autoencoder based classification-regression model for patient-specific VMAT QA. Phys Med Biol. 2020;65(23). Stedem AK, Tutty M, Chofor N, Langhans M, Kleefeld C, Schönfeld AA. Systematic evaluation of spatial resolution and gamma criteria for quality assurance with detector arrays in stereotactic radiosurgery. J Appl Clin Med Phys [Internet]. 2024 Feb 1 [cited 2025 Feb 21];25(2):e14274. Available from: https://onlinelibrary.wiley.com/doi/full/ 10.1002/acm2.14274 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6231733","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":431925250,"identity":"835372e5-d7b8-450c-8f4f-532f5a4f3c0c","order_by":0,"name":"Chirasak Khamfongkhruea","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7UlEQVRIiWNgGAWjYDACCcYGEGXAwMPDwPABJgoRJEIL4wyY6EG8WiAUWAszDzFa+Gc3t3348cfOmIHn7MHPNjX35BnYDz9g/rgDjyV3DjbP7G1LNmPg7UuWzjlWbNjAk2bAcPAMbi0GEonNDLwNzDYM/DwG0rkNCUCv5QAd1oZfC+OfP/UgLca/LRsS7Bv43xDWwszDdhjosB4zacaGhMQGCQK2SNwAapFtO27MxnPGzLLnWEJym8QzgwNn8Wjhn5H+mPHNn2rDfp4c4xs/ahJs+/mTHz6oxKMFDtiQGQeI0DAKRsEoGAWjAA8AAKhTSx8adwuZAAAAAElFTkSuQmCC","orcid":"","institution":"Princess Srisavangavadhana Faculty of Medicine, Chulabhorn Royal Academy","correspondingAuthor":true,"prefix":"","firstName":"Chirasak","middleName":"","lastName":"Khamfongkhruea","suffix":""},{"id":431925251,"identity":"7059a5bf-7cc9-43fd-9541-c25366977c40","order_by":1,"name":"Sawitri Jitsuk","email":"","orcid":"","institution":"Princess Srisavangavadhana Faculty of Medicine, Chulabhorn Royal Academy","correspondingAuthor":false,"prefix":"","firstName":"Sawitri","middleName":"","lastName":"Jitsuk","suffix":""},{"id":431925253,"identity":"ad3a8699-0cd3-47af-9bd9-91f2fe2062bb","order_by":2,"name":"Kampheang Nimjaroeng","email":"","orcid":"","institution":"Chulabhorn Hospital, Chulabhorn Royal Academy","correspondingAuthor":false,"prefix":"","firstName":"Kampheang","middleName":"","lastName":"Nimjaroeng","suffix":""},{"id":431925255,"identity":"32785e01-dade-47b2-ae41-54fe0c85e1f2","order_by":3,"name":"Thananya Chanpanya","email":"","orcid":"","institution":"Chulabhorn Hospital, Chulabhorn Royal Academy","correspondingAuthor":false,"prefix":"","firstName":"Thananya","middleName":"","lastName":"Chanpanya","suffix":""},{"id":431925257,"identity":"bbd80f23-b34f-47a2-8941-8af3e765f59e","order_by":4,"name":"Todsaporn Fuangrod","email":"","orcid":"","institution":"Princess Srisavangavadhana Faculty of Medicine, Chulabhorn Royal Academy","correspondingAuthor":false,"prefix":"","firstName":"Todsaporn","middleName":"","lastName":"Fuangrod","suffix":""},{"id":431925258,"identity":"a8c8665a-ba9c-427a-a6c3-89863928f768","order_by":5,"name":"Nantanat Chailanggar","email":"","orcid":"","institution":"Thailand Institute of Nuclear Technology (Public Organization)","correspondingAuthor":false,"prefix":"","firstName":"Nantanat","middleName":"","lastName":"Chailanggar","suffix":""}],"badges":[],"createdAt":"2025-03-15 09:08:06","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6231733/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6231733/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":79641219,"identity":"e660e1bc-3fed-454f-9deb-214989bfcf93","added_by":"auto","created_at":"2025-04-01 06:07:00","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":135157,"visible":true,"origin":"","legend":"\u003cp\u003eWorkflow for classifying and quantifying MLC positioning errors in IMRT plans. The process includes plan selection, error-based plan generation, EPID-based data acquisition, image preprocessing, deep learning model development, and performance evaluation using classification and regression metrics.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6231733/v1/9d7e54d8d87dba9abc554976.jpg"},{"id":79639388,"identity":"0da6f731-4c58-41c9-ad9c-a23682438f43","added_by":"auto","created_at":"2025-04-01 05:43:00","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":60457,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the investigated scenarios: (a) error-free, (b) systematic MLC position errors, and (c) random MLC position errors.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6231733/v1/7e546bc2a92bfdb84fcfe2bb.jpg"},{"id":79639389,"identity":"b7c5e6a0-1160-42f4-87b5-7ce44da491fa","added_by":"auto","created_at":"2025-04-01 05:43:00","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":60808,"visible":true,"origin":"","legend":"\u003cp\u003eSubtraction images comparing EPID-acquired and PDIP-predicted images: (a) systematic MLC errors, (b) random MLC errors, and (c) error-free conditions.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6231733/v1/c5be239e671432af12f193df.jpg"},{"id":79639394,"identity":"ac489c9d-17e9-467d-99c7-3a527f812b14","added_by":"auto","created_at":"2025-04-01 05:43:00","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":42517,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix illustrating the comparison between deep learning–based error classification (predicted labels) and true labels in MLC position error classification, presented as relative values (percentages).\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6231733/v1/3c1488db41ebbe234ec41282.jpg"},{"id":81300718,"identity":"6db50c23-86f9-48a8-bbb7-75bf699dcdc6","added_by":"auto","created_at":"2025-04-24 13:53:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1046720,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6231733/v1/3526c2b2-195e-49e8-b3f7-a4f224344786.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Deep Learning-Based Multileaf Collimator Error Classification and Quantification in Patient- Specific Intensity Modulated Radiation Therapy Quality Assurance","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eAdvanced radiation therapy techniques, particularly intensity-modulated radiation therapy (IMRT), play a crucial role in rectal cancer treatment by delivering radiation in a highly precise manner, minimizing damage to surrounding organs and significantly reducing acute bowel side effects (\u003cspan additionalcitationids=\"CR2 CR3\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). This precision is achieved through the dynamic modulation of multileaf collimators (MLCs), sophisticated mechanical components that shape radiation beams to match tumor volumes (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). Accurate delivery of these complex treatment plans is critical to ensuring that the planned dose distribution is faithfully reproduced in the patient.\u003c/p\u003e \u003cp\u003ePatient-specific quality assurance (PSQA) is essential for maintaining treatment accuracy and safety (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). PSQA typically involves measurement-based verification (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e), with the gamma passing rate (GPR) serving as a commonly used evaluation metric. The GPR compares predicted and measured dose distributions based on the dose difference (DD) and distance-to-agreement (DTA) criteria (\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e). Task Group 218 recommends a 95% GPR threshold using 3%/2 mm criteria (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). However, while GPR effectively identifies large deviations, it has notable limitations in detecting subtle but clinically significant errors, particularly those related to MLC positioning (\u003cspan additionalcitationids=\"CR14 CR15 CR16\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e). Such errors can alter dose delivery in critical regions, potentially compromising treatment outcomes.\u003c/p\u003e \u003cp\u003eImproving the detection of MLC-related errors has thus become a priority for enhancing the reliability and effectiveness of PSQA. Recent advancements in AI have shown significant potential to address this challenge (\u003cspan additionalcitationids=\"CR19 CR20 CR21 CR22\" citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e). AI models provide a more detailed, quantitative analysis of treatment errors, complementing or surpassing traditional gamma analysis. For example, Wootton et al. (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e) investigated the application of logistic regression models to detect systematic and random MLC positioning errors by analyzing radiomic features extracted from gamma maps generated using an electronic portal imaging device (EPID). Their model classified plans into three categories\u0026mdash;error-free, systematic MLC errors (uniform 2 mm shift), and random MLC errors (0\u0026ndash;2 mm variability)\u0026mdash;achieving classification accuracies of 0.72 and 0.76 for systematic and random errors, respectively, surpassing conventional gamma analysis with an area under the curve (AUC) of 0.74.\u003c/p\u003e \u003cp\u003eBuilding on this, Nyflot et al. (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e) explored multiple machine learning models, including support vector machines (SVMs), artificial neural networks (ANNs), decision trees, and k-nearest neighbors (KNNs). By integrating radiomic features from gamma maps with convolutional neural network (CNN)\u0026ndash;derived features, the authors aimed to detect patient-specific errors in IMRT QA measurements. Among the models tested, the SVM demonstrated the highest classification accuracy, achieving a success rate of 0.64 across the three error categories.\u003c/p\u003e \u003cp\u003eFurther advancing this field, Kimura et al. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e) employed a deep learning-based approach, specifically a CNN, to predict MLC positional errors using features extracted from dose difference maps and PSQA measurements obtained from 3D detectors. Their deep learning model demonstrated an overall accuracy of 94.40%, effectively classifying treatment plans into error-free, systematic error, and random error categories.\u003c/p\u003e \u003cp\u003eThese studies highlight the potential of deep learning, particularly CNN-based models (\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e), to advance PSQA beyond traditional gamma analysis. While existing research demonstrates the effectiveness of CNNs in error detection, a critical gap remains: Most studies do not quantify error magnitudes (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan additionalcitationids=\"CR29 CR30 CR31 CR32\" citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e), which are essential for clinical decision-making. Both minor and significant MLC deviations can critically impact dose distribution, especially in high-gradient regions. By estimating error type and magnitude, deep learning models can provide actionable insights, improving the precision and reliability of IMRT QA and enhancing treatment safety and effectiveness.\u003c/p\u003e \u003cp\u003eTherefore, this study aims to build upon these advancements by developing a deep learning-based tool to classify and quantify MLC errors in PSQA for rectal cancer IMRT. We hypothesize that this approach will provide more accurate and clinically meaningful error detection than traditional methods, thereby improving treatment verification and patient safety. Unlike previous studies, which primarily focus on classification, this study also quantifies error magnitudes, providing actionable clinical insights for real-time IMRT QA.\u003c/p\u003e"},{"header":"2. Materials and Methods","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis study employed a deep learning-based approach to classify and quantify MLC positioning errors in IMRT plans. The methodology consisted of multiple stages, including plan selection, plan generation, data acquisition, data preparation, and model generation and evaluation, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Plan selection\u003c/h2\u003e \u003cp\u003eIn this study, we selected 30 intensity-modulated radiation therapy (IMRT) plans comprising 357 fields for rectal cancer patients treated at Chulabhorn Hospital between January and November 2022. Each treatment plan was designed to deliver a total dose of 50 Gy in 25 fractions (2 Gy per fraction) using the Ethos Treatment Planning System (TPS) (Varian Medical Systems, Palo Alto, CA, USA) with the Acuros XB algorithm.\u003c/p\u003e \u003cp\u003eThe IMRT technique, utilizing 12 beams with 6 MV flattening filter-free (6FFF) photon energy, was employed. The advanced dual-layer MLC of the Ethos system played a crucial role in precisely shaping the radiation beams, with the added benefit of reduced interleaf leakage. For plan selection, PSQA was performed on all plans, confirming a gamma passing rate (3%/2 mm) exceeding 95%. This rigorous validation ensured that the clinical plans were accurate and free from significant errors, establishing a reliable baseline for further investigation into the impact of MLC positioning errors on treatment delivery. This retrospective study complied with the Declaration of Helsinki and received approval from the Chulabhorn Royal Academy Institutional Review Board (IRB) under approval number EC039/2566. Given the retrospective nature of the study and the use of de-identified patient treatment plan data, the IRB waived the requirement for informed consent, as no direct patient interaction or intervention was involved. Clinical trial number: not applicable.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Investigated scenarios with and without errors\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTo investigate error detection, three conditions were evaluated: (a) error-free, (b) systematic MLC position errors, and (c) random MLC position errors, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The original rectal IMRT treatment plans, which exhibited a gamma passing rate of more than 95% (3%/2 mm criteria), served as the error-free group. Systematic and random MLC position errors were introduced by directly altering the treatment plan\u0026rsquo;s DICOM RT Plan files (specifically targeting MLC positions) using MATLAB 2023a (MathWorks, Inc., Natick, MA, USA). Uniform shifts were applied to all MLC leaves across every control point to induce systematic MLC position errors. Six error magnitudes were introduced: \u0026plusmn;1.0 mm, \u0026plusmn;\u0026thinsp;2.0 mm, and \u0026plusmn;\u0026thinsp;5.0 mm.\u003c/p\u003e \u003cp\u003eMATLAB functions were developed to randomize MLC leaf positions (distal and proximal banks) and error magnitudes for random errors. Error magnitudes were categorized as relevant (exceeding 2 mm, up to 5 mm) or nonrelevant (within 2 mm). Each plan incorporated 10 random position errors\u0026mdash;five per bank\u0026mdash;with two nonrelevant and three relevant errors per bank. A total of 2,856 raw data fields were generated from the initial 357 fields by introducing systematic errors (six magnitudes), random errors, and error-free conditions, enabling a comprehensive investigation into error detection and its impact on treatment accuracy.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 \u003cem\u003eData acquisition\u003c/em\u003e\u003c/h2\u003e \u003cp\u003ePSQA was conducted using an EPID-based dosimetry method for all plans. These plans were delivered on an Ethos linear accelerator (Halcyon version 3.1, Varian Medical Systems, Palo Alto, CA), a state-of-the-art system designed for adaptive radiotherapy. During beam delivery, integrated EPID images were acquired for each treatment field using the aSi-1200 EPID, a high-resolution amorphous silicon detector with an active area of 40 \u0026times; 30 cm\u0026sup2;, a resolution of 1024 \u0026times; 768 pixels, and a pixel size of 0.39 mm.\u003c/p\u003e \u003cp\u003eA two-dimensional (2D) dose distribution at a source-to-imager distance (SID) of 154 cm was calculated using the portal dose image prediction (PDIP) algorithm integrated within the Eclipse treatment planning system (Version 13.6, Varian Medical Systems, Palo Alto, CA). The PDIP algorithm utilizes a sophisticated model to predict the expected dose distribution at the EPID plane, accounting for beam energy, field size, and patient-specific anatomy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Data preparation\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe data preparation process consisted of sequential steps to ensure consistency and accuracy in the PSQA analysis. The first step involved resetting the collimator angle to 0\u0026deg;. Because treatment plans often use various collimator angles to optimize beam delivery, these rotations can cause misalignment of the fluence map on EPID images. To address this issue, the images were rotated back to their original orientations, and missing regions were filled by extending pixel values from the edges of the image to maintain uniformity. Next, image resizing was performed to correct differences in resolution between the EPID (1,180 \u0026times; 1,180 pixels) and PDIP (512 \u0026times; 512 pixels) images. Both image types were resized to a standardized dimension of 256 \u0026times; 256 pixels, ensuring a consistent pixel spacing of 1 mm \u0026times; 1 mm, which is necessary for reliable spatial comparison of dose distributions.\u003c/p\u003e \u003cp\u003eAfter resizing, image registration was applied to minimize positional discrepancies between the EPID and PDIP images. Rigid registration, beginning with edge detection, was used to identify key structural boundaries. The PDIP image was treated as the fixed reference, while the EPID image was adjusted to match its position and orientation, ensuring accurate alignment for subsequent analysis.\u003c/p\u003e \u003cp\u003eOnce registered, image subtraction was performed to highlight deviations between the planned and delivered dose distributions. This process entailed performing a pixel-by-pixel subtraction of the EPID image from the PDIP image, resulting in a different image that highlighted regions of inconsistency. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e illustrates examples of these subtraction images under various scenarios, including cases with error-free conditions, systematic errors, and random MLC errors.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe final data preparation step involved augmenting the data to enhance the training dataset. Data augmentation improved model training efficiency, reduced overfitting, and enhanced prediction performance. Three augmentation techniques, brightness adjustment, contrast adjustment, and Gaussian noise addition, were applied. Brightness adjustment was used to create variations in image brightness by randomly modifying brightness levels within a predefined range. Contrast adjustment introduced sharpness variations by altering each image\u0026rsquo;s contrast intensity. Gaussian noise was added to simulate random noise, improving the robustness and diversity of the training dataset. After applying data augmentation, the total dataset size increased to 11,424 images, significantly enhancing the volume and diversity of the data for robust model development.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Model generation\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis study divided the error detection step into three stages: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) error type classification, (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) systematic error identification, and (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) random error identification. All training processes utilized an NVIDIA GeForce RTX 3070 GPU, with progress monitored through real-time training progress plots.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e2.5.1. Error type classification\u003c/h2\u003e \u003cp\u003e\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eThe dataset was labeled into three categories: error-free, systematic MLC position errors, and random MLC position errors. Subsequently, the dataset was divided, with 90% (3,864 images) placed in a training dataset; these images were equally distributed among the three types (1,288 images each). The remaining 10% were placed in a testing dataset (420 images). A CNN was developed using MATLAB\u0026rsquo;s deep learning toolbox. The network started with a 256 \u0026times; 256 \u0026times; 1 grayscale image, which was processed by a convolutional layer using 3 \u0026times; 3 filters with 16 filters, followed by batch normalization, The rectified linear activation (ReLU) activation, and max pooling (2 \u0026times; 2, stride 2 \u0026times; 2). A similar block with 32 filters was applied next, followed by another with 64 filters. The extracted features were fed into a fully connected layer with six neurons, followed by SoftMax activation and a final classification layer for three classes. The training process utilized the Adam optimizer with an initial learning rate of 0.001, 20 epochs, and a mini-batch size of 32. The data were shuffled at each epoch, and validation was conducted every five epochs using augmented data.\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003ch2\u003e2.5.2. Systematic error identification\u003c/h2\u003e \u003cp\u003e\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eFor training, systematic error subtraction images were created by subtracting the predicted image of the original treatment plan from the measurement image generated under the error plan. Six systematic MLC position error magnitudes (+\u0026thinsp;1 mm, +\u0026thinsp;2 mm, +\u0026thinsp;5 mm, \u0026minus;\u0026thinsp;1 mm, \u0026minus;\u0026thinsp;2 mm, and \u0026minus;\u0026thinsp;5 mm) were introduced, forming a dataset classified into six error categories. Following augmentation, 8,568 images were used for systematic error identification. The dataset was subsequently split, with 90% (7,711 images) allocated for training and 10% (857 images) reserved for testing.\u003c/p\u003e\u003cp\u003eA CNN was developed using MATLAB\u0026rsquo;s deep learning toolbox. The model began with a 256 \u0026times; 256 \u0026times; 1 grayscale image input that was processed through eight convolutional layers\u0026mdash;each using a 3 \u0026times; 3 filter with 64 filters and the same padding\u0026mdash;followed by ReLU activation, which helped introduce nonlinearity. Finally, the extracted features passed through a fully connected layer with a single neuron before reaching the regression layer for the final output. Hyperparameter tuning was performed to enhance model performance using the Adam optimizer with an initial learning rate of 0.001, 20 epochs, and a mini-batch size of 32. The data were shuffled at each epoch, and validation was conducted every five epochs using the augmented image.\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e2.5.3. Random error identification\u003c/h2\u003e \u003cp\u003e\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eThe process was further divided into MLC position identification and magnitude identification. In the position identification step, the subtraction images were normalized by rescaling the pixel values within a range from 0 to 1 and adjusting the background values to approximately 0.5. Cluster labels were generated based on cutoffs defined by adding and subtracting 1.5 from the minimum and maximum pixel values. Cropped 10 \u0026times; 10-pixel clusters were mapped to MLC leaf positions on the Ethos machine. Cluster centroids determined which leaf contained the error. In the magnitude identification step, a dataset of 795 cropped images was labeled with error magnitudes and split into 90% for training and 10% for testing. The model took a 10 \u0026times; 10 \u0026times; 1 grayscale image as input and processed it through five convolutional layers, each using 2 \u0026times; 2 filters with 64 filters and the same padding, followed by ReLU activations.\u003c/p\u003e\u003cp\u003eAfter extracting the features, the model passed them through a fully connected layer with a single neuron, followed by a regression layer for the final output. The network began with a 10 \u0026times; 10 grayscale image, which was fed into a series of five convolutional layers. Each convolutional layer employed a 2 \u0026times; 2 filter with 64 filters and the same padding and was immediately followed by ReLU activation to introduce nonlinearity. The extracted features were then passed to a fully connected layer with a single neuron, which fed into the final regression layer for output. The training was performed using the Adam optimizer with an initial learning rate of 0.001, 20 epochs, and a mini-batch size of 16.\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.6 Model evaluation\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe model\u0026rsquo;s performance was evaluated using classification and regression metrics.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003e2.6.1. Error type classification\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis study assessed the performance of the classification model using several key metrics defined by Equations (\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) through (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e). Accuracy represents the proportion of correctly identified scenarios (both error and non-error) out of all investigated cases. Sensitivity measures the model\u0026rsquo;s ability to detect error scenarios, while specificity indicates how effectively the model identifies non-error scenarios. Precision calculates the proportion of correctly predicted positive instances among all instances labeled positive, and the F\u003csub\u003e1\u003c/sub\u003e score is the harmonic mean of precision and sensitivity, providing a balanced measure of the model\u0026rsquo;s performance. These metrics rely on four fundamental quantities: true positive (TP), true negative (TN), false positive (FP), and false negative (FN).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Equ1\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:Accuracy=\\frac{TP\\:+TN}{(TP\\:+TN\\:+FP\\:+FN)}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ2\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\text{S}\\text{e}\\text{n}\\text{s}\\text{i}\\text{t}\\text{i}\\text{v}\\text{i}\\text{t}\\text{y}=\\frac{TP}{TP+FN}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ3\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:\\text{S}\\text{p}\\text{e}\\text{c}\\text{i}\\text{f}\\text{i}\\text{c}\\text{i}\\text{t}\\text{y}=\\frac{TN}{TN+FP}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ4\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:\\text{P}\\text{r}\\text{e}\\text{c}\\text{i}\\text{s}\\text{i}\\text{o}\\text{n}\\:=\\frac{TP}{TP+FP}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e \u003cdiv id=\"Equ5\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:\\text{F}1-\\text{s}\\text{c}\\text{o}\\text{r}\\text{e}\\:\\:=\\frac{2\\:\\bullet\\:Precision\\:\\bullet\\:Sensitivity}{Precision\\:+\\:Sensitivity}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e \u003ch2\u003e2.6.2 Systematic error identification\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe model\u0026rsquo;s performance in predicting continuous values was assessed using the mean absolute error (MAE), the mean squared error (MSE), the root mean squared error (RMSE), and R-squared. The MAE measures the average magnitude of prediction errors, providing a straightforward understanding of accuracy. By squaring the errors, the MSE emphasizes more significant deviations and is sensitive to outliers. The RMSE, as the square root of the MSE, offers an interpretable measure of the error spread in the same units as the target variable. R-squared quantifies the proportion of variance explained by the model, ranging from 0 (no fit) to 1 (perfect fit). Together, these metrics comprehensively evaluate the model\u0026rsquo;s prediction accuracy and reliability.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe formulas are as follows:\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:MAE\\:=\\:\\frac{1}{n}{\\sum\\:}_{i=1}^{n}\\left|{y}_{i}-{\\widehat{y}}_{i}\\right|$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:\\:MSE=\\:\\frac{1}{n}{\\sum\\:}_{i=1}^{n}{\\left({y}_{i}-{\\widehat{y}}_{i}\\right)}^{2}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:RMSE=\\:\\sqrt{MSE}=\\:\\sqrt{\\frac{1}{n}{\\sum\\:}_{i=1}^{n}{\\left({y}_{i}-{\\widehat{y}}_{i}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ9\" name=\"EquationSource\"\u003e\n$$\\:R-squared=1-\\:\\frac{{\\sum\\:}_{i=1}^{n}{\\left({y}_{i}-{\\widehat{y}}_{i}\\right)}^{2}}{{\\sum\\:}_{i=1}^{n}{\\left({y}_{i}-\\stackrel{-}{y}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e \u003c/span\u003e = The number of observations\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{i}\\)\u003c/span\u003e \u003c/span\u003e = The actual value for the \u0026#119894;th observation\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{y}}_{i}\\)\u003c/span\u003e \u003c/span\u003e = The predicted value for the \u0026#119894;th observation\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\stackrel{-}{y}\\)\u003c/span\u003e \u003c/span\u003e = The mean of actual values\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e2.6.3. Random error identification\u003c/h2\u003e \u003cp\u003eThe performance of the position identification model was evaluated using accuracy, calculated by comparing the predicted MLC leaf positions with ground truth data. Accuracy was determined by dividing the number of correctly identified leaf positions by the total number of predictions across the four possible positions. The performance of the MLC magnitude identification model was assessed using regression metrics, including the MAE, MSE, RMSE, and R-squared, as described in the systematic error identification section.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Error type classification\u003c/h2\u003e \u003cp\u003eIn this study, a classification model for MLC position errors was developed and optimized through hyperparameter tuning. The model achieved optimal performance after 20 training epochs, completing 1,920 iterations (96 iterations per epoch) in 5 minutes and 1 second. Validation was conducted every five epochs, achieving a validation accuracy of 96.0%. The model successfully categorized errors into three types: error-free, systematic MLC position errors, and random MLC position errors. The results from the independent dataset indicate the model\u0026rsquo;s strong reliability in distinguishing MLC position errors, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo assess the model\u0026rsquo;s performance, key metrics\u0026mdash;including accuracy, sensitivity, specificity, precision, and F\u003csub\u003e1\u003c/sub\u003e score\u0026mdash;were evaluated and are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The model achieved an overall accuracy of 96.67%, with high classification performance across all error types. For the error-free category, the model exhibited 97.14% accuracy, 99.29% sensitivity, 96.07% specificity, and 92.67% precision and had an F\u003csub\u003e1\u003c/sub\u003e score of 95.86%. The systematic MLC position error classification showed an accuracy of 98.81%, a sensitivity of 98.57%, a specificity of 98.93%, a precision of 97.87%, and an F\u003csub\u003e1\u003c/sub\u003e score of 98.22%. Similarly, the random MLC position error category demonstrated 97.38% accuracy, 92.14% sensitivity, 100.00% specificity, 100.00% precision, and an F\u003csub\u003e1\u003c/sub\u003e score of 95.91%.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eError type classification performance\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eError type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colspan=\"5\" nameend=\"c7\" namest=\"c3\"\u003e \u003cp\u003ePerformance metrics (%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy (overall)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy (individual)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF\u003csub\u003e1\u003c/sub\u003e Score\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eError-free\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e96.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e97.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e99.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e96.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e92.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e95.86\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSystematic MLC position errors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e98.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e98.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e97.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e98.22\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom MLC position errors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e97.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e92.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e100.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e95.91\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Systematic error identification\u003c/h2\u003e \u003cp\u003eThis study optimized the systematic error identification model by adjusting the hyperparameters based on the training RMSE values. The model was trained over 20 epochs, completing 1,920 iterations (96 iterations per epoch) in 216 minutes and 39 seconds. Validation was conducted every five epochs, yielding a validation RMSE of 0.352. The model predicted the magnitude of systematic errors, with performance evaluated using the MAE, MSE, RMSE, and R-squared. The results were 1.082, 1.954, 1.398, and 0.804, respectively.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Random error identification\u003c/h2\u003e \u003cp\u003eThe MLC position identification model identified MLC positions at which random errors occurred, evaluating performance based on accuracy. Of the 795 samples, 708 predictions matched the ground truth, resulting in an accuracy of 0.89.\u003c/p\u003e \u003cp\u003eThe MLC magnitude identification model was optimized by tuning the hyperparameters based on the training RMSE values. The model was trained over 40 epochs, completing 1,400 iterations (35 per epoch) in 21 minutes and 3 seconds. Validation was conducted every five epochs, yielding a validation RMSE of 0.995. The model was designed to predict error magnitudes, with performance metrics assessed for regression tasks. The MAE, MSE, RMSE, and R-squared results were 0.818, 1.010, 1.005, and 0.294, respectively, indicating the model\u0026rsquo;s capability to predict error magnitudes accurately.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThis study demonstrates the effectiveness of a deep learning-based PSQA framework for classifying and quantifying systematic and random MLC position errors in IMRT. By leveraging CNNs trained on subtraction image\u0026ndash;based data, our model achieved a classification accuracy of 96.67%, surpassing conventional gamma analysis, which provides only a pass/fail assessment and lacks specificity in identifying the root causes of treatment errors. Furthermore, our model successfully quantified systematic error magnitudes, with a strong correlation (R-squared\u0026thinsp;=\u0026thinsp;0.804) between the predicted and actual values, while random error detection reached 89% accuracy. However, the lower R-squared value (0.294) for random error magnitude estimation suggests that further refinements are necessary to enhance predictive performance for clinical applications.\u003c/p\u003e \u003cp\u003eTraditional gamma analysis remains the standard in PSQA, but its reliance on a binary pass/fail criterion limits its ability to distinguish between spatial and dosimetric discrepancies. By blending dose difference (DD) and distance-to-agreement (DTA) into a single score, gamma maps with 2%/2 mm criteria fail to detect clinically significant deviations, particularly systematic MLC positioning errors up to 2 mm (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e). Our approach, which employs subtraction-based images instead of gamma maps, preserves pixel-wise dose differences, improving model sensitivity to spatial and dosimetric discrepancies. This enhances classification accuracy by providing continuous-valued pixel intensities, enabling CNN models to learn complex spatial patterns and effectively differentiate systematic from random errors.\u003c/p\u003e \u003cp\u003eThe performance of our model aligns with previous studies demonstrating the superiority of deep learning\u0026ndash;based methods in PSQA. For instance, Kimura et al. (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e) reported 98.6% accuracy for systematic errors and 84.7% for random errors, while our model achieved higher specificity (100%) for random errors, confirming its improved classification capability. Similarly, while Sakai et al. (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e) reported 100% sensitivity and 81.80% specificity for MLC errors using radiomics-based learning (MPC position error vs. error-free with the MLC position error model), our model attained 98.57% sensitivity and 98.93% specificity for systematic errors, demonstrating greater classification reliability.\u003c/p\u003e \u003cp\u003eNyflot et al. (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e) trained deep learning models on gamma maps, achieving 77.3% accuracy for binary classification and only 64.3% for multi-class classification\u0026mdash;significantly lower than our 96.67% accuracy for three-class classification. This disparity highlights the advantage of subtraction image\u0026ndash;based CNN models, which retain spatial information more effectively than gamma map\u0026ndash;based training. Additionally, Potter et al. (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e) developed a dual neural network for IMRT QA, achieving 95.3% accuracy for spatial errors. While our classification performance is comparable, our model\u0026rsquo;s ability to quantify both systematic and random MLC errors provides an additional advantage in treatment verification and real-time clinical decision-making. Finally, findings by Wolfs et al. (2020) demonstrate that CNNs trained on EPID dosimetry data can detect and quantify treatment errors\u0026mdash;a capability that aligns closely with our model\u0026rsquo;s strengths in systematic error quantification. Together, these comparisons illustrate that deep learning models, especially those based on CNNs, provide significant advantages in addressing spatially complex errors beyond the reach of conventional gamma analysis.\u003c/p\u003e \u003cp\u003eThe classification model effectively distinguished between error-free, systematic, and random MLC errors, demonstrating high sensitivity, specificity, and precision across all categories. Our model attained 99.29% sensitivity for error-free treatment plans, ensuring the accurate detection of plans without MLC errors. For systematic MLC position errors, the model achieved 98.57% sensitivity and 98.93% specificity, confirming its strong classification reliability. Meanwhile, random MLC position errors reached 92.14% sensitivity and 100.00% specificity, demonstrating precise error isolation. These results underscore the robustness of CNN-based PSQA, providing a significant improvement over gamma analysis, which lacks the specificity to differentiate between systematic and random deviations. Furthermore, the systematic error identification model achieved an R-squared value of 0.804, confirming a strong correlation between the predicted and actual error magnitudes, making it a clinically useful tool for refining treatment plan accuracy.\u003c/p\u003e \u003cp\u003eAlthough random error detection reached 89% accuracy, the lower R-squared value (0.294) for estimating the magnitudes of random errors suggests greater variability in random error patterns, making precise quantification more challenging. This finding aligns with Sakai et al. (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e), who observed that radiomics-based learning models struggled with multiple concurrent errors, emphasizing the need for further optimization in spatial error modeling. Despite these challenges, our model demonstrates strong performance in systematic and random error classification, confirming its potential to improve PSQA automation and treatment safety. In the context of random MLC position errors, our method leverages a mapping strategy that capitalizes on the double-layer configuration of the MLC, which inherently provides four potential leaf. This design, while introducing complexity, is addressed by our approach that initially maps all four MLC positions but ultimately refines the process to accurately identify and isolate the specific single leaf exhibiting an error. By focusing on a one-leaf identification strategy, the method reduces ambiguity and enhances the reliability of error detection, ensuring that even subtle random deviations are captured. This targeted approach not only simplifies the interpretation of positional discrepancies but also bolsters the overall sensitivity of the CNN model in detecting clinically significant errors, paving the way for more precise and effective quality assurance in IMRT treatments.\u003c/p\u003e \u003cp\u003eFuture research should focus on clinical validation using real patient data to ensure generalizability across different treatment centers and equipment configurations. Additionally, integrating additional error sources, such as gantry misalignment, measurement setup errors, and dose calculation inaccuracies, will expand the model\u0026rsquo;s robustness. Enhancing combination error detection, in which multiple deviations interact simultaneously, will further improve clinical decision-making and treatment plan verification. Furthermore, hybrid AI approaches incorporating radiomics-based feature extraction and 4D dose distribution modeling could enhance real-time error detection sensitivity, enabling automated PSQA workflows. Given its computational efficiency, our method holds strong potential for clinical integration, reducing manual verification time while improving treatment accuracy and patient safety.\u003c/p\u003e \u003cp\u003eThis study confirmed that deep learning-based PSQA significantly enhances IMRT error classification and quantification, surpassing gamma analysis in sensitivity, specificity, and clinical interpretability. Compared to previous methods, our CNN model achieved higher classification accuracy, improved specificity for systematic and random errors, and precise error magnitude estimation. These findings align with prior research (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan additionalcitationids=\"CR29\" citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e), reinforcing the growing need for AI-driven automation in radiotherapy QA.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn conclusion, this study demonstrated the potential of a deep learning approach to enhance EPID-based PSQA by accurately classifying and quantifying both systematic and random MLC errors. This capability could significantly improve the efficiency, precision, and reliability of PSQA processes, leading to better treatment planning and execution. By improving detection sensitivity and providing insight into the causes of plan failure, deep learning models could complement or potentially replace gamma-based QA methods, ultimately enhancing the safety and efficacy of radiation therapy. Further research should focus on increasing model adaptability across different treatment protocols, improving the detection of subtle combination errors, and integrating hybrid approaches that combine radiomics, fluence map analysis, and dose distribution data.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eConflicts of interest\u003c/h2\u003e \u003cp\u003eThere is no conflict of interest with regard to this manuscript.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eAuthor Contributions Statement\u003c/h2\u003e \u003cp\u003eC.K. and T.F. conceptualized and designed the study. C.K., N.C., and T.F. provided consultation. S.J. and T.C. conducted data collection. S.J. performed model generation and data analysis. C.K. wrote the main manuscript text. K.N. and S.J. prepared figures and tables. All authors reviewed and approved the final manuscript.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis research was funded by the Thailand Institute of Nuclear Technology (Public Organization).\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eC.K. and T.F. conceptualized and designed the study. C.K., N.C., and T.F. provided consultation. S.J. and T.C. conducted data collection. S.J. performed model generation and data analysis. C.K. wrote the main manuscript text. K.N. and S.J. prepared figures and tables. All authors reviewed and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eWe thank Ioana Gianina Buda, PhD from Scribendi (www.scribendi.com) for editing a draft of this manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eKouklidis G, Nikolopoulos M, Ahmed O, Eskander B, Masters B. A Retrospective Comparison of Toxicity, Response and Survival of Intensity-Modulated Radiotherapy Versus Three-Dimensional Conformal Radiation Therapy in the Treatment of Rectal Carcinoma. Cureus [Internet]. 2023 Nov 2 [cited 2025 Feb 7];15(11). Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/37929269/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/37929269/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNg SY, Colborn KL, Cambridge L, Hajj C, Yang TJ, Wu AJ et al. Acute toxicity with intensity modulated radiotherapy versus 3-dimensional conformal radiotherapy during preoperative chemoradiation for locally advanced rectal cancer. Radiother Oncol [Internet]. 2016 Nov 1 [cited 2025 Feb 7];121(2):252\u0026ndash;7. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/27751605/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/27751605/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWee CW, Kang HC, Wu HG, Chie EK, Choi N, Park JM et al. Intensity-modulated radiotherapy versus three-dimensional conformal radiotherapy in rectal cancer treated with neoadjuvant concurrent chemoradiation: a meta-analysis and pooled-analysis of acute toxicity. Jpn J Clin Oncol [Internet]. 2018 May 1 [cited 2025 Feb 7];48(5):458\u0026ndash;66. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/29554287/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/29554287/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJabbour SK, Patel S, Herman JM, Wild A, Nagda SN, Altoos T et al. Intensity-Modulated Radiation Therapy for Rectal Carcinoma Can Reduce Treatment Breaks and Emergency Department Visits. Int J Surg Oncol [Internet]. 2012 [cited 2025 Feb 7];2012:891067. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pmc.ncbi.nlm.nih.gov/articles/PMC3425793/\u003c/span\u003e\u003cspan address=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC3425793/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCho B. Intensity-modulated radiation therapy: a review with a physics perspective. Radiat Oncol J [Internet]. 2018 Mar 1 [cited 2025 Feb 7];36(1):1\u0026ndash;10. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.e-roj.org/journal/view.php?doi=10.3857/roj.2018.00122\u003c/span\u003e\u003cspan address=\"http://www.e-roj.org/journal/view.php?doi=10.3857/roj.2018.00122\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWebb S. Intensity-modulated radiation therapy. Intensity-Modulated Radiation Therapy [Internet]. 2015 Jan 1 [cited 2025 Feb 7];1\u0026ndash;422. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.taylorfrancis.com/books/mono/10.1201/9781420034110/intensity-modulated-radiation-therapy-webb\u003c/span\u003e\u003cspan address=\"https://www.taylorfrancis.com/books/mono/10.1201/9781420034110/intensity-modulated-radiation-therapy-webb\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEzzell GA, Burmeister JW, Dogan N, Losasso TJ, Mechalakos JG, Mihailidis D et al. IMRT commissioning: multiple institution planning and dosimetry comparisons, a report from AAPM Task Group 119. Med Phys [Internet]. 2009 [cited 2025 Feb 7];36(11):5359\u0026ndash;73. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/19994544/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/19994544/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiften M, Olch A, Mihailidis D, Moran J, Pawlicki T, Molineu A et al. Tolerance limits and methodologies for IMRT measurement-based verification QA: Recommendations of AAPM Task Group No. 218. Med Phys [Internet]. 2018 Apr 1 [cited 2025 Feb 7];45(4):e53\u0026ndash;83. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://onlinelibrary.wiley.com/doi/full/\u003c/span\u003e\u003cspan address=\"https://onlinelibrary.wiley.com/doi/full/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/mp.12810\u003c/span\u003e\u003cspan address=\"10.1002/mp.12810\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDogan N, Mijnheer BJ, Padgett K, Nalichowski A, Wu C, Nyflot MJ et al. AAPM Task Group Report 307: Use of EPIDs for Patient-Specific IMRT and VMAT QA. Med Phys [Internet]. 2023 Aug 1 [cited 2025 Feb 7];50(8):e865\u0026ndash;903. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/37384416/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/37384416/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLow DA, Harms WB, Mutic S, Purdy JA. A technique for the quantitative evaluation of dose distributions. Med Phys. 1998;25(5):656\u0026ndash;61.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu L, Tang TLS, Cassim N, Livingstone A, Cassidy D, Kairn T et al. Analysis of dose comparison techniques for patient-specific quality assurance in radiation therapy. J Appl Clin Med Phys [Internet]. 2019 Nov 1 [cited 2025 Feb 7];20(11):189\u0026ndash;98. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/31613053/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/31613053/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDas S, Kharade V, Pandey VP, Kv A, Pasricha RK, Gupta M. Gamma Index Analysis as a Patient-Specific Quality Assurance Tool for High-Precision Radiotherapy: A Clinical Perspective of Single Institute Experience. Cureus [Internet]. 2022 Oct 31 [cited 2025 Feb 7];14(10):e30885. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pmc.ncbi.nlm.nih.gov/articles/PMC9626372/\u003c/span\u003e\u003cspan address=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC9626372/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNelms BE, Zhen H, Toḿ WA. Per-beam, planar IMRT QA passing rates do not predict clinically relevant patient dose errors. Med Phys [Internet]. 2011 Feb 1 [cited 2025 Feb 7];38(2):1037\u0026ndash;44. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://onlinelibrary.wiley.com/doi/full/10.1118/1.3544657\u003c/span\u003e\u003cspan address=\"https://onlinelibrary.wiley.com/doi/full/10.1118/1.3544657\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStasi M, Bresciani S, Miranti A, Maggio A, Sapino V, Gabriele P. Pretreatment patient-specific IMRT quality assurance: A correlation study between gamma index and patient clinical dose volume histogram. Med Phys [Internet]. 2012 Dec 1 [cited 2025 Feb 7];39(12):7626\u0026ndash;34. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://onlinelibrary.wiley.com/doi/full/\u003c/span\u003e\u003cspan address=\"https://onlinelibrary.wiley.com/doi/full/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1118/1.4767763\u003c/span\u003e\u003cspan address=\"10.1118/1.4767763\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan G, Liu C, Simon TA, Peng LC, Fox C, Li JG. On the sensitivity of patient-specific IMRT QA to MLC positioning errors. J Appl Clin Med Phys [Internet]. 2009 [cited 2025 Feb 7];10(1):120. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pmc.ncbi.nlm.nih.gov/articles/PMC5720508/\u003c/span\u003e\u003cspan address=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC5720508/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNakamura S, Sakai M, Ishizaka N, Mayumi K, Kinoshita T, Akamatsu S et al. Deep learning-based detection and classification of multi-leaf collimator modeling errors in volumetric modulated radiation therapy. J Appl Clin Med Phys [Internet]. 2023 Dec 1 [cited 2025 Feb 7];24(12). Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/37633834/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/37633834/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa C, Wang R, Zhou S, Wang M, Yue H, Zhang Y, et al. The structural similarity index for IMRT quality assurance: radiomics-based error classification. Med Phys. 2021;48(1):80\u0026ndash;93.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKimura Y, Kadoya N, Tomori S, Oku Y, Jingu K. Error detection using a convolutional neural network with dose difference maps in patient-specific quality assurance for volumetric modulated arc therapy. Physica Med. 2020;73:57\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi G, Duan L, Xie L, Hu T, Wei W, Bai L et al. Deep learning for patient-specific quality assurance of volumetric modulated arc therapy: Prediction accuracy and cost-sensitive classification performance. Physica Med. 2024;125.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNyflot MJ, Thammasorn P, Wootton LS, Ford EC, Chaovalitwongse WA. Deep learning for patient-specific quality assurance: Identifying errors in radiotherapy delivery by radiomic analysis of gamma images with convolutional neural networks. Med Phys. 2019;46(2):456\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOno T, Iramina H, Hirashima H, Adachi T, Nakamura M, Mizowaki T. Applications of artificial intelligence for machine- and patient-specific quality assurance in radiation therapy: Current status and future directions. J Radiat Res. 2024;65(4):421\u0026ndash;32.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOsman AFI, Maalej NM, Jayesh K. Prediction of the individual multileaf collimator positional deviations during dynamic IMRT delivery priori with artificial neural network. Med Phys. 2020;47(4):1421\u0026ndash;30.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSakai M, Nakano H, Kawahara D, Tanabe S, Takizawa T, Narita A, et al. Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy. Med Phys. 2021;48(3):991\u0026ndash;1002.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWootton LS, Nyflot MJ, Chaovalitwongse WA, Ford E. Error Detection in Intensity-Modulated Radiation Therapy Quality Assurance Using Radiomic Analysis of Gamma Distributions. Int J Radiat Oncol Biol Phys. 2018;102(1):219\u0026ndash;28.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi Z, Liu F, Yang W, Peng S, Zhou J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans Neural Netw Learn Syst. 2022;33(12):6999\u0026ndash;7019.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen C, Nguyen D, Zhou Z, Jiang SB, Dong B, Jia X. An introduction to deep learning in medical physics: Advantages, potential, and challenges. Phys Med Biol. 2020;65(5).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePotter NJ, Mund K, Andreozzi JM, Li JG, Liu C, Yan G. Error detection and classification in patient-specific IMRT QA with dual neural networks. Med Phys. 2020;47(10):4711\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWolfs CJA, Canters RAM, Verhaegen F. Identification of treatment error types for lung cancer patients using convolutional neural networks and EPID dosimetry. Radiother Oncol. 2020;153:243\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKimura Y, Kadoya N, Oku Y, Jingu K. Development of a deep learning-based error detection system without error dose maps in the patient-specific quality assurance of volumetric modulated arc therapy. J Radiat Res. 2023;64(4):728\u0026ndash;37.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKimura Y, Kadoya N, Oku Y, Kajikawa T, Tomori S, Jingu K. Error detection model developed using a multi-task convolutional neural network in patient-specific quality assurance for volumetric-modulated arc therapy. Med Phys. 2021;48(9):4769\u0026ndash;83.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSheen H, Shin HB, Kim H, Kim C, Kim J, Kim JS et al. Application of error classification model using indices based on dose distribution for characteristics evaluation of multileaf collimator position errors. Sci Rep. 2023;13(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang L, Li J, Zhang S, Zhang X, Zhang Q, Chan MF et al. Multi-task autoencoder based classification-regression model for patient-specific VMAT QA. Phys Med Biol. 2020;65(23).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStedem AK, Tutty M, Chofor N, Langhans M, Kleefeld C, Sch\u0026ouml;nfeld AA. Systematic evaluation of spatial resolution and gamma criteria for quality assurance with detector arrays in stereotactic radiosurgery. J Appl Clin Med Phys [Internet]. 2024 Feb 1 [cited 2025 Feb 21];25(2):e14274. Available from: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://onlinelibrary.wiley.com/doi/full/\u003c/span\u003e\u003cspan address=\"https://onlinelibrary.wiley.com/doi/full/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/acm2.14274\u003c/span\u003e\u003cspan address=\"10.1002/acm2.14274\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Patient-specific quality assurance, error detection, MLC positional errors, deep learning, electronic portal imaging device","lastPublishedDoi":"10.21203/rs.3.rs-6231733/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6231733/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003ePurpose\u003c/h2\u003e \u003cp\u003eThis study presents a deep learning\u0026ndash;based patient-specific quality assurance (PSQA) framework for rectal cancer intensity-modulated radiation therapy (IMRT) designed to classify and quantify multileaf collimator (MLC) position errors.\u003c/p\u003e\u003ch2\u003eMaterials and Methods\u003c/h2\u003e \u003cp\u003eThirty rectal IMRT treatment plans were analyzed, and both systematic and random MLC errors were deliberately introduced by modifying the digital imaging and communications in medicine - radiation therapy plan files. The framework utilizes convolutional neural networks (CNNs) trained on subtraction images generated from electronic portal imaging device\u0026ndash;acquired and portal dose image prediction\u0026ndash;predicted dose distributions. One CNN was developed to categorize plans based on the associated errors into three groups: error-free, systematic errors, and random errors. In parallel, regression-based CNN models were created to estimate the magnitude of the detected errors.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe classification network achieved an overall accuracy of 96.67%, with excellent sensitivity and specificity across all categories. For systematic error estimation, the regression model produced a mean absolute error of 1.082 and a strong R-squared of 0.804, indicating precise quantification capability. In contrast, the random error model reached an accuracy of 89.00% but had a lower R-squared of 0.294, highlighting an area for future improvement.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eThese findings suggest that deep learning models can offer more detailed and quantitative insights into treatment errors compared to traditional gamma analysis, ultimately enhancing PSQA processes and contributing to improved treatment verification and patient safety.\u003c/p\u003e","manuscriptTitle":"Deep Learning-Based Multileaf Collimator Error Classification and Quantification in Patient- Specific Intensity Modulated Radiation Therapy Quality Assurance","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-01 05:42:56","doi":"10.21203/rs.3.rs-6231733/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e8b97812-55d3-448a-90e0-9f100ae52248","owner":[],"postedDate":"April 1st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-04-24T13:53:13+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-01 05:42:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6231733","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6231733","identity":"rs-6231733","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.