Research on lung nodule detection in X-ray plain films based on improved YOLOv12 algorithm | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Research on lung nodule detection in X-ray plain films based on improved YOLOv12 algorithm Minghui Mao, Chengkun Hong, Yuhang Zhang, Hao Huang, Jianfeng Chu, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8670280/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 07 Apr, 2026 Read the published version in Scientific Reports → Version 1 posted 13 You are reading this latest preprint version Abstract To investigate the feasibility of automatic lung nodule detection using chest X-rays, this study proposes an improved YOLOv12 algorithm based on space-to-depth convolution (SPDConv), a dynamic upsampling module (DySample), and a one-shot aggregation cross stage partial network with ghost convolution (VoVGSCSP). The original YOLOv12 algorithm was optimized by replacing specific convolutional layers in the Backbone and Neck with SPDConv, substituting the Upsample modules in the Neck with upgraded DySample modules, and replacing the C3k2 and A2C2f modules in the Neck with VoVGSCSP to construct the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm. The optimized algorithm was trained and validated using a public chest X-ray lung nodule dataset available on the Roboflow platform, and its performance was compared with that of the original YOLOv12 algorithm. Results indicate that the improved algorithm achieved a mean average precision at an intersection over union threshold of 0.5 (mAP50) of 0.735 and a mAP50-95 of 0.426 in detecting lung nodules on chest X-rays. These results outperformed the original YOLOv12 algorithm, which achieved a mAP50 of 0.704 and a mAP50-95 of 0.411. In conclusion, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm demonstrates superior overall performance in detecting lung nodules on chest X-rays, significantly surpassing the original YOLOv12 algorithm. Biological sciences/Computational biology and bioinformatics Physical sciences/Engineering Physical sciences/Mathematics and computing YOLOv12 SPDConv DySample VoVGSCSP Chest X-ray Lung nodules Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction Lung cancer is a prevalent malignancy worldwide, characterized by a high mortality rate and an incidence that increases progressively with age 1 , 2 . As a critical indicator of early-stage lung cancer, the early diagnosis of lung nodules can significantly improve patient survival rates 3 . Traditionally, lung nodule detection relies on radiologists visually inspecting chest X-ray images 4 – 6 . However, with the increasing demand for medical imaging examinations, the workload on radiologists has escalated correspondingly 7 – 9 . Furthermore, lung nodules in medical images typically present with small volumes and highly diverse morphologies. These factors complicate detection, thereby increasing the risk of misdiagnosis or missed diagnosis 10 , 11 . In recent years, deep learning methods have developed rapidly and have been widely applied in the medical field 12 – 17 . The application of computer algorithms for the automatic detection of lung nodules in chest X-rays is of great significance. Numerous deep learning algorithms have achieved remarkable progress in the field of lung nodule detection using chest X-rays 18 . For instance, in 2023, Horry et al. 19 proposed a lung nodule detection algorithm based on deep convolutional neural networks (DCNN). Taking chest X-rays as input, this algorithm optimized nodule localization through an end-to-end network architecture, achieving state-of-the-art performance. It significantly improved accuracy and reliability, demonstrating an external generalization accuracy of 89% and validating its efficacy in medical image analysis for lung disease detection. In the same year, Lim et al. 20 introduced a deep-learning-based automatic detection algorithm (DLAD) for lung nodule detection and mass volume estimation. By utilizing convolutional neural networks (CNN) to generate localization maps and derived parameters (such as regional area and mean probability), they established a volume prediction algorithm combined with univariate or multivariate regression analysis. On a dataset of 147 X-rays and 208 nodules, they achieved precise detection and quantitative volume estimation. The multivariate regression model achieved a root mean square error (RMSE) of 7975.6 mm³, verifying its precision in analyzing lung nodules on chest X-rays. In the field of object detection, two-stage algorithms based on region proposals, such as the faster region-based convolutional neural network (Faster R-CNN), and one-stage end-to-end algorithms, such as the you only look once (YOLO) series, constitute the current mainstream algorithmic frameworks. The core difference between these two categories lies in the design of the detection process. Two-stage algorithms first generate potential target regions via a region proposal network (RPN) and then refine classification and localization on candidate boxes. Conversely, one-stage algorithms complete joint regression for target localization and classification directly on feature maps, achieving a highly integrated detection process. This architectural distinction grants the former an advantage in detection accuracy, while the latter excels in real-time performance. Together, they support object detection requirements across various research fields. Although two-stage algorithms used in current research demonstrate high detection accuracy, they suffer from drawbacks such as complex algorithmic structures, high training costs, and slow detection speeds. Additionally, they often require large amounts of data 21 – 23 . In this study, the one-stage algorithm employed offers fast detection speeds but is slightly inferior in accuracy. Nevertheless, by optimizing the network structure and training strategies of the YOLO algorithm, it is possible to achieve high detection accuracy while meeting various application requirements. In 2022, Chiu et al. 24 proposed a lung nodule detection algorithm based on you only look once version 4 (YOLOv4). Through multiple preprocessing methods and ensemble model optimization, they achieved advanced performance in lung nodule detection on chest X-ray datasets, significantly improving sensitivity and reliability. The algorithm achieved a sensitivity of 84.4% on the external JSRT dataset. However, the relatively older YOLOv4 version used may lack optimization for complex scenarios and show poor adaptability to new data types. The YOLOv12 algorithm utilized in this study, a newer iteration in the YOLO series, demonstrates superior performance in terms of both precision and efficiency in object detection. Nonetheless, in the detection of lung nodules on chest X-rays, several factors may adversely affect the performance of YOLOv12. These include the high morphological diversity of nodules, low contrast with surrounding normal lung tissue, blood vessels, and ribs, the masking of small nodules by noise and artifacts, and image quality variations caused by different patient positioning and exposure parameters. To address these issues, this study constructs the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm by replacing specific convolutional layers in the Backbone and Neck with SPDConv, substituting Upsample modules in the Neck with upgraded DySample modules, and replacing C3k2 and A2C2f in the Neck with VoVGSCSP. The improved algorithm enhances the performance of lung nodule detection on chest X-rays to a certain extent. It can more acutely identify tiny and complex-shaped nodules, reducing the occurrence of misdiagnosis and missed diagnosis, thereby playing a significant role in promoting early clinical screening and diagnosis of lung nodules. Original YOLOv12 Algorithm As a recent iteration of the YOLO series, the YOLOv12 algorithm retains the end-to-end direct prediction paradigm. It achieves simultaneous object classification and bounding box localization for all targets within an image through a single forward pass 25 – 27 . As illustrated in Fig. 1 , the overall architecture of the algorithm adheres to the classic three-stage framework, comprising a Backbone, a Neck, and a Head 28 . Specifically, the Backbone is utilized for feature extraction 29 . By stacking standard convolutions and pooling layers for progressive downsampling, it generates feature maps rich in semantic information. The Neck serves as a hub for feature fusion 30 , producing composite representations with enhanced discriminative power. The Head is responsible for precisely outputting target class probabilities and bounding box coordinates. The YOLOv12 algorithm inherits the C3k2 module from YOLOv11 31 , which facilitates dynamic structural switching via a Boolean parameter, C3k. When C3k is set to False, the C3k2 module functions as a C2f structure (a standard bottleneck layer). Conversely, when C3k is set to True, a more complex C3 module structure is activated to enhance feature fusion capabilities. However, YOLOv12 primarily relies on standard convolutions for feature extraction. This architectural choice limits the receptive field, resulting in insufficient capture of long-range contextual information and restricted detection performance for small targets. In addition, the original upsampling operations in the Neck of YOLOv12 rely on fixed interpolation rules. These methods struggle to dynamically optimize based on actual image content, often leading to blurred edge information or the loss of fine-grained features in feature maps. Furthermore, while the C3k2 and A2C2f modules in the Neck improve accuracy, they introduce high computational and memory overheads, thereby raising the hardware requirements for deployment. Consequently, there remains room for improvement in the comprehensive performance of the YOLOv12 algorithm. Algorithm Improvements SPDConv Module The core technology of the SPDConv module is space-to-depth reorganization 32 . It is an innovative module that rearranges the spatial and channel dimension structures of feature maps to achieve downsampling while avoiding information loss 33 . First, it rearranges and compresses the input feature map in the spatial dimension, efficiently aggregating and transferring local spatial feature information, originally distributed across height and width dimensions, to the channel dimension. This method not only avoids the spatial information compression and detail loss caused by standard convolution downsampling operations but also further filters redundant background noise and enhances the saliency of target-related features through the fusion and interaction of multi-dimensional features between channels. Ultimately, by fully preserving the rich semantic and detailed information of the original feature map, the module significantly improves the algorithm's ability to recognize and perceive small-scale targets, weak boundary features, and blurry visual patterns. Thus, it effectively addresses the limitations of the YOLOv12 algorithm in small object detection tasks. As shown in Fig. 2, the SPDConv module implements a novel feature reconstruction mechanism by combining spatial dimension compression with channel dimension expansion. First, the input feature map X 1 is divided into multiple regions, and sub-features are subsequently concatenated along the channel axis to generate X 2 . This space-to-channel transformation mechanism not only reduces the resolution of the feature map but also significantly increases the number of channels. Consequently, it effectively maintains and strengthens the integrity of key spatial positions and structural information in the original data while reducing the computational burden. In addition, through dimensional restructuring, this mechanism adapts to suppress background noise irrelevant to the target, thereby improving the signal-to-noise ratio for subsequent feature learning. Subsequently, to enhance feature expression performance, the SPDConv module employs a non-strided convolution (with a stride of 1) to perform feature extraction on the reorganized features. This approach prevents the loss of spatial information often caused by strided convolutions. Using C₂ filters, the deep features following spatial reorganization can be further refined. Ultimately, a resultant feature map of size \(\:(\frac{L}{2},\frac{W}{2},{\text{C}}_{2})\) is generated, which effectively resolves the difficulty in distinguishing between small targets and background features. Compared with standard convolutions, the SPDConv module reduces the number of redundant channels and optimizes the feature expression structure while avoiding spatial information loss associated with strided convolutions. Furthermore, the convolutional layers within the module possess highly learnable parameter sets capable of automatically adjusting and optimizing feature representations based on training data. This enables more precise capture of fine structures, edge information, and local patterns within the image. These characteristics allow SPDConv to maintain high detection accuracy and robustness even when facing low-resolution inputs, dense small targets, or heavily occluded scenes, thereby significantly improving the overall performance of the algorithm in small object detection tasks. To ensure stable convergence of the SPDConv module during training and to synergistically optimize classification precision and localization accuracy in the detection task, the module utilizes a multi-objective loss mechanism that integrates classification loss and bounding box regression loss. The joint loss function is defined as follows: $$\:\begin{array}{c}L={L}_{cls}+\lambda\:{L}_{bbox}\#(1)\end{array}$$ Where \(\:{L}_{cls}\) represents the classification loss; \(\:{L}_{bbox}\) represents the bounding box regression loss; and \(\:\lambda\:\) denotes the balancing coefficient. In summary, the design of the SPDConv module focuses on addressing critical issues common in lung nodule detection on chest X-rays, such as small target lesions, background contrast interference, and vague feature expression. By optimizing the organization of spatial information, enhancing the preservation of feature integrity, and precisely capturing the detailed features of tiny targets, this module achieves a significant improvement in feature representation capability, thereby effectively increasing the accuracy of the lung nodule detection task on chest X-rays. DySample Module DySample is a highly efficient dynamic upsampling method whose core innovation lies in the adoption of a dynamic point sampling mechanism to perform the upsampling process 34 – 36 . Due to its simple structure and clear logic, this method facilitates easy deployment within the PyTorch framework. As shown in Fig. 3 , this method operates based on the principle of sampling-based dynamic upsampling, intuitively demonstrating its working mechanism. DySample generates sampling points using both static and dynamic scope factors. The feature map X is processed through a linear layer to generate a feature map of the corresponding size. Subsequently, the pixel shuffle technique is combined with the scope factor to generate the offsets O. Finally, these offsets are added to the original grid positions G to obtain the sampling set S, as illustrated in Fig. 4 . Traditional upsampling processes lack content-adaptive capabilities 37 . Consequently, when processing low-contrast features in chest X-rays, they are prone to causing boundary blurring and the loss of detailed information. To address this issue, DySample employs a point resampling strategy. In the detection of morphologically diverse lung nodule features on chest X-rays, this module can adaptively upscale the input feature maps while preserving sufficient detailed information. Simultaneously, DySample reduces computational burden and latency by avoiding time-consuming dynamic convolutions and the need for additional sub-networks to generate dynamic kernels. By incorporating DySample, the algorithm is expected to capture the detailed boundary features of lung nodules more precisely. VoVGSCSP Module Unlike the traditional C3k2 and A2C2f modules, the VoVGSCSP module integrates the gradient optimization capability of the cross stage partial (CSP) network, the one-shot aggregation (OSA) feature fusion mechanism of the VoVNet backbone, and the lightweight characteristics of Ghost Convolution. It constructs a three-level feature enhancement architecture that balances performance and efficiency, with a particular focus on lightweight design 38 – 40 . First, this module utilizes 1×1 convolutions to preserve the spatial details of the original feature map. Simultaneously, by stacking GS Bottlenecks, it enhances the fusion capability of multi-scale features. This effectively alleviates the issue of information attenuation regarding tiny lung nodules in chest X-rays during propagation through deep networks (as shown in Fig. 5 ). To further enhance the lightweight nature of the algorithm, the module introduces Ghost Convolution to replace standard convolutions. It generates redundant features with lower parameter counts and computational overheads, thereby significantly compressing the model size and inference costs while maintaining representational capability. Improved YOLOv12-SPDConv-Dysample-VoVGSCSP Algorithm As shown in Fig. 6, the improved algorithm is an efficient optimization scheme proposed to address the shortcomings of the YOLOv12 algorithm, such as weak small object detection capability, the tendency to lose edge information and fine-grained features in feature maps, as well as high computational memory consumption and high hardware deployment thresholds. It combines the comprehensive advantages of the SPDConv, DySample, and VoVGSCSP modules. First, replacing specific standard convolutions (Conv) in the Backbone and Neck with SPDConv is of significant value for enhancing the small object detection capability of the YOLOv12 algorithm. Notably, since the weights of the first standard Conv in the Backbone have a critical impact on algorithm convergence, and its uniform weight distribution is more conducive to the stable propagation of gradients, this study retained the first standard Conv of the Backbone during the improvement process. Second, all Upsample modules in the Neck were replaced with upgraded DySample modules. This implements a dynamic, content-aware upsampling process capable of generating clearer feature maps rich in semantic information, which is particularly beneficial for edge and detail recovery. Through content-aware sampling points, multi-scale features are fused more intelligently, thereby enhancing the feature extraction capability of the YOLOv12 algorithm for lung nodules in chest X-rays. Finally, the C3k2 and A2C2f modules in the Neck were replaced with lightweight VoVGSCSP modules. This optimized the channel compression ratio of feature maps and the lightweight design of activation functions. It avoided the issues of feature redundancy and surging memory usage caused by the complex stacking of traditional C3k2 and A2C2f modules. This approach not only maintained the capture precision of fine-grained lung nodule features but also made the overall algorithm more lightweight, effectively reducing video memory requirements and computational barriers during hardware deployment, thereby further balancing detection performance and operational efficiency. The comprehensive improvements of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm not only functionally strengthened the specific weaknesses of YOLOv12 but also constructed a jointly optimized feature extraction system through the complementary design between modules. This provides a more reliable algorithmic foundation for high-precision object detection in complex scenarios, making it more suitable for the study of lung nodule detection in chest X-rays compared to the original YOLOv12 algorithm. Figure 6. Structure of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm. Note: Input represents the data input stage; Conv refers to the convolutional module; SPDConv denotes the Space-to-Depth Convolution module; C3k2 is an improved module designed for feature extraction; SPDConv, DySample, and VoVGSCSP correspond to the improved modules utilized in this study as described above; C3k is a module for optimizing feature extraction; A2 represents the area attention mechanism; A2C2f is a C2f module enhanced by the area attention mechanism; Concat denotes the concatenation operation; Upsample refers to the upsampling operation; Split denotes the splitting operation; Scaling refers to the adjustment of channel dimensions; ABlock is a composite module consisting of Conv and A2; DWConv stands for the depthwise separable convolution module; Conv2d represents the 2D convolution module; BboxLoss denotes the bounding box loss; CLSLoss represents the classification loss; and Detect refers to the object detection process. Results and Analysis Computing Environment All algorithm experiments in this study were conducted on the Ubuntu 20.04 operating system. The experimental environment was established using Python 3.10. The YOLOv12 algorithm architecture was implemented using the PyTorch 2.2.2 deep learning framework, with CUDA 11.8 employed to enable GPU acceleration for model training. During the training process, stochastic gradient descent (SGD) was selected as the optimizer. The batch size was set to 16, the initial learning rate was set to 0.01, and the training spanned a total of 100 epochs. For the object detection task, an intersection over union (IoU) threshold of 0.5 was utilized to determine the matching between predicted boxes and ground truth boxes. Datasets In this study, chest X-ray images containing lung nodules were sourced from two public detection datasets available on the Roboflow platform. The first dataset, designated as Dataset 1, contains a total of 3,600 chest X-ray images of lung nodules. It was utilized for algorithm training and performance evaluation using an internal test set. The second dataset, named Dataset 2, comprises 1,000 chest X-ray images of lung nodules. This dataset was employed for external testing to evaluate the performance of both the original and improved algorithms. All image data underwent meticulous annotation by two senior radiologists using the professional annotation software LabelImg. Subsequently, the annotations were reviewed and verified by medical experts to ensure the precision and reliability of lung nodule labeling on the chest X-rays. Furthermore, to examine the robustness and generalization capability of the algorithm, a five-fold cross-validation method was adopted for training. Specifically, 3,000 images were randomly selected from Dataset 1 and divided into training and validation sets at a ratio of 8:2 across five groups with different data distributions. The remaining 600 images served as the internal test set to assess algorithm performance. Subsequently, Dataset 2 was utilized for external validation. Throughout the data partitioning process, special attention was paid to maintaining the distribution ratio of images across different categories in the training and validation sets consistent with that of the original dataset. Meanwhile, a manual screening step was established to strictly select chest X-ray images of lung nodules that met the study's established image quality standards, ensuring they could effectively support the subsequent algorithm training and performance evaluation tasks. Evaluation Metrics To comprehensively evaluate the overall performance of different object detection algorithms, this study established a multi-angle evaluation metric system. This system incorporates key metrics such as precision (Pre), recall (Rec), Intersection over union (IoU), mean Average precision (mAP), F1 score, parameters (Params), floating point operations (FLOPs), and frames per second (FPS). Regarding the evaluation of mAP, this study adopted two different IoU judgment standards. The first employs a fixed IoU threshold of 0.5, denoted as mAP50. The second introduces a dynamic IoU threshold mechanism, where the threshold is incrementally adjusted from 0.5 to 0.95 with a step size of 0.05, denoted as mAP50-95. This allows for a more comprehensive assessment of algorithm performance under various levels of matching strictness. Regarding the matching criteria between predicted boxes and ground truth boxes, this study followed the classic matching strategy based on IoU thresholds. Specifically, the system calculates the IoU value between each predicted box and the corresponding ground truth box and compares this value with a preset IoU threshold. If the IoU value reaches or exceeds the set threshold, the detection is classified as correct; otherwise, it is considered a false detection. The aforementioned evaluation system not only enables accurate and reliable quantitative analysis of the algorithm's basic detection performance but also facilitates a deep exploration of the algorithm's robustness and adaptability under different matching conditions through the use of multi-level IoU thresholds. Furthermore, Params reflects the total number of trainable parameters in the network, serving as a crucial metric for measuring algorithm complexity and video memory requirements. FLOPs is used to assess the computational overhead of the algorithm during inference, while FPS intuitively reflects the real-time inference speed of the algorithm in practical applications. The calculation formulas for precision and recall are as follows: $$\:\begin{array}{c}\text{P}\text{r}\text{e}\text{}\text{=}\frac{TP}{TP+FP}\#(2)\end{array}$$ $$\:\begin{array}{c}\text{R}\text{e}\text{c}\text{}\text{=}\frac{\text{TP}}{\text{TP+FN}}\#\left(3\right)\end{array}$$ Where TP (true positives) represents the number of actual positive instances that are correctly predicted as positive by the algorithm; FP (false positives) denotes the number of actual negative instances that are incorrectly predicted as positive by the algorithm; and FN (false negatives) represents the number of actual positive instances that are incorrectly predicted as negative by the algorithm. The calculation formula for IoU is as follows: $$\:\begin{array}{c}IoU=\frac{{S}_{\text{G}\text{T}}\cap\:{S}_{\text{p}\text{r}\text{e}\text{d}}}{{S}_{\text{G}\text{T}}\cup\:{S}_{\text{p}\text{r}\text{e}\text{d}}}\#\left(4\right)\end{array}$$ Where S GT represents the area occupied by the ground truth bounding box of the target object; and S pred denotes the area covered by the predicted bounding box obtained by the algorithm. The calculation formula for mAP is as follows: $$\:\begin{array}{c}\text{m}\text{A}\text{P}\text{}\text{=}\frac{\sum\:_{c}\:{S}_{PR}}{\left|C\right|}\#\left(5\right)\end{array}$$ Where C represents the set of all target categories; | C | denotes the total number of categories in set C , representing the cardinality of the set; ∑ c represents the summation over all target categories c ; and S PR denotes the area under the precision-recall (PR) curve. The calculation formula for the F1 score is as follows: $$\:\begin{array}{c}{F}_{1}=2\times\:\frac{Pre\:\text{x}\:Rec}{Pre+Rec}\#\left(6\right)\end{array}$$ Where Pre denotes precision; and Rec denotes recall. Ablation Studies To accurately evaluate the specific contributions of each improvement method to the overall performance gain of the algorithm, this study designed and implemented a set of ablation experiments. Analysis of the results in Table 1 reveals that the original YOLOv12 algorithm did not reach optimal levels across various evaluation metrics. Specifically: (1) after introducing the SPDConv module alone, the precision of the algorithm increased from 0.689 to 0.706, recall increased from 0.635 to 0.652, and mAP50, mAP50-95, and F1 score increased from 0.704, 0.411, and 0.661 to 0.724, 0.413, and 0.678, respectively; (2) after introducing VoVGSCSP alone, the precision improved to 0.724, recall was 0.644, and mAP50, mAP50-95, and F1 score improved to 0.730, 0.421, and 0.682, respectively; and (3) after introducing DySample alone, the precision increased to 0.691, recall increased to 0.656, and mAP50, mAP50-95, and F1 score improved to 0.726, 0.421, and 0.673, respectively. These results indicate that all three improvement modules can enhance the performance of the YOLOv12 algorithm in detecting lung nodules on chest X-rays to varying degrees across multiple dimensions. When the VoVGSCSP module was added on top of the SPDConv module, the precision increased to 0.709, recall increased to 0.653, and mAP50, mAP50-95, and F1 score increased to 0.727, 0.417, and 0.680, respectively, achieving a further breakthrough in recall improvement compared to introducing SPDConv or VoVGSCSP alone. When the DySample module was added on top of the SPDConv module, the increase in precision was the most significant among all improvement schemes, reaching 0.729; however, the improvement in recall was the least significant, at 0.642. The mAP50, mAP50-95, and F1 score increased to 0.726, 0.417, and 0.683, respectively. When the DySample module was added on top of the VoVGSCSP module, the precision increased to 0.717, which represents a moderate level among the pairwise combinations; the recall was 0.649, showing no obvious improvement; whereas the mAP50, mAP50-95, and F1 score increased significantly to 0.734, 0.424, and 0.681, respectively. However, after simultaneously introducing the SPDConv, DySample, and VoVGSCSP modules, the algorithm performed best. The Precision increased from 0.689 to 0.718; the recall increased substantially from 0.635 to 0.655, second only to the standalone DySample scheme; and the mAP50, mAP50-95, and F1 score increased significantly to 0.735, 0.426, and 0.685, respectively. A comparison of the parameter counts and computational efficiency results of different algorithms in the table shows that while the comprehensive performance of the algorithm improved significantly with the three modules, Params and FLOPs decreased, and FPS also saw an increase. In terms of overall performance, the algorithm incorporating all three modules is the most superior. Notably, Fig. 7 illustrates the training curves of the improved algorithm. From the loss function curves, various losses on both the training and validation sets decreased significantly with the increase in training epochs, indicating that the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm gradually fitted the data during the learning process. Furthermore, the evaluation metric curves show that precision, recall, and various mAP metrics continuously improved and tended to stabilize, demonstrating that the algorithm's performance was constantly enhancing and possessed good generalization capability. Table 1. Results of ablation experiments. SPDConv VoVGSCSP Dysample P R mAP50 mAP50-95 F1 Params FLOPs FPS - - - 0.689 0.635 0.704 0.411 0.661 2.52 6.0 97.6 √ - 0.706 0.652 0.724 0.413 0.678 2.06 5.2 116.8 - √ - 0.724 0.644 0.730 0.421 0.682 2.64 5.9 90.8 - - √ 0.691 0.656 0.726 0.421 0.673 2.53 6.0 95.2 √ √ - 0.709 0.653 0.727 0.417 0.680 2.20 5.2 109.5 √ - √ 0.729 0.642 0.726 0.417 0.683 2.07 5.2 114.6 - √ √ 0.717 0.649 0.734 0.424 0.681 2.65 6.0 89.9 √ √ √ 0.718 0.655 0.735 0.426 0.685 2.21 5.2 107.8 Figure 7. Training curves of the improved algorithm. Comparative Experiments of Algorithms Based on the algorithm experimental results in Table 2 , the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm demonstrates significant advantages across multiple key metrics.In terms of precision, the algorithm's mAP performance is outstanding. Specifically, mAP50 reached 0.735, representing an improvement compared to YOLOv5 (0.654), YOLOv8 (0.667), YOLOv10 (0.691), YOLOv11 (0.726), and YOLOv12 (0.704). This implies that the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm can localize lung nodule lesions on chest X-rays more accurately and possesses higher detection precision. The improved algorithm achieved an mAP50-95 of 0.426, performing the best among all algorithms, which indicates that its comprehensive detection capability under different IoU thresholds is stronger than that of the original algorithm. Regarding recall, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm achieved a recall of 0.655. Although the advantage is not extremely pronounced compared to some algorithms, it remains at the highest level, enabling the effective detection of the majority of targets and reducing the occurrence of missed detections. In terms of computational resource consumption, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm holds advantages over other algorithms with fewer Params and FLOPs, as well as a higher FPS. This indicates that while maintaining high detection precision, it consumes fewer computational resources and operates with higher efficiency. Comprehensively considering all key metrics, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm is superior to other comparative algorithms. Table 2 Experimental results of the algorithms. Algorithm P R mAP50 mAP50-95 F1 Params FLOPs FPS YOLOv5 0.623 0.616 0.654 0.405 0.618 2.55 7.3 87.9 YOLOv8 0.651 0.595 0.667 0.407 0.642 2.98 8.2 75.3 YOLOv10 0.636 0.628 0.691 0.392 0.633 3.18 8.1 81.4 YOLOv11 0.679 0.651 0.726 0.415 0.640 3.04 7.1 90.4 YOLOv12 0.689 0.635 0.704 0.411 0.661 2.52 6.0 97.6 YOLOv12-SPDConv-Dysample-VoVGSCSP 0.718 0.655 0.735 0.426 0.685 2.21 5.2 107.8 Based on the external test results in Table 3 , all metrics of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm surpass those of the original YOLOv12 algorithm. This indicates that in external testing, while maintaining a high recall, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm significantly enhances detection precision and comprehensive performance, proving to be an object detection algorithm with excellent performance. Combined with the visual comparison of object detection results in internal and external tests shown in Figs. 8 and 9 , it can be intuitively observed that the original YOLOv12 algorithm performed suboptimally in detecting lung nodules on chest X-rays, whereas the improved YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm demonstrated superior detection results. In summary of all the above results, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm stands out as a high-performance object detection algorithm that outperforms the original YOLOv12 algorithm, thereby facilitating the clinical detection of lung nodules in chest X-rays. Table 3 External test results. Model P R mAP50 mAP50-95 F1 YOLOv12 0.722 0.642 0.735 0.427 0.679 YOLOv12-SPDConv-DySample-VoVGSCSP 0.737 0.658 0.749 0.437 0.695 Discussion and Conclusion Addressing the task of lung nodule detection on chest X-rays, this study designed and implemented an optimized and improved YOLOv12 object detection algorithm. By replacing parts of the Conv layers in the Backbone and Neck with SPDConv, replacing the Upsample modules in the Neck with upgraded Dysample modules, and replacing C3k2 and A2C2f in the Neck with VoVGSCSP, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm was constructed. The SPDConv, Dysample, and VoVGSCSP modules exhibit an inseparable synergistic effect, which is reflected in a progressive and complementary feature processing workflow. First, SPDConv plays the foundational role of "information fidelity." Intervening in the early stages of the Backbone, it maximizes the preservation of the most original and finest pixel-level information of small targets in the raw image. This provides detail-rich, undiluted raw information for all subsequent high-level feature extraction, serving as the first step to fundamentally resolve the issue of missing small target features. Subsequently, Dysample plays a key role in "intelligent enhancement" during mid-level feature fusion. It receives multi-scale feature maps from the Backbone (after preliminary enhancement by SPDConv) and performs content-aware dynamic upsampling. This process is not merely simple magnification but intelligently identifies and strengthens information in key detail areas, such as edges and textures within the feature maps. Consequently, nodule contours and internal structures that might otherwise be blurred or ignored are clearly reproduced and enriched on higher-level feature maps. It ensures that the detailed information preserved by SPDConv is not lost during transmission and fusion in the feature pyramid but is instead highlighted and enhanced. Finally, VoVGSCSP achieves the ultimate goal of "efficient distillation" within the high-level network architecture. Building upon the high-quality features fully prepared by SPDConv and Dysample, the VoVGSCSP module utilizes its lightweight cross-stage partial connection structure and optimized channel interaction mechanism to more efficiently distill the most discriminative fine-grained features from these detail-rich and semantic-rich inputs. Simultaneously, its lightweight design ensures that the YOLOv12 algorithm does not incur huge memory overhead due to complex calculations during object detection tasks, allowing the performance gains achieved by the preceding two modules to be efficiently solidified without sacrificing deployment feasibility. The synergy of these three components forms a powerful positive cycle, locking in and outputting superior final performance in a lightweight manner. This full-chain optimization—from information fidelity and intelligent enhancement to efficient distillation—is unachievable by any single module and constitutes the core pillar of the improved algorithm's success. Through the aforementioned synergistic optimization, the algorithm's capability to detect low-contrast lung nodules on chest X-rays is significantly enhanced. Experimental results indicate that this improved scheme demonstrates excellent comprehensive performance in detection tasks, surpassing the original YOLOv12 algorithm across multiple evaluation metrics. Nevertheless, the current study still has certain limitations, and practical applications continue to face technical challenges that need to be further addressed. First, although the chest X-ray dataset used in this study was professionally annotated and screened, the sample size (especially the external test set containing only 1000 images) may be insufficient to cover the extreme diversity of lung nodule morphologies (e.g., extremely small nodules, irregular shapes, or nodules highly similar to surrounding tissues). Furthermore, it lacks diverse data involving different devices, acquisition parameters, or pathological types (e.g., benign vs. malignant), which may affect the algorithm's generalization capability in complex real-world clinical scenarios. Second, there is a lack of multi-modal information fusion. The current study relies solely on single-modal chest X-ray data without integrating auxiliary data such as CT scans or clinical information, which may limit the ability to further discriminate nodule characteristics (e.g., benignity or malignancy). Finally, the adaptability for actual deployment has not been further investigated. Although the improved algorithm reduced Params and FLOPs, its real-time performance on mobile devices or low-computational-power hardware in primary healthcare institutions has not been verified, and the feasibility of actual clinical deployment requires further confirmation. In the future, it is necessary to expand and diversify the dataset by collecting chest X-ray data from more sources (different hospitals, devices, and acquisition parameters). This should cover a broader range of lung nodule morphologies (e.g., tiny nodules, spiculated nodules), pathological types, and complex backgrounds (e.g., co-existing inflammation, emphysema). Additionally, data augmentation techniques (e.g., simulating different exposure conditions, adding noise) should be employed to enhance algorithm robustness. Future work also needs to promote multi-modal information fusion. By integrating multi-modal imaging data such as CT and ultrasound with clinical information (e.g., patient age, smoking history), multi-modal detection models can be constructed to improve the accuracy of nodule characterization and assist in clinical decision-making. Lastly, real-world deployment verification needs to be strengthened. It is essential to test the algorithm's FPS and stability on low-computational-power hardware (e.g., GPUs/CPUs in primary hospitals) or mobile devices. Optimization through model compression techniques (e.g., pruning, quantization) or the design of lightweight versions should be pursued to ensure usability in authentic clinical environments. Declarations Competing interests The authors declare no competing interests. Funding No Funding. Author Contribution Minghui Mao: Conceptualization, Methodology, Software, Writing - Original Draft, Investigation. Chengkun Hong: Data curation, Validation, Formal analysis, Investigation, Writing - Original Draft. Yuhang Zhang: Visualization, Software, Validation, Data curation, Writing - Review & Editing. Hao Huang: Resources, Writing - Review & Editing, Supervision. Jianfeng Chu: Methodology, Formal analysis, Supervision, Writing - Review & Editing. Liyuan Fu: Conceptualization, Project administration, Funding acquisition, Supervision. Minghui Mao, Chengkun Hong and Yuhang Zhang contributed equally to this work. All authors have read and agreed to the published version of the manuscript. Data Availability The datasets used and analyzed during the current study are available in the Roboflow repository: Dataset 1 is available at [https://universe.roboflow.com/school-wo8fx/lung-anqdx], and Dataset 2 is available at [https://universe.roboflow.com/sad-unjbn/lung-sample]. References Dairi, M. S. & Bahakeem, B. Public Attitudes Towards Lung Cancer Screening in Saudi Arabia: A Cross-Sectional Study. J. multidisciplinary Healthc. 16 , 2279–2289 (2023). Bychkov, I. et al. Musashi-2 (MSI2) regulation of DNA damage response in lung cancer. Research square . rs.3.rs-4021568(2024). Liu, H. et al. Multi-model Ensemble Learning Architecture Based on 3D CNN for Lung Nodule Malignancy Suspiciousness Classification. J. Digit. Imaging . 33 (5), 1242–1256 (2020). Mikhail Lette, M. N. et al. Toward Improved Outcomes for Patients With Lung Cancer Globally: The Essential Role of Radiology and Nuclear Medicine. JCO global Oncol. 8 , e2100100 (2022). Cheng, Q. et al. Pneumocystis jirovecii diagnosed by next-generation sequencing of bronchoscopic alveolar lavage fluid: A case report and review of literature. World J. Clin. cases . 11 (4), 866–873 (2023). Pyka, V. et al. High-dose chemotherapy and autologous hematopoietic stem cell transplantation for progressive systemic sclerosis: a retrospective study of outcome and prognostic factors. J. Cancer Res. Clin. Oncol. 150 (6), 301 (2024). Gupta, D., Loane, R. & Gayen, S. Demner-Fushman, D. Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features. Knowl. Based Syst. 278 , 110907 (2023). Mahmoudi, S. et al. Imaging biomarkers to stratify lymph node metastases in abdominal CT - Is radiomics superior to dual-energy material decomposition? Eur. J. Radiol. open. 10 , 100459 (2022). Park, J. S. et al. Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes. Korean J. Radiol. 26 (9), 855–866 (2025). Prasada Rao, R. H. & Goswami, A. D. Cnidaria herd optimized fuzzy C-means clustering enabled deep learning model for lung nodule detection. Front. Physiol. 16 , 1511716 (2025). Wang, G., Duan, Q., Shen, T. & Zhang, S. SenseCare: a research platform for medical image informatics and interactive 3D visualization. Front. Radiol. 4 , 1460889 (2024). Xue, S. et al. CTS-Net: A Segmentation Network for Glaucoma Optical Coherence Tomography Retinal Layer Images. Bioeng. (Basel Switzerland) . 10 (2), 230 (2023). Xue, M., Liu, Y. & Cai, X. Automated Detection Model Based on Deep Learning for Knee Joint Motion Injury due to Martial Arts. Computational and mathematical methods in medicine . 3647152(2022). (2022). Wang, F., Mao, R., Yan, L., Ling, S. & Cai, Z. A deep learning-based approach for rectus abdominis segmentation and distance measurement in ultrasonography. Front. Physiol. 14 , 1246994 (2023). Ramamoorthy, P., Ramakantha Reddy, B. R., Askar, S. S. & Abouhawwash, M. Histopathology-based breast cancer prediction using deep learning methods for healthcare applications. Front. Oncol. 14 , 1300997 (2024). Lee, H. S. et al. Automated analysis of knee joint alignment using detailed angular values in long leg radiographs based on deep learning. Sci. Rep. 14 (1), 7226 (2024). Zhang, Z. et al. Accurate segmentation algorithm of acoustic neuroma in the cerebellopontine angle based on ACP-TransUNet. Frontiers neuroscience 17 , 1207149 (2023). Mustafa, Z. & Nsour, H. Using Computer Vision Techniques to Automatically Detect Abnormalities in Chest X-rays. Diagnostics (Basel Switzerland) . 13 (18), 2979 (2023). Horry, M. J. et al. Development of Debiasing Technique for Lung Nodule Chest X-ray Datasets to Generalize Deep Learning Models. Sens. (Basel Switzerland) . 23 (14), 6585 (2023). Lim, C. Y. et al. Estimating the Volume of Nodules and Masses on Serial Chest Radiography Using a Deep-Learning-Based Automatic Detection Algorithm: A Preliminary Study. Diagnostics (Basel Switzerland) . 13 (12), 2060 (2023). Khalili, B. & Smyth, A. W. SOD-YOLOv8-Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes. Sens. (Basel Switzerland) . 24 (19), 6209 (2024). Wang, Y. et al. A method for detecting the rate of tobacco leaf loosening in tobacco leaf sorting scenarios. Front. Plant Sci. 16 , 1578317 (2025). Lewis, J. E. et al. An Automated Pipeline for Differential Cell Counts on Whole-Slide Bone Marrow Aspirate Smears. Mod. pathology: official J. United States Can. Acad. Pathol. Inc . 36 (2), 100003 (2023). Chiu, H. Y. et al. Artificial Intelligence for Early Detection of Chest Nodules in X-ray Images. Biomedicines 10 (11), 2839 (2022). Obayya, M., Al-Wesabi, F. N., Bedewi, W. & Alshammeri, M. An intelligent framework for visually impaired people through indoor object Detection-Based assistive system using YOLO with recurrent neural networks. Sci. Rep. 15 (1), 43720 (2025). Zou, H., Weng, Z., Zhao, M. & Jiang, X. Multi-Strategy Improved Cantaloupe Pest Detection Algorithm. Insects 16 (12), 1201 (2025). Xu, B., Ma, Z., Su, X., He, X. & Wu, X. A lightweight intelligent grading method for lychee anthracnose based on improved YOLOv12. Front. Plant Sci. 16 , 1688675 (2025). Xu, H., Li, H. & Zhao, J. A lightweight tri-modal few-shot detection framework for fruit diversity recognition toward digital orchard archiving. Front. Plant Sci. 16 , 1696622 (2025). Gupta, C. et al. An optimized YOLO NAS based framework for realtime object detection. Sci. Rep. 15 (1), 32903 (2025). Tian, J. H. et al. An improved YOLOv5n algorithm for detecting surface defects in industrial components. Sci. Rep. 15 (1), 9756 (2025). Liu, D., Yang, Z., Bao, C. & Meng, Q. Artificial intelligence-based method for detecting wrist fractures in children. Sci. Rep. 15 (1), 38555 (2025). Zhou, C., Zhang, Y., Fu, W., Yao, L. & Yin, C. MDE-DETR: multi-domain enhanced feature fusion algorithm for bayberry detection and counting in complex orchards. Front. Plant Sci. 16 , 1711545 (2025). Yang, F. et al. A lightweight infrared target detection network suitable for land and water surfaces. Sci. Rep. 15 (1), 37794 (2025). Li, Z. et al. SGSNet: a lightweight deep learning model for strawberry growth stage detection. Front. Plant Sci. 15 , 1491706 (2024). Chen, C., Lee, H. & Chen, M. Steel surface defect detection method based on improved YOLOv9. Sci. Rep. 15 (1), 25098 (2025). Hu, L. et al. STAIR-DETR: A Synergistic Transformer Integrating Statistical Attention and Multi-Scale Dynamics for UAV Small Object Detection. Sens. (Basel Switzerland) . 25 (24), 7681 (2025). Du, L. et al. Y. A SCG-YOLOv8n potato counting framework with efficient mobile deployment. Sci. Rep. 15 (1), 34909 (2025). Wu, Z. et al. Algorithm for detecting surface defects in wind turbines based on a lightweight YOLO model. Sci. Rep. 14 (1), 24558 (2024). Xie, B. J., Li, H., Luan, Z., Li, X. X. & Lei, Z. A lightweight coal mine pedestrian detector for video surveillance systems with multi-level feature fusion and channel pruning. Sci. Rep. 15 (1), 5757 (2025). Liu, J. et al. A real-time end-to-end detector for detecting surface defects on oversized rings. PloS one . 20 (8), e0330031 (2025). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 07 Apr, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 05 Mar, 2026 Reviews received at journal 04 Mar, 2026 Reviewers agreed at journal 02 Mar, 2026 Reviewers agreed at journal 26 Feb, 2026 Reviews received at journal 25 Feb, 2026 Reviewers agreed at journal 24 Feb, 2026 Reviews received at journal 16 Feb, 2026 Reviewers agreed at journal 27 Jan, 2026 Reviewers invited by journal 27 Jan, 2026 Editor invited by journal 27 Jan, 2026 Editor assigned by journal 23 Jan, 2026 Submission checks completed at journal 23 Jan, 2026 First submitted to journal 22 Jan, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8670280","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":581432599,"identity":"f9a0aa98-813f-4d64-a620-8e1f678dfe18","order_by":0,"name":"Minghui Mao","email":"","orcid":"","institution":"Fujian University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Minghui","middleName":"","lastName":"Mao","suffix":""},{"id":581432600,"identity":"f9bf5ec5-136c-4581-97c8-def3d8cc6d50","order_by":1,"name":"Chengkun Hong","email":"","orcid":"","institution":"Fujian University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Chengkun","middleName":"","lastName":"Hong","suffix":""},{"id":581432601,"identity":"fdeae6d1-d2b7-4419-86e1-ec6164ca2e66","order_by":2,"name":"Yuhang Zhang","email":"","orcid":"","institution":"Fujian University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yuhang","middleName":"","lastName":"Zhang","suffix":""},{"id":581432602,"identity":"8dac745a-6edb-4db2-85b3-68c78047e43f","order_by":3,"name":"Hao Huang","email":"","orcid":"","institution":"Fujian University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hao","middleName":"","lastName":"Huang","suffix":""},{"id":581432603,"identity":"e24d25e7-1eb1-4a01-868e-f6803a88e429","order_by":4,"name":"Jianfeng Chu","email":"","orcid":"","institution":"Fujian University of Traditional Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jianfeng","middleName":"","lastName":"Chu","suffix":""},{"id":581432604,"identity":"bb303199-427e-4976-8ee6-24188889caed","order_by":5,"name":"Liyuan Fu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3klEQVRIie3QIQvCQBTA8QeDrTxZPYt+hYPBkuBXeYcwi4LxohYNKlY/xprY3hyYTrPBoAgmg0aLOjAq22yG++X73917AJb1h7y+x8lVP9H3ZscD6UZxgowqFcapVSfGkQcTlUkg4MrQCeS841aPw1WJRBCxMK6KobvW5DL4ozEVJIq5p1EtYBvtCPcgzCbOTZrvV4RaDvrhjsQZpOjmJ9nHZDaLVHEKYY9kWjqhQK4xBKIyCV4oWzJnS3ZbgjjCwlnQa6e3q2b066fkdn80av5omp983vHbccuyLOurFyLcUaivp4ryAAAAAElFTkSuQmCC","orcid":"","institution":"Fuzong Teaching Hospital of Fujian University of Traditional Chinese Medicine (900th Hospital)","correspondingAuthor":true,"prefix":"","firstName":"Liyuan","middleName":"","lastName":"Fu","suffix":""}],"badges":[],"createdAt":"2026-01-22 13:38:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8670280/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8670280/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-026-47670-9","type":"published","date":"2026-04-07T15:57:33+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":101751604,"identity":"a86a72d2-bf3d-4a11-9de2-42c1ce6104f8","added_by":"auto","created_at":"2026-02-03 10:21:40","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":522021,"visible":true,"origin":"","legend":"\u003cp\u003eStructure of the original YOLOv12 algorithm. Note: Input represents the data input stage; Conv refers to the convolutional module; C3k2 is an improved module designed for feature extraction; C3k is a module for optimizing feature extraction; A2 denotes the area attention mechanism; A2C2f is a C2f module enhanced by the area attention mechanism; Concat represents the concatenation operation; Upsample refers to the upsampling operation; Split denotes the splitting operation; Scaling refers to the adjustment of channel dimensions; ABlock is a composite module comprising Conv and A2; DWConv stands for the depthwise separable convolution module; Conv2d represents the 2D convolution module; BboxLoss denotes the bounding box loss; CLSLoss represents the classification loss; and Detect refers to the object detection process.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/cb4ac9a46835ae3c3cde408a.png"},{"id":101435464,"identity":"1b161e9c-4028-450b-b591-23587a04203d","added_by":"auto","created_at":"2026-01-29 16:14:27","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":415592,"visible":true,"origin":"","legend":"\u003cp\u003eWorking principle of the SPDConv module\u003cimg width=\"703\" height=\"327\" src=\"file:///C:/Users/pgs9865/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png\"/\u003e.\u003cstrong\u003e \u003c/strong\u003eNote: X\u003csub\u003e1\u003c/sub\u003e is the input feature map; scale=2 denotes the equal division operation; X\u003csub\u003e2\u003c/sub\u003e is the feature map after concatenation; Stride=1 indicates a stride of 1 for the compression convolution; X\u003csub\u003e3\u003c/sub\u003e is the final output feature map; L, W, and C represent the length, width, and number of channels of the feature map, respectively; x and y represent the coordinate axes in the length (L) and width (W) directions of the input feature map X\u003csub\u003e1\u003c/sub\u003e; C\u003csub\u003e1\u003c/sub\u003e is the number of channels of the input feature map; C\u003csub\u003e2\u003c/sub\u003e is the number of channels of the final output feature map X\u003csub\u003e3\u003c/sub\u003e; 2²C1 represents the new number of channels obtained after concatenation; and (i, j) represents the positional indices of the division operation in the L and W directions.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/1bf46049f0635a6b0884ec04.png"},{"id":101751294,"identity":"672c1028-f932-4ba5-8ae4-160ba5af6150","added_by":"auto","created_at":"2026-02-03 10:19:02","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":64470,"visible":true,"origin":"","legend":"\u003cp\u003eSampling-based dynamic upsampling.\u003cstrong\u003e \u003c/strong\u003eNote: H and Wdenote the height and width of the input feature map, respectively; Crepresents the number of channels of the input feature map; X is a tensor with a shape of (\u003cem\u003eH, W, C\u003c/em\u003e), representing the input feature map data; sH and sW denote the height and width of the output feature map, respectively; 2g represents specific scale or offset information associated with the sampling set; and X' is a tensor with a shape of (\u003cem\u003esH, sW, C\u003c/em\u003e), representing the output feature map obtained after the upsampling operation.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/8d6f42ae5d72a24e4c2ed93b.png"},{"id":101751519,"identity":"7f845a6c-89ff-49ef-ae49-f26710667e56","added_by":"auto","created_at":"2026-02-03 10:20:58","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":113917,"visible":true,"origin":"","legend":"\u003cp\u003eSampling point generator.\u003cstrong\u003e \u003c/strong\u003eNote: H and W denote the height and width of the input feature map, respectively; X represents the feature map; \"linear\" refers to the linear layer sampling; sH and sW denote the height and width of the output feature map, respectively; Pixel Shuffle is a technique used to convert low-resolution feature maps into high-resolution feature maps; 2gs\u003csup\u003e2\u003c/sup\u003e represents specific scale or offset information during sampling point generation; 2g denotes scale or offset information associated with the sampling set; O represents the offset; G denotes the original grid positions; S represents the sampling set; and 0.5σ is the scaling factor.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/101ad382933da5994611e055.png"},{"id":101751505,"identity":"d38e797f-d97a-4992-9725-1f612bf3f619","added_by":"auto","created_at":"2026-02-03 10:20:51","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":177639,"visible":true,"origin":"","legend":"\u003cp\u003eStructure of the VoVGSCSP module.\u003cstrong\u003e \u003c/strong\u003eNote: Conv refers to the convolutional module; GSConv stands for Ghost Convolution; GSBottleneck is a lightweight feature fusion unit that combines GSConv and standard Conv; Concat represents the concatenation operation; and A, B, C, and D denote potential lightweight nodes where partial shift convolutions can be inserted or replaced. This facilitates flexible customization based on varying computational requirements to achieve a balance between accuracy and efficiency.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/e8879d454ae60bfe99411be0.png"},{"id":101435459,"identity":"d0a4e152-7245-4b9b-9717-46d68dadb56d","added_by":"auto","created_at":"2026-01-29 16:14:26","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":542839,"visible":true,"origin":"","legend":"\u003cp\u003eStructure of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm\u003cstrong\u003e. \u003c/strong\u003eNote: Input represents the data input stage; Conv refers to the convolutional module; SPDConv denotes the Space-to-Depth Convolution module; C3k2 is an improved module designed for feature extraction; SPDConv, DySample, and VoVGSCSP correspond to the improved modules utilized in this study as described above; C3k is a module for optimizing feature extraction; A2 represents the area attention mechanism; A2C2f is a C2f module enhanced by the area attention mechanism; Concat denotes the concatenation operation; Upsample refers to the upsampling operation; Split denotes the splitting operation; Scaling refers to the adjustment of channel dimensions; ABlock is a composite module consisting of Conv and A2; DWConv stands for the depthwise separable convolution module; Conv2d represents the 2D convolution module; BboxLoss denotes the bounding box loss; CLSLoss represents the classification loss; and Detect refers to the object detection process.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/e750329e541a24b1f446568b.png"},{"id":101435461,"identity":"18530a72-ccc0-41aa-8c35-939cd8ddba7f","added_by":"auto","created_at":"2026-01-29 16:14:26","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":291354,"visible":true,"origin":"","legend":"\u003cp\u003eTraining curves of the improved algorithm.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/4a68a2e53ccea31d04be4ed0.png"},{"id":101435465,"identity":"0479d1a5-e00b-438b-bbac-c7838b28bdb9","added_by":"auto","created_at":"2026-01-29 16:14:27","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":272444,"visible":true,"origin":"","legend":"\u003cp\u003eInternal test results of the original and improved YOLOv12 algorithms for lung noduledetection on chest X-rays.\u003cstrong\u003e (A) \u003c/strong\u003eLung nodule detection results of the original YOLOv12 algorithm. \u003cstrong\u003e(B)\u003c/strong\u003e Lung nodule detection results of the improved YOLOv12 algorithm. Note: The values (0.5, 0.6) in the figure indicate the algorithm's prediction confidence for lung nodule lesions on chest X-rays.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/8300f31bbeb6d39ca9383816.png"},{"id":101435463,"identity":"1f0d1236-06af-4272-ad35-c10d1ec5c69c","added_by":"auto","created_at":"2026-01-29 16:14:26","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":270232,"visible":true,"origin":"","legend":"\u003cp\u003eExternal test results of the original and improved YOLOv12 algorithms for lung nodule detection on chest X-rays. \u003cstrong\u003e(A)\u003c/strong\u003e Lung nodule detection results of the original YOLOv12 algorithm.\u003cstrong\u003e (B)\u003c/strong\u003e Lung nodule detection results of the improved YOLOv12 algorithm. Note: The values (0.6, 0.8) in the figure indicate the algorithm's prediction confidence for lung nodule lesions on chest X-rays.\u003c/p\u003e","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/8463c63f817c2f299db4078a.png"},{"id":106808817,"identity":"ac2501e4-05a4-4aec-b859-20b5f0b64b7a","added_by":"auto","created_at":"2026-04-13 16:02:31","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3518483,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8670280/v1/92b3dc99-0983-458f-a62f-cb386c01d222.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Research on lung nodule detection in X-ray plain films based on improved YOLOv12 algorithm","fulltext":[{"header":"Introduction","content":"\u003cp\u003eLung cancer is a prevalent malignancy worldwide, characterized by a high mortality rate and an incidence that increases progressively with age\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. As a critical indicator of early-stage lung cancer, the early diagnosis of lung nodules can significantly improve patient survival rates\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Traditionally, lung nodule detection relies on radiologists visually inspecting chest X-ray images\u003csup\u003e\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. However, with the increasing demand for medical imaging examinations, the workload on radiologists has escalated correspondingly\u003csup\u003e\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Furthermore, lung nodules in medical images typically present with small volumes and highly diverse morphologies. These factors complicate detection, thereby increasing the risk of misdiagnosis or missed diagnosis\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn recent years, deep learning methods have developed rapidly and have been widely applied in the medical field\u003csup\u003e\u003cspan additionalcitationids=\"CR13 CR14 CR15 CR16\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. The application of computer algorithms for the automatic detection of lung nodules in chest X-rays is of great significance. Numerous deep learning algorithms have achieved remarkable progress in the field of lung nodule detection using chest X-rays\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. For instance, in 2023, Horry et al.\u003csup\u003e19\u003c/sup\u003e proposed a lung nodule detection algorithm based on deep convolutional neural networks (DCNN). Taking chest X-rays as input, this algorithm optimized nodule localization through an end-to-end network architecture, achieving state-of-the-art performance. It significantly improved accuracy and reliability, demonstrating an external generalization accuracy of 89% and validating its efficacy in medical image analysis for lung disease detection. In the same year, Lim et al.\u003csup\u003e20\u003c/sup\u003e introduced a deep-learning-based automatic detection algorithm (DLAD) for lung nodule detection and mass volume estimation. By utilizing convolutional neural networks (CNN) to generate localization maps and derived parameters (such as regional area and mean probability), they established a volume prediction algorithm combined with univariate or multivariate regression analysis. On a dataset of 147 X-rays and 208 nodules, they achieved precise detection and quantitative volume estimation. The multivariate regression model achieved a root mean square error (RMSE) of 7975.6 mm\u0026sup3;, verifying its precision in analyzing lung nodules on chest X-rays.\u003c/p\u003e \u003cp\u003eIn the field of object detection, two-stage algorithms based on region proposals, such as the faster region-based convolutional neural network (Faster R-CNN), and one-stage end-to-end algorithms, such as the you only look once (YOLO) series, constitute the current mainstream algorithmic frameworks. The core difference between these two categories lies in the design of the detection process. Two-stage algorithms first generate potential target regions via a region proposal network (RPN) and then refine classification and localization on candidate boxes. Conversely, one-stage algorithms complete joint regression for target localization and classification directly on feature maps, achieving a highly integrated detection process. This architectural distinction grants the former an advantage in detection accuracy, while the latter excels in real-time performance. Together, they support object detection requirements across various research fields.\u003c/p\u003e \u003cp\u003eAlthough two-stage algorithms used in current research demonstrate high detection accuracy, they suffer from drawbacks such as complex algorithmic structures, high training costs, and slow detection speeds. Additionally, they often require large amounts of data\u003csup\u003e\u003cspan additionalcitationids=\"CR22\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. In this study, the one-stage algorithm employed offers fast detection speeds but is slightly inferior in accuracy. Nevertheless, by optimizing the network structure and training strategies of the YOLO algorithm, it is possible to achieve high detection accuracy while meeting various application requirements. In 2022, Chiu et al.\u003csup\u003e24\u003c/sup\u003e proposed a lung nodule detection algorithm based on you only look once version 4 (YOLOv4). Through multiple preprocessing methods and ensemble model optimization, they achieved advanced performance in lung nodule detection on chest X-ray datasets, significantly improving sensitivity and reliability. The algorithm achieved a sensitivity of 84.4% on the external JSRT dataset. However, the relatively older YOLOv4 version used may lack optimization for complex scenarios and show poor adaptability to new data types.\u003c/p\u003e \u003cp\u003eThe YOLOv12 algorithm utilized in this study, a newer iteration in the YOLO series, demonstrates superior performance in terms of both precision and efficiency in object detection. Nonetheless, in the detection of lung nodules on chest X-rays, several factors may adversely affect the performance of YOLOv12. These include the high morphological diversity of nodules, low contrast with surrounding normal lung tissue, blood vessels, and ribs, the masking of small nodules by noise and artifacts, and image quality variations caused by different patient positioning and exposure parameters. To address these issues, this study constructs the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm by replacing specific convolutional layers in the Backbone and Neck with SPDConv, substituting Upsample modules in the Neck with upgraded DySample modules, and replacing C3k2 and A2C2f in the Neck with VoVGSCSP. The improved algorithm enhances the performance of lung nodule detection on chest X-rays to a certain extent. It can more acutely identify tiny and complex-shaped nodules, reducing the occurrence of misdiagnosis and missed diagnosis, thereby playing a significant role in promoting early clinical screening and diagnosis of lung nodules.\u003c/p\u003e"},{"header":"Original YOLOv12 Algorithm","content":"\u003cp\u003eAs a recent iteration of the YOLO series, the YOLOv12 algorithm retains the end-to-end direct prediction paradigm. It achieves simultaneous object classification and bounding box localization for all targets within an image through a single forward pass\u003csup\u003e\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e–\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the overall architecture of the algorithm adheres to the classic three-stage framework, comprising a Backbone, a Neck, and a Head\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Specifically, the Backbone is utilized for feature extraction\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. By stacking standard convolutions and pooling layers for progressive downsampling, it generates feature maps rich in semantic information. The Neck serves as a hub for feature fusion\u003csup\u003e\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e, producing composite representations with enhanced discriminative power. The Head is responsible for precisely outputting target class probabilities and bounding box coordinates.\u003c/p\u003e \u003cp\u003eThe YOLOv12 algorithm inherits the C3k2 module from YOLOv11\u003csup\u003e31\u003c/sup\u003e, which facilitates dynamic structural switching via a Boolean parameter, C3k. When C3k is set to False, the C3k2 module functions as a C2f structure (a standard bottleneck layer). Conversely, when C3k is set to True, a more complex C3 module structure is activated to enhance feature fusion capabilities.\u003c/p\u003e \u003cp\u003eHowever, YOLOv12 primarily relies on standard convolutions for feature extraction. This architectural choice limits the receptive field, resulting in insufficient capture of long-range contextual information and restricted detection performance for small targets. In addition, the original upsampling operations in the Neck of YOLOv12 rely on fixed interpolation rules. These methods struggle to dynamically optimize based on actual image content, often leading to blurred edge information or the loss of fine-grained features in feature maps. Furthermore, while the C3k2 and A2C2f modules in the Neck improve accuracy, they introduce high computational and memory overheads, thereby raising the hardware requirements for deployment. Consequently, there remains room for improvement in the comprehensive performance of the YOLOv12 algorithm.\u003c/p\u003e\n\u003ch3\u003eAlgorithm Improvements\u003c/h3\u003e\n\u003ch2\u003eSPDConv Module\u003c/h2\u003e\u003cp\u003eThe core technology of the SPDConv module is space-to-depth reorganization\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. It is an innovative module that rearranges the spatial and channel dimension structures of feature maps to achieve downsampling while avoiding information loss\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. First, it rearranges and compresses the input feature map in the spatial dimension, efficiently aggregating and transferring local spatial feature information, originally distributed across height and width dimensions, to the channel dimension. This method not only avoids the spatial information compression and detail loss caused by standard convolution downsampling operations but also further filters redundant background noise and enhances the saliency of target-related features through the fusion and interaction of multi-dimensional features between channels. Ultimately, by fully preserving the rich semantic and detailed information of the original feature map, the module significantly improves the algorithm's ability to recognize and perceive small-scale targets, weak boundary features, and blurry visual patterns. Thus, it effectively addresses the limitations of the YOLOv12 algorithm in small object detection tasks.\u003c/p\u003e\u003cp\u003eAs shown in Fig.\u0026nbsp;2, the SPDConv module implements a novel feature reconstruction mechanism by combining spatial dimension compression with channel dimension expansion. First, the input feature map X\u003csub\u003e1\u003c/sub\u003e is divided into multiple regions, and sub-features are subsequently concatenated along the channel axis to generate X\u003csub\u003e2\u003c/sub\u003e. This space-to-channel transformation mechanism not only reduces the resolution of the feature map but also significantly increases the number of channels. Consequently, it effectively maintains and strengthens the integrity of key spatial positions and structural information in the original data while reducing the computational burden. In addition, through dimensional restructuring, this mechanism adapts to suppress background noise irrelevant to the target, thereby improving the signal-to-noise ratio for subsequent feature learning.\u003c/p\u003e\u003cp\u003eSubsequently, to enhance feature expression performance, the SPDConv module employs a non-strided convolution (with a stride of 1) to perform feature extraction on the reorganized features. This approach prevents the loss of spatial information often caused by strided convolutions. Using C₂ filters, the deep features following spatial reorganization can be further refined. Ultimately, a resultant feature map of size \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:(\\frac{L}{2},\\frac{W}{2},{\\text{C}}_{2})\\)\u003c/span\u003e\u003c/span\u003e is generated, which effectively resolves the difficulty in distinguishing between small targets and background features. Compared with standard convolutions, the SPDConv module reduces the number of redundant channels and optimizes the feature expression structure while avoiding spatial information loss associated with strided convolutions. Furthermore, the convolutional layers within the module possess highly learnable parameter sets capable of automatically adjusting and optimizing feature representations based on training data. This enables more precise capture of fine structures, edge information, and local patterns within the image. These characteristics allow SPDConv to maintain high detection accuracy and robustness even when facing low-resolution inputs, dense small targets, or heavily occluded scenes, thereby significantly improving the overall performance of the algorithm in small object detection tasks.\u003c/p\u003e\u003cp\u003eTo ensure stable convergence of the SPDConv module during training and to synergistically optimize classification precision and localization accuracy in the detection task, the module utilizes a multi-objective loss mechanism that integrates classification loss and bounding box regression loss. The joint loss function is defined as follows:\u003c/p\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}L={L}_{cls}+\\lambda\\:{L}_{bbox}\\#(1)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{L}_{cls}\\)\u003c/span\u003e\u003c/span\u003e represents the classification loss; \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{L}_{bbox}\\)\u003c/span\u003e\u003c/span\u003e represents the bounding box regression loss; and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\lambda\\:\\)\u003c/span\u003e\u003c/span\u003e denotes the balancing coefficient.\u003c/p\u003e\u003cp\u003eIn summary, the design of the SPDConv module focuses on addressing critical issues common in lung nodule detection on chest X-rays, such as small target lesions, background contrast interference, and vague feature expression. By optimizing the organization of spatial information, enhancing the preservation of feature integrity, and precisely capturing the detailed features of tiny targets, this module achieves a significant improvement in feature representation capability, thereby effectively increasing the accuracy of the lung nodule detection task on chest X-rays.\u003c/p\u003e\u003ch3\u003eDySample Module\u003c/h3\u003e\u003cp\u003eDySample is a highly efficient dynamic upsampling method whose core innovation lies in the adoption of a dynamic point sampling mechanism to perform the upsampling process\u003csup\u003e\u003cspan additionalcitationids=\"CR35\" citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e–\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. Due to its simple structure and clear logic, this method facilitates easy deployment within the PyTorch framework. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e, this method operates based on the principle of sampling-based dynamic upsampling, intuitively demonstrating its working mechanism.\u003c/p\u003e\u003cp\u003eDySample generates sampling points using both static and dynamic scope factors. The feature map X is processed through a linear layer to generate a feature map of the corresponding size. Subsequently, the pixel shuffle technique is combined with the scope factor to generate the offsets O. Finally, these offsets are added to the original grid positions G to obtain the sampling set S, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e\u003cp\u003eTraditional upsampling processes lack content-adaptive capabilities\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e. Consequently, when processing low-contrast features in chest X-rays, they are prone to causing boundary blurring and the loss of detailed information. To address this issue, DySample employs a point resampling strategy. In the detection of morphologically diverse lung nodule features on chest X-rays, this module can adaptively upscale the input feature maps while preserving sufficient detailed information. Simultaneously, DySample reduces computational burden and latency by avoiding time-consuming dynamic convolutions and the need for additional sub-networks to generate dynamic kernels. By incorporating DySample, the algorithm is expected to capture the detailed boundary features of lung nodules more precisely.\u003c/p\u003e\u003ch3\u003eVoVGSCSP Module\u003c/h3\u003e\u003cp\u003eUnlike the traditional C3k2 and A2C2f modules, the VoVGSCSP module integrates the gradient optimization capability of the cross stage partial (CSP) network, the one-shot aggregation (OSA) feature fusion mechanism of the VoVNet backbone, and the lightweight characteristics of Ghost Convolution. It constructs a three-level feature enhancement architecture that balances performance and efficiency, with a particular focus on lightweight design\u003csup\u003e\u003cspan additionalcitationids=\"CR39\" citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e–\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eFirst, this module utilizes 1×1 convolutions to preserve the spatial details of the original feature map. Simultaneously, by stacking GS Bottlenecks, it enhances the fusion capability of multi-scale features. This effectively alleviates the issue of information attenuation regarding tiny lung nodules in chest X-rays during propagation through deep networks (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eTo further enhance the lightweight nature of the algorithm, the module introduces Ghost Convolution to replace standard convolutions. It generates redundant features with lower parameter counts and computational overheads, thereby significantly compressing the model size and inference costs while maintaining representational capability.\u003c/p\u003e\u003ch3\u003eImproved YOLOv12-SPDConv-Dysample-VoVGSCSP Algorithm\u003c/h3\u003e\u003cp\u003eAs shown in Fig.\u0026nbsp;6, the improved algorithm is an efficient optimization scheme proposed to address the shortcomings of the YOLOv12 algorithm, such as weak small object detection capability, the tendency to lose edge information and fine-grained features in feature maps, as well as high computational memory consumption and high hardware deployment thresholds. It combines the comprehensive advantages of the SPDConv, DySample, and VoVGSCSP modules.\u003c/p\u003e\u003cp\u003eFirst, replacing specific standard convolutions (Conv) in the Backbone and Neck with SPDConv is of significant value for enhancing the small object detection capability of the YOLOv12 algorithm. Notably, since the weights of the first standard Conv in the Backbone have a critical impact on algorithm convergence, and its uniform weight distribution is more conducive to the stable propagation of gradients, this study retained the first standard Conv of the Backbone during the improvement process.\u003c/p\u003e\u003cp\u003eSecond, all Upsample modules in the Neck were replaced with upgraded DySample modules. This implements a dynamic, content-aware upsampling process capable of generating clearer feature maps rich in semantic information, which is particularly beneficial for edge and detail recovery. Through content-aware sampling points, multi-scale features are fused more intelligently, thereby enhancing the feature extraction capability of the YOLOv12 algorithm for lung nodules in chest X-rays.\u003c/p\u003e\u003cp\u003eFinally, the C3k2 and A2C2f modules in the Neck were replaced with lightweight VoVGSCSP modules. This optimized the channel compression ratio of feature maps and the lightweight design of activation functions. It avoided the issues of feature redundancy and surging memory usage caused by the complex stacking of traditional C3k2 and A2C2f modules. This approach not only maintained the capture precision of fine-grained lung nodule features but also made the overall algorithm more lightweight, effectively reducing video memory requirements and computational barriers during hardware deployment, thereby further balancing detection performance and operational efficiency.\u003c/p\u003e\u003cp\u003e The comprehensive improvements of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm not only functionally strengthened the specific weaknesses of YOLOv12 but also constructed a jointly optimized feature extraction system through the complementary design between modules. This provides a more reliable algorithmic foundation for high-precision object detection in complex scenarios, making it more suitable for the study of lung nodule detection in chest X-rays compared to the original YOLOv12 algorithm.\u003c/p\u003e\u003cp\u003e \u003cb\u003eFigure 6.\u003c/b\u003e Structure of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm. Note: Input represents the data input stage; Conv refers to the convolutional module; SPDConv denotes the Space-to-Depth Convolution module; C3k2 is an improved module designed for feature extraction; SPDConv, DySample, and VoVGSCSP correspond to the improved modules utilized in this study as described above; C3k is a module for optimizing feature extraction; A2 represents the area attention mechanism; A2C2f is a C2f module enhanced by the area attention mechanism; Concat denotes the concatenation operation; Upsample refers to the upsampling operation; Split denotes the splitting operation; Scaling refers to the adjustment of channel dimensions; ABlock is a composite module consisting of Conv and A2; DWConv stands for the depthwise separable convolution module; Conv2d represents the 2D convolution module; BboxLoss denotes the bounding box loss; CLSLoss represents the classification loss; and Detect refers to the object detection process.\u003c/p\u003e"},{"header":"Results and Analysis","content":"\u003ch2\u003eComputing Environment\u003c/h2\u003e\u003cp\u003eAll algorithm experiments in this study were conducted on the Ubuntu 20.04 operating system. The experimental environment was established using Python 3.10. The YOLOv12 algorithm architecture was implemented using the PyTorch 2.2.2 deep learning framework, with CUDA 11.8 employed to enable GPU acceleration for model training.\u003c/p\u003e\u003cp\u003eDuring the training process, stochastic gradient descent (SGD) was selected as the optimizer. The batch size was set to 16, the initial learning rate was set to 0.01, and the training spanned a total of 100 epochs. For the object detection task, an intersection over union (IoU) threshold of 0.5 was utilized to determine the matching between predicted boxes and ground truth boxes.\u003c/p\u003e\u003ch3\u003eDatasets\u003c/h3\u003e\u003cp\u003eIn this study, chest X-ray images containing lung nodules were sourced from two public detection datasets available on the Roboflow platform. The first dataset, designated as Dataset 1, contains a total of 3,600 chest X-ray images of lung nodules. It was utilized for algorithm training and performance evaluation using an internal test set. The second dataset, named Dataset 2, comprises 1,000 chest X-ray images of lung nodules. This dataset was employed for external testing to evaluate the performance of both the original and improved algorithms.\u003c/p\u003e\u003cp\u003eAll image data underwent meticulous annotation by two senior radiologists using the professional annotation software LabelImg. Subsequently, the annotations were reviewed and verified by medical experts to ensure the precision and reliability of lung nodule labeling on the chest X-rays. Furthermore, to examine the robustness and generalization capability of the algorithm, a five-fold cross-validation method was adopted for training. Specifically, 3,000 images were randomly selected from Dataset 1 and divided into training and validation sets at a ratio of 8:2 across five groups with different data distributions. The remaining 600 images served as the internal test set to assess algorithm performance. Subsequently, Dataset 2 was utilized for external validation. Throughout the data partitioning process, special attention was paid to maintaining the distribution ratio of images across different categories in the training and validation sets consistent with that of the original dataset. Meanwhile, a manual screening step was established to strictly select chest X-ray images of lung nodules that met the study's established image quality standards, ensuring they could effectively support the subsequent algorithm training and performance evaluation tasks.\u003c/p\u003e\u003ch2\u003eEvaluation Metrics\u003c/h2\u003e\u003cp\u003eTo comprehensively evaluate the overall performance of different object detection algorithms, this study established a multi-angle evaluation metric system. This system incorporates key metrics such as precision (Pre), recall (Rec), Intersection over union (IoU), mean Average precision (mAP), F1 score, parameters (Params), floating point operations (FLOPs), and frames per second (FPS).\u003c/p\u003e\u003cp\u003eRegarding the evaluation of mAP, this study adopted two different IoU judgment standards. The first employs a fixed IoU threshold of 0.5, denoted as mAP50. The second introduces a dynamic IoU threshold mechanism, where the threshold is incrementally adjusted from 0.5 to 0.95 with a step size of 0.05, denoted as mAP50-95. This allows for a more comprehensive assessment of algorithm performance under various levels of matching strictness.\u003c/p\u003e\u003cp\u003eRegarding the matching criteria between predicted boxes and ground truth boxes, this study followed the classic matching strategy based on IoU thresholds. Specifically, the system calculates the IoU value between each predicted box and the corresponding ground truth box and compares this value with a preset IoU threshold. If the IoU value reaches or exceeds the set threshold, the detection is classified as correct; otherwise, it is considered a false detection.\u003c/p\u003e\u003cp\u003eThe aforementioned evaluation system not only enables accurate and reliable quantitative analysis of the algorithm's basic detection performance but also facilitates a deep exploration of the algorithm's robustness and adaptability under different matching conditions through the use of multi-level IoU thresholds. Furthermore, Params reflects the total number of trainable parameters in the network, serving as a crucial metric for measuring algorithm complexity and video memory requirements. FLOPs is used to assess the computational overhead of the algorithm during inference, while FPS intuitively reflects the real-time inference speed of the algorithm in practical applications.\u003c/p\u003e\u003cp\u003eThe calculation formulas for precision and recall are as follows:\u003c/p\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}\\text{P}\\text{r}\\text{e}\\text{}\\text{=}\\frac{TP}{TP+FP}\\#(2)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}\\text{R}\\text{e}\\text{c}\\text{}\\text{=}\\frac{\\text{TP}}{\\text{TP+FN}}\\#\\left(3\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cem\u003eTP\u003c/em\u003e (true positives) represents the number of actual positive instances that are correctly predicted as positive by the algorithm; \u003cem\u003eFP\u003c/em\u003e (false positives) denotes the number of actual negative instances that are incorrectly predicted as positive by the algorithm; and \u003cem\u003eFN\u003c/em\u003e (false negatives) represents the number of actual positive instances that are incorrectly predicted as negative by the algorithm.\u003c/p\u003e\u003cp\u003eThe calculation formula for \u003cem\u003eIoU\u003c/em\u003e is as follows:\u003c/p\u003e\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}IoU=\\frac{{S}_{\\text{G}\\text{T}}\\cap\\:{S}_{\\text{p}\\text{r}\\text{e}\\text{d}}}{{S}_{\\text{G}\\text{T}}\\cup\\:{S}_{\\text{p}\\text{r}\\text{e}\\text{d}}}\\#\\left(4\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cem\u003eS\u003c/em\u003e\u003csub\u003e\u003cem\u003eGT\u003c/em\u003e\u003c/sub\u003e represents the area occupied by the ground truth bounding box of the target object; and \u003cem\u003eS\u003c/em\u003e\u003csub\u003e\u003cem\u003epred\u003c/em\u003e\u003c/sub\u003e denotes the area covered by the predicted bounding box obtained by the algorithm.\u003c/p\u003e\u003cp\u003eThe calculation formula for \u003cem\u003emAP\u003c/em\u003e is as follows:\u003c/p\u003e\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}\\text{m}\\text{A}\\text{P}\\text{}\\text{=}\\frac{\\sum\\:_{c}\\:{S}_{PR}}{\\left|C\\right|}\\#\\left(5\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cem\u003eC\u003c/em\u003e represents the set of all target categories; |\u003cem\u003eC\u003c/em\u003e| denotes the total number of categories in set \u003cem\u003eC\u003c/em\u003e, representing the cardinality of the set; ∑\u003cem\u003ec\u003c/em\u003e represents the summation over all target categories \u003cem\u003ec\u003c/em\u003e; and \u003cem\u003eS\u003c/em\u003e\u003csub\u003e\u003cem\u003ePR\u003c/em\u003e\u003c/sub\u003e denotes the area under the precision-recall (PR) curve.\u003c/p\u003e\u003cp\u003eThe calculation formula for the \u003cem\u003eF1\u003c/em\u003e score is as follows:\u003c/p\u003e\u003cdiv id=\"Equf\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equf\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{F}_{1}=2\\times\\:\\frac{Pre\\:\\text{x}\\:Rec}{Pre+Rec}\\#\\left(6\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eWhere \u003cem\u003ePre\u003c/em\u003e denotes precision; and \u003cem\u003eRec\u003c/em\u003e denotes recall.\u003c/p\u003e\u003ch2\u003eAblation Studies\u003c/h2\u003e\u003cp\u003eTo accurately evaluate the specific contributions of each improvement method to the overall performance gain of the algorithm, this study designed and implemented a set of ablation experiments.\u003c/p\u003e\u003cp\u003eAnalysis of the results in Table\u0026nbsp;1 reveals that the original YOLOv12 algorithm did not reach optimal levels across various evaluation metrics. Specifically: (1) after introducing the SPDConv module alone, the precision of the algorithm increased from 0.689 to 0.706, recall increased from 0.635 to 0.652, and mAP50, mAP50-95, and F1 score increased from 0.704, 0.411, and 0.661 to 0.724, 0.413, and 0.678, respectively; (2) after introducing VoVGSCSP alone, the precision improved to 0.724, recall was 0.644, and mAP50, mAP50-95, and F1 score improved to 0.730, 0.421, and 0.682, respectively; and (3) after introducing DySample alone, the precision increased to 0.691, recall increased to 0.656, and mAP50, mAP50-95, and F1 score improved to 0.726, 0.421, and 0.673, respectively.\u003c/p\u003e\u003cp\u003eThese results indicate that all three improvement modules can enhance the performance of the YOLOv12 algorithm in detecting lung nodules on chest X-rays to varying degrees across multiple dimensions.\u003c/p\u003e\u003cp\u003eWhen the VoVGSCSP module was added on top of the SPDConv module, the precision increased to 0.709, recall increased to 0.653, and mAP50, mAP50-95, and F1 score increased to 0.727, 0.417, and 0.680, respectively, achieving a further breakthrough in recall improvement compared to introducing SPDConv or VoVGSCSP alone.\u003c/p\u003e\u003cp\u003eWhen the DySample module was added on top of the SPDConv module, the increase in precision was the most significant among all improvement schemes, reaching 0.729; however, the improvement in recall was the least significant, at 0.642. The mAP50, mAP50-95, and F1 score increased to 0.726, 0.417, and 0.683, respectively.\u003c/p\u003e\u003cp\u003eWhen the DySample module was added on top of the VoVGSCSP module, the precision increased to 0.717, which represents a moderate level among the pairwise combinations; the recall was 0.649, showing no obvious improvement; whereas the mAP50, mAP50-95, and F1 score increased significantly to 0.734, 0.424, and 0.681, respectively.\u003c/p\u003e\u003cp\u003eHowever, after simultaneously introducing the SPDConv, DySample, and VoVGSCSP modules, the algorithm performed best. The \u003cem\u003ePrecision\u003c/em\u003e increased from 0.689 to 0.718; the recall increased substantially from 0.635 to 0.655, second only to the standalone DySample scheme; and the mAP50, mAP50-95, and F1 score increased significantly to 0.735, 0.426, and 0.685, respectively.\u003c/p\u003e\u003cp\u003eA comparison of the parameter counts and computational efficiency results of different algorithms in the table shows that while the comprehensive performance of the algorithm improved significantly with the three modules, Params and FLOPs decreased, and FPS also saw an increase. In terms of overall performance, the algorithm incorporating all three modules is the most superior.\u003c/p\u003e\u003cp\u003eNotably, Fig.\u0026nbsp;7 illustrates the training curves of the improved algorithm. From the loss function curves, various losses on both the training and validation sets decreased significantly with the increase in training epochs, indicating that the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm gradually fitted the data during the learning process. Furthermore, the evaluation metric curves show that precision, recall, and various mAP metrics continuously improved and tended to stabilize, demonstrating that the algorithm's performance was constantly enhancing and possessed good generalization capability.\u003c/p\u003e\u003cp\u003e \u003cb\u003eTable 1.\u003c/b\u003e Results of ablation experiments.\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e\u003ccolgroup cols=\"11\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSPDConv\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVoVGSCSP\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDysample\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003emAP50\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003emAP50-95\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eParams\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eFLOPs\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eFPS\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.689\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.635\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.704\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.411\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.661\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.52\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.0\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.6\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.706\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.652\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.724\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.413\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.678\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.06\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e5.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e116.8\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.724\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.644\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.730\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.421\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.682\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.64\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e5.9\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e90.8\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.691\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.656\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.726\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.421\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.673\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.53\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.0\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e95.2\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.709\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.653\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.727\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.417\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.680\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e5.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e109.5\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.729\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.642\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.726\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.417\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.683\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.07\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e5.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e114.6\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.717\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.649\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.734\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.424\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.681\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.65\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.0\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e89.9\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e√\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.718\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.655\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.735\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.426\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.685\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e2.21\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e5.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e107.8\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003e \u003cb\u003eFigure 7.\u003c/b\u003e Training curves of the improved algorithm.\u003c/p\u003e\u003ch2\u003eComparative Experiments of Algorithms\u003c/h2\u003e\u003cp\u003eBased on the algorithm experimental results in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm demonstrates significant advantages across multiple key metrics.In terms of precision, the algorithm's mAP performance is outstanding. Specifically, mAP50 reached 0.735, representing an improvement compared to YOLOv5 (0.654), YOLOv8 (0.667), YOLOv10 (0.691), YOLOv11 (0.726), and YOLOv12 (0.704). This implies that the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm can localize lung nodule lesions on chest X-rays more accurately and possesses higher detection precision.\u003c/p\u003e\u003cp\u003eThe improved algorithm achieved an mAP50-95 of 0.426, performing the best among all algorithms, which indicates that its comprehensive detection capability under different IoU thresholds is stronger than that of the original algorithm.\u003c/p\u003e\u003cp\u003eRegarding recall, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm achieved a recall of 0.655. Although the advantage is not extremely pronounced compared to some algorithms, it remains at the highest level, enabling the effective detection of the majority of targets and reducing the occurrence of missed detections.\u003c/p\u003e\u003cp\u003eIn terms of computational resource consumption, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm holds advantages over other algorithms with fewer Params and FLOPs, as well as a higher FPS. This indicates that while maintaining high detection precision, it consumes fewer computational resources and operates with higher efficiency.\u003c/p\u003e\u003cp\u003eComprehensively considering all key metrics, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm is superior to other comparative algorithms.\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eExperimental results of the algorithms.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"9\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlgorithm\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003emAP50\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003emAP50-95\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eParams\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eFLOPs\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eFPS\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv5\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.623\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.616\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.654\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.405\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.618\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2.55\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e7.3\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e87.9\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv8\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.651\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.595\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.667\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.407\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.642\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2.98\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e8.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e75.3\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv10\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.636\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.628\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.691\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.392\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.633\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e3.18\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e8.1\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e81.4\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv11\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.679\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.651\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.726\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.415\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.640\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e3.04\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e7.1\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e90.4\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv12\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.689\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.635\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.704\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.411\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.661\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2.52\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e6.0\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.6\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv12-SPDConv-Dysample-VoVGSCSP\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.718\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.655\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.735\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.426\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.685\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2.21\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e5.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e107.8\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003eBased on the external test results in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e3\u003c/span\u003e, all metrics of the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm surpass those of the original YOLOv12 algorithm. This indicates that in external testing, while maintaining a high recall, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm significantly enhances detection precision and comprehensive performance, proving to be an object detection algorithm with excellent performance.\u003c/p\u003e\u003cp\u003eCombined with the visual comparison of object detection results in internal and external tests shown in Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003e and \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e9\u003c/span\u003e, it can be intuitively observed that the original YOLOv12 algorithm performed suboptimally in detecting lung nodules on chest X-rays, whereas the improved YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm demonstrated superior detection results.\u003c/p\u003e\u003cp\u003eIn summary of all the above results, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm stands out as a high-performance object detection algorithm that outperforms the original YOLOv12 algorithm, thereby facilitating the clinical detection of lung nodules in chest X-rays.\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eExternal test results.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003emAP50\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003emAP50-95\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv12\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.722\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.642\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.735\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.427\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.679\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYOLOv12-SPDConv-DySample-VoVGSCSP\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.737\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.658\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.749\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.437\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.695\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e"},{"header":"Discussion and Conclusion","content":"\u003cp\u003eAddressing the task of lung nodule detection on chest X-rays, this study designed and implemented an optimized and improved YOLOv12 object detection algorithm. By replacing parts of the Conv layers in the Backbone and Neck with SPDConv, replacing the Upsample modules in the Neck with upgraded Dysample modules, and replacing C3k2 and A2C2f in the Neck with VoVGSCSP, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm was constructed.\u003c/p\u003e\u003cp\u003eThe SPDConv, Dysample, and VoVGSCSP modules exhibit an inseparable synergistic effect, which is reflected in a progressive and complementary feature processing workflow.\u003c/p\u003e\u003cp\u003eFirst, SPDConv plays the foundational role of \"information fidelity.\" Intervening in the early stages of the Backbone, it maximizes the preservation of the most original and finest pixel-level information of small targets in the raw image. This provides detail-rich, undiluted raw information for all subsequent high-level feature extraction, serving as the first step to fundamentally resolve the issue of missing small target features.\u003c/p\u003e\u003cp\u003eSubsequently, Dysample plays a key role in \"intelligent enhancement\" during mid-level feature fusion. It receives multi-scale feature maps from the Backbone (after preliminary enhancement by SPDConv) and performs content-aware dynamic upsampling. This process is not merely simple magnification but intelligently identifies and strengthens information in key detail areas, such as edges and textures within the feature maps. Consequently, nodule contours and internal structures that might otherwise be blurred or ignored are clearly reproduced and enriched on higher-level feature maps. It ensures that the detailed information preserved by SPDConv is not lost during transmission and fusion in the feature pyramid but is instead highlighted and enhanced.\u003c/p\u003e\u003cp\u003eFinally, VoVGSCSP achieves the ultimate goal of \"efficient distillation\" within the high-level network architecture. Building upon the high-quality features fully prepared by SPDConv and Dysample, the VoVGSCSP module utilizes its lightweight cross-stage partial connection structure and optimized channel interaction mechanism to more efficiently distill the most discriminative fine-grained features from these detail-rich and semantic-rich inputs. Simultaneously, its lightweight design ensures that the YOLOv12 algorithm does not incur huge memory overhead due to complex calculations during object detection tasks, allowing the performance gains achieved by the preceding two modules to be efficiently solidified without sacrificing deployment feasibility.\u003c/p\u003e\u003cp\u003eThe synergy of these three components forms a powerful positive cycle, locking in and outputting superior final performance in a lightweight manner. This full-chain optimization—from information fidelity and intelligent enhancement to efficient distillation—is unachievable by any single module and constitutes the core pillar of the improved algorithm's success. Through the aforementioned synergistic optimization, the algorithm's capability to detect low-contrast lung nodules on chest X-rays is significantly enhanced. Experimental results indicate that this improved scheme demonstrates excellent comprehensive performance in detection tasks, surpassing the original YOLOv12 algorithm across multiple evaluation metrics.\u003c/p\u003e\u003cp\u003eNevertheless, the current study still has certain limitations, and practical applications continue to face technical challenges that need to be further addressed.\u003c/p\u003e\u003cp\u003eFirst, although the chest X-ray dataset used in this study was professionally annotated and screened, the sample size (especially the external test set containing only 1000 images) may be insufficient to cover the extreme diversity of lung nodule morphologies (e.g., extremely small nodules, irregular shapes, or nodules highly similar to surrounding tissues). Furthermore, it lacks diverse data involving different devices, acquisition parameters, or pathological types (e.g., benign vs. malignant), which may affect the algorithm's generalization capability in complex real-world clinical scenarios.\u003c/p\u003e\u003cp\u003eSecond, there is a lack of multi-modal information fusion. The current study relies solely on single-modal chest X-ray data without integrating auxiliary data such as CT scans or clinical information, which may limit the ability to further discriminate nodule characteristics (e.g., benignity or malignancy).\u003c/p\u003e\u003cp\u003eFinally, the adaptability for actual deployment has not been further investigated. Although the improved algorithm reduced Params and FLOPs, its real-time performance on mobile devices or low-computational-power hardware in primary healthcare institutions has not been verified, and the feasibility of actual clinical deployment requires further confirmation.\u003c/p\u003e\u003cp\u003eIn the future, it is necessary to expand and diversify the dataset by collecting chest X-ray data from more sources (different hospitals, devices, and acquisition parameters). This should cover a broader range of lung nodule morphologies (e.g., tiny nodules, spiculated nodules), pathological types, and complex backgrounds (e.g., co-existing inflammation, emphysema). Additionally, data augmentation techniques (e.g., simulating different exposure conditions, adding noise) should be employed to enhance algorithm robustness.\u003c/p\u003e\u003cp\u003eFuture work also needs to promote multi-modal information fusion. By integrating multi-modal imaging data such as CT and ultrasound with clinical information (e.g., patient age, smoking history), multi-modal detection models can be constructed to improve the accuracy of nodule characterization and assist in clinical decision-making.\u003c/p\u003e\u003cp\u003eLastly, real-world deployment verification needs to be strengthened. It is essential to test the algorithm's FPS and stability on low-computational-power hardware (e.g., GPUs/CPUs in primary hospitals) or mobile devices. Optimization through model compression techniques (e.g., pruning, quantization) or the design of lightweight versions should be pursued to ensure usability in authentic clinical environments.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eCompeting interests\u003c/strong\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eNo Funding.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eMinghui Mao: Conceptualization, Methodology, Software, Writing - Original Draft, Investigation. Chengkun Hong: Data curation, Validation, Formal analysis, Investigation, Writing - Original Draft. Yuhang Zhang: Visualization, Software, Validation, Data curation, Writing - Review \u0026amp; Editing. Hao Huang: Resources, Writing - Review \u0026amp; Editing, Supervision. Jianfeng Chu: Methodology, Formal analysis, Supervision, Writing - Review \u0026amp; Editing. Liyuan Fu: Conceptualization, Project administration, Funding acquisition, Supervision. Minghui Mao, Chengkun Hong and Yuhang Zhang contributed equally to this work. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets used and analyzed during the current study are available in the Roboflow repository: Dataset 1 is available at [https://universe.roboflow.com/school-wo8fx/lung-anqdx], and Dataset 2 is available at [https://universe.roboflow.com/sad-unjbn/lung-sample].\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eDairi, M. S. \u0026amp; Bahakeem, B. Public Attitudes Towards Lung Cancer Screening in Saudi Arabia: A Cross-Sectional Study. \u003cem\u003eJ. multidisciplinary Healthc.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 2279\u0026ndash;2289 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBychkov, I. et al. Musashi-2 (MSI2) regulation of DNA damage response in lung cancer. \u003cem\u003eResearch square\u003c/em\u003e. rs.3.rs-4021568(2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, H. et al. Multi-model Ensemble Learning Architecture Based on 3D CNN for Lung Nodule Malignancy Suspiciousness Classification. \u003cem\u003eJ. Digit. Imaging\u003c/em\u003e. \u003cb\u003e33\u003c/b\u003e (5), 1242\u0026ndash;1256 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMikhail Lette, M. N. et al. Toward Improved Outcomes for Patients With Lung Cancer Globally: The Essential Role of Radiology and Nuclear Medicine. \u003cem\u003eJCO global Oncol.\u003c/em\u003e \u003cb\u003e8\u003c/b\u003e, e2100100 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng, Q. et al. Pneumocystis jirovecii diagnosed by next-generation sequencing of bronchoscopic alveolar lavage fluid: A case report and review of literature. \u003cem\u003eWorld J. Clin. cases\u003c/em\u003e. \u003cb\u003e11\u003c/b\u003e (4), 866\u0026ndash;873 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePyka, V. et al. High-dose chemotherapy and autologous hematopoietic stem cell transplantation for progressive systemic sclerosis: a retrospective study of outcome and prognostic factors. \u003cem\u003eJ. Cancer Res. Clin. Oncol.\u003c/em\u003e \u003cb\u003e150\u003c/b\u003e (6), 301 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGupta, D., Loane, R. \u0026amp; Gayen, S. Demner-Fushman, D. Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features. \u003cem\u003eKnowl. Based Syst.\u003c/em\u003e \u003cb\u003e278\u003c/b\u003e, 110907 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMahmoudi, S. et al. Imaging biomarkers to stratify lymph node metastases in abdominal CT - Is radiomics superior to dual-energy material decomposition? \u003cem\u003eEur. J. Radiol. open.\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e, 100459 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark, J. S. et al. Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes. \u003cem\u003eKorean J. Radiol.\u003c/em\u003e \u003cb\u003e26\u003c/b\u003e (9), 855\u0026ndash;866 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrasada Rao, R. H. \u0026amp; Goswami, A. D. Cnidaria herd optimized fuzzy C-means clustering enabled deep learning model for lung nodule detection. \u003cem\u003eFront. Physiol.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 1511716 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, G., Duan, Q., Shen, T. \u0026amp; Zhang, S. SenseCare: a research platform for medical image informatics and interactive 3D visualization. \u003cem\u003eFront. Radiol.\u003c/em\u003e \u003cb\u003e4\u003c/b\u003e, 1460889 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXue, S. et al. CTS-Net: A Segmentation Network for Glaucoma Optical Coherence Tomography Retinal Layer Images. \u003cem\u003eBioeng. (Basel Switzerland)\u003c/em\u003e. \u003cb\u003e10\u003c/b\u003e (2), 230 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXue, M., Liu, Y. \u0026amp; Cai, X. Automated Detection Model Based on Deep Learning for Knee Joint Motion Injury due to Martial Arts. \u003cem\u003eComputational and mathematical methods in medicine\u003c/em\u003e. 3647152(2022). (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, F., Mao, R., Yan, L., Ling, S. \u0026amp; Cai, Z. A deep learning-based approach for rectus abdominis segmentation and distance measurement in ultrasonography. \u003cem\u003eFront. Physiol.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 1246994 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamamoorthy, P., Ramakantha Reddy, B. R., Askar, S. S. \u0026amp; Abouhawwash, M. Histopathology-based breast cancer prediction using deep learning methods for healthcare applications. \u003cem\u003eFront. Oncol.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 1300997 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee, H. S. et al. Automated analysis of knee joint alignment using detailed angular values in long leg radiographs based on deep learning. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e (1), 7226 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Z. et al. Accurate segmentation algorithm of acoustic neuroma in the cerebellopontine angle based on ACP-TransUNet. \u003cem\u003eFrontiers neuroscience\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, 1207149 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMustafa, Z. \u0026amp; Nsour, H. Using Computer Vision Techniques to Automatically Detect Abnormalities in Chest X-rays. \u003cem\u003eDiagnostics (Basel Switzerland)\u003c/em\u003e. \u003cb\u003e13\u003c/b\u003e (18), 2979 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHorry, M. J. et al. Development of Debiasing Technique for Lung Nodule Chest X-ray Datasets to Generalize Deep Learning Models. \u003cem\u003eSens. (Basel Switzerland)\u003c/em\u003e. \u003cb\u003e23\u003c/b\u003e (14), 6585 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLim, C. Y. et al. Estimating the Volume of Nodules and Masses on Serial Chest Radiography Using a Deep-Learning-Based Automatic Detection Algorithm: A Preliminary Study. \u003cem\u003eDiagnostics (Basel Switzerland)\u003c/em\u003e. \u003cb\u003e13\u003c/b\u003e (12), 2060 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhalili, B. \u0026amp; Smyth, A. W. SOD-YOLOv8-Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes. \u003cem\u003eSens. (Basel Switzerland)\u003c/em\u003e. \u003cb\u003e24\u003c/b\u003e (19), 6209 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, Y. et al. A method for detecting the rate of tobacco leaf loosening in tobacco leaf sorting scenarios. \u003cem\u003eFront. Plant Sci.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 1578317 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLewis, J. E. et al. An Automated Pipeline for Differential Cell Counts on Whole-Slide Bone Marrow Aspirate Smears. \u003cem\u003eMod. pathology: official J. United States Can. Acad. Pathol. Inc\u003c/em\u003e. \u003cb\u003e36\u003c/b\u003e (2), 100003 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChiu, H. Y. et al. Artificial Intelligence for Early Detection of Chest Nodules in X-ray Images. \u003cem\u003eBiomedicines\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e (11), 2839 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eObayya, M., Al-Wesabi, F. N., Bedewi, W. \u0026amp; Alshammeri, M. An intelligent framework for visually impaired people through indoor object Detection-Based assistive system using YOLO with recurrent neural networks. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 43720 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZou, H., Weng, Z., Zhao, M. \u0026amp; Jiang, X. Multi-Strategy Improved Cantaloupe Pest Detection Algorithm. \u003cem\u003eInsects\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e (12), 1201 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, B., Ma, Z., Su, X., He, X. \u0026amp; Wu, X. A lightweight intelligent grading method for lychee anthracnose based on improved YOLOv12. \u003cem\u003eFront. Plant Sci.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 1688675 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, H., Li, H. \u0026amp; Zhao, J. A lightweight tri-modal few-shot detection framework for fruit diversity recognition toward digital orchard archiving. \u003cem\u003eFront. Plant Sci.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 1696622 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGupta, C. et al. An optimized YOLO NAS based framework for realtime object detection. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 32903 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTian, J. H. et al. An improved YOLOv5n algorithm for detecting surface defects in industrial components. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 9756 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, D., Yang, Z., Bao, C. \u0026amp; Meng, Q. Artificial intelligence-based method for detecting wrist fractures in children. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 38555 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou, C., Zhang, Y., Fu, W., Yao, L. \u0026amp; Yin, C. MDE-DETR: multi-domain enhanced feature fusion algorithm for bayberry detection and counting in complex orchards. \u003cem\u003eFront. Plant Sci.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 1711545 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, F. et al. A lightweight infrared target detection network suitable for land and water surfaces. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 37794 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Z. et al. SGSNet: a lightweight deep learning model for strawberry growth stage detection. \u003cem\u003eFront. Plant Sci.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e, 1491706 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, C., Lee, H. \u0026amp; Chen, M. Steel surface defect detection method based on improved YOLOv9. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 25098 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, L. et al. STAIR-DETR: A Synergistic Transformer Integrating Statistical Attention and Multi-Scale Dynamics for UAV Small Object Detection. \u003cem\u003eSens. (Basel Switzerland)\u003c/em\u003e. \u003cb\u003e25\u003c/b\u003e (24), 7681 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDu, L. et al. Y. A SCG-YOLOv8n potato counting framework with efficient mobile deployment. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 34909 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu, Z. et al. Algorithm for detecting surface defects in wind turbines based on a lightweight YOLO model. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e (1), 24558 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie, B. J., Li, H., Luan, Z., Li, X. X. \u0026amp; Lei, Z. A lightweight coal mine pedestrian detector for video surveillance systems with multi-level feature fusion and channel pruning. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 5757 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, J. et al. A real-time end-to-end detector for detecting surface defects on oversized rings. \u003cem\u003ePloS one\u003c/em\u003e. \u003cb\u003e20\u003c/b\u003e (8), e0330031 (2025).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"YOLOv12, SPDConv, DySample, VoVGSCSP, Chest X-ray, Lung nodules","lastPublishedDoi":"10.21203/rs.3.rs-8670280/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8670280/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTo investigate the feasibility of automatic lung nodule detection using chest X-rays, this study proposes an improved YOLOv12 algorithm based on space-to-depth convolution (SPDConv), a dynamic upsampling module (DySample), and a one-shot aggregation cross stage partial network with ghost convolution (VoVGSCSP). The original YOLOv12 algorithm was optimized by replacing specific convolutional layers in the Backbone and Neck with SPDConv, substituting the Upsample modules in the Neck with upgraded DySample modules, and replacing the C3k2 and A2C2f modules in the Neck with VoVGSCSP to construct the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm. The optimized algorithm was trained and validated using a public chest X-ray lung nodule dataset available on the Roboflow platform, and its performance was compared with that of the original YOLOv12 algorithm. Results indicate that the improved algorithm achieved a mean average precision at an intersection over union threshold of 0.5 (mAP50) of 0.735 and a mAP50-95 of 0.426 in detecting lung nodules on chest X-rays. These results outperformed the original YOLOv12 algorithm, which achieved a mAP50 of 0.704 and a mAP50-95 of 0.411. In conclusion, the YOLOv12-SPDConv-Dysample-VoVGSCSP algorithm demonstrates superior overall performance in detecting lung nodules on chest X-rays, significantly surpassing the original YOLOv12 algorithm.\u003c/p\u003e","manuscriptTitle":"Research on lung nodule detection in X-ray plain films based on improved YOLOv12 algorithm","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-29 16:14:21","doi":"10.21203/rs.3.rs-8670280/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-03-05T13:08:22+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-04T13:59:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"281409908834524655695579283245512238502","date":"2026-03-02T14:58:21+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"98074702140882448052107713531938308624","date":"2026-02-26T08:33:39+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-25T12:19:21+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"152479714288801219952681630473074286014","date":"2026-02-24T15:10:32+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-16T20:41:27+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"53051961666439125554752929759769875464","date":"2026-01-27T14:05:43+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-01-27T13:35:58+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-01-27T08:46:56+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-01-24T01:37:03+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-01-24T01:35:46+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-01-22T12:57:40+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"cdc3dca3-9c7c-4f39-8846-f102cc9ef563","owner":[],"postedDate":"January 29th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":61856484,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":61856485,"name":"Physical sciences/Engineering"},{"id":61856486,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-04-13T16:00:32+00:00","versionOfRecord":{"articleIdentity":"rs-8670280","link":"https://doi.org/10.1038/s41598-026-47670-9","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-04-07 15:57:33","publishedOnDateReadable":"April 7th, 2026"},"versionCreatedAt":"2026-01-29 16:14:21","video":"","vorDoi":"10.1038/s41598-026-47670-9","vorDoiUrl":"https://doi.org/10.1038/s41598-026-47670-9","workflowStages":[]},"version":"v1","identity":"rs-8670280","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8670280","identity":"rs-8670280","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.