Enhanced YOLOv8 with Lightweight and Efficient Detection Head for for Detecting Rice Leaf Diseases | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Enhanced YOLOv8 with Lightweight and Efficient Detection Head for for Detecting Rice Leaf Diseases Bo Gan, Guolin Pu, Weiyin Xing, Lianfang Wang, Shu Liang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5336865/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 Jul, 2025 Read the published version in Scientific Reports → Version 1 posted 5 You are reading this latest preprint version Abstract Detecting rice leaf diseases is essential for agricultural stability and crop health. However, the diversity of these diseases, their uneven distribution, and complex field environments create challenges for precise, multi-scale detection. While YOLO object detection algorithms show strong performance in automated detection, further optimization is needed. This paper presents G-YOLO, a novel architecture that combines a Lightweight and Efficient Detection Head (LEDH) with Multi-scale Spatial Pyramid Pooling Fast (MSPPF). The LEDH enhances detection speed by simplifying the network structure while maintaining accuracy, reducing computational demands. The MSPPF improves the model’s ability to capture intricate details of rice leaf diseases at various scales by fusing multi-level feature maps. On the RiceDisease dataset, G-YOLO surpasses YOLOv8n with 4.4% higher [email protected] , 3.9% higher [email protected] , and a 13.1% increase in FPS, making it well-suited for resource-constrained devices due to its efficient design. Physical sciences/Mathematics and computing/Computer science Physical sciences/Engineering/Electrical and electronic engineering Biological sciences/Computational biology and bioinformatics Rice leaf disease detection Agricultural stability Crop Health Lightweight Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 1. Introduction Detecting rice leaf diseases is essential for modern agriculture and crop protection. Fast and accurate field monitoring of these diseases is critical for optimizing management and control strategies. However, current detection technologies in complex and dynamic field conditions often face difficulties balancing accuracy and real-time performance. This issue arises due to two main factors: first, the variability in the rice growth cycle, changing weather, and the wide variety of diseases make precise identification challenging. Specifically, when multiple diseases with varying severities coexist, models often detect only the most obvious lesions, missing subtle early symptoms, which leads to missed detections and false positives. Second, the high computational complexity and large size of many detection models hinder their real-time capabilities. Deep learning methods for detecting rice leaf diseases can be grouped into two main categories. The first category includes two-stage detection algorithms based on region proposals, such as RCNN [ 1 ], Fast R-CNN [ 2 ], and Faster R-CNN [ 3 ]. Although these methods provide high accuracy, their slower speeds make them less effective for real-time use. The second category comprises one-stage detection methods, like the YOLO series [ 4 – 10 ], which offer faster processing by predicting directly from the image while maintaining accuracy close to two-stage approaches. YOLOv8 [ 10 ] is an advanced object detection algorithm recognized for its efficient balance of speed and accuracy. It consists of three primary components: the backbone, neck, and head, each designed to optimize performance while minimizing computational demands. The backbone extracts image features through convolutional layers, the C2f module, and the SPPF module. The C2f module enhances gradient flow and strengthens feature representation, whereas the SPPF module improves multi-scale feature perception, thereby increasing detection accuracy for objects of varying sizes. The neck, which integrates a Feature Pyramid Network (FPN) and Path Aggregation Network (PAN), utilizes bidirectional feature fusion to ensure precise detection of objects from small to large. The head, situated on top of the PAN structure, includes detection heads customized for different object sizes, enabling YOLOv8 to manage a wide range of targets in complex environments. Furthermore, YOLOv8 incorporates Distribution Focal Loss (DFL) [ 11 ], which refines bounding box predictions by decreasing uncertainty and improving localization precision via probabilistic distribution. Recent developments in YOLOv8 have concentrated on improving multi-scale feature extraction, enhancing attention mechanisms, and refining loss functions to elevate both accuracy and speed in complex environments. For instance, BGF-YOLO [ 12 ] features Bi-level Routing Attention (BRA) [ 13 ] and Generalized Feature Pyramid Networks (GFPN) [ 14 ], along with an additional detection head, which greatly enhances the representation of multi-scale features. UAV-YOLOv8 [ 15 ] incorporates BiFormer [ 13 ] and Wise-IoU (WIoU) [ 16 ], optimizing both feature extraction and localization for improved stability in complex settings. YOLO-SE [ 17 ] utilizes an Efficient Multi-scale Attention Module with Cross-Spatial Learning (EMA) [ 18 ] to tackle the challenges of multi-scale detection, significantly boosting accuracy for small objects. Meanwhile, MHSA-YOLOv8 [ 19 ] integrates Multi-Head Self-Attention (MHSA) [ 20 ], refining the feature extraction process in demanding environments. These advancements have significantly enhanced YOLOv8's performance across a variety of application areas. In this paper, we introduce G-YOLO, a novel object detection framework designed to enhance both the real-time performance and accuracy of YOLOv8n in detecting rice leaf diseases by incorporating the LEDH and MSPPF modules. Compared to the traditional YOLOv8n model, G-YOLO achieves significant improvements in rice leaf disease detection accuracy while maintaining high detection speed. The main contributions of this work are as follows: 1.The introduction of the LEDH module reduces the model’s parameter count and computational complexity, significantly improving inference speed for real-time rice leaf disease detection tasks. 2.The LEDH module, with its specially designed detection head, enhances detection accuracy for rice leaf diseases, improving the ability to detect these diseases in complex environments. 3.The MSPPF module strengthens multi-scale detection by integrating features from various scales, increasing detection accuracy across diverse backgrounds. 4.Experiments show that G-YOLO outperforms YOLOv8n in both accuracy and inference speed, making it well-suited for resource-constrained devices. 2. Methods G-YOLO refines the YOLOv8n [ 10 ] architecture by incorporating the LEDH and MSPPF modules, achieving a more effective balance between inference speed and detection accuracy, particularly in rice leaf disease detection tasks. While retaining the core backbone of YOLOv8n, G-YOLO introduces key innovations in the detection head and feature fusion components. This section provides an in-depth analysis of the G-YOLO architecture, detailing how the LEDH and MSPPF modules contribute to enhanced model performance. 2.1 LEDH Module The detection head in YOLO object detection models plays a pivotal role after the Feature Pyramid Network (FPN), responsible for analyzing multi-scale features to predict object bounding boxes and class labels.Traditionally, the detection head consists of two main components: a location predictor for estimating bounding box positions and a class classifier for identifying object categories. An optimized detection head significantly impacts both accuracy and inference speed, ensuring efficient real-time performance by effectively processing multi-scale features. YOLOv8 adopts a Decoupled Head structure, where the tasks of classification and bounding box regression are handled by separate sub-networks. This architecture includes three detection heads, each corresponding to a different scale of feature maps, designed to manage small, medium, and large objects. This decoupled design allows for task-specific specialization, enhancing both detection accuracy and efficiency. Additionally, it enables independent optimization of classification and localization tasks through distinct loss functions and training strategies. However, while this approach reduces task interference and improves model stability, it also increases the overall parameter count and computational complexity, which can be a limiting factor for real-time performance, especially in resource-constrained environments. To address these limitations, we introduce the LEDH module, purposefully designed to reduce computational overhead and parameter count while maintaining high detection accuracy, particularly for rice leaf disease detection. The LEDH module includes several key optimizations: shared convolutions streamline the model architecture, resulting in substantial reductions in parameters and computational demands; inspired by the Efficient Layer Aggregation Network (ELAN) structure, the module incorporates an efficient multi-layer feature aggregation mechanism, which significantly enhances the network’s feature learning capability and detection accuracy, thereby improving detection performance [ 7 ]; the inclusion of multiple Conv_GN modules, which utilize Group Normalization (GN), further enhances localization and classification precision in object detection tasks [ 21 ]; and the module employs two independent convolutional branches for classification and bounding box regression, with a scaling factor layer applied to the shared convolution in the regression branch to accommodate multi-scale detection requirements. Collectively, these optimizations substantially diminish computational burden and parameter count while enhancing real-time performance and detection accuracy. The structure of the LEDH module is shown in Fig. 1 (a). G-YOLO is designed with three detection heads for detecting large, medium, and small objects, respectively. Each detection head includes an independent Conv_GN module that employs a 1×1 convolution to merge and adjust feature channels, generating its respective F1. The structure of the Conv_GN module is shown in Fig. 1 (b). Each F1 is then evenly split along the channel dimension into two new feature maps: F2 and F3. F3 is sequentially processed by two Conv_GN modules: F3 is first processed by a 3×3 Conv_GN module, generating F4. Then, F4 is further processed by another 3×3 Conv_GN module to produce F5. These Conv_GN modules share weights across all detection heads. Next, F2, F3, F4, and F5 are concatenated along the channel dimension to form F6. F6 is then processed by two separate convolution branches, each with a 1×1 convolution, responsible for classification and bounding box regression tasks, respectively. These branches also share weights across all detection heads. To accommodate objects of varying sizes, each of the three detection heads independently learns a distinct scaling factor during training, denoted as Scale[i] (where i = 0, 1, 2). Each scaling factor is uniquely associated with a detection head, and these factors dynamically adjust the bounding box predictions, thereby improving the accuracy of multi-scale object detection. 2.2 MSPPF Module In the architecture of YOLOv8, the Spatial Pyramid Pooling Fast (SPPF) [ 10 ] module processes feature maps using serial max pooling operations with a fixed-size pooling kernel (e.g., 5×5). Compared to the traditional Spatial Pyramid Pooling (SPP) [ 30 ] module, which employs multiple parallel pooling paths, the SPPF module is more efficient. SPPF concatenates the feature maps produced by each pooling operation with the original feature map along the channel dimension. This approach not only significantly enhances the model’s ability to capture targets across varying scales but also greatly reduces computational resource consumption by removing the overhead associated with parallel pooling paths. The SPPF design effectively balances the need for multi-scale information extraction with real-time processing requirements, allowing YOLOv8 to deliver outstanding performance in complex detection scenarios while maintaining high-speed operation. Nevertheless, there remains potential for further optimization in the area of multi-scale feature fusion. In this paper, to address the limitations of multi-scale feature fusion in the SPPF module, we extend and optimize it by proposing the MSPPF module. Specifically, we designed the multi-scale feature concatenation (MSFC) module and integrated it with the SPPF module to form the MSPPF module. The MSFC module improves upon the existing multi-scale selective fusion (MSF) module [ 22 ], primarily by replacing the add operation in the feature fusion process with a concat operation. This enhancement significantly boosts the expressive power of multi-scale feature fusion and reduces information loss. By combining the foundational characteristics of the SPPF module with the strengths of the MSFC module, the MSPPF module demonstrates improved detection accuracy and performance in object detection tasks. The structure of the MSPPF module is shown in Fig. 2 (a). First, the feature map F is processed by the CBS module, generating F1. The structure of the CBS module is shown in Fig. 2 (b). It enhances the network's feature extraction capability through a combination of convolution, batch normalization (BN), and the SiLU activation function. Then, F1 sequentially undergoes three serial max pooling operations, as follows: F1 is processed by the first 5x5 pooling kernel to generate F2; F2 is further processed by the second 5x5 pooling kernel to generate F3; finally, F3 is processed by the third 5x5 pooling kernel to generate F4. The formulas are shown in Equations ( 1 )-( 4 ). $$\:\text{F}1=\text{C}\text{B}\text{S}\left(\text{F}\right)$$ 1 $$\:\text{F}2=\text{M}\text{a}\text{x}\text{P}\text{o}\text{o}\text{l}\text{i}\text{n}\text{g}\left(\text{F}1\right)$$ 2 $$\:\text{F}3=\text{M}\text{a}\text{x}\text{P}\text{o}\text{o}\text{l}\text{i}\text{n}\text{g}\left(\text{F}2\right)$$ 3 $$\:\text{F}4=\text{M}\text{a}\text{x}\text{P}\text{o}\text{o}\text{l}\text{i}\text{n}\text{g}\left(\text{F}3\right)$$ 4 Next, we input the feature maps F1, F2, F3, and F4 at different scales into the MSFC module for feature fusion, as shown in Fig. 2 (c). Initially, global average pooling (GAP) is conducted on each feature map to derive the averages for each channel. Subsequently, inter-channel relationships are modeled through a 1×1 dilated convolution, and the resulting weights are transformed to fit within the 0 to 1 range via the Sigmoid activation function. Afterward, the multi-scale feature channel weights undergo normalization through the Softmax function, ensuring a balanced distribution. Ultimately, these normalized weights are multiplied with the original feature maps in an element-wise manner, and the modified feature maps are combined along the channel axis to create a new representation of multi-scale features. The MSFC module skillfully integrates convolutional techniques with attention mechanisms, thereby effectively blending fine image details with broader contextual information. 3. Experiments and Results 3.1 Dataset In this study, we utilized the publicly available RiceDisease dataset from Kaggle [ 23 ], which contains 850 images of rice leaf diseases. The dataset encompasses instances of rice leaf diseases with varying sizes and shapes, captured under diverse conditions, including low lighting and varying visibility scales. This diversity in image data effectively simulates real-world scenarios, enhancing the model's robustness and generalization capability across different conditions. The dataset covers three common rice leaf diseases: Bacterial Leaf Blight, Blast, and Brown Spot, with 434, 869, and 1878 instances, respectively. It is split into training, validation, and test sets, containing 583, 160, and 107 images of rice leaf diseases, respectively. Figure 3 (a) presents the histogram of instance quantities for each category in the RiceDisease training set; Fig. 3 (b) displays the distribution of bounding box sizes and their corresponding counts; Fig. 3 (c) illustrates the spatial distribution of bounding boxes within the images; and Fig. 3 (d) depicts the aspect ratio of bounding boxes relative to the overall image, showing the distribution of aspect ratios. 3.2 Experimental Environment We drew on the prior research experiences of MobileNet [ 24 – 26 ] and ShuffleNet [ 27 , 28 ], utilizing high-performance hardware for training and testing the accuracy and complexity of our model. We also conducted real-time testing on resource-constrained devices to ensure that we could comprehensively evaluate the model's performance under different resource conditions. All training processes and evaluations of accuracy and complexity-related metrics were conducted on a high-performance server, including Mean Average Precision (mAP), the number of parameters (Params), model size, and Giga Floating Point Operations per Second (GFLOPS). The use of the PyTorch framework and CUDA acceleration libraries fully leveraged the computational power of the GPU, speeding up model training and inference. Table 1 provides the configuration details of the server, while Table 2 shows the hyperparameter configuration details. Table 1 High-Performance Server Configuration Device Configuration GPU NVIDIA GeForce RTX 3090 VRAM 24GB Operating System Ubuntu18.04 Framework PyTorch 1.11 CUDA 11.3 Python 3.8 Table 2 Hyperparameter Settings Parameter Value Epoch 500 Patience 50 Batch Size 16 Image Size 640 Pretrained False Optimizer AdamW Initial Learning Rate 0.001 The resource-constrained device used an NVIDIA GeForce RTX 3050 Laptop GPU with 4GB of VRAM, primarily for evaluating real-time performance metrics such as inference speed and Frames Per Second (FPS). This lower-power GPU was chosen to assess the model's inference performance in environments with limited computational resources, especially when handling lightweight inference tasks. Table 3 provides the configuration details of this device. Table 3 Resource-Constrained Device Configuration Device Configuration GPU NVIDIA GeForce RTX 3050 Laptop VRAM 4GB Operating System Windows 11 Framework PyTorch 1.11 CUDA 11.3 Python 3.8 3.3 Evaluation Metrics Model evaluation is crucial for determining a model's performance and its compatibility with the research objectives. In the rice leaf disease detection task, we employed the following evaluation metrics to comprehensively measure the performance of lightweight models: mAP, Params, model size, GFLOPS, inference time, and FPS. All experimental results in this study were obtained on the test set to ensure transparency and comparability of the results. The formulas for these evaluation metrics are shown in Equations ( 5 )-( 9 ). 1.mAP : mAP is a crucial metric for evaluating object detection models. It is derived by calculating the Average Precision (AP) for each class and then averaging the AP values across all classes to assess the model's overall performance. AP for each class is determined from the precision-recall curve, allowing mAP to capture both precision (P) and recall (R) at various recall levels, providing a comprehensive evaluation of detection capability. [email protected] refers to the mean Average Precision at an Intersection over Union (IoU) threshold of 0.5, measuring how well the model predicts bounding boxes with at least 50% overlap with ground truth. [email protected] uses a 75% IoU threshold, setting a higher bar for localization accuracy by requiring more overlap between predicted and actual boxes. In these calculations, TP denotes true positives, FP denotes false positives, FN denotes false negatives, P(r) represents precision at recall r, AP i indicates the average precision for class i, and C represents the total number of classes. $$\:\text{P}=\frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{P}}$$ 5 $$\:\text{R}=\frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{N}}$$ 6 $$\:\text{A}\text{P}={\int\:}_{0}^{1}\text{P}\left(\text{r}\right)\text{d}\text{r}$$ 7 $$\:\text{m}\text{A}\text{P}=\frac{\sum\:_{\text{i}=1}^{\text{C}}\text{A}{\text{P}}_{\text{i}}}{\text{C}}$$ 8 2.Params : The parameter count indicates a model's complexity and computational requirements. Fewer parameters usually suggest a more lightweight model, enabling it to function more efficiently on devices with limited resources, improving both training and inference speeds. For rice leaf disease detection, models with reduced parameters excel in resource-limited settings, cutting down computational and storage demands, which boosts system responsiveness and overall processing performance. 3.Model Size : The model size influences storage requirements, loading time, and computational cost. Smaller models save storage space, speed up loading, and lower computational expenses. In rice leaf disease detection, model size directly affects deployment and operational efficiency, particularly in resource-limited environments. 4.GFLOPS : GFLOPS quantifies a model's computational capability, indicating the number of floating-point operations performed per second. Lightweight models generally have lower GFLOPS values, reflecting reduced computational complexity and optimized efficiency. A lower GFLOPS value signifies effective optimization of resource requirements and power consumption, enhancing model efficiency in resource-constrained environments while still providing reasonable performance. 5.Inference Time : Inference time is the duration needed for a model to analyze a single image, measured in milliseconds (ms). Reduced inference times enhance the model's responsiveness in real-time applications, improving user experience. In rice leaf disease detection, shorter inference times enable quick processing of input data and provide immediate feedback, which is vital for applications requiring rapid detection and response. 6.FPS : FPS indicates the number of image frames the model processes each second and is essential for evaluating its real-time processing capability. A higher FPS results in a smoother experience, which is vital for real-time rice leaf disease detection. $$\:\text{F}\text{P}\text{S}=\frac{1000}{\text{I}\text{n}\text{f}\text{e}\text{r}\text{e}\text{n}\text{c}\text{e}\:\text{T}\text{i}\text{m}\text{e}}$$ 9 where Inference Time is the inference time measured in milliseconds. 3.4 Model Training Results In this study, we compared the accuracy of YOLOv8n and G-YOLO, using YOLOv8n as the baseline model. To avoid overfitting and improve model generalization, we employed Early Stopping during training. Early Stopping is a method to prevent overfitting by monitoring the model's performance on the validation set, and training is halted when the model's performance no longer shows significant improvement on the validation set. In this study, we set the patience period to 50 epochs. The training results of G-YOLO are shown in Fig. 4 . Figures 5 (a) and 5(b) display the PR curves for the YOLOv8n and G-YOLO algorithms on the RiceDisease test set. The PR curve for G-YOLO surpasses that of YOLOv8n, demonstrating that the G-YOLO model outperforms YOLOv8n. Moreover, G-YOLO enhances the [email protected] metric by 4.4% compared to YOLOv8n. 3.5 Ablation Study We conducted a series of ablation experiments to assess the contributions of various modules in the G-YOLO model and to analyze the impact of these modules on detection performance through extensive experimental evaluation. We used the original YOLOv8n without any enhancement modules as the baseline and evaluated the effects of each improvement module on the RiceDisease dataset. In the experiments, The original YOLOv8n without enhancement modules was used as the baseline, and the effect of each improvement module was evaluated on the RiceDisease dataset. In the experiments, '✓' marks an enabled module, while '×' marks a disabled one. From Table 4 , it can be observed that integrating the MSPPF and LEDH modules into the original YOLOv8 network improves the experimental results to varying degrees for the RiceDisease dataset. Specifically, adding the MSPPF module resulted in a relative increase of 0.8% in [email protected] and a 1% increase in [email protected] , indicating that this module better captures and integrates multi-scale contextual information. Adding the LEDH module reduced the model size by 1.2MB, increased FPS by 17.4%, and improved [email protected] and [email protected] by 0.7% respectively. This shows that the module not only effectively reduces the model size and improves real-time inference speed but also significantly enhances target localization accuracy and classification precision for tasks with fewer classes. The proposed G-YOLO algorithm achieves the best performance in both [email protected] and [email protected] , with improvements of 4.4% and 3.9%, respectively. Additionally, the model size is reduced by 0.5MB, and FPS increases by 13.1%. Table 4 Ablation experiment results on the RiceDisease test set. Model MSPPF LEDH [email protected] [email protected] Params (MB) Model Size(MB) GFLOPS Inference Time(ms) FPS YOLOv8n × × 0.684 0.145 3.01 6.2 8.2 11.05 90.50 √ × 0.692 0.155 3.34 6.9 8.3 11.24 88.97 × √ 0.691 0.152 2.40 5.0 6.9 9.41 106.27 G-YOLO √ √ 0.728 0.184 2.73 5.7 7.0 9.77 102.35 3.6 Comparative Experiments In this study, since G-YOLO is a lightweight model, we compared it with several other lightweight YOLO models, including YOLOv3-tiny[ 29 ], YOLOv5n[ 30 ], YOLOv6n[ 31 ], YOLOv8n[ 10 ], YOLOv9t[ 32 ], and YOLOv10n[ 33 ]. The experimental results are shown in Table 5 . G-YOLO achieved the best performance in terms of [email protected] , [email protected] ,GFLOPS, inference time, and FPS. Additionally, compared to the baseline model YOLOv8n, G-YOLO reduced the model size by 0.5MB, demonstrating its effectiveness in reducing the model size while maintaining high performance. Table 5 Comparative experiment results on the RiceDisease test set. Model [email protected] [email protected] Params(MB) Model Size(MB) GFLOPS Inference Time(ms) FPS YOLOv3-tiny 0.625 0.134 12.13 24.4 19.0 10.12 98.81 YOLOv5n 0.717 0.157 2.50 5.2 7.2 10.95 91.32 YOLOv6n 0.679 0.153 4.23 8.7 11.9 11.65 85.84 YOLOv8n 0.684 0.145 3.01 6.2 8.2 11.05 90.50 YOLOv9t 0.678 0.148 2.00 4.6 7.9 19.97 50.08 YOLOv10n 0.649 0.145 2.70 5.7 8.4 13.49 74.13 G-YOLO(ours) 0.728 0.184 2.73 5.7 7.0 9.77 102.35 To evaluate the detection performance of the G-YOLO model, we selected representative sample images from the RiceDisease test set and presented the detection results of various models, including YOLOv3-tiny, YOLOv5n, YOLOv6n, YOLOv8n, YOLOv9t, YOLOv10n, and G-YOLO. The detection results of all models were compared using the same confidence threshold (conf = 0.3) and Intersection over Union (IoU) threshold (IoU = 0.5). Figure 6 illustrates the comparison of detection results. From the figures, it is evident that our proposed G-YOLO model significantly improves detection results on the same test images compared to other YOLO models. Specifically, in Fig. 6 (a), G-YOLO more accurately identifies diseases on rice leaves, reducing the occurrence of missed detections. In Fig. 6 (b), G-YOLO precisely locates the positions of rice leaf diseases, enhancing detection accuracy. In Fig. 6 (c), G-YOLO successfully detects a small, unique disease on the rice leaf and assigns a high confidence score of 0.81, indicating strong confidence in the detection results. In contrast, YOLOv10n fails to detect this small target, resulting in a missed detection, whereas other models successfully identify the target. 3.7 Conclusion This study proposes an improved object detection algorithm, G-YOLO, based on YOLOv8n, which significantly enhances the performance of rice leaf disease detection on the RiceDisease dataset by integrating the LEDH and MSPPF modules. Specifically, the introduction of the LEDH module effectively reduces the model size by 1.2MB while increasing the FPS by 17.4%. In terms of detection accuracy, the LEDH module improves [email protected] and [email protected] by 0.7% each, indicating a notable enhancement in target localization precision and classification accuracy for detecting rice leaf diseases. Additionally, the integration of the MSPPF module improves the model's ability to capture multi-scale contextual information, increasing [email protected] by 0.8% and [email protected] by 1%. This demonstrates that the MSPPF module better integrates multi-scale features, thereby improving the model's detection performance. The integration of these optimization strategies endows G-YOLO with a significant advantage in the task of rice leaf disease detection, achieving the best performance in [email protected] and [email protected] , with improvements of 4.4% and 3.9%, respectively. Additionally, the model size is reduced by 0.5MB, and the FPS is increased by 13.1%. Our research demonstrates that precise design and module optimization can significantly enhance the overall performance of object detection models in complex rice leaf disease scenarios, providing valuable insights for future research on rice leaf disease detection. Declarations Conflicts of Interest: The authors declare that they have no conflict of interest regarding the publication of this paper. Author Contribution Methodology, B.G., G.P. and S.L.; Dataset preparation, B.G. and S.L.; Experiments, B.G., S.L. and G.P.; Original draft, B.G. and S.L.; Review and editing, S.L. ,B.G. and L.W.; Visualization,G.P. and W.X.; Supervision, S.L. All authors have read and agreed to the published version of the manuscript. Acknowledgement The authors would like to thank the anonymous reviewers for their critical comments and suggestions for improving the manuscript. Data Availability All the data mentioned in the paper are available through the corresponding author References Girshick, R., Donahue, J., Darrell, T. & Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR , 580–587. (2014). https://doi.org/10.1109/CVPR.2014.81 Girshick, R. & Fast, R-C-N-N. ICCV ,1440–1448. (2015). https://doi.org/10.1109/ICCV.2015.169 Ren, S., He, K., Girshick, R., Sun, J. & Faster, R-C-N-N. Towards real-time object detection with region proposal networks. TPAMI . 39 , 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 (2017). Redmon, J. & Farhadi, A. YOLOv3: An incremental improvement. Preprint at (2018). https://arxiv.org/abs/1804.02767 Li, C. et al. YOLOv6: A single-stage object detection framework for industrial applications. Preprint at (2022). https://arxiv.org/abs/2209.02976 Li, C. et al. YOLOv6 v3.0: A full-scale reloading. Preprint at. https://doi.org/10.48550/arXiv.2301.05586 (2023). Wang, C. Y., Bochkovskiy, A. & Liao, H. Y. M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. CVPR , 7464–7475. (2023). https://doi.org/10.1109/CVPR52729.2023.00721 Wang, C. Y., Yeh, I. H. & Liao, H. Y. M. YOLOv9: Learning what you want to learn using programmable gradient information. Preprint at (2024). https://arxiv.org/abs/2402.13616 Wang, A. et al. YOLOv10: Real-time end-to-end object detection. Preprint at (2024). https://arxiv.org/abs/2405.14458 Ultralytics YOLOv8. (2024). https://github.com/ultralytics/ultralytics/tree/v8.1.47 Li, X. et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. NeurIPS . 33 , 21002–21012 (2020). Kang, M., Ting, C. M., Ting, F. F. & Phan, R. C. W. BGF-YOLO: Enhanced YOLOv8 with multiscale attentional feature fusion for brain tumor detection. MICCAI . 15008 , 35–45. https://doi.org/10.1007/978-3-031-72111-3_4 (2024). Zhu, L. et al. Vision transformer with bi-level routing attention. CVPR , 10323–10333 (2023). Jiang, Y. et al. GiraffeDet: A heavy-neck paradigm for object detection. ICLR (2022). Wang, G. et al. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors . 23 , 7190. https://doi.org/10.3390/s23167190 (2023). Tong, Z., Chen, Y., Xu, Z. & Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. Preprint at (2023). https://arxiv.org/abs/2301.10051 Wu, T., Dong, Y. & YOLO-SE Improved YOLOv8 for remote sensing object detection and recognition. Appl. Sci. 13 , 12977. https://doi.org/10.3390/app132412977 (2023). Ouyang, D. et al. Efficient multi-scale attention module with cross-spatial learning. ICASSP , 1–5. (2023). https://doi.org/10.1109/ICASSP49357.2023.10096516 Li, P. et al. Tomato maturity detection and counting model based on MHSA-YOLOv8. Sensors 23, 6701. (2023). https://doi.org/10.3390/s23156701 Vaswani, A. et al. Attention is all you need. Preprint at. https://doi.org/10.48550/arXiv.1706.03762 (2017). Tian, Z., Shen, C., Chen, H. & He, T. FCOS: A simple and strong anchor-free object detector. TPAMI 44 , 1922–1933. (2022). https://doi.org/10.1109/TPAMI.2020.3032166 Xie, L. et al. SHISRCNet: Super-resolution and classification network for low-resolution breast cancer histopathology image. MICCAI . 14224 , 15–25. https://doi.org/10.1007/978-3-031-43904-9_3 (2023). Shrestha, N. L. Rice disease dataset. (2021). https://www.kaggle.com/datasets/nischallal/rice-disease-dataset Howard, A. G. et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. Preprint at (2017). https://arxiv.org/abs/1704.04861 Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. C. MobileNetV2: Inverted residuals and linear bottlenecks. CVPR , 4510–4520 (2018). Howard, A. et al. Searching for MobileNetV3. ICCV , 1314–1324 (2019). Zhang, X., Zhou, X., Lin, M. & Sun, J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. CVPR , 6848–6856 (2018). Ma, N., Zhang, X., Zheng, H. T. & Sun, J. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. ECCV , 116–131 (2018). Ultralytics. YOLOv3-tiny. (2024). https://github.com/ultralytics/ultralytics/tree/v8.1.47 Ultralytics. YOLOv5n. (2024). https://github.com/ultralytics/ultralytics/tree/v8.1.47 Ultralytics. YOLOv6n. (2024). https://github.com/ultralytics/ultralytics/tree/v8.1.47 Ultralytics. YOLOv9t. (2024). https://github.com/ultralytics/ultralytics/tree/v8.2.69 Ultralytics. YOLOv10n. (2024). https://github.com/ultralytics/ultralytics/tree/v8.2.69 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 01 Jul, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 15 Nov, 2024 Editor assigned by journal 15 Nov, 2024 Editor invited by journal 11 Nov, 2024 Submission checks completed at journal 11 Nov, 2024 First submitted to journal 26 Oct, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5336865","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":378437920,"identity":"cec12fb1-f8e7-43cc-bb96-dc57177703b7","order_by":0,"name":"Bo Gan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAsElEQVRIiWNgGAWjYHACNjDiZ2Y+/IAkLRKS7WxpBqRpMTjPoyBBlHp597PHHvwoO1xnfJiHwYChxiaaoBbDM3nphj3nDkuYHeY98IDhWFpuA0EtDTlm0oxtIC18CQaMDYeJ0NL/BqLFuJnHQIIoLfISUFsMmInVYiDxxhzol3TJGYeBgZxAjF/k+3PMgCFmzc/ff/jwgw81NkTYcgCZl0BIOdgWgoaOglEwCkbBKAAA8Hc65lcsKoAAAAAASUVORK5CYII=","orcid":"","institution":"Dazhou Vocational and Technical College","correspondingAuthor":true,"prefix":"","firstName":"Bo","middleName":"","lastName":"Gan","suffix":""},{"id":378437921,"identity":"7dc17d53-fee3-4aff-96b4-5cfe757c7cdf","order_by":1,"name":"Guolin Pu","email":"","orcid":"","institution":"Dazhou Vocational and Technical College","correspondingAuthor":false,"prefix":"","firstName":"Guolin","middleName":"","lastName":"Pu","suffix":""},{"id":378437922,"identity":"5a6d14d4-7642-47bc-9315-9c5fc7c0a889","order_by":2,"name":"Weiyin Xing","email":"","orcid":"","institution":"Dazhou Vocational and Technical College","correspondingAuthor":false,"prefix":"","firstName":"Weiyin","middleName":"","lastName":"Xing","suffix":""},{"id":378437923,"identity":"f82e4ccf-eff7-49a2-b40b-e20626f7f8d8","order_by":3,"name":"Lianfang Wang","email":"","orcid":"","institution":"Dazhou Vocational and Technical College","correspondingAuthor":false,"prefix":"","firstName":"Lianfang","middleName":"","lastName":"Wang","suffix":""},{"id":378437924,"identity":"14ad4d3b-c007-40bc-bb74-e15c665b418c","order_by":4,"name":"Shu Liang","email":"","orcid":"","institution":"Dazhou Vocational and Technical College","correspondingAuthor":false,"prefix":"","firstName":"Shu","middleName":"","lastName":"Liang","suffix":""}],"badges":[],"createdAt":"2024-10-26 10:08:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5336865/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5336865/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-06843-8","type":"published","date":"2025-07-01T15:58:03+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":69973708,"identity":"0d85a1f1-ed18-439a-94cb-9fb124eec82c","added_by":"auto","created_at":"2024-11-27 06:41:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":66714,"visible":true,"origin":"","legend":"\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) The structure of the LEDH module. (\u003cstrong\u003eb\u003c/strong\u003e) The structure of the Conv_GN module.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/551c3008d0e29a4fc7764694.png"},{"id":69973705,"identity":"5ddf44a6-da17-4529-8f1f-00fb1f14df80","added_by":"auto","created_at":"2024-11-27 06:41:13","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":106910,"visible":true,"origin":"","legend":"\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) The structure of the MSPPF module . (\u003cstrong\u003eb\u003c/strong\u003e) The structure of the CBS module. (\u003cstrong\u003ec\u003c/strong\u003e) The structure of the MSFC module.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/3f3e16793d8d0993ea0e5261.png"},{"id":69973707,"identity":"6e548497-f837-4f11-a99e-8ab3587e7873","added_by":"auto","created_at":"2024-11-27 06:41:13","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":359927,"visible":true,"origin":"","legend":"\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) The distribution histogram of instance quantities.(\u003cstrong\u003eb\u003c/strong\u003e)the distribution of bounding box sizes and their corresponding counts. (\u003cstrong\u003ec\u003c/strong\u003e) The spatial distribution of bounding boxes (\u003cstrong\u003ed\u003c/strong\u003e) the distribution of aspect ratios.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/37b3f0d5f14d35b014556aaa.png"},{"id":69973706,"identity":"a6fb5e97-8ed3-4e45-9b57-2ea4609d0fed","added_by":"auto","created_at":"2024-11-27 06:41:13","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":174094,"visible":true,"origin":"","legend":"\u003cp\u003eModel Training Results.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/0a348b04133a0c0fce3a6aec.png"},{"id":69973703,"identity":"da10876e-e44f-421d-aa92-85349e2c60ad","added_by":"auto","created_at":"2024-11-27 06:41:12","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":251473,"visible":true,"origin":"","legend":"\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) The PR curve of YOLOv8n. (\u003cstrong\u003eb\u003c/strong\u003e) The PR curve of G-YOLO.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/5d401d8903adce57e512d81c.png"},{"id":69973830,"identity":"957ab18f-74c6-465a-9e32-990add15188b","added_by":"auto","created_at":"2024-11-27 06:49:12","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":1577075,"visible":true,"origin":"","legend":"\u003cp\u003eThe comparison of detection results.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/58f1517629715738968a5f0b.png"},{"id":86180881,"identity":"09e048ea-9ebb-4f6d-8f9c-909aa7f3a751","added_by":"auto","created_at":"2025-07-07 16:22:56","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2959046,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5336865/v1/1229d9fe-b47a-47bc-a43d-d2a40f928e8a.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Enhanced YOLOv8 with Lightweight and Efficient Detection Head for for Detecting Rice Leaf Diseases","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eDetecting rice leaf diseases is essential for modern agriculture and crop protection. Fast and accurate field monitoring of these diseases is critical for optimizing management and control strategies. However, current detection technologies in complex and dynamic field conditions often face difficulties balancing accuracy and real-time performance. This issue arises due to two main factors: first, the variability in the rice growth cycle, changing weather, and the wide variety of diseases make precise identification challenging. Specifically, when multiple diseases with varying severities coexist, models often detect only the most obvious lesions, missing subtle early symptoms, which leads to missed detections and false positives. Second, the high computational complexity and large size of many detection models hinder their real-time capabilities.\u003c/p\u003e \u003cp\u003eDeep learning methods for detecting rice leaf diseases can be grouped into two main categories. The first category includes two-stage detection algorithms based on region proposals, such as RCNN [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], Fast R-CNN [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], and Faster R-CNN [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Although these methods provide high accuracy, their slower speeds make them less effective for real-time use. The second category comprises one-stage detection methods, like the YOLO series [\u003cspan additionalcitationids=\"CR5 CR6 CR7 CR8 CR9\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], which offer faster processing by predicting directly from the image while maintaining accuracy close to two-stage approaches.\u003c/p\u003e \u003cp\u003eYOLOv8 [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] is an advanced object detection algorithm recognized for its efficient balance of speed and accuracy. It consists of three primary components: the backbone, neck, and head, each designed to optimize performance while minimizing computational demands. The backbone extracts image features through convolutional layers, the C2f module, and the SPPF module. The C2f module enhances gradient flow and strengthens feature representation, whereas the SPPF module improves multi-scale feature perception, thereby increasing detection accuracy for objects of varying sizes. The neck, which integrates a Feature Pyramid Network (FPN) and Path Aggregation Network (PAN), utilizes bidirectional feature fusion to ensure precise detection of objects from small to large. The head, situated on top of the PAN structure, includes detection heads customized for different object sizes, enabling YOLOv8 to manage a wide range of targets in complex environments. Furthermore, YOLOv8 incorporates Distribution Focal Loss (DFL) [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], which refines bounding box predictions by decreasing uncertainty and improving localization precision via probabilistic distribution.\u003c/p\u003e \u003cp\u003eRecent developments in YOLOv8 have concentrated on improving multi-scale feature extraction, enhancing attention mechanisms, and refining loss functions to elevate both accuracy and speed in complex environments. For instance, BGF-YOLO [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] features Bi-level Routing Attention (BRA) [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] and Generalized Feature Pyramid Networks (GFPN) [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], along with an additional detection head, which greatly enhances the representation of multi-scale features. UAV-YOLOv8 [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] incorporates BiFormer [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] and Wise-IoU (WIoU) [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], optimizing both feature extraction and localization for improved stability in complex settings. YOLO-SE [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] utilizes an Efficient Multi-scale Attention Module with Cross-Spatial Learning (EMA) [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] to tackle the challenges of multi-scale detection, significantly boosting accuracy for small objects. Meanwhile, MHSA-YOLOv8 [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] integrates Multi-Head Self-Attention (MHSA) [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], refining the feature extraction process in demanding environments. These advancements have significantly enhanced YOLOv8's performance across a variety of application areas.\u003c/p\u003e \u003cp\u003eIn this paper, we introduce G-YOLO, a novel object detection framework designed to enhance both the real-time performance and accuracy of YOLOv8n in detecting rice leaf diseases by incorporating the LEDH and MSPPF modules. Compared to the traditional YOLOv8n model, G-YOLO achieves significant improvements in rice leaf disease detection accuracy while maintaining high detection speed. The main contributions of this work are as follows:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e1.The introduction of the LEDH module reduces the model\u0026rsquo;s parameter count and computational complexity, significantly improving inference speed for real-time rice leaf disease detection tasks.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e2.The LEDH module, with its specially designed detection head, enhances detection accuracy for rice leaf diseases, improving the ability to detect these diseases in complex environments.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e3.The MSPPF module strengthens multi-scale detection by integrating features from various scales, increasing detection accuracy across diverse backgrounds.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e4.Experiments show that G-YOLO outperforms YOLOv8n in both accuracy and inference speed, making it well-suited for resource-constrained devices.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e"},{"header":"2. Methods","content":"\u003cp\u003eG-YOLO refines the YOLOv8n [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] architecture by incorporating the LEDH and MSPPF modules, achieving a more effective balance between inference speed and detection accuracy, particularly in rice leaf disease detection tasks. While retaining the core backbone of YOLOv8n, G-YOLO introduces key innovations in the detection head and feature fusion components. This section provides an in-depth analysis of the G-YOLO architecture, detailing how the LEDH and MSPPF modules contribute to enhanced model performance.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 LEDH Module\u003c/h2\u003e \u003cp\u003eThe detection head in YOLO object detection models plays a pivotal role after the Feature Pyramid Network (FPN), responsible for analyzing multi-scale features to predict object bounding boxes and class labels.Traditionally, the detection head consists of two main components: a location predictor for estimating bounding box positions and a class classifier for identifying object categories. An optimized detection head significantly impacts both accuracy and inference speed, ensuring efficient real-time performance by effectively processing multi-scale features.\u003c/p\u003e \u003cp\u003eYOLOv8 adopts a Decoupled Head structure, where the tasks of classification and bounding box regression are handled by separate sub-networks. This architecture includes three detection heads, each corresponding to a different scale of feature maps, designed to manage small, medium, and large objects. This decoupled design allows for task-specific specialization, enhancing both detection accuracy and efficiency. Additionally, it enables independent optimization of classification and localization tasks through distinct loss functions and training strategies. However, while this approach reduces task interference and improves model stability, it also increases the overall parameter count and computational complexity, which can be a limiting factor for real-time performance, especially in resource-constrained environments.\u003c/p\u003e \u003cp\u003eTo address these limitations, we introduce the LEDH module, purposefully designed to reduce computational overhead and parameter count while maintaining high detection accuracy, particularly for rice leaf disease detection. The LEDH module includes several key optimizations: shared convolutions streamline the model architecture, resulting in substantial reductions in parameters and computational demands; inspired by the Efficient Layer Aggregation Network (ELAN) structure, the module incorporates an efficient multi-layer feature aggregation mechanism, which significantly enhances the network\u0026rsquo;s feature learning capability and detection accuracy, thereby improving detection performance [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]; the inclusion of multiple Conv_GN modules, which utilize Group Normalization (GN), further enhances localization and classification precision in object detection tasks [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]; and the module employs two independent convolutional branches for classification and bounding box regression, with a scaling factor layer applied to the shared convolution in the regression branch to accommodate multi-scale detection requirements. Collectively, these optimizations substantially diminish computational burden and parameter count while enhancing real-time performance and detection accuracy. The structure of the LEDH module is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(a).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eG-YOLO is designed with three detection heads for detecting large, medium, and small objects, respectively. Each detection head includes an independent Conv_GN module that employs a 1\u0026times;1 convolution to merge and adjust feature channels, generating its respective F1. The structure of the Conv_GN module is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e(b). Each F1 is then evenly split along the channel dimension into two new feature maps: F2 and F3. F3 is sequentially processed by two Conv_GN modules: F3 is first processed by a 3\u0026times;3 Conv_GN module, generating F4. Then, F4 is further processed by another 3\u0026times;3 Conv_GN module to produce F5. These Conv_GN modules share weights across all detection heads. Next, F2, F3, F4, and F5 are concatenated along the channel dimension to form F6. F6 is then processed by two separate convolution branches, each with a 1\u0026times;1 convolution, responsible for classification and bounding box regression tasks, respectively. These branches also share weights across all detection heads. To accommodate objects of varying sizes, each of the three detection heads independently learns a distinct scaling factor during training, denoted as Scale[i] (where i\u0026thinsp;=\u0026thinsp;0, 1, 2). Each scaling factor is uniquely associated with a detection head, and these factors dynamically adjust the bounding box predictions, thereby improving the accuracy of multi-scale object detection.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 MSPPF Module\u003c/h2\u003e \u003cp\u003eIn the architecture of YOLOv8, the Spatial Pyramid Pooling Fast (SPPF) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] module processes feature maps using serial max pooling operations with a fixed-size pooling kernel (e.g., 5\u0026times;5). Compared to the traditional Spatial Pyramid Pooling (SPP) [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] module, which employs multiple parallel pooling paths, the SPPF module is more efficient. SPPF concatenates the feature maps produced by each pooling operation with the original feature map along the channel dimension. This approach not only significantly enhances the model\u0026rsquo;s ability to capture targets across varying scales but also greatly reduces computational resource consumption by removing the overhead associated with parallel pooling paths. The SPPF design effectively balances the need for multi-scale information extraction with real-time processing requirements, allowing YOLOv8 to deliver outstanding performance in complex detection scenarios while maintaining high-speed operation. Nevertheless, there remains potential for further optimization in the area of multi-scale feature fusion.\u003c/p\u003e \u003cp\u003eIn this paper, to address the limitations of multi-scale feature fusion in the SPPF module, we extend and optimize it by proposing the MSPPF module. Specifically, we designed the multi-scale feature concatenation (MSFC) module and integrated it with the SPPF module to form the MSPPF module. The MSFC module improves upon the existing multi-scale selective fusion (MSF) module [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], primarily by replacing the add operation in the feature fusion process with a concat operation. This enhancement significantly boosts the expressive power of multi-scale feature fusion and reduces information loss. By combining the foundational characteristics of the SPPF module with the strengths of the MSFC module, the MSPPF module demonstrates improved detection accuracy and performance in object detection tasks. The structure of the MSPPF module is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e(a).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFirst, the feature map F is processed by the CBS module, generating F1. The structure of the CBS module is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e(b). It enhances the network's feature extraction capability through a combination of convolution, batch normalization (BN), and the SiLU activation function. Then, F1 sequentially undergoes three serial max pooling operations, as follows: F1 is processed by the first 5x5 pooling kernel to generate F2; F2 is further processed by the second 5x5 pooling kernel to generate F3; finally, F3 is processed by the third 5x5 pooling kernel to generate F4. The formulas are shown in Equations (\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e)-(\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:\\text{F}1=\\text{C}\\text{B}\\text{S}\\left(\\text{F}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\text{F}2=\\text{M}\\text{a}\\text{x}\\text{P}\\text{o}\\text{o}\\text{l}\\text{i}\\text{n}\\text{g}\\left(\\text{F}1\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:\\text{F}3=\\text{M}\\text{a}\\text{x}\\text{P}\\text{o}\\text{o}\\text{l}\\text{i}\\text{n}\\text{g}\\left(\\text{F}2\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:\\text{F}4=\\text{M}\\text{a}\\text{x}\\text{P}\\text{o}\\text{o}\\text{l}\\text{i}\\text{n}\\text{g}\\left(\\text{F}3\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eNext, we input the feature maps F1, F2, F3, and F4 at different scales into the MSFC module for feature fusion, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e(c). Initially, global average pooling (GAP) is conducted on each feature map to derive the averages for each channel. Subsequently, inter-channel relationships are modeled through a 1\u0026times;1 dilated convolution, and the resulting weights are transformed to fit within the 0 to 1 range via the Sigmoid activation function. Afterward, the multi-scale feature channel weights undergo normalization through the Softmax function, ensuring a balanced distribution. Ultimately, these normalized weights are multiplied with the original feature maps in an element-wise manner, and the modified feature maps are combined along the channel axis to create a new representation of multi-scale features. The MSFC module skillfully integrates convolutional techniques with attention mechanisms, thereby effectively blending fine image details with broader contextual information.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Experiments and Results","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n\u003ch2\u003e3.1 Dataset\u003c/h2\u003e\n\u003cp\u003eIn this study, we utilized the publicly available RiceDisease dataset from Kaggle [\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e], which contains 850 images of rice leaf diseases. The dataset encompasses instances of rice leaf diseases with varying sizes and shapes, captured under diverse conditions, including low lighting and varying visibility scales. This diversity in image data effectively simulates real-world scenarios, enhancing the model's robustness and generalization capability across different conditions. The dataset covers three common rice leaf diseases: Bacterial Leaf Blight, Blast, and Brown Spot, with 434, 869, and 1878 instances, respectively. It is split into training, validation, and test sets, containing 583, 160, and 107 images of rice leaf diseases, respectively. Figure\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e(a) presents the histogram of instance quantities for each category in the RiceDisease training set; Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e(b) displays the distribution of bounding box sizes and their corresponding counts; Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e(c) illustrates the spatial distribution of bounding boxes within the images; and Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e(d) depicts the aspect ratio of bounding boxes relative to the overall image, showing the distribution of aspect ratios.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n\u003ch2\u003e3.2 Experimental Environment\u003c/h2\u003e\n\u003cp\u003eWe drew on the prior research experiences of MobileNet [\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e] and ShuffleNet [\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e], utilizing high-performance hardware for training and testing the accuracy and complexity of our model. We also conducted real-time testing on resource-constrained devices to ensure that we could comprehensively evaluate the model's performance under different resource conditions.\u003c/p\u003e\n\u003cp\u003eAll training processes and evaluations of accuracy and complexity-related metrics were conducted on a high-performance server, including Mean Average Precision (mAP), the number of parameters (Params), model size, and Giga Floating Point Operations per Second (GFLOPS). The use of the PyTorch framework and CUDA acceleration libraries fully leveraged the computational power of the GPU, speeding up model training and inference. Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e provides the configuration details of the server, while Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e shows the hyperparameter configuration details.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab1\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eHigh-Performance Server Configuration\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDevice\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eConfiguration\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eGPU\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eNVIDIA GeForce RTX 3090\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eVRAM\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e24GB\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eOperating System\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eUbuntu18.04\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFramework\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePyTorch 1.11\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCUDA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e11.3\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePython\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e3.8\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab2\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eHyperparameter Settings\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eParameter\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eValue\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEpoch\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e500\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePatience\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e50\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eBatch Size\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e16\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eImage Size\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e640\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePretrained\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFalse\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eOptimizer\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAdamW\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eInitial Learning Rate\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.001\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe resource-constrained device used an NVIDIA GeForce RTX 3050 Laptop GPU with 4GB of VRAM, primarily for evaluating real-time performance metrics such as inference speed and Frames Per Second (FPS). This lower-power GPU was chosen to assess the model's inference performance in environments with limited computational resources, especially when handling lightweight inference tasks. Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e provides the configuration details of this device.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab3\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eResource-Constrained Device Configuration\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDevice\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eConfiguration\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eGPU\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eNVIDIA GeForce RTX 3050 Laptop\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eVRAM\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e4GB\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eOperating System\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eWindows 11\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFramework\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePyTorch 1.11\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCUDA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e11.3\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePython\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e3.8\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n\u003ch2\u003e3.3 Evaluation Metrics\u003c/h2\u003e\n\u003cp\u003eModel evaluation is crucial for determining a model's performance and its compatibility with the research objectives. In the rice leaf disease detection task, we employed the following evaluation metrics to comprehensively measure the performance of lightweight models: mAP, Params, model size, GFLOPS, inference time, and FPS. All experimental results in this study were obtained on the test set to ensure transparency and comparability of the results. The formulas for these evaluation metrics are shown in Equations (\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e)-(\u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.mAP\u003c/strong\u003e: mAP is a crucial metric for evaluating object detection models. It is derived by calculating the Average Precision (AP) for each class and then averaging the AP values across all classes to assess the model's overall performance. AP for each class is determined from the precision-recall curve, allowing mAP to capture both precision (P) and recall (R) at various recall levels, providing a comprehensive evaluation of detection capability.
[email protected] refers to the mean Average Precision at an Intersection over Union (IoU) threshold of 0.5, measuring how well the model predicts bounding boxes with at least 50% overlap with ground truth.
[email protected] uses a 75% IoU threshold, setting a higher bar for localization accuracy by requiring more overlap between predicted and actual boxes. In these calculations, TP denotes true positives, FP denotes false positives, FN denotes false negatives, P(r) represents precision at recall r, AP\u003csub\u003ei\u003c/sub\u003e indicates the average precision for class i, and C represents the total number of classes.\u003c/p\u003e\n\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equ5\" class=\"mathdisplay\"\u003e$$\\:\\text{P}=\\frac{\\text{T}\\text{P}}{\\text{T}\\text{P}+\\text{F}\\text{P}}$$\u003c/div\u003e\n\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equ6\" class=\"mathdisplay\"\u003e$$\\:\\text{R}=\\frac{\\text{T}\\text{P}}{\\text{T}\\text{P}+\\text{F}\\text{N}}$$\u003c/div\u003e\n\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equ7\" class=\"mathdisplay\"\u003e$$\\:\\text{A}\\text{P}={\\int\\:}_{0}^{1}\\text{P}\\left(\\text{r}\\right)\\text{d}\\text{r}$$\u003c/div\u003e\n\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equ8\" class=\"mathdisplay\"\u003e$$\\:\\text{m}\\text{A}\\text{P}=\\frac{\\sum\\:_{\\text{i}=1}^{\\text{C}}\\text{A}{\\text{P}}_{\\text{i}}}{\\text{C}}$$\u003c/div\u003e\n\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\n\u003c/div\u003e\n\u003cstrong\u003e2.Params\u003c/strong\u003e: The parameter count indicates a model's complexity and computational requirements. Fewer parameters usually suggest a more lightweight model, enabling it to function more efficiently on devices with limited resources, improving both training and inference speeds. For rice leaf disease detection, models with reduced parameters excel in resource-limited settings, cutting down computational and storage demands, which boosts system responsiveness and overall processing performance.\u003cbr /\u003e\n\u003cp\u003e\u003cstrong\u003e3.Model Size\u003c/strong\u003e: The model size influences storage requirements, loading time, and computational cost. Smaller models save storage space, speed up loading, and lower computational expenses. In rice leaf disease detection, model size directly affects deployment and operational efficiency, particularly in resource-limited environments.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.GFLOPS\u003c/strong\u003e: GFLOPS quantifies a model's computational capability, indicating the number of floating-point operations performed per second. Lightweight models generally have lower GFLOPS values, reflecting reduced computational complexity and optimized efficiency. A lower GFLOPS value signifies effective optimization of resource requirements and power consumption, enhancing model efficiency in resource-constrained environments while still providing reasonable performance.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.Inference Time\u003c/strong\u003e: Inference time is the duration needed for a model to analyze a single image, measured in milliseconds (ms). Reduced inference times enhance the model's responsiveness in real-time applications, improving user experience. In rice leaf disease detection, shorter inference times enable quick processing of input data and provide immediate feedback, which is vital for applications requiring rapid detection and response.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6.FPS\u003c/strong\u003e: FPS indicates the number of image frames the model processes each second and is essential for evaluating its real-time processing capability. A higher FPS results in a smoother experience, which is vital for real-time rice leaf disease detection.\u003c/p\u003e\n\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\n\u003cdiv id=\"FileID_Equ9\" class=\"mathdisplay\"\u003e$$\\:\\text{F}\\text{P}\\text{S}=\\frac{1000}{\\text{I}\\text{n}\\text{f}\\text{e}\\text{r}\\text{e}\\text{n}\\text{c}\\text{e}\\:\\text{T}\\text{i}\\text{m}\\text{e}}$$\u003c/div\u003e\n\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003ewhere Inference Time is the inference time measured in milliseconds.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n\u003ch2\u003e3.4 Model Training Results\u003c/h2\u003e\n\u003cp\u003eIn this study, we compared the accuracy of YOLOv8n and G-YOLO, using YOLOv8n as the baseline model. To avoid overfitting and improve model generalization, we employed Early Stopping during training. Early Stopping is a method to prevent overfitting by monitoring the model's performance on the validation set, and training is halted when the model's performance no longer shows significant improvement on the validation set. In this study, we set the patience period to 50 epochs. The training results of G-YOLO are shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e\n\u003cp\u003eFigures \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e(a) and 5(b) display the PR curves for the YOLOv8n and G-YOLO algorithms on the RiceDisease test set. The PR curve for G-YOLO surpasses that of YOLOv8n, demonstrating that the G-YOLO model outperforms YOLOv8n. Moreover, G-YOLO enhances the
[email protected] metric by 4.4% compared to YOLOv8n.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n\u003ch2\u003e3.5 Ablation Study\u003c/h2\u003e\n\u003cp\u003eWe conducted a series of ablation experiments to assess the contributions of various modules in the G-YOLO model and to analyze the impact of these modules on detection performance through extensive experimental evaluation. We used the original YOLOv8n without any enhancement modules as the baseline and evaluated the effects of each improvement module on the RiceDisease dataset. In the experiments, The original YOLOv8n without enhancement modules was used as the baseline, and the effect of each improvement module was evaluated on the RiceDisease dataset. In the experiments, '✓' marks an enabled module, while '\u0026times;' marks a disabled one.\u003c/p\u003e\n\u003cp\u003eFrom Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e, it can be observed that integrating the MSPPF and LEDH modules into the original YOLOv8 network improves the experimental results to varying degrees for the RiceDisease dataset. Specifically, adding the MSPPF module resulted in a relative increase of 0.8% in
[email protected] and a 1% increase in
[email protected], indicating that this module better captures and integrates multi-scale contextual information. Adding the LEDH module reduced the model size by 1.2MB, increased FPS by 17.4%, and improved
[email protected] and
[email protected] by 0.7% respectively. This shows that the module not only effectively reduces the model size and improves real-time inference speed but also significantly enhances target localization accuracy and classification precision for tasks with fewer classes. The proposed G-YOLO algorithm achieves the best performance in both
[email protected] and
[email protected], with improvements of 4.4% and 3.9%, respectively. Additionally, the model size is reduced by 0.5MB, and FPS increases by 13.1%.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"char\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab4\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eAblation experiment results on the RiceDisease test set.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eModel\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eMSPPF\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eLEDH\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\
[email protected]\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\
[email protected]\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eParams\u003c/p\u003e\n\u003cp\u003e(MB)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eModel\u003c/p\u003e\n\u003cp\u003eSize(MB)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eGFLOPS\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eInference Time(ms)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFPS\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv8n\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026times;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026times;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.684\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.145\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e3.01\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6.2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e8.2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e11.05\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e90.50\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026radic;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026times;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.692\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.155\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e3.34\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6.9\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e8.3\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e11.24\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e88.97\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026times;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026radic;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.691\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.152\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e2.40\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e5.0\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e6.9\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e9.41\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e106.27\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eG-YOLO\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026radic;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e\u0026radic;\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e0.728\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e0.184\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.73\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5.7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e7.0\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e9.77\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e102.35\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n\u003ch2\u003e3.6 Comparative Experiments\u003c/h2\u003e\n\u003cp\u003eIn this study, since G-YOLO is a lightweight model, we compared it with several other lightweight YOLO models, including YOLOv3-tiny[\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e], YOLOv5n[\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e], YOLOv6n[\u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e], YOLOv8n[\u003cspan class=\"CitationRef\"\u003e10\u003c/span\u003e], YOLOv9t[\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e], and YOLOv10n[\u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e]. The experimental results are shown in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e. G-YOLO achieved the best performance in terms of
[email protected],
[email protected],GFLOPS, inference time, and FPS. Additionally, compared to the baseline model YOLOv8n, G-YOLO reduced the model size by 0.5MB, demonstrating its effectiveness in reducing the model size while maintaining high performance.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003cdiv class=\"colspec\" align=\"char\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab5\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eComparative experiment results on the RiceDisease test set.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eModel\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\
[email protected]\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\
[email protected]\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eParams(MB)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eModel Size(MB)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eGFLOPS\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eInference Time(ms)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFPS\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv3-tiny\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.625\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.134\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e12.13\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e24.4\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e19.0\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e10.12\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e98.81\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv5n\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.717\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.157\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.50\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5.2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e7.2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e10.95\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e91.32\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv6n\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.679\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.153\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e4.23\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e8.7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e11.9\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e11.65\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e85.84\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv8n\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.684\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.145\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e3.01\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6.2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e8.2\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e11.05\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e90.50\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv9t\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.678\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.148\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e2.00\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e4.6\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e7.9\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e19.97\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e50.08\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eYOLOv10n\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.649\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e0.145\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.70\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5.7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e8.4\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e13.49\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e74.13\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eG-YOLO(ours)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e0.728\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e0.184\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e2.73\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5.7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e7.0\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e9.77\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e\u003cstrong\u003e102.35\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eTo evaluate the detection performance of the G-YOLO model, we selected representative sample images from the RiceDisease test set and presented the detection results of various models, including YOLOv3-tiny, YOLOv5n, YOLOv6n, YOLOv8n, YOLOv9t, YOLOv10n, and G-YOLO. The detection results of all models were compared using the same confidence threshold (conf\u0026thinsp;=\u0026thinsp;0.3) and Intersection over Union (IoU) threshold (IoU\u0026thinsp;=\u0026thinsp;0.5). Figure\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e illustrates the comparison of detection results.\u003c/p\u003e\n\u003cp\u003eFrom the figures, it is evident that our proposed G-YOLO model significantly improves detection results on the same test images compared to other YOLO models. Specifically, in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e(a), G-YOLO more accurately identifies diseases on rice leaves, reducing the occurrence of missed detections. In Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e(b), G-YOLO precisely locates the positions of rice leaf diseases, enhancing detection accuracy. In Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e(c), G-YOLO successfully detects a small, unique disease on the rice leaf and assigns a high confidence score of 0.81, indicating strong confidence in the detection results. In contrast, YOLOv10n fails to detect this small target, resulting in a missed detection, whereas other models successfully identify the target.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n\u003ch2\u003e3.7 Conclusion\u003c/h2\u003e\n\u003cp\u003eThis study proposes an improved object detection algorithm, G-YOLO, based on YOLOv8n, which significantly enhances the performance of rice leaf disease detection on the RiceDisease dataset by integrating the LEDH and MSPPF modules. Specifically, the introduction of the LEDH module effectively reduces the model size by 1.2MB while increasing the FPS by 17.4%. In terms of detection accuracy, the LEDH module improves
[email protected] and
[email protected] by 0.7% each, indicating a notable enhancement in target localization precision and classification accuracy for detecting rice leaf diseases. Additionally, the integration of the MSPPF module improves the model's ability to capture multi-scale contextual information, increasing
[email protected] by 0.8% and
[email protected] by 1%. This demonstrates that the MSPPF module better integrates multi-scale features, thereby improving the model's detection performance.\u003c/p\u003e\n\u003cp\u003eThe integration of these optimization strategies endows G-YOLO with a significant advantage in the task of rice leaf disease detection, achieving the best performance in
[email protected] and
[email protected], with improvements of 4.4% and 3.9%, respectively. Additionally, the model size is reduced by 0.5MB, and the FPS is increased by 13.1%. Our research demonstrates that precise design and module optimization can significantly enhance the overall performance of object detection models in complex rice leaf disease scenarios, providing valuable insights for future research on rice leaf disease detection.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eConflicts of Interest:\u003c/h2\u003e \u003cp\u003eThe authors declare that they have no conflict of interest regarding the publication of this paper.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eMethodology, B.G., G.P. and S.L.; Dataset preparation, B.G. and S.L.; Experiments, B.G., S.L. and G.P.; Original draft, B.G. and S.L.; Review and editing, S.L. ,B.G. and L.W.; Visualization,G.P. and W.X.; Supervision, S.L. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors would like to thank the anonymous reviewers for their critical comments and suggestions for improving the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eAll the data mentioned in the paper are available through the corresponding author\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGirshick, R., Donahue, J., Darrell, T. \u0026amp; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. \u003cem\u003eCVPR\u003c/em\u003e, 580\u0026ndash;587. (2014). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/CVPR.2014.81\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2014.81\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGirshick, R. \u0026amp; Fast, R-C-N-N. \u003cem\u003eICCV\u003c/em\u003e,1440\u0026ndash;1448. (2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICCV.2015.169\u003c/span\u003e\u003cspan address=\"10.1109/ICCV.2015.169\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRen, S., He, K., Girshick, R., Sun, J. \u0026amp; Faster, R-C-N-N. Towards real-time object detection with region proposal networks. \u003cem\u003eTPAMI\u003c/em\u003e. \u003cb\u003e39\u003c/b\u003e, 1137\u0026ndash;1149. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TPAMI.2016.2577031\u003c/span\u003e\u003cspan address=\"10.1109/TPAMI.2016.2577031\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRedmon, J. \u0026amp; Farhadi, A. YOLOv3: An incremental improvement. Preprint at (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1804.02767\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1804.02767\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, C. et al. YOLOv6: A single-stage object detection framework for industrial applications. Preprint at (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2209.02976\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2209.02976\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, C. et al. YOLOv6 v3.0: A full-scale reloading. \u003cem\u003ePreprint at.\u003c/em\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2301.05586\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2301.05586\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, C. Y., Bochkovskiy, A. \u0026amp; Liao, H. Y. M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. \u003cem\u003eCVPR\u003c/em\u003e, 7464\u0026ndash;7475. (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/CVPR52729.2023.00721\u003c/span\u003e\u003cspan address=\"10.1109/CVPR52729.2023.00721\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, C. Y., Yeh, I. H. \u0026amp; Liao, H. Y. M. YOLOv9: Learning what you want to learn using programmable gradient information. Preprint at (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2402.13616\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2402.13616\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, A. et al. YOLOv10: Real-time end-to-end object detection. Preprint at (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2405.14458\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2405.14458\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUltralytics YOLOv8. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics/tree/v8.1.47\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics/tree/v8.1.47\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, X. et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. \u003cem\u003eNeurIPS\u003c/em\u003e. \u003cb\u003e33\u003c/b\u003e, 21002\u0026ndash;21012 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKang, M., Ting, C. M., Ting, F. F. \u0026amp; Phan, R. C. W. BGF-YOLO: Enhanced YOLOv8 with multiscale attentional feature fusion for brain tumor detection. \u003cem\u003eMICCAI\u003c/em\u003e. \u003cb\u003e15008\u003c/b\u003e, 35\u0026ndash;45. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-031-72111-3_4\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-72111-3_4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, L. et al. Vision transformer with bi-level routing attention. \u003cem\u003eCVPR\u003c/em\u003e, 10323\u0026ndash;10333 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang, Y. et al. GiraffeDet: A heavy-neck paradigm for object detection. \u003cem\u003eICLR\u003c/em\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, G. et al. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. \u003cem\u003eSensors\u003c/em\u003e. \u003cb\u003e23\u003c/b\u003e, 7190. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s23167190\u003c/span\u003e\u003cspan address=\"10.3390/s23167190\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTong, Z., Chen, Y., Xu, Z. \u0026amp; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. Preprint at (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2301.10051\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2301.10051\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu, T., Dong, Y. \u0026amp; YOLO-SE Improved YOLOv8 for remote sensing object detection and recognition. \u003cem\u003eAppl. Sci.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 12977. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/app132412977\u003c/span\u003e\u003cspan address=\"10.3390/app132412977\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOuyang, D. et al. Efficient multi-scale attention module with cross-spatial learning. \u003cem\u003eICASSP\u003c/em\u003e, 1\u0026ndash;5. (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICASSP49357.2023.10096516\u003c/span\u003e\u003cspan address=\"10.1109/ICASSP49357.2023.10096516\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, P. et al. Tomato maturity detection and counting model based on MHSA-YOLOv8. \u003cem\u003eSensors\u003c/em\u003e 23, 6701. (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s23156701\u003c/span\u003e\u003cspan address=\"10.3390/s23156701\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaswani, A. et al. Attention is all you need. \u003cem\u003ePreprint at.\u003c/em\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1706.03762\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1706.03762\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTian, Z., Shen, C., Chen, H. \u0026amp; He, T. FCOS: A simple and strong anchor-free object detector. \u003cem\u003eTPAMI\u003c/em\u003e \u003cb\u003e44\u003c/b\u003e, 1922\u0026ndash;1933. (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TPAMI.2020.3032166\u003c/span\u003e\u003cspan address=\"10.1109/TPAMI.2020.3032166\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie, L. et al. SHISRCNet: Super-resolution and classification network for low-resolution breast cancer histopathology image. \u003cem\u003eMICCAI\u003c/em\u003e. \u003cb\u003e14224\u003c/b\u003e, 15\u0026ndash;25. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-031-43904-9_3\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-43904-9_3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShrestha, N. L. Rice disease dataset. (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.kaggle.com/datasets/nischallal/rice-disease-dataset\u003c/span\u003e\u003cspan address=\"https://www.kaggle.com/datasets/nischallal/rice-disease-dataset\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoward, A. G. et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. Preprint at (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1704.04861\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1704.04861\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSandler, M., Howard, A., Zhu, M., Zhmoginov, A. \u0026amp; Chen, L. C. MobileNetV2: Inverted residuals and linear bottlenecks. \u003cem\u003eCVPR\u003c/em\u003e, 4510\u0026ndash;4520 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoward, A. et al. Searching for MobileNetV3. \u003cem\u003eICCV\u003c/em\u003e, 1314\u0026ndash;1324 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, X., Zhou, X., Lin, M. \u0026amp; Sun, J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. \u003cem\u003eCVPR\u003c/em\u003e, 6848\u0026ndash;6856 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa, N., Zhang, X., Zheng, H. T. \u0026amp; Sun, J. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. \u003cem\u003eECCV\u003c/em\u003e, 116\u0026ndash;131 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUltralytics. YOLOv3-tiny. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics/tree/v8.1.47\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics/tree/v8.1.47\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUltralytics. YOLOv5n. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics/tree/v8.1.47\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics/tree/v8.1.47\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUltralytics. YOLOv6n. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics/tree/v8.1.47\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics/tree/v8.1.47\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUltralytics. YOLOv9t. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics/tree/v8.2.69\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics/tree/v8.2.69\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUltralytics. YOLOv10n. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics/tree/v8.2.69\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics/tree/v8.2.69\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Rice leaf disease detection, Agricultural stability, Crop Health, Lightweight","lastPublishedDoi":"10.21203/rs.3.rs-5336865/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5336865/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDetecting rice leaf diseases is essential for agricultural stability and crop health. However, the diversity of these diseases, their uneven distribution, and complex field environments create challenges for precise, multi-scale detection. While YOLO object detection algorithms show strong performance in automated detection, further optimization is needed. This paper presents G-YOLO, a novel architecture that combines a Lightweight and Efficient Detection Head (LEDH) with Multi-scale Spatial Pyramid Pooling Fast (MSPPF). The LEDH enhances detection speed by simplifying the network structure while maintaining accuracy, reducing computational demands. The MSPPF improves the model\u0026rsquo;s ability to capture intricate details of rice leaf diseases at various scales by fusing multi-level feature maps. On the RiceDisease dataset, G-YOLO surpasses YOLOv8n with 4.4% higher
[email protected], 3.9% higher
[email protected], and a 13.1% increase in FPS, making it well-suited for resource-constrained devices due to its efficient design.\u003c/p\u003e","manuscriptTitle":"Enhanced YOLOv8 with Lightweight and Efficient Detection Head for for Detecting Rice Leaf Diseases","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-27 06:41:08","doi":"10.21203/rs.3.rs-5336865/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-11-15T06:46:27+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-11-15T05:14:14+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2024-11-11T15:30:27+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-11-11T07:18:16+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2024-10-26T09:57:39+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d6358ae0-6427-4cef-8788-11e6af8715eb","owner":[],"postedDate":"November 27th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":40299502,"name":"Physical sciences/Mathematics and computing/Computer science"},{"id":40299503,"name":"Physical sciences/Engineering/Electrical and electronic engineering"},{"id":40299504,"name":"Biological sciences/Computational biology and bioinformatics"}],"tags":[],"updatedAt":"2025-07-07T16:19:42+00:00","versionOfRecord":{"articleIdentity":"rs-5336865","link":"https://doi.org/10.1038/s41598-025-06843-8","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-07-01 15:58:03","publishedOnDateReadable":"July 1st, 2025"},"versionCreatedAt":"2024-11-27 06:41:08","video":"","vorDoi":"10.1038/s41598-025-06843-8","vorDoiUrl":"https://doi.org/10.1038/s41598-025-06843-8","workflowStages":[]},"version":"v1","identity":"rs-5336865","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5336865","identity":"rs-5336865","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.