LKCAFormer: A Lightweight Transformer with Large-Kernel Cooperative Attention for the Segmentation of Field Maize Leaf Diseases

doi:10.21203/rs.3.rs-6543171/v1

LKCAFormer: A Lightweight Transformer with Large-Kernel Cooperative Attention for the Segmentation of Field Maize Leaf Diseases

2025 · doi:10.21203/rs.3.rs-6543171/v1

preprint OA: closed

Full text JSON View at publisher

Full text 283,154 characters · extracted from preprint-html · click to expand

LKCAFormer: A Lightweight Transformer with Large-Kernel Cooperative Attention for the Segmentation of Field Maize Leaf Diseases | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article LKCAFormer: A Lightweight Transformer with Large-Kernel Cooperative Attention for the Segmentation of Field Maize Leaf Diseases Jian Hu, Xinhua Jiang, Julin Gao, Xiaofang Yu, Chengjun Zhai This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6543171/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 28 Feb, 2026 Read the published version in BMC Plant Biology → Version 1 posted 16 You are reading this latest preprint version Abstract In smart agriculture, segmentation models are essential for the early and accurate detection of diseases. However, the complex backgrounds and diverse diseases on maize leaves present significant challenges. Although current models have improved, these advancements often lead to larger model sizes and higher computational demands, making them difficult to deploy on hardware with limited resources. To overcome these issues, we propose a new lightweight segmentation network called LKCAFormer. This network is specifically designed for accurate maize leaf disease segmentation and is built upon a coordinated attention mechanism and cross-scale large-kernel convolutions. Our approach introduces the Large-Kernel Convolution Cooperative Attention (LK-COA) module, which uses large-kernel convolutions to extract global features and a cooperative attention mechanism to capture fine details of small spots. This combination enhances the segmentation of small spots and reduces errors caused by spot adhesion. Additionally, the CSDecoder effectively fuses shallow features, rich in edge and detail information, with deeper semantic features to produce precise segmentation results. Experimental results on three maize leaf disease datasets demonstrate that our method outperforms existing segmentation techniques, confirming its effectiveness in the pathological analysis of maize leaf diseases. maize leaf disease Lightweight Large-kernel cooperative attention Semantic Segmentation Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction Maize is one of the most important food crops in our country and also a key raw material for animal husbandry and light industry. However, due to climate change and environmental factors, the frequency of maize leaf diseases has been increasing every year. Diseases on the leaves not only impair photosynthesis and affect plant growth but also threaten the quality and yield of maize, causing severe economic losses for farmers. Timely and accurate detection and diagnosis of plant diseases are crucial for effective disease management. Traditional methods for diagnosing maize leaf diseases rely on manually observing symptoms and spots, combined with expert experience. However, in large-scale farms, manual diagnosis is inefficient, less accurate, labor-intensive, and difficult to perform. Therefore, using computer vision technology to automatically analyze maize leaf diseases can improve diagnostic efficiency. Currently, existing methods show poor segmentation accuracy for images with complex backgrounds, small disease areas with rich textures, and similar disease symptoms. It is urgent to solve these problems to help farmers implement more precise disease control, thereby significantly improving maize yield and quality. Convolutional Neural Networks (CNNs), as one of the core architectures in deep learning, have undergone significant evolution and have become a common framework in agriculture. Subsequently, segmentation networks based on CNNs began to emerge, such as U-Net[ 1 ], PSPNet[ 2 ], SegNet[ 3 ], and various versions of DeepLab [ 4 – 6 ]. In recent years, some improved networks have combined the advantages of DeepLab and U-Net. These models can handle multi-scale contextual information while achieving precise detail recovery and accurate boundaries, all while maintaining efficient computation and low resource consumption. For example, Divyanth, Ahmad [ 7 ] collected 1,050 maize disease spot leaves from the Purdue University Agricultural Research and Education Center and evaluated the strengths and weaknesses of SegNet, U-Net, and DeepLab v3+. They ultimately chose U-Net for segmenting maize leaves and DeepLab v3 + for segmenting disease spots. Similarly, Yang, Shan [ 8 ] proposed an improved DeepLab v3 + model that incorporates the advantages of U-Net by extracting multi-scale semantic information during encoding and obtaining richer spatial information during decoding, resulting in higher segmentation accuracy. However, CNNs mainly rely on stacking network layers to capture global features, and this approach has limitations when dealing with long-range dependencies and spatial transformations. To more effectively capture global features and better extract local features, researchers have started replacing traditional convolutions with self-attention mechanisms to model global information. This led to the development of segmentation models based on Transformers. The earliest ViT [ 9 ] was the first in computer vision to adopt a pure attention mechanism, completely abandoning convolutional layers, and laid the foundation for later Transformer-based models. Subsequently, models such as SegFormer [ 10 ] and PoolFormer [ 11 ] further extended this idea to the field of image segmentation, showing superior performance compared to CNN-based models on several tasks. Although Transformers can directly model the global information in high-resolution natural images with complex backgrounds and integrate information from different parts of an image through self-attention to capture more detailed information. But in maize leaf disease segmentation, the global feature maps which contain a wealth of detailed information lack an accurate representation of details such as the edges of the disease spots. These details are crucial for segmenting small disease regions that are hard to distinguish. Despite the significant advantages of both CNNs and Transformers in feature extraction and modeling, they share a common drawback: high computational resource consumption. This means that whether using deep convolutional networks or Transformers based on self-attention mechanisms, both require a large number of parameters and extensive computation. As a result, their efficiency is low when deployed in real-world environments—especially on resource-constrained devices—making it challenging to meet the demands of real-time applications. Based on the analysis above, we propose a new efficient network called LKCAFormer for in-field maize leaf disease segmentation, addressing the issues of high computational resource consumption and the need for improved segmentation accuracy in current leaf spot segmentation models. Compared with traditional Transformer- and CNN-based models, LKCAFormer uses large-kernel convolution attention to emphasize the capture of boundary information and the fusion of global features for disease spot segmentation. This new method effectively addresses the challenges in maize leaf disease segmentation and offers a reliable solution for similar future applications. In summary, the main contributions of this study are as follows: A lightweight model called LKCAFormer uses an encoder-decoder architecture to fuse feature information is proposed for the effective segmentation of maize leaf diseases. Large-Kernel Convolution Cooperative Attention Module (LK-COA) is designed to capture local edge and detailed features while enhancing the extraction of small spots, thereby alleviating the issue of spot adhesion. A Cross-Scale Attention Decoder called CSDecoder is designed which integrates attention operations to effectively recover details, edges, and positional information retained through downsampling, and outputs accurate segmentation results by combining semantic information. LKCAFormer is evaluated on two representative datasets, namely Single-CD&S and CD&S. The experimental results show that our network has a significant improvement in performance compared with mainstream methods. Related work Plant Disease Segmentation In recent years, CNN and Transformer-based models have been widely applied to plant disease segmentation. Zhang and Zhang [ 16 ] and colleagues introduced residual blocks and residual paths to develop an enhanced U-Net for plant leaf disease image segmentation. On a single maize leaf disease augmentation dataset, a segmentation accuracy of 94.07% was achieved. Zhu, Ma [ 17 ] proposed a two-stage DeepLabV3 + algorithm with adaptive loss for segmenting apple leaf diseases, which effectively addressed the challenges of leaf and lesion extraction in complex environments. This model achieved segmentation accuracy of 98.70% for leaves and 86.56% for lesions. Zhang, Li [ 18 ] focused on apple leaf spot disease and brown spot disease, proposing a lesion segmentation model based on DFL-UNet + CBAM. This model used a hybrid loss function combining Dice Loss and Focal Loss to refine the weight relationships between features, enhancing the channel features of lesions while suppressing those of healthy leaf regions. The overall disease segmentation accuracy reached 95.16%. Chang, Wang [ 19 ] applied the Vision Transformer for plant disease recognition and introduced an EFG module to enhance local feature extraction. Experimental results demonstrated that this method outperformed ViT, PVT, and Swin. Thai, Le [ 20 ] proposed an improved Transformer-based model, FormerLeaf, for cassava leaf disease datasets, employing an attention pruning algorithm to select the most important attention heads at each layer, reducing complexity and improving segmentation accuracy. While these models have demonstrated superior performance in agriculture, they involve extensive matrix computations, leading to high computational overhead. CNNs with Large Kernels Traditional CNN architectures have limitations in kernel design, mainly manifested in a rapid increase in computational complexity as the network depth grows. To bridge the performance gap between Transformers and CNNs, ConvNeXt [ 21 ] incorporates Transformer design ideas into the ResNet architecture. By adjusting the training process of Swin Transformers, altering the computation ratios at different stages, reducing the number of activation and normalization layers, and using larger kernel sizes, ConvNeXt is able to improve segmentation performance. Recently, Ding et al. [ 22 ] reexamined the importance of large kernel designs in CNNs. Their proposed RepLKNet employs 31×31 large kernels, achieving a 0.3% accuracy improvement over Swin Transformers on the ImageNet classification task and outperforming ResNet-101 by 4.4% on the MS-COCO object detection task. However, because large kernels significantly increase the number of parameters and computational load, their application in segmentation tasks is limited. For instance, Peng, Zhang [ 23 ] pointed out that as the size of convolution kernels increases, the number of model parameters rises, which can lead to overfitting and ultimately harm segmentation performance pointed out that as the kernel size increases, the number of parameters also increases, leading to potential overfitting and indicating that large kernels may degrade segmentation performance. To address this issue, they proposed a Global Convolutional Network (GCN) that uses large 1×k and k×1 convolution kernels to enhance semantic segmentation results. Furthermore, experiments with the latest model, SLaK[ 24 ], have shown that the performance of RepLKNet tends to stabilize when kernel sizes exceed 31, 51, or 61. To tackle the optimization challenges associated with extra-large kernels during training, SLaK decomposes the large kernel into two rectangular convolution kernels (for example, 51×5 and 5×51) and employs dynamic sparsity techniques to reduce the number of trainable parameters, thereby achieving more efficient feature extraction. Lightweight Segmentation Models In recent years, an increasing number of researchers have begun focusing on lightweight segmentation models. For example, El-Assiouti, El-Saadawy [ 25 ] proposed Lite-UNet, which outperforms the standard U-Net, achieving slight increases of 0.06% in Dice coefficient and 0.12% in IoU. At the same time, Lite-UNet reduced model parameters, FLOPs (floating point operations per second), and inference time by 15.9 times, 25 times, and 6.6 times, respectively. Zhang, Li [ 26 ] customized a lightweight U-shaped sharpening perception Transformer architecture (UPFormer) specifically for segmenting grape leaf diseases, achieving a better balance between performance and efficiency. Additionally, Zhang and Lv [ 27 ] designed the TinySegformer model for agricultural pest detection tasks, significantly enhancing semantic segmentation capabilities while reducing parameters. Furthermore, Shi, Lin [ 28 ] developed a lightweight context-aware network (LCNet) that accelerates semantic segmentation processing while ensuring inference speed and segmentation accuracy, thus achieving a good balance between computational efficiency and prediction performance in mobile application scenarios. Although the aforementioned lightweight models can maintain high accuracy while reducing parameters and improving efficiency, their segmentation performance remains insufficient on maize leaf disease datasets with complex backgrounds and high disease similarity. Attention Mechanisms The attention mechanism is a key technology to extract local key information. Accordingly, researchers have proposed several segmentation models that combine attention mechanisms to capture key information and suppress irrelevant features. For example, Sheng, Kang [ 29 ] designed new attention modules—GLM and BAM—that integrate and refine high-level semantic and spatial information. They developed an edge-guided segmentation method for complex environments, called EdgeSegNet, which achieved average MIoU scores of 0.909 and 0.942 on apple and peach datasets, respectively. Lu, Lu [ 30 ] proposed a precise segmentation method for fruit leaf disease images based on EAIS-Former, which uses a custom extra-large kernel attention module to capture more tiny spots, resulting in more refined segmentation of leaves and disease spots. Chen, Zhou [ 31 ] introduced an enhanced selective large-kernel attention module that adaptively recalibrates the weights of feature map regions along the channel and spatial dimensions, allowing the network to focus more on high-contribution areas and reduce interference from less informative regions. Meanwhile, Yan, Shao [ 32 ] developed a broadcast self-attention block to capture key fine-grained features globally while avoiding the heavy computational cost of complex matrix multiplications and multi-dimensional exponentiation. Although these attention-based segmentation models have achieved good results in their respective applications, they still fall short when dealing with maize leaf diseases. This is because the diseased regions and the complex background pixels are highly similar, which hinders accurate segmentation. Therefore, efficiently distinguishing between similar disease areas and healthy leaf regions in complex backgrounds remains a challenging problem that needs to be addressed. Materials and methods Dataset We evaluate the proposed model on two datasets: Single-CD&S and CD&S [ 12 ]. The CD&S dataset is a publicly available and fair maize disease recognition dataset, which includes three common maize leaf diseases: Northern Leaf Blight (Nlb), Northern Leaf Spot (Nls) and Gray Leaf Spot (Gls). The dataset contains images captured in natural environments, where, in addition to the foreground diseased leaves, the background also includes other diseased leaves, resulting in a complex and cluttered background with potential interference. To enable accurate segmentation of the leaves and lesions, we extracted a subset from the CD&S dataset, annotating only individual leaves and their corresponding diseased regions to create the Single-CD&S dataset. In contrast, the original CD&S dataset includes annotations for multiple leaves and their associated diseased regions. Data Augmentation Our research involves two sequential stages: (1) The first stage extracts the target leaf from the complex background, and (2) the second stage involves segmenting the lesions from the leaf images extracted in the first stage. Therefore, each original image requires labels for the leaf, the disease, and the background. The sample data were annotated using the Labelme tool [ 13 ] ( https://github.com/wkentaro/labelme ), and the annotated visual results are shown in Fig. 1 . To mitigate overfitting and enhance the model's robustness and generalization, we applied the Augmentor module [ 14 ] to perform geometric transformations such as random horizontal or vertical flips, random cropping, random sampling, and color/brightness adjustment. Additionally, we incorporated powerful data augmentation techniques from the MMsegmentation library [ 15 ] for semantic segmentation. The three maize leaf disease datasets were split into training and test sets with the ratio of 8:2. Moreover, during the training phase, each dataset was further divided into training and validation sets in a 9:1 for cross-validation. Table 1 provides detailed information on the training set, validation set, and data augmentation process. Table 1 Details of used maize leaf diseases datasets. Maize disease Classes Acquisition Augment Train data Val data CD&S 3 1540 12437 9949 2488 Single-CD&S 3 1274 9758 7806 1952 Model design Overview The overall architecture of the proposed LKCAFormer is depicted in Fig. 2 . LKCAFormer comprises two main components: (1) LK-COAT Encoder, which leverages powerful feature extraction capabilities through multi-level large-kernel convolution CNNs and attention modules. The encoder consists of three layers of the LK-COA module. Each layer employs large convolution kernels to extract global features, which are then passed through skip connections to a newly designed coordinated attention block. This block optimises fine details and aggregates features, facilitating both coarse and fine feature representations at the same scale. The encoder outputs shallow features that retain rich local details and edge information, as well as deeper features that capture global semantic information. This dual-level feature extraction ensures accurate identification of leaf and lesion characteristics during downsampling. (2) Cross-Scale Attention Decoder (CSDecoder). Three decoders are designed, each receiving low-resolution feature maps containing high-level semantic information from the encoder. These decoders calculate similarity weights for fine-grained, high-frequency global information and then perform attention operations with coarse-grained, low-frequency feature maps from the upper layers. This results in a fusion of fine-grained and coarse-grained high-frequency features. The fused feature map is subsequently upsampled and concatenated, then processed through a MLP for nonlinear adjustments. This compensates for the reduced sensitivity of convolution operations in capturing detailed information. Finally, the shallow and deep features are fused, and the aggregated feature map is passed to the lightweight segmentation head for final processing. Specifically, the input leaf images to the network are of size $\:512\times\:512\times\:3$ . In the encoder, the input image is first processed through a feature extraction head composed of two stacked $\:3\times\:3\:$ depthwise convolutions, effectively extracting features and generating a feature map of size 1/4 of the original image, denoted as $\:{F}_{0}$ . This helps reduce the initial parameter count in the encoder. Then, through three encoder layers, the feature maps at {1/8,1,16,1/32} of the original image size are obtained, denoted as $\:{F}_{1}$ , $\:{F}_{2}$ , and $\:{F}_{3}$ . Each layer employs large convolution kernels with sizes $\:k\:=\:\left\{\right(\text{7,9},11),\:(\text{11,13,15}),\:(\text{15,17,19}\left)\right\}$ and channel dimensions of {32, 64, 128, 160}.In the decoding phase, the feature maps $\:{F}_{1}$ , $\:{F}_{2}$ , and $\:{F}_{3}$ from the encoder are passed to the decoder for feature fusion and upsampling to match the size of $\:{F}_{0}$ . These maps are then concatenated with $\:{F}_{0}$ and processed through a simple segmentation head, producing a 512×512× $\:{N}_{cls}$ segmentation output, where $\:{N}_{cls}$ represents the predefined number of classes (in this case, $\:{N}_{cls}$ = 3). Large-Kernel Convolutional Cooperative Attention Due to the high computational complexity of Transformer models, which require substantial computational resources, they are not well-suited for real-time applications in agricultural production environments. However, long-range dependencies and global feature information are particularly well-captured by Transformers, a capability that convolutional operations generally lack. In recent years, efforts have been made by some researchers to enhance the global information-capturing ability of convolutional networks through the use of large convolution kernels, while attention mechanisms have been incorporated to improve the perception of both channel and spatial features, as well as long-range dependencies. This approach leads to a reduction in network depth, an increase in sensitivity to local details, parameter optimisation, and ultimately an improvement in segmentation performance. Building on this analysis, this paper proposes the design of an LK-COAT encoder, which models global information through large-kernel convolutions and captures local features using a cooperative attention mechanism. This allows the model to learn the interaction between local and global features, leading to richer feature representations and enhanced extraction of edge and texture features, thus enabling fine-grained segmentation of disease regions on maize leaves. The LK-COAT encoder is illustrated in Figure (b). Given a feature map $\:F\in\:{\mathbb{R}}^{C\times\:H\times\:W}$ , where C is the number of input channels, and H and W represent the height and width of the feature map, respectively, the high computational cost associated with large-kernel depthwise convolutions is mitigated by decomposing the large convolution into smaller depthwise convolutions, followed by extended depthwise convolutions with relatively large kernels. The output of the LK-COA module can be computed using equations 1 – 5 . $$\:{Z}^{C}\:=\:\sum\:_{H,W}{W}_{(2d-1)*(2d-1)}^{C}\:*\:{F}^{C}$$ 1 $$\:\:{\stackrel{-}{Z}}_{g}^{C}=\:\sum\:_{H,W}{W}_{⌊\frac{k}{d}⌋\times\:⌊\frac{k}{d}⌋}^{C}\:\:*{\:\:Z}^{C}$$ 2 $$\:{A}_{g}^{C}\:=SoftMax\left(\:Avg\right({\stackrel{-}{Z}}_{g}^{C}\left)\right)\:\:\odot\:\:{\:F}^{C}$$ 3 $$\:{A}_{L}^{C}\:=SoftMax\left(\:{W}_{3\times\:3}*{F}^{C}\right)\:\odot\:\:{\stackrel{-}{Z}}^{C}$$ 4 $$\:{\stackrel{-}{F}}^{C}\:=\:{A}_{g}^{C}\:\oplus\:{A}_{L}^{C}$$ 5 . The symbol $\:\text{*}$ represents the convolution operation, and $\:\odot\:\:$ denotes the Hadamard product. In Eq. ( 1 ), $\:{Z}^{C}$ refers to the output feature map obtained by applying a depthwise convolution with a kernel size of $\:\left(2d-1\right)\times\:\left(2d-1\right)$ (where $\:d$ is the dilation rate) to the input feature map $\:F$ . The use of dilated convolutions helps capture detailed information of the maize leaf disease while mitigating the grid effect caused by depthwise separable convolutions, as described in Eq. ( 2 ). The notation "take-down" refers to the operation of reducing or removing certain features. The output, which eliminates background noise and retains global spatial information, is denoted by Output. We keep the kernel size k below 23, which allows the model to effectively capture both global and local features. When the kernel size exceeds 23, as demonstrated in the study(Lau, K. W. et.al,2024), it leads to high computational complexity and increased memory usage. In Eq. ( 3 ), the global output feature map undergoes average pooling and activation to compute attention weights for the leaf region. These weights are then applied to the input feature map via a Hadamard product, resulting in the global attention feature map. In Eq. ( 4 ), the input feature map is passed through a depthwise separable convolution W with a $\:3\times\:3$ kernel, followed by an activation function. This generates attention weights for the diseased regions, which are then combined with the global output feature map using a Hadamard product, resulting in the local attention feature map. Finally, Eq. ( 5 ) combines these features through addition, producing the final attention feature map. This map eliminates background noise and retains both high-frequency edge features of the leaf and high-frequency features of the disease lesions. This encoding structure effectively extracts global features of the leaf and lesions across spatial, position, and channel dimensions, enhancing the model’s ability to represent global features while reducing computational overhead. The LK-COAT encoder we designed consists of three LK-COA modules, with convolution kernels of sizes $\:k=\text{7,11,23}$ and dilation rates of $\:d=\text{1,2},3$ . Decoder As previously described, we have constructed a network based on an encoder-decoder architecture. After obtaining the features $\:{\left\{{F}_{i}\right\}}_{i=1}^{3}$ from the encoder, we deploy three CSDecoder blocks to progressively integrate high-level semantic features with low-level spatial details, as illustrated in Fig. 1 . For the i-th decoder block, the input consists of the encoder features $\:{F}_{i}$ at the same level, along with the decoder features $\:{F}_{i+1}^{D}$ from the previous block. The decoding process can be defined as follows: $$\:{F}_{i}^{D}\:=\:{f}_{{D}_{i}}^{Fi+1}\left({f}^{AM}\right({F}_{i},{F}_{i+1}^{D}\left)\right)$$ 6 $$\:{F}^{D}=Mlp\left(\:cat\right(\sum\:_{i=1}^{3}{up}_{f}\left({F}_{i}^{D}\right)\left)\right)$$ 7 $$\:{F}_{cls}\:=\:up\left({f}_{seg}\right(Cat({F}^{S}+F0)\left)\right)$$ 8 Equation ( 6 ) represents the features from the i -th decoder, where $\:{f}^{AM}$ denotes the attention module AM . Subsequently, the feature maps undergo the operation described in Eq. ( 7 ), where upsampling is performed using bilinear interpolation to match the size of $\:{F}_{0}$ . The upsampled feature maps are then concatenated, and a nonlinear feedforward network is applied to generate the final output features $\:{F}^{D}$ . Cross-Scale Attention. In natural environments, leaves experience varying levels of light exposure, which can lead to shadows that reduce the accuracy of leaf segmentation. Additionally, the similarity in color between lesion edges, leaf color, and parts of the background makes it difficult to accurately extract the true contours of the lesions. Furthermore, in most maize leaf images, the proportion of diseased pixels is relatively small compared to the total image area, which complicates the extraction of small disease features. During the encoding phase, the LK-COAT module provides both global perception and local feature extraction capabilities, but there is a risk of losing edge details in scattered or dense lesion areas. To address this, this section leverages the AM (Attention Mechanism) to optimize the segmentation of leaf and lesion edges, helping to capture more fine-grained lesion details. The AM module is illustrated in Figure (c). Within the AM module, the similarity score matrix is computed using Eq. ( 9 ). Specifically, given the input tensor $\:\text{X}\in\:{\mathbb{R}}^{\text{H}\times\:\text{W}\times\:\text{C}}$ , a depthwise convolution with a kernel size of $\:k\times\:k$ and the Hadamard product is used to compute the output $\:Z$ , as follows: $$\:\text{S}\:=\:\text{A}\:⨀\:\text{V}$$ 9 $$\:\text{A}\:=\:{\text{L}}_{1}{\text{F}}_{\text{i}}$$ 10 $$\:\text{V}\:=\:{\text{L}}_{2}{\text{F}}_{\text{i}+1}$$ 11 $$\:\text{Z}\:={\text{W}}_{3\times\:3}\left(\text{S}\right)\:+\:{\text{F}}_{\text{i}+1}\:$$ 12 In Eqs. ( 10 ) and ( 11 ), $\:{L}_{1}\:$ and $\:{L}_{2}$ are the weight matrices of two linear layers, corresponding to the depthwise convolution with a kernel size of $\:k\times\:k$ . This operation enables each spatial location $\:(h,w)$ to interact with all pixels within a $\:\text{k}\times\:\text{k}$ square region centered at $\:\:(h,w)$ . Inter-channel information exchange is facilitated through the linear layers. The output for each spatial position is the weighted sum of all pixels within the square region. Compared to self-attention, our approach utilizes convolutions to establish relationships, which, especially when dealing with high-resolution images, is more memory-efficient than self-attention. Loss Function In this paper, a combination of Cross-Entropy (CE) and Dice loss is employed. The specific formulas are as follows: $$\:{Loss}_{CE}\:=\:-\:\sum\:_{\text{c}=1}^{\text{C}}{\text{y}}_{\text{c}}\text{log}\left(\widehat{{\text{y}}_{\text{c}}}\right)$$ 13 $$\:{Loss}_{Dice}\:=\:1-\:\frac{2\sum\:_{i}{x}_{i}{y}_{i}}{\sum\:_{i}{x}_{i}+\sum\:_{i}{y}_{i}}$$ 14 $$\:{Loss}_{total}\:=\:0.5\:*\:{Loss}_{CE}\:+\:{Loss}_{Dice}$$ 15 The loss function shown in Eq. ( 15 ) combines Cross-Entropy (CE) loss and Dice loss. In the CE loss formula (12), $\:{y}_{c}$ represents the true label of the sample for class $\:c$ , and $\:{\widehat{y}}_{c}$ is the predicted probability for class $\:\:c$ . In the Dice loss formula (15), $\:{x}_{i}$ is the predicted probability that a given element in the prediction map belongs to a specific foreground class, while $\:{y}_{i}$ is the true value of that element in the ground truth map. Unlike CE loss, Dice loss is not affected by the size of the foreground. However, CE loss provides important guidance for the model in learning the Dice loss. Therefore, combining both losses for network training is a more effective and rational approach. Experiment settings The experiments were conducted using the public MMsegmentation codebase [ 15 ] and PyTorch [ 33 ]. The model was trained on two NVIDIA GTX 4090 GPUs. During training, images were randomly cropped to a size of 512×512. The AdamW optimizer [ 34 ] was used with a cosine learning rate decay strategy. The hyperparameters for training were as follows: momentum of 0.9, weight decay of $\:1\times\:{10}^{-2}$ , a batch size of 16, 500 epochs, an initial learning rate of $\:1\times\:{10}^{-4}$ , and a minimum learning rate of $\:1\times\:{10}^{-7}$ . To prevent overfitting, the learning rate was reduced by a factor of 0.1 at specified intervals. Evaluation Metrics The quantitative metrics used to evaluate the model's performance include Precision [ 26 ], IoU [ 10 ], Dice coefficient [ 35 ], and Recall [ 26 ]. Higher IoU and Dice values typically indicate greater overlap between the predicted and ground truth results, which corresponds to more accurate segmentation. $$\:\mathbf{P}\mathbf{r}\mathbf{e}\mathbf{c}\mathbf{i}\mathbf{s}\mathbf{i}\mathbf{o}\mathbf{n}\:=\:\frac{TP}{TP+FP}$$ 16 $$\:\mathbf{R}\mathbf{e}\mathbf{c}\mathbf{a}\mathbf{l}\mathbf{l}\:=\:\frac{TP}{TP+FN}$$ 17 $$\:\mathbf{D}\mathbf{i}\mathbf{c}\mathbf{e}\:=\:\frac{2TP}{2TP+FP+FN}$$ 18 $$\:\mathbf{I}\mathbf{o}\mathbf{U}\:=\:\frac{TP}{TP+FP+FN}$$ 19 Here, TP represents the number of positive pixels correctly classified as true positives. TN denotes the number of negative pixels correctly classified as true negatives. FP refers to pixels classified as leaves but are actually part of the background. FN represents pixels classified as background but are actually part of the leaves. Results and Discussion Comparative experiments In this section, the proposed method is compared with several popular deep learning-based semantic segmentation approaches, including CNN-based methods such as U-Net and DeepLab v3+, Transformer-based methods like SegFormer and PVT2, as well as lightweight models such as TopFormer, AFFormer, and SwiftFormer. This comparison is conducted to further validate the feasibility and effectiveness of LKCAFormer. Specifically, U-Net features a simple skip connection structure that repeatedly fuses shallow and semantic features. DeepLab v3 + adopts an ASPP (Atrous Spatial Pyramid Pooling) structure, using dilated convolutions to expand the receptive field. SegFormer utilizes a hierarchical Transformer block, while the decoder applies a lightweight MLP (Multi-Layer Perceptron) structure. TopFormer refines features layer by layer, effectively capturing global context while preventing detail loss. AFFormer employs a parallel architecture and uses prototype representations as learnable local descriptors to replace the decoder, preserving rich image semantics in high-resolution features. SwiftFormer introduces an efficient additive attention mechanism that learns consistent global context across multiple scales. Each method was trained and tested on three maize leaf disease datasets. The performance of each method was evaluated using seven metrics: Dice, Recall, IoU, Precision, FPS (Frames per Second), total parameters, and FLOPs/G. The results of the segmentation comparisons on the Single-CD&S dataset for three types of maize diseases are recorded in Tables 2 , 3 , and 4 . As shown in Table 2 , the proposed method achieves the best segmentation performance on the Gls test set. It outperforms the CNN-based models, U-Net and DeepLab v3+, in terms of segmentation accuracy. Compared to U-Net, the proposed method improves the IoU for background, leaf, and lesion segmentation by 1.14%, 0.6%, and 3.15%, respectively. DeepLab v3 + exhibits lower IoU values than the proposed method, with reductions of 0.92% for leaf segmentation, 2.54% for lesion segmentation, and 1.77% for background segmentation. When compared to the proposed method, SegFormer shows a decrease in IoU by 0.45% for leaf segmentation, 2.83% for lesion segmentation, and 0.96% for background segmentation. In contrast, the proposed method outperforms PVT2, with improvements of 0.58% in leaf segmentation IoU, 2.83% in lesion segmentation IoU, and 0.57% in background segmentation IoU. Additionally, the proposed method consistently outperforms the lightweight models TopFormer and AFFormer. While SwiftFormer exhibits segmentation accuracy similar to or slightly better than the other methods, it still falls short of the proposed method. Specifically, the proposed method improves the IoU for leaf segmentation by 0.35%, for lesion segmentation by 0.47%, and for background segmentation by 0.78% compared to SwiftFormer. Table 2 Quantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Gls Test Set of the Single-CD&S Dataset Methods Gls Leaf Background Dice IoU Recall Precision Dice IoU Recall Precision Dice IoU Recall Precision UNet(2015) 86.85 75.33 87.22 86.52 97.07 95.96 97.35 96.77 97.74 95.83 97.03 97.79 DeeplabV3+(2018) 86.70 75.94 86.94 86.71 97.23 95.64 97.39 98.29 97.34 95.20 96.84 98.12 Segformer(2021) 87.14 75.65 88.81 86.95 97.94 96.11 97.71 98.06 98.03 96.01 98.59 98.13 PVT3(2022) 88.27 76.69 87.48 88.81 97.63 95.98 98.17 97.22 98.31 96.40 98.15 98.22 Topformer(2022) 88.20 77.92 88.38 87.79 97.07 90.96 97.35 97.77 95.23 92.22 95.14 94.93 AFFormer(2023) 87.23 77.48 88.32 87.76 97.79 96.15 98.24 98.14 98.18 96.17 98.05 98.77 SwiftFormer(2023) 88.21 78.01 88.19 88.15 98.13 96.21 98.53 97.67 98.20 96.19 98.26 97.93 LKCAFormer(ours) 88.86 78.48 88.01 88.32 98.19 96.56 98.03 98.13 98.42 96.97 98.48 98.26 According to Table 3 , the proposed method demonstrates the best segmentation performance on the maize leaf Nls disease test set. Compared to U-Net, the proposed method improves the IoU for leaf and lesion segmentation by 1.08% and 3.51%, respectively, while achieving nearly the same IoU for background segmentation, with an increase of 0.03%. DeepLab v3 + shows lower segmentation accuracy than the proposed method for all categories: the IoU for background segmentation is 1.24% lower, leaf segmentation is 1.74% lower, and lesion segmentation is 2.12% lower. Additionally, the proposed method outperforms SegFormer by 0.72% in lesion segmentation IoU. The segmentation performance of PVT2 is weaker than the proposed method, with IoU values for background, leaf, and lesion segmentation being 0.21%, 1.01%, and 108% lower, respectively. SwiftFormer exhibits better segmentation performance than lightweight methods like TopFormer and AFFormer, but the proposed method still outperforms SwiftFormer. Specifically, the proposed method improves the IoU for background segmentation by 0.56%, for leaf segmentation by 0.62%, and for lesion segmentation by 0.95%. In summary, while SwiftFormer shows higher segmentation accuracy than other methods, it does not outperform the method presented in this study. Table 3 Quantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nls Test Set of the Single-CD&S Dataset Methods Nls Leaf Background Dice IoU Recall Precision Dice IoU Recall Precision Dice IoU Recall Precision UNet(2015) 83.21 72.70 82.10 81.86 98.19 95.21 98.23 98.67 98.20 97.78 98.43 97.49 DeeplabV3+(2018) 85.38 74.09 86.25 85.77 97.43 94.55 98.39 98.29 97.14 96.57 97.84 98.12 Segformer(2021) 85.03 75.49 86.59 85.70 98.14 96.11 98.51 98.26 97.99 97.01 97.59 96.3 PVT3(2022) 85.01 75.13 85.34 85.96 98.30 95.28 98.37 97.82 98.01 97.60 97.95 98.02 Topformer(2022) 83.09 72.11 85.61 85.25 97.59 95.35 97.64 98.14 98.38 97.37 98.25 97.77 AFFormer(2023) 80.31 70.82 79.76 81.25 98.07 91.46 98.35 97.87 98.23 95.12 98.84 97.93 SwiftFormer(2023) 85.92 75.26 84.45 86.34 98.26 95.67 97.93 98.01 98.24 97.25 97.89 98.04 LKCAFormer(ours) 86.20 76.21 87.67 86.99 98.49 96.29 97.63 98.33 98.72 97.81 98.36 98.16 Table 4 presents the experimental results on the maize leaf Nlb disease test set. The proposed method again shows the best segmentation performance. Notably, TopFormer performs poorly in leaf and lesion segmentation, with IoU values 4.63% and 7.56% lower than those of the proposed method. AFFormer slightly outperforms the proposed method in background segmentation IoU, but its IoU for leaf and lesion segmentation is 1.23% and 0.54% lower, respectively. Compared to the methods mentioned above, the CNN-based U-Net and DeepLab v3 + show similar performance, but both perform worse than the proposed method. The Transformer-based SegFormer performs similarly to the proposed method in lesion segmentation, with a difference of only 0.13%, but its performance in leaf and background segmentation is 1.88% and 1.11% lower, respectively. PVT2 shows lower IoU values in background, leaf, and lesion segmentation by 0.61%, 1.61%, and 0.78%, respectively. SwiftFormer, with segmentation accuracy comparable to U-Net and DeepLab v3+, still falls short of the proposed method. In conclusion, the proposed method demonstrates significant improvements in segmentation accuracy compared to other methods across all tested datasets. To better validate the performance of the proposed method in real-world scenarios, each method was trained and tested on the CD&S dataset, which contains complex backgrounds and multiple leaf diseases. The results were then compared with the proposed method. Tables 5 , 6 , and 7 present the segmentation performance comparisons of the proposed method and other methods on three disease test sets. As seen in Table 5 , compared to Table 2 , all methods show poorer performance when segmenting multiple leaves and disease areas compared to segmenting single leaves and disease regions. Despite this, the proposed method still performs the best in terms of segmentation performance, achieving IoUs of 99.02%, 97.39%, and 70.52% for background, leaf, and lesion segmentation, respectively. These results outperform the worst-performing PVT2 by 2.55%, 0.65%, and 6.83%, respectively, in lesion segmentation. Table 4 Quantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nlb Test Set of the Single-CD&S Dataset Methods Nlb Leaf Background Dice IoU Recall Precision Dice IoU Recall Precision Dice IoU Recall Precision UNet(2015) 83.29 72.46 83.49 84.61 97.79 95.21 97.53 96.67 98.50 97.78 98.43 96.79 DeeplabV3+(2018) 82.25 72.28 83.10 82.58 97.43 96.35 97.99 97.29 97.84 98.20 97.84 98.32 Segformer(2021) 84.10 73.86 84.17 85.81 98.14 95.31 97.11 96.96 96.73 97.01 97.59 97.3 PVT3(2022) 85.34 73.21 86.26 84.97 97.30 95.58 98.37 97.62 97.01 97.60 97.55 98.62 Topformer(2022) 78.45 66.43 78.37 79.12 98.17 92.56 96.99 96.17 97.23 98.12 97.84 97.93 AFFormer(2023) 84.91 73.45 84.71 85.30 97.97 95.96 97.85 95.77 98.18 98.37 97.15 97.37 SwiftFormer(2023) 85.12 72.67 85.39 84.98 98.01 96.23 97.34 97.99 98.01 98.07 97.99 98.04 LKCAFormer(ours) 85.73 73.99 85.91 85.12 98.49 97.19 97.63 98.13 98.76 98.21 98.38 97.66 Table 5 Quantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Gls Test Set of the CD&S Dataset Methods Gls Leaf Background Dice IoU Recall Precision Dice IoU Recall Precision Dice IoU Recall Precision UNet(2015) 75.85 64.33 77.22 75.52 98.34 97.18 98.84 98.27 98.23 97.78 98.42 98.19 DeeplabV3+(2018) 76.70 65.98 76.94 75.71 98.13 97.35 98.09 97.29 98.04 97.21 98.14 97.92 Segformer(2021) 77.14 66.65 78.81 76.95 97.14 95.31 98.11 96.96 99.03 98.01 99.19 98.82 PVT3(2022) 74.27 63.69 73.48 73.81 97.64 96.74 97.97 97.62 97.91 96.47 98.15 97.24 Topformer(2022) 75.20 65.92 75.38 75.79 97.74 96.56 98.07 97.89 97.45 96.87 97.84 97.84 AFFormer(2023) 79.23 68.48 80.32 79.76 98.01 97.26 98.13 98.77 99.18 98.77 99.35 99.37 SwiftFormer(2023) 78.21 68.01 78.19 78.15 97.91 97.13 98.34 97.99 99.21 98.97 99.01 98.89 LKCAFormer(ours) 80.86 70.52 81.01 79.32 98.49 97.39 98.62 99.13 99.16 99.02 99.34 99.66 Table 6 Quantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nls Test Set of the CD&S Dataset Methods Nls Leaf Background Dice IoU Recall Precision Dice IoU Recall Precision Dice IoU Recall Precision UNet(2015) 73.01 62.43 73.49 73.26 95.64 93.18 96.84 95.73 96.50 94.78 96.43 96.19 DeeplabV3+(2018) 72.82 61.91 72.85 72.32 96.43 94.35 97.08 96.25 98.75 97.18 98.84 98.27 Segformer(2021) 76.31 65.63 76.29 75.84 94.32 92.35 94.67 94.39 98.41 97.10 98.05 98.36 PVT3(2022) 74.31 64.82 74.76 75.25 93.30 91.74 94.37 93.73 98.01 96.60 98.55 97.71 Topformer(2022) 71.20 61.08 71.42 70.96 90.17 83.16 91.07 91.42 94.28 89.12 93.78 93.97 AFFormer(2023) 70.09 59.11 69.61 70.25 91.97 87.96 92.01 91.77 93.39 91.37 93.19 93.32 SwiftFormer(2023) 72.46 62.56 72.95 72.74 94.01 93.23 93.34 93.99 98.41 97.16 98.21 98.67 LKCAFormer(ours) 76.63 67.28 77.49 76.99 95.49 95.19 95.34 95.21 99.06 98.02 98.34 98.66 Among the lightweight methods, TopFormer shows lower IoUs for background, leaf, and lesion segmentation than the proposed method, with reductions of 2.15%, 0.83%, and 4.60%, respectively. AFFormer and SwiftFormer show similar segmentation performance, but both fall short of the proposed method, with average reductions of 0.18%, 0.46%, and 1.82%, respectively. Compared to DeepLab v3+, the proposed method improves the IoU for background and lesion segmentation by 1.91% and 4.54%, respectively, while showing a similar performance for leaf segmentation, with a 0.04% increase. U-Net, on the other hand, shows lower segmentation accuracy than the proposed method across all categories. The IoU for background segmentation is 1.24% lower, for leaf segmentation it is 0.21% lower, and for lesion segmentation it is 6.19% lower compared to the proposed method. Table 6 presents the segmentation performance comparison of the proposed method with other methods on the Nls test set. Overall, the segmentation results are relatively poor, but the proposed method still achieves the highest accuracy, with IoUs of 98.02%, 95.19%, and 67.28% for background, leaf, and lesion segmentation, respectively. The worst performance is observed with the lightweight AFFormer, which achieves IoUs of 91.37%, 87.96%, and 59.11% for background, leaf, and lesion segmentation, respectively. SegFormer performs worse than the proposed method across all categories. Specifically, the IoU for background segmentation is 1.01% lower, for leaf segmentation it is 2.88% lower, and for lesion segmentation it is 1.58% lower compared to the proposed method. Furthermore, the proposed method outperforms U-Net, with an improvement of 4.78% in lesion segmentation IoU, 2.01% in leaf segmentation IoU, and 3.24% in background segmentation IoU. DeepLab v3 + shows a decrease of 5.3% in lesion segmentation IoU, 0.84% in leaf segmentation IoU, and 0.82% in background segmentation IoU compared to the proposed method. Table 7 Quantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nlb Test Set of the CD&S Dataset Methods Nlb Leaf Background Dice IoU Recall Precision Dice IoU Recall Precision Dice IoU Recall Precision UNet(2015) 74.24 64.23 74.09 74.89 98.32 97.21 98.61 98.07 98.30 97.13 98.53 97.99 DeeplabV3+(2018) 74.36 65.28 74.83 74.58 98.43 97.65 98.57 98.19 98.51 97.20 98.32 98.03 Segformer(2021) 73.15 62.86 74.58 74.99 97.14 95.86 96.75 96.36 97.93 97.03 98.03 97.99 PVT3(2022) 70.78 60.21 71.28 70.97 97.02 95.47 97.36 96.81 97.98 96.47 97.77 97.73 Topformer(2022) 78.45 66.86 78.66 79.09 96.97 96.38 97.02 96.45 97.41 96.16 98.03 97.59 AFFormer(2023) 70.79 59.55 69.93 70.33 96.38 95.06 96.73 96.31 97.06 95.68 97.63 98.01 SwiftFormer(2023) 74.12 63.81 74.39 73.98 97.82 96.83 97.34 98.19 98.29 97.16 98.51 99.04 LKCAFormer(ours) 79.11 69.48 79.31 80.01 99.22 98.19 99.00 99.10 99.31 99.01 99.11 99.24 Table 7 presents the experimental results on the maize leaf Nlb disease test set, where the proposed method demonstrates the best segmentation performance. Specifically, the Transformer-based SegFormer method achieves 6.62% lower IoU for lesion segmentation compared to the proposed method, and its IoU for leaf and background segmentation is 2.33% and 2.18% lower, respectively. The proposed method outperforms PVT2 by 2.74%, 1.44%, and 9.27% in IoU for background, leaf, and lesion segmentation, respectively. U-Net and DeepLab v3 + exhibit similar segmentation performance, with DeepLab v3 + performing slightly better overall. However, DeepLab v3 + still lags behind the proposed method by 2.01%, 0.54%, and 4.2% in IoU for background, leaf, and lesion segmentation, respectively. The proposed method shows better segmentation accuracy than the lightweight TopFormer, with improvements of 3.05%, 1.81%, and 2.62% in background, leaf, and lesion segmentation IoU. AFFormer, on the other hand, shows relatively poor performance, with IoUs of only 95.68%, 95.06%, and 59.45% for background, leaf, and lesion segmentation, respectively. In summary, the proposed method demonstrates a significant improvement in segmentation accuracy compared to other methods. Table 8 presents a comparison of the methods based on the remaining evaluation metrics. As shown in Table 8 , the proposed method outperforms U-Net in terms of FPS by 7.36 ms. Additionally, the total number of parameters and FLOPs of the proposed method are only 12.7% and 0.14% of those of U-Net, respectively. PVT2 achieves the highest FPS among all methods, but its parameter count is more than twice that of the proposed method. AFFormer has the fewest parameters and FLOPs, but its FPS is 0.94 ms lower than that of the proposed method. Compared to TopFormer, the proposed method increases FPS by 6.4 ms. Furthermore, in terms of total parameters and FLOPs, the proposed method is more efficient, requiring 1.46M fewer parameters and 1.05G fewer FLOPs than TopFormer. In summary, based on a comprehensive comparison of all parameters, the proposed method strikes the best balance between segmentation performance and computational efficiency, offering superior segmentation accuracy with lower computational overhead. Table 8 The results of different methods on the remaining evaluation indicators. FPS Total parameters/M FLOPs/G UNet(2015) 22.33 29.06 76.78 DeeplabV3+(2018) 125.52 5.81 18.49 SegFormer(2021) 94.80 3.72 6.77 PVT3(2022) 13.7 7.53 4.53 Topformer(2022) 21.37 5.14 2.18 AFFormer(2023) 15.91 3.05 0.86 SwiftFormer(2023) 18.34 3.29 17.47 LKCAFormer(ours) 14.97 3.68 1.13 Figures 3 , 4 , and 5 display the segmentation results for each method on the single-leaf test sets of Gls, Nls, and Nlb diseases in the Single-CD&S dataset. As shown in Fig. 3 , the white dashed boxes highlight specific disease areas where lesion colors are similar to the leaf color due to lighting conditions. These regions are crucial for evaluating the accuracy of lesion segmentation. Comparing Figs. 3 (a) and 3(c), U-Net correctly segments most of the lesions, but segmentation is poor in certain areas, with significant loss of detail. However, comparing Figs. 3 (c) and 3(d), it is clear that DeepLab v3 + performs worse than U-Net. In Fig. 3 (e), SegFormer shows better segmentation performance than the previous two methods, but its segmentation of some edge regions is poor. While it has strong global modeling capabilities, some local details are lost. Comparing Figs. 3 (f), 3(h), and 3(i), AFFormer effectively reduces noise from lighting and other factors, focusing on lesion segmentation, but it performs poorly in segmenting lesion edges. In Fig. 3 (j), the proposed method, LKCAFormer, compensates effectively for the loss of fine-grained details caused by aggregating different resolutions, providing accurate segmentation in key regions. It also performs well in segmenting leaf-edge lesions. Figure 4 shows the Nls single-leaf disease segmentation results. Since Nls lesions are relatively concentrated and the diseased areas are scattered, the segmentation differences between methods are smaller. In Fig. 4 (h), the AFFormer method performs poorly in lesion area segmentation, only capturing a few prominent lesions. Figures 4 (c), 4(d), 4(f), and 4(g) show that these methods can segment dense lesions, but they struggle with lesion edges, resulting in incorrect segmentation. Figure 4 (e) performs poorly in segmenting independent lesions at the edges. In Figs. 4 (i) and 4(j), differences are visible at the leaf’s striped areas, with SwiftFormer mistakenly classifying the stripe color as lesions, while the proposed method segments these areas more accurately, with better performance on leaf-edge lesion segmentation. Figure 5 presents the Nlb single-leaf disease segmentation results, with the white dashed boxes indicating the dense lesion areas. Comparing Figs. 5 (a) and 5(c), U-Net shows poor segmentation performance in specific regions. Methods in Figs. 5 (d), 5(e), 5(f), and 5(i) perform poorly in lesion segmentation in these regions, and these methods mistakenly classify the leaf’s main veins, which are similar in color to the lesions, as lesions, resulting in incorrect segmentation. Comparing Figs. 5 (h) and 5(j), the proposed method segments more dense lesions in specific regions and performs better in segmenting the lesion edges around larger regions near the main veins. The results from these experiments demonstrate that LKCAFormer not only achieves clearer segmentation of the leaf-edge regions but also segments lesion edges more accurately, providing superior segmentation performance overall. Figures 6 , 7 , and 8 present the segmentation results for each method on the multi-leaf test sets of Gls, Nls, and Nlb diseases from the CD&S dataset. As shown in Fig. 6 , Gls lesions are densely distributed, and lighting effects cause parts of the leaves to reflect light, making the color of the reflections similar to that of the lesions, which can lead to missegmentation. Comparing Figs. 6 (a) and 6(c), the dense lesion areas are segmented fairly well, but some leaf areas are misclassified as lesions. Additionally, shadowed regions show poor segmentation performance. Figure 6 (h) provides the best segmentation for the shadowed regions, but a comparison with Fig. 6 (b) reveals that the annotated lesions were not segmented correctly. Figures 6 (f) and 6(g) show poor performance in segmenting the edges of dense lesions, leading to incorrect segmentations. In Fig. 6 (j), the proposed method achieves the best overall segmentation, correctly segmenting most of the annotated lesions from Fig. 6 (b) with minimal missegmentation. Figure 7 presents the segmentation results for Nls disease, where the background and leaf colors are similar. Due to the presence of grass and maize plants in the background, which share similar colors with the leaves, methods in Figs. 7 (c), 7(d), 7(g), 7(h), and 7(i) all show varying degrees of incorrect segmentation of the leaves. Other methods, such as those in Figs. 7 (e) and 7(f), fail to effectively segment the leaf edges where lighting conditions affect them. In contrast, the proposed method in Fig. 7 (j) accurately segments the leaves, as shown in Fig. 7 (b). Regarding lesion segmentation, since Nls lesions are small and scattered, all methods perform similarly on larger lesions. However, the proposed method (Fig. 7 (j)) performs better on smaller lesions. Other methods, such as Figs. 7 (c) and 7(d), incorrectly segment non-lesion areas as lesions, while methods like Figs. 7 (g) and 7(i) mistakenly classify the background as lesions. Figure 8 displays the segmentation results for Nlb disease on multi-leaf images. Lighting effects in the background cause areas similar in color to the leaves, leading to missegmentation, as shown in Figs. 8 (e), 8(f), and 8(h). Comparing Fig. 8 (j) with Figs. 8 (c) and 8(d), the performance for lesion segmentation is comparable across all methods, with most annotated lesions being correctly segmented. However, Fig. 8 (j) performs better in segmenting both lesion edges and leaf boundaries. In summary, the LKCAFormer method demonstrates superior segmentation performance in the complex background multi-leaf CD&S test sets. It effectively minimizes background noise, focuses on the leaf regions, and accurately segments lesions, whether they are dispersed or dense, showing overall superior segmentation accuracy. Ablation studies In this section, five ablation experiments are designed to verify the adaptability of the LK-COA module across different model architectures and its effectiveness in optimizing global feature modeling and detail feature extraction. Specifically, in Test 1, LKCAFormer-TR is a model where the LK-COA module is removed and replaced with three Transformer blocks. In Test 2, the traditional Transformer blocks are replaced with one LK-COA module. Test 3 adds two LK-COA modules, while Test 4 adds three LK-COA modules, which corresponds to the model proposed in this paper. Test 5 includes four LK-COA modules. Additionally, the ablation experiments in this section are conducted using the three disease datasets from the Single-CD&S dataset. The evaluation results of the ablation study are recorded in Table 9 . Table 9 The results of ablation studies on three maize leaf datasets. Test No. Model Gls Nls Nlb IoU IoU IoU Leaf Disease Backgorund Leaf Disease Backgorund Leaf Disease Backgorund 1 LKCAFormer-TR 95.39 75.56 96.01 95.28 74.13 96.56 95.85 72.32 97.74 2 LKCAFormer-LC1 83.41 58.89 90.61 80.31 56.01 84.39 77.47 54.93 82.06 3 LKCAFormer-LC2 88.53 70.39 93.01 89.27 69.95 93.89 86.68 65.10 91.43 4 LKCAFormer-LC3 96.56 78.48 96.97 96.29 76.21 97.81 96.19 73.99 98.21 5 LKCAFormer + LC4 97.11 76.15 97.86 97.91 75.10 98.38 97.92 73.02 98.93 As shown in Table 9 , a comparison between Experiment 1 and Experiment 2 clearly indicates a significant decrease in segmentation accuracy when the traditional Transformer blocks are replaced with the proposed LK-COA module. However, when the LK-COA module is added to the model in Experiment 4 (with three LK-COA modules), the IoU for lesion segmentation across the three maize disease test sets improves by 3.11%, 1.26%, and 1.89%, respectively, compared to Experiment 1. Interestingly, when four LK-COA modules are added in Experiment 5, the IoU for lesion segmentation decreases, but the accuracy for background and leaf segmentation improves, which can be attributed to the large-kernel convolution's ability to better capture global features. Considering the segmentation results for lesions, leaves, and background, the encoder with three stacked LK-COA modules provides a more balanced improvement in segmentation accuracy. It not only enhances the perception of global features but also improves the model's ability to extract finer details and edge features, thereby effectively boosting segmentation performance for the background, leaves, and lesions. Figure 9 illustrates the segmentation results for leaf and lesion areas in each test. Comparing the two rows in Fig. 9 for Test 1 and Test 4, it is evident that replacing the traditional Transformer blocks with the LK-COA module enables the model to capture more fine-grained lesions, resulting in more precise segmentation of leaf and lesion boundaries. From the comparison between the row in Experiment 4 and the other experiments, it is clear that the proposed method not only segments the contours of the leaves and lesions more clearly but also captures more small lesion areas, accurately defining the lesion boundaries and mitigating the issue of small lesions merging in dense lesion regions. The experimental results demonstrate that the proposed method improves segmentation performance for various types of maize leaf lesions. Conclusion This paper presents a novel lightweight algorithm, LKCAFormer, for precise segmentation of maize disease lesions. The network incorporates the large-kernel convolution cooperative attention (LK-COA) module, which uses large-kernel convolutions to model global features and capture global context. The COA attention mechanism is then applied to extract finer lesion details, focusing on lesion segmentation. This module aggregates and enhances the ability to capture local edge and detail features, improving the extraction of small lesions and alleviating the problem of lesion merging. The CSDecoder decoder is designed to fuse shallow features, rich in detail and edge information, with deeper features containing stronger semantic information, allowing for precise recovery and yielding finely segmented results. Experimental results demonstrate that, compared to other segmentation methods, LKCAFormer achieves the best segmentation performance across the three disease datasets tested. These findings indicate that LKCAFormer can provide valuable technical support for pathological image analysis of various maize leaf diseases. It is worth noting that LKCAFormer offers computational advantages, but experiments were only conducted on three maize disease leaf datasets. Future research will focus on further improving the model's accuracy and enhancing its computational performance for deployment on edge computing devices. Additionally, further experiments will be conducted on a broader range of crop disease datasets. Declarations Acknowledgements Not applicable. Authors’ contributions Conceptualization, J.L.G. and X.H.J.; methodology, J.H; validation, J.H and C.J.Z.; writing—original draft preparation,J.H ; writing—review and editing, X.F.Y, and J.L.G; visualization, J.H; project administration. X.H.J. Funding This research was supported by the National Natural Science Foundation of China(62061037、31960494)，Natural Science Foundation of Inner Mongolia(2023LHMS06017、2023QN06006、NJZZ21068), Science and Technology R&D Program of Inner Mongolia(0200GG0169). Data availability The datasets used during the current study are available from the correspond ing author on reasonable request. Data Availability Statement: The datasets used during the current study are available from the corresponding author on reasonable request. Ethics approval and consent to participate Not applicable. Institutional review board statement Not applicable. Informed consent statement Not applicable. Competing interest The authors declare no conflict of interest. References Ronneberger, O., P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation. 2015: p. 234-241. Zhao, H., et al., Pyramid scene parsing network. 2017: p. 2881-2890. Sun, K., et al., High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514, 2019. Chen, L.-C., Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014. Chen, L.-C., Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017. Chen, L.-C., et al., Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 2017. 40 (4): p. 834-848 %@ 0162-8828. Divyanth, L.G., A. Ahmad, and D. Saraswat, A two-stage deep-learning based segmentation model for crop disease quantification based on corn field imagery. Smart Agricultural Technology, 2023. 3 : p. 100108 %@ 2772-3755. Yang, Y., H. Shan, and F. Qu, Maize disease segmentation method based on improved image segmentation network model. 2023. 12610 : p. 481-485. Dosovitskiy, A., An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. Xie, E., et al., SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems, 2021. 34 : p. 12077-12090. Yu, W., et al., Metaformer is actually what you need for vision. 2022: p. 10819-10829. Ahmad, A., et al., CD&S dataset: Handheld imagery dataset acquired under field conditions for corn disease identification and severity estimation. arXiv preprint arXiv:2110.12084, 2021. Russell, B.C., et al., LabelMe: a database and web-based tool for image annotation. International journal of computer vision, 2008. 77 : p. 157-173 %@ 0920-5691. Bloice, M.D., P.M. Roth, and A. Holzinger, Biomedical image augmentation using Augmentor. Bioinformatics, 2019. 35 (21): p. 4522-4524 %@ 1367-4803. Contributors, M., MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. 2020. Zhang, S. and C. Zhang, Modified U-Net for plant diseased leaf image segmentation. Computers and Electronics in Agriculture, 2023. 204 : p. 107511 %@ 0168-1699. Zhu, S., et al., A novel approach for apple leaf disease image segmentation in complex scenes based on two-stage DeepLabv3+ with adaptive loss. Computers and Electronics in Agriculture, 2023. 204 : p. 107539 %@ 0168-1699. Zhang, X., et al., Research of segmentation recognition of small disease spots on apple leaves based on hybrid loss function and cbam. Frontiers in Plant Science, 2023. 14 : p. 1175027 %@ 1664-462X. Chang, B., et al., A general-purpose edge-feature guidance module to enhance vision transformers for plant disease identification. Expert Systems with Applications, 2024. 237 : p. 121638 %@ 0957-4174. Thai, H.-T., K.-H. Le, and N.L.-T. Nguyen, FormerLeaf: An efficient vision transformer for Cassava Leaf Disease detection. Computers and Electronics in Agriculture, 2023. 204 : p. 107518 %@ 0168-1699. Liu, Z., et al., A convnet for the 2020s. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022: p. 11976-11986. Ding, X., et al., Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. 2022: p. 11963-11975. Peng, C., et al., Large kernel matters--improve semantic segmentation by global convolutional network. 2017: p. 4353-4361. Liu, S., et al., More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620, 2022. El-Assiouti, H.S., et al., Lite-SRGAN and Lite-UNet: toward fast and accurate image super-resolution, segmentation, and localization for plant leaf diseases. IEEE Access, 2023. 11 : p. 67498-67517 %@ 2169-3536. Zhang, X., et al., UPFormer: U-sharped perception lightweight transformer for segmentation of field grape leaf diseases. Expert Systems with Applications, 2024. 249 : p. 123546 %@ 0957-4174. Zhang, Y. and C. Lv, TinySegformer: A lightweight visual segmentation model for real-time agricultural pest detection. Computers and Electronics in Agriculture, 2024. 218 : p. 108740 %@ 0168-1699. Shi, M., et al., Lightweight context-aware network using partial-channel transformation for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems %@ 1524-9050, 2024. Sheng, X., et al., An edge-guided method to fruit segmentation in complex environments. Computers and Electronics in Agriculture, 2023. 208 : p. 107788 %@ 0168-1699. Lu, J., et al., EAIS-Former: An efficient and accurate image segmentation method for fruit leaf diseases. Computers and Electronics in Agriculture, 2024. 218 : p. 108739 %@ 0168-1699. Chen, G., et al., ESKNet: An enhanced adaptive selection kernel convolution for ultrasound breast tumors segmentation. Expert Systems with Applications, 2024. 246 : p. 123265 %@ 0957-4174. Yan, S., et al., LiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention. Expert Systems with Applications, 2024. 237 : p. 121338 %@ 0957-4174. Paszke, A., et al., Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 2019. 32 . Loshchilov, I. and F. Hutter, Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101, 2017. 5 . Garcia-Garcia, A., et al., A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857, 2017. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 28 Feb, 2026 Read the published version in BMC Plant Biology → Version 1 posted Editorial decision: Revision requested 12 Aug, 2025 Reviews received at journal 06 Aug, 2025 Reviews received at journal 05 Aug, 2025 Reviews received at journal 03 Aug, 2025 Reviewers agreed at journal 25 Jul, 2025 Reviewers agreed at journal 25 Jul, 2025 Reviewers agreed at journal 24 Jul, 2025 Reviewers agreed at journal 23 Jul, 2025 Reviews received at journal 30 May, 2025 Reviewers agreed at journal 18 May, 2025 Reviewers agreed at journal 15 May, 2025 Reviewers invited by journal 13 May, 2025 Editor invited by journal 08 May, 2025 Editor assigned by journal 08 May, 2025 Submission checks completed at journal 08 May, 2025 First submitted to journal 27 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6543171","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":455803737,"identity":"9d721811-9985-407f-9b69-af093b231d40","order_by":0,"name":"Jian Hu","email":"","orcid":"","institution":"Inner Mongolia Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Jian","middleName":"","lastName":"Hu","suffix":""},{"id":455803738,"identity":"f3779556-3abe-4bb9-b117-6883a7c40004","order_by":1,"name":"Xinhua Jiang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1UlEQVRIiWNgGAWjYBACxmYwdUCODcJnJl6LMfFaoOBAYgPRWpjbmZ89/Np2J71P7PAzCYYK68QG9rMHCDiMzdxY5syz3DbpNDMJhjPpiQ08eQkEtDCYSUtUHAZqSTCTYGw7nNggwWNAQAv7N2kJg8PpbNLp3yQY/xGlhcdM8kPF4QQ26RygLQ3EaSmTZjhz2LBNOqfYIuFYunEbTw5+LYb9x7dJ/mw7LC8/O33jjQ811rL97GcIaGkABjQPjJcAxGx41QOBPMhxPwipGgWjYBSMgpENAMEkP9nbCX/8AAAAAElFTkSuQmCC","orcid":"","institution":"Inner Mongolia Agricultural University","correspondingAuthor":true,"prefix":"","firstName":"Xinhua","middleName":"","lastName":"Jiang","suffix":""},{"id":455803739,"identity":"b65560c0-37c5-4b0f-861c-b9164d3320d3","order_by":2,"name":"Julin Gao","email":"","orcid":"","institution":"Inner Mongolia Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Julin","middleName":"","lastName":"Gao","suffix":""},{"id":455803740,"identity":"aae56ddb-0959-4fac-a192-3b958ed23a94","order_by":3,"name":"Xiaofang Yu","email":"","orcid":"","institution":"Inner Mongolia Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Xiaofang","middleName":"","lastName":"Yu","suffix":""},{"id":455803741,"identity":"2832db4f-eea7-4659-ad26-c19819dc6124","order_by":4,"name":"Chengjun Zhai","email":"","orcid":"","institution":"Inner Mongolia Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Chengjun","middleName":"","lastName":"Zhai","suffix":""}],"badges":[],"createdAt":"2025-04-28 02:38:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6543171/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6543171/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12870-026-08409-w","type":"published","date":"2026-02-28T15:57:52+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":82799464,"identity":"594b7bcc-71f8-4c91-937e-e93eb93578d9","added_by":"auto","created_at":"2025-05-15 10:58:22","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":603657,"visible":true,"origin":"","legend":"\u003cp\u003eExample annotated samples. Background, leaf, and disease parts in a single leaf are annotated, respectively.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/9f52002dda537a2225f19df8.jpeg"},{"id":82799461,"identity":"eb29aa47-486e-44a3-b4c4-3157e263b087","added_by":"auto","created_at":"2025-05-15 10:58:22","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":62664,"visible":true,"origin":"","legend":"\u003cp\u003e(a) The overall structure of LKCAFormer.(b) The internal structure of LK-COAT, which consists of three LK-COA modules and joins the jump connection.(c) The cross-scale integration of the internal structure of CSDecoder through the attention mechanism.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/ebb6061ad8b8fc2be534da8f.jpg"},{"id":82799462,"identity":"a0bb5296-759f-4f0b-98da-d605a5bb1c34","added_by":"auto","created_at":"2025-05-15 10:58:22","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":747645,"visible":true,"origin":"","legend":"\u003cp\u003eThe results of different methods on the Gls image.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/98308e7bfa3e5b043c4126eb.png"},{"id":82799463,"identity":"51b570e5-1102-447d-ae79-2177428eacf0","added_by":"auto","created_at":"2025-05-15 10:58:22","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":98453,"visible":true,"origin":"","legend":"\u003cp\u003eThe results of different methods on the Nls image.\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/0d78e6f65412fdcf73da24d6.jpeg"},{"id":82799467,"identity":"74a37be3-1f1d-48c7-a1c9-c3d6721180ff","added_by":"auto","created_at":"2025-05-15 10:58:22","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":861364,"visible":true,"origin":"","legend":"\u003cp\u003eThe results of different methods on the Nlb image.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/7ccacb06828270e03977ee0e.png"},{"id":82800698,"identity":"05f061e8-0703-4aac-a018-07099a68254e","added_by":"auto","created_at":"2025-05-15 11:14:22","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":839237,"visible":true,"origin":"","legend":"\u003cp\u003eThe results of different methods on the Gls image at CD\u0026amp;S.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/30d221546af1ce0bac7579c3.png"},{"id":82799466,"identity":"85b2f32c-d83a-4777-9fe6-060b53d3a9f4","added_by":"auto","created_at":"2025-05-15 10:58:22","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":822188,"visible":true,"origin":"","legend":"\u003cp\u003eThe results of different methods on the Nls image at CD\u0026amp;S.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/0a21b2abbd00f9df2a1046e3.png"},{"id":82799469,"identity":"0ca890eb-6978-4c8f-b5f5-9542d6f0fc85","added_by":"auto","created_at":"2025-05-15 10:58:23","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":749698,"visible":true,"origin":"","legend":"\u003cp\u003eThe results of different methods on the Nlb image at CD\u0026amp;S.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/5544a4b5c1cd736dcfad4b5f.png"},{"id":82801293,"identity":"8e239dd1-2c59-409d-b1ee-46c2b4963aac","added_by":"auto","created_at":"2025-05-15 11:22:23","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":926732,"visible":true,"origin":"","legend":"\u003cp\u003eThe segmentation results of ablation studies.\u003c/p\u003e","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/f4a877d08f95c4999880584a.png"},{"id":103766088,"identity":"4da4c92c-1a5b-4939-a715-f9776d4f1500","added_by":"auto","created_at":"2026-03-02 16:12:07","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":8846563,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6543171/v1/8d909912-3961-47c7-8930-faa72375f9db.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"LKCAFormer: A Lightweight Transformer with Large-Kernel Cooperative Attention for the Segmentation of Field Maize Leaf Diseases","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMaize is one of the most important food crops in our country and also a key raw material for animal husbandry and light industry. However, due to climate change and environmental factors, the frequency of maize leaf diseases has been increasing every year. Diseases on the leaves not only impair photosynthesis and affect plant growth but also threaten the quality and yield of maize, causing severe economic losses for farmers. Timely and accurate detection and diagnosis of plant diseases are crucial for effective disease management. Traditional methods for diagnosing maize leaf diseases rely on manually observing symptoms and spots, combined with expert experience. However, in large-scale farms, manual diagnosis is inefficient, less accurate, labor-intensive, and difficult to perform. Therefore, using computer vision technology to automatically analyze maize leaf diseases can improve diagnostic efficiency. Currently, existing methods show poor segmentation accuracy for images with complex backgrounds, small disease areas with rich textures, and similar disease symptoms. It is urgent to solve these problems to help farmers implement more precise disease control, thereby significantly improving maize yield and quality.\u003c/p\u003e \u003cp\u003eConvolutional Neural Networks (CNNs), as one of the core architectures in deep learning, have undergone significant evolution and have become a common framework in agriculture. Subsequently, segmentation networks based on CNNs began to emerge, such as U-Net[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], PSPNet[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], SegNet[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], and various versions of DeepLab [\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. In recent years, some improved networks have combined the advantages of DeepLab and U-Net. These models can handle multi-scale contextual information while achieving precise detail recovery and accurate boundaries, all while maintaining efficient computation and low resource consumption. For example, Divyanth, Ahmad [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] collected 1,050 maize disease spot leaves from the Purdue University Agricultural Research and Education Center and evaluated the strengths and weaknesses of SegNet, U-Net, and DeepLab v3+. They ultimately chose U-Net for segmenting maize leaves and DeepLab v3\u0026thinsp;+\u0026thinsp;for segmenting disease spots. Similarly, Yang, Shan [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] proposed an improved DeepLab v3\u0026thinsp;+\u0026thinsp;model that incorporates the advantages of U-Net by extracting multi-scale semantic information during encoding and obtaining richer spatial information during decoding, resulting in higher segmentation accuracy. However, CNNs mainly rely on stacking network layers to capture global features, and this approach has limitations when dealing with long-range dependencies and spatial transformations.\u003c/p\u003e \u003cp\u003eTo more effectively capture global features and better extract local features, researchers have started replacing traditional convolutions with self-attention mechanisms to model global information. This led to the development of segmentation models based on Transformers. The earliest ViT [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] was the first in computer vision to adopt a pure attention mechanism, completely abandoning convolutional layers, and laid the foundation for later Transformer-based models. Subsequently, models such as SegFormer [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] and PoolFormer [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] further extended this idea to the field of image segmentation, showing superior performance compared to CNN-based models on several tasks. Although Transformers can directly model the global information in high-resolution natural images with complex backgrounds and integrate information from different parts of an image through self-attention to capture more detailed information. But in maize leaf disease segmentation, the global feature maps which contain a wealth of detailed information lack an accurate representation of details such as the edges of the disease spots. These details are crucial for segmenting small disease regions that are hard to distinguish. Despite the significant advantages of both CNNs and Transformers in feature extraction and modeling, they share a common drawback: high computational resource consumption. This means that whether using deep convolutional networks or Transformers based on self-attention mechanisms, both require a large number of parameters and extensive computation. As a result, their efficiency is low when deployed in real-world environments\u0026mdash;especially on resource-constrained devices\u0026mdash;making it challenging to meet the demands of real-time applications.\u003c/p\u003e \u003cp\u003eBased on the analysis above, we propose a new efficient network called LKCAFormer for in-field maize leaf disease segmentation, addressing the issues of high computational resource consumption and the need for improved segmentation accuracy in current leaf spot segmentation models. Compared with traditional Transformer- and CNN-based models, LKCAFormer uses large-kernel convolution attention to emphasize the capture of boundary information and the fusion of global features for disease spot segmentation. This new method effectively addresses the challenges in maize leaf disease segmentation and offers a reliable solution for similar future applications.\u003c/p\u003e \u003cp\u003eIn summary, the main contributions of this study are as follows:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA lightweight model called LKCAFormer uses an encoder-decoder architecture to fuse feature information is proposed for the effective segmentation of maize leaf diseases.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLarge-Kernel Convolution Cooperative Attention Module (LK-COA) is designed to capture local edge and detailed features while enhancing the extraction of small spots, thereby alleviating the issue of spot adhesion.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eA Cross-Scale Attention Decoder called CSDecoder is designed which integrates attention operations to effectively recover details, edges, and positional information retained through downsampling, and outputs accurate segmentation results by combining semantic information.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLKCAFormer is evaluated on two representative datasets, namely Single-CD\u0026amp;S and CD\u0026amp;S. The experimental results show that our network has a significant improvement in performance compared with mainstream methods.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e\n\u003ch3\u003eRelated work\u003c/h3\u003e\n\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePlant Disease Segmentation\u003c/h2\u003e \u003cp\u003eIn recent years, CNN and Transformer-based models have been widely applied to plant disease segmentation. Zhang and Zhang [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] and colleagues introduced residual blocks and residual paths to develop an enhanced U-Net for plant leaf disease image segmentation. On a single maize leaf disease augmentation dataset, a segmentation accuracy of 94.07% was achieved. Zhu, Ma [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] proposed a two-stage DeepLabV3\u0026thinsp;+\u0026thinsp;algorithm with adaptive loss for segmenting apple leaf diseases, which effectively addressed the challenges of leaf and lesion extraction in complex environments. This model achieved segmentation accuracy of 98.70% for leaves and 86.56% for lesions. Zhang, Li [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] focused on apple leaf spot disease and brown spot disease, proposing a lesion segmentation model based on DFL-UNet\u0026thinsp;+\u0026thinsp;CBAM. This model used a hybrid loss function combining Dice Loss and Focal Loss to refine the weight relationships between features, enhancing the channel features of lesions while suppressing those of healthy leaf regions. The overall disease segmentation accuracy reached 95.16%. Chang, Wang [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] applied the Vision Transformer for plant disease recognition and introduced an EFG module to enhance local feature extraction. Experimental results demonstrated that this method outperformed ViT, PVT, and Swin. Thai, Le [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] proposed an improved Transformer-based model, FormerLeaf, for cassava leaf disease datasets, employing an attention pruning algorithm to select the most important attention heads at each layer, reducing complexity and improving segmentation accuracy. While these models have demonstrated superior performance in agriculture, they involve extensive matrix computations, leading to high computational overhead.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eCNNs with Large Kernels\u003c/h3\u003e\n\u003cp\u003eTraditional CNN architectures have limitations in kernel design, mainly manifested in a rapid increase in computational complexity as the network depth grows. To bridge the performance gap between Transformers and CNNs, ConvNeXt [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] incorporates Transformer design ideas into the ResNet architecture. By adjusting the training process of Swin Transformers, altering the computation ratios at different stages, reducing the number of activation and normalization layers, and using larger kernel sizes, ConvNeXt is able to improve segmentation performance. Recently, Ding et al. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] reexamined the importance of large kernel designs in CNNs. Their proposed RepLKNet employs 31\u0026times;31 large kernels, achieving a 0.3% accuracy improvement over Swin Transformers on the ImageNet classification task and outperforming ResNet-101 by 4.4% on the MS-COCO object detection task. However, because large kernels significantly increase the number of parameters and computational load, their application in segmentation tasks is limited. For instance, Peng, Zhang [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] pointed out that as the size of convolution kernels increases, the number of model parameters rises, which can lead to overfitting and ultimately harm segmentation performance pointed out that as the kernel size increases, the number of parameters also increases, leading to potential overfitting and indicating that large kernels may degrade segmentation performance. To address this issue, they proposed a Global Convolutional Network (GCN) that uses large 1\u0026times;k and k\u0026times;1 convolution kernels to enhance semantic segmentation results. Furthermore, experiments with the latest model, SLaK[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], have shown that the performance of RepLKNet tends to stabilize when kernel sizes exceed 31, 51, or 61. To tackle the optimization challenges associated with extra-large kernels during training, SLaK decomposes the large kernel into two rectangular convolution kernels (for example, 51\u0026times;5 and 5\u0026times;51) and employs dynamic sparsity techniques to reduce the number of trainable parameters, thereby achieving more efficient feature extraction.\u003c/p\u003e\n\u003ch3\u003eLightweight Segmentation Models\u003c/h3\u003e\n\u003cp\u003eIn recent years, an increasing number of researchers have begun focusing on lightweight segmentation models. For example, El-Assiouti, El-Saadawy [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] proposed Lite-UNet, which outperforms the standard U-Net, achieving slight increases of 0.06% in Dice coefficient and 0.12% in IoU. At the same time, Lite-UNet reduced model parameters, FLOPs (floating point operations per second), and inference time by 15.9 times, 25 times, and 6.6 times, respectively. Zhang, Li [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] customized a lightweight U-shaped sharpening perception Transformer architecture (UPFormer) specifically for segmenting grape leaf diseases, achieving a better balance between performance and efficiency. Additionally, Zhang and Lv [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] designed the TinySegformer model for agricultural pest detection tasks, significantly enhancing semantic segmentation capabilities while reducing parameters. Furthermore, Shi, Lin [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] developed a lightweight context-aware network (LCNet) that accelerates semantic segmentation processing while ensuring inference speed and segmentation accuracy, thus achieving a good balance between computational efficiency and prediction performance in mobile application scenarios. Although the aforementioned lightweight models can maintain high accuracy while reducing parameters and improving efficiency, their segmentation performance remains insufficient on maize leaf disease datasets with complex backgrounds and high disease similarity.\u003c/p\u003e\n\u003ch3\u003eAttention Mechanisms\u003c/h3\u003e\n\u003cp\u003eThe attention mechanism is a key technology to extract local key information. Accordingly, researchers have proposed several segmentation models that combine attention mechanisms to capture key information and suppress irrelevant features. For example, Sheng, Kang [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] designed new attention modules\u0026mdash;GLM and BAM\u0026mdash;that integrate and refine high-level semantic and spatial information. They developed an edge-guided segmentation method for complex environments, called EdgeSegNet, which achieved average MIoU scores of 0.909 and 0.942 on apple and peach datasets, respectively. Lu, Lu [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] proposed a precise segmentation method for fruit leaf disease images based on EAIS-Former, which uses a custom extra-large kernel attention module to capture more tiny spots, resulting in more refined segmentation of leaves and disease spots. Chen, Zhou [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] introduced an enhanced selective large-kernel attention module that adaptively recalibrates the weights of feature map regions along the channel and spatial dimensions, allowing the network to focus more on high-contribution areas and reduce interference from less informative regions. Meanwhile, Yan, Shao [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] developed a broadcast self-attention block to capture key fine-grained features globally while avoiding the heavy computational cost of complex matrix multiplications and multi-dimensional exponentiation. Although these attention-based segmentation models have achieved good results in their respective applications, they still fall short when dealing with maize leaf diseases. This is because the diseased regions and the complex background pixels are highly similar, which hinders accurate segmentation. Therefore, efficiently distinguishing between similar disease areas and healthy leaf regions in complex backgrounds remains a challenging problem that needs to be addressed.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eDataset\u003c/h2\u003e \u003cp\u003eWe evaluate the proposed model on two datasets: Single-CD\u0026amp;S and CD\u0026amp;S [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. The CD\u0026amp;S dataset is a publicly available and fair maize disease recognition dataset, which includes three common maize leaf diseases: Northern Leaf Blight (Nlb), Northern Leaf Spot (Nls) and Gray Leaf Spot (Gls). The dataset contains images captured in natural environments, where, in addition to the foreground diseased leaves, the background also includes other diseased leaves, resulting in a complex and cluttered background with potential interference. To enable accurate segmentation of the leaves and lesions, we extracted a subset from the CD\u0026amp;S dataset, annotating only individual leaves and their corresponding diseased regions to create the Single-CD\u0026amp;S dataset. In contrast, the original CD\u0026amp;S dataset includes annotations for multiple leaves and their associated diseased regions.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eData Augmentation\u003c/h3\u003e\n\u003cp\u003eOur research involves two sequential stages: (1) The first stage extracts the target leaf from the complex background, and (2) the second stage involves segmenting the lesions from the leaf images extracted in the first stage. Therefore, each original image requires labels for the leaf, the disease, and the background. The sample data were annotated using the Labelme tool [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/wkentaro/labelme\u003c/span\u003e\u003cspan address=\"https://github.com/wkentaro/labelme\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), and the annotated visual results are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eTo mitigate overfitting and enhance the model's robustness and generalization, we applied the Augmentor module [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] to perform geometric transformations such as random horizontal or vertical flips, random cropping, random sampling, and color/brightness adjustment. Additionally, we incorporated powerful data augmentation techniques from the MMsegmentation library [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] for semantic segmentation. The three maize leaf disease datasets were split into training and test sets with the ratio of 8:2. Moreover, during the training phase, each dataset was further divided into training and validation sets in a 9:1 for cross-validation. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e provides detailed information on the training set, validation set, and data augmentation process.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDetails of used maize leaf diseases datasets.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMaize disease\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClasses\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAcquisition\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAugment\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTrain data\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eVal data\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCD\u0026amp;S\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1540\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e12437\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e9949\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2488\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSingle-CD\u0026amp;S\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1274\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e9758\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e7806\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1952\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eModel design\u003c/h3\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eOverview\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe overall architecture of the proposed LKCAFormer is depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. LKCAFormer comprises two main components: (1) LK-COAT Encoder, which leverages powerful feature extraction capabilities through multi-level large-kernel convolution CNNs and attention modules. The encoder consists of three layers of the LK-COA module. Each layer employs large convolution kernels to extract global features, which are then passed through skip connections to a newly designed coordinated attention block. This block optimises fine details and aggregates features, facilitating both coarse and fine feature representations at the same scale. The encoder outputs shallow features that retain rich local details and edge information, as well as deeper features that capture global semantic information. This dual-level feature extraction ensures accurate identification of leaf and lesion characteristics during downsampling. (2) Cross-Scale Attention Decoder (CSDecoder). Three decoders are designed, each receiving low-resolution feature maps containing high-level semantic information from the encoder. These decoders calculate similarity weights for fine-grained, high-frequency global information and then perform attention operations with coarse-grained, low-frequency feature maps from the upper layers. This results in a fusion of fine-grained and coarse-grained high-frequency features. The fused feature map is subsequently upsampled and concatenated, then processed through a MLP for nonlinear adjustments. This compensates for the reduced sensitivity of convolution operations in capturing detailed information. Finally, the shallow and deep features are fused, and the aggregated feature map is passed to the lightweight segmentation head for final processing.\u003c/p\u003e \u003cp\u003eSpecifically, the input leaf images to the network are of size \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:512\\times\\:512\\times\\:3\$\u003c/span\u003e\u003c/span\u003e. In the encoder, the input image is first processed through a feature extraction head composed of two stacked \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:3\\times\\:3\\:\$\u003c/span\u003e\u003c/span\u003edepthwise convolutions, effectively extracting features and generating a feature map of size 1/4 of the original image, denoted as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{0}\$\u003c/span\u003e\u003c/span\u003e. This helps reduce the initial parameter count in the encoder. Then, through three encoder layers, the feature maps at {1/8,1,16,1/32} of the original image size are obtained, denoted as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{1}\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{2}\$\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{3}\$\u003c/span\u003e\u003c/span\u003e. Each layer employs large convolution kernels with sizes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:k\\:=\\:\\left\\{\\right(\\text{7,9},11),\\:(\\text{11,13,15}),\\:(\\text{15,17,19}\\left)\\right\\}\$\u003c/span\u003e\u003c/span\u003e and channel dimensions of {32, 64, 128, 160}.In the decoding phase, the feature maps \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{1}\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{2}\$\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{3}\$\u003c/span\u003e\u003c/span\u003e from the encoder are passed to the decoder for feature fusion and upsampling to match the size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{0}\$\u003c/span\u003e\u003c/span\u003e. These maps are then concatenated with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{0}\$\u003c/span\u003e\u003c/span\u003e and processed through a simple segmentation head, producing a 512\u0026times;512\u0026times;\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{N}_{cls}\$\u003c/span\u003e\u003c/span\u003e segmentation output, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{N}_{cls}\$\u003c/span\u003e\u003c/span\u003e represents the predefined number of classes (in this case, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{N}_{cls}\$\u003c/span\u003e\u003c/span\u003e = 3).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eLarge-Kernel Convolutional Cooperative Attention\u003c/h2\u003e \u003cp\u003eDue to the high computational complexity of Transformer models, which require substantial computational resources, they are not well-suited for real-time applications in agricultural production environments. However, long-range dependencies and global feature information are particularly well-captured by Transformers, a capability that convolutional operations generally lack. In recent years, efforts have been made by some researchers to enhance the global information-capturing ability of convolutional networks through the use of large convolution kernels, while attention mechanisms have been incorporated to improve the perception of both channel and spatial features, as well as long-range dependencies. This approach leads to a reduction in network depth, an increase in sensitivity to local details, parameter optimisation, and ultimately an improvement in segmentation performance. Building on this analysis, this paper proposes the design of an LK-COAT encoder, which models global information through large-kernel convolutions and captures local features using a cooperative attention mechanism. This allows the model to learn the interaction between local and global features, leading to richer feature representations and enhanced extraction of edge and texture features, thus enabling fine-grained segmentation of disease regions on maize leaves. The LK-COAT encoder is illustrated in Figure (b). Given a feature map \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:F\\in\\:{\\mathbb{R}}^{C\\times\\:H\\times\\:W}\$\u003c/span\u003e\u003c/span\u003e, where \u003cem\u003eC\u003c/em\u003e is the number of input channels, and \u003cem\u003eH\u003c/em\u003e and \u003cem\u003eW\u003c/em\u003e represent the height and width of the feature map, respectively, the high computational cost associated with large-kernel depthwise convolutions is mitigated by decomposing the large convolution into smaller depthwise convolutions, followed by extended depthwise convolutions with relatively large kernels. The output of the LK-COA module can be computed using equations \u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{Z}^{C}\\:=\\:\\sum\\:_{H,W}{W}_{(2d-1)*(2d-1)}^{C}\\:*\\:{F}^{C}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\:{\\stackrel{-}{Z}}_{g}^{C}=\\:\\sum\\:_{H,W}{W}_{\u0026lfloor;\\frac{k}{d}\u0026rfloor;\\times\\:\u0026lfloor;\\frac{k}{d}\u0026rfloor;}^{C}\\:\\:*{\\:\\:Z}^{C}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:{A}_{g}^{C}\\:=SoftMax\\left(\\:Avg\\right({\\stackrel{-}{Z}}_{g}^{C}\\left)\\right)\\:\\:\\odot\\:\\:{\\:F}^{C}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:{A}_{L}^{C}\\:=SoftMax\\left(\\:{W}_{3\\times\\:3}*{F}^{C}\\right)\\:\\odot\\:\\:{\\stackrel{-}{Z}}^{C}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:{\\stackrel{-}{F}}^{C}\\:=\\:{A}_{g}^{C}\\:\\oplus\\:{A}_{L}^{C}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e.\u003c/p\u003e \u003cp\u003eThe symbol \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{*}\$\u003c/span\u003e\u003c/span\u003e represents the convolution operation, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\odot\\:\\:\$\u003c/span\u003e\u003c/span\u003edenotes the Hadamard product. In Eq.\u0026nbsp;(\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{Z}^{C}\$\u003c/span\u003e\u003c/span\u003e refers to the output feature map obtained by applying a depthwise convolution with a kernel size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\left(2d-1\\right)\\times\\:\\left(2d-1\\right)\$\u003c/span\u003e\u003c/span\u003e(where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:d\$\u003c/span\u003e\u003c/span\u003e is the dilation rate) to the input feature map \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:F\$\u003c/span\u003e\u003c/span\u003e. The use of dilated convolutions helps capture detailed information of the maize leaf disease while mitigating the grid effect caused by depthwise separable convolutions, as described in Eq.\u0026nbsp;(\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The notation \"take-down\" refers to the operation of reducing or removing certain features. The output, which eliminates background noise and retains global spatial information, is denoted by Output. We keep the kernel size \u003cem\u003ek\u003c/em\u003e below 23, which allows the model to effectively capture both global and local features. When the kernel size exceeds 23, as demonstrated in the study(Lau, K. W. et.al,2024), it leads to high computational complexity and increased memory usage. In Eq.\u0026nbsp;(\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), the global output feature map undergoes average pooling and activation to compute attention weights for the leaf region. These weights are then applied to the input feature map via a Hadamard product, resulting in the global attention feature map. In Eq.\u0026nbsp;(\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), the input feature map is passed through a depthwise separable convolution \u003cem\u003eW\u003c/em\u003e with a \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:3\\times\\:3\$\u003c/span\u003e\u003c/span\u003e kernel, followed by an activation function. This generates attention weights for the diseased regions, which are then combined with the global output feature map using a Hadamard product, resulting in the local attention feature map. Finally, Eq.\u0026nbsp;(\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) combines these features through addition, producing the final attention feature map. This map eliminates background noise and retains both high-frequency edge features of the leaf and high-frequency features of the disease lesions. This encoding structure effectively extracts global features of the leaf and lesions across spatial, position, and channel dimensions, enhancing the model\u0026rsquo;s ability to represent global features while reducing computational overhead. The LK-COAT encoder we designed consists of three LK-COA modules, with convolution kernels of sizes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:k=\\text{7,11,23}\$\u003c/span\u003e\u003c/span\u003e and dilation rates of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:d=\\text{1,2},3\$\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eDecoder\u003c/h2\u003e \u003cp\u003eAs previously described, we have constructed a network based on an encoder-decoder architecture. After obtaining the features \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\left\\{{F}_{i}\\right\\}}_{i=1}^{3}\$\u003c/span\u003e\u003c/span\u003e from the encoder, we deploy three CSDecoder blocks to progressively integrate high-level semantic features with low-level spatial details, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. For the \u003cem\u003ei-th\u003c/em\u003e decoder block, the input consists of the encoder features \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{i}\$\u003c/span\u003e\u003c/span\u003e at the same level, along with the decoder features \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{i+1}^{D}\$\u003c/span\u003e\u003c/span\u003e from the previous block. The decoding process can be defined as follows:\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:{F}_{i}^{D}\\:=\\:{f}_{{D}_{i}}^{Fi+1}\\left({f}^{AM}\\right({F}_{i},{F}_{i+1}^{D}\\left)\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:{F}^{D}=Mlp\\left(\\:cat\\right(\\sum\\:_{i=1}^{3}{up}_{f}\\left({F}_{i}^{D}\\right)\\left)\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:{F}_{cls}\\:=\\:up\\left({f}_{seg}\\right(Cat({F}^{S}+F0)\\left)\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eEquation (\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e6\u003c/span\u003e) represents the features from the \u003cem\u003ei\u003c/em\u003e-th decoder, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{f}^{AM}\$\u003c/span\u003e\u003c/span\u003edenotes the attention module \u003cem\u003eAM\u003c/em\u003e. Subsequently, the feature maps undergo the operation described in Eq.\u0026nbsp;(\u003cspan refid=\"Equ7\" class=\"InternalRef\"\u003e7\u003c/span\u003e), where upsampling is performed using bilinear interpolation to match the size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}_{0}\$\u003c/span\u003e\u003c/span\u003e. The upsampled feature maps are then concatenated, and a nonlinear feedforward network is applied to generate the final output features \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{F}^{D}\$\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eCross-Scale Attention.\u003c/b\u003e In natural environments, leaves experience varying levels of light exposure, which can lead to shadows that reduce the accuracy of leaf segmentation. Additionally, the similarity in color between lesion edges, leaf color, and parts of the background makes it difficult to accurately extract the true contours of the lesions. Furthermore, in most maize leaf images, the proportion of diseased pixels is relatively small compared to the total image area, which complicates the extraction of small disease features.\u003c/p\u003e \u003cp\u003eDuring the encoding phase, the LK-COAT module provides both global perception and local feature extraction capabilities, but there is a risk of losing edge details in scattered or dense lesion areas. To address this, this section leverages the AM (Attention Mechanism) to optimize the segmentation of leaf and lesion edges, helping to capture more fine-grained lesion details. The AM module is illustrated in Figure (c). Within the AM module, the similarity score matrix is computed using Eq.\u0026nbsp;(\u003cspan refid=\"Equ9\" class=\"InternalRef\"\u003e9\u003c/span\u003e). Specifically, given the input tensor \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{X}\\in\\:{\\mathbb{R}}^{\\text{H}\\times\\:\\text{W}\\times\\:\\text{C}}\$\u003c/span\u003e\u003c/span\u003e, a depthwise convolution with a kernel size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:k\\times\\:k\$\u003c/span\u003e\u003c/span\u003e and the Hadamard product is used to compute the output \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z\$\u003c/span\u003e\u003c/span\u003e, as follows:\u003cdiv id=\"Equ9\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ9\" name=\"EquationSource\"\u003e\n$$\\:\\text{S}\\:=\\:\\text{A}\\:⨀\\:\\text{V}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ10\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ10\" name=\"EquationSource\"\u003e\n$$\\:\\text{A}\\:=\\:{\\text{L}}_{1}{\\text{F}}_{\\text{i}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e10\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ11\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ11\" name=\"EquationSource\"\u003e\n$$\\:\\text{V}\\:=\\:{\\text{L}}_{2}{\\text{F}}_{\\text{i}+1}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e11\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ12\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ12\" name=\"EquationSource\"\u003e\n$$\\:\\text{Z}\\:={\\text{W}}_{3\\times\\:3}\\left(\\text{S}\\right)\\:+\\:{\\text{F}}_{\\text{i}+1}\\:$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e12\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn Eqs.\u0026nbsp;(\u003cspan refid=\"Equ10\" class=\"InternalRef\"\u003e10\u003c/span\u003e) and (\u003cspan refid=\"Equ11\" class=\"InternalRef\"\u003e11\u003c/span\u003e), \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{L}_{1}\\:\$\u003c/span\u003e\u003c/span\u003eand \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{L}_{2}\$\u003c/span\u003e\u003c/span\u003e are the weight matrices of two linear layers, corresponding to the depthwise convolution with a kernel size of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:k\\times\\:k\$\u003c/span\u003e\u003c/span\u003e. This operation enables each spatial location \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:(h,w)\$\u003c/span\u003e\u003c/span\u003e to interact with all pixels within a \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{k}\\times\\:\\text{k}\$\u003c/span\u003e\u003c/span\u003e square region centered at\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\:(h,w)\$\u003c/span\u003e\u003c/span\u003e. Inter-channel information exchange is facilitated through the linear layers. The output for each spatial position is the weighted sum of all pixels within the square region. Compared to self-attention, our approach utilizes convolutions to establish relationships, which, especially when dealing with high-resolution images, is more memory-efficient than self-attention.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eLoss Function\u003c/h2\u003e \u003cp\u003eIn this paper, a combination of Cross-Entropy (CE) and Dice loss is employed. The specific formulas are as follows:\u003cdiv id=\"Equ13\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ13\" name=\"EquationSource\"\u003e\n$$\\:{Loss}_{CE}\\:=\\:-\\:\\sum\\:_{\\text{c}=1}^{\\text{C}}{\\text{y}}_{\\text{c}}\\text{log}\\left(\\widehat{{\\text{y}}_{\\text{c}}}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e13\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ14\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ14\" name=\"EquationSource\"\u003e\n$$\\:{Loss}_{Dice}\\:=\\:1-\\:\\frac{2\\sum\\:_{i}{x}_{i}{y}_{i}}{\\sum\\:_{i}{x}_{i}+\\sum\\:_{i}{y}_{i}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e14\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ15\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ15\" name=\"EquationSource\"\u003e\n$$\\:{Loss}_{total}\\:=\\:0.5\\:*\\:{Loss}_{CE}\\:+\\:{Loss}_{Dice}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e15\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe loss function shown in Eq.\u0026nbsp;(\u003cspan refid=\"Equ15\" class=\"InternalRef\"\u003e15\u003c/span\u003e) combines Cross-Entropy (CE) loss and Dice loss. In the CE loss formula (12), \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{c}\$\u003c/span\u003e\u003c/span\u003e represents the true label of the sample for class \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:c\$\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\widehat{y}}_{c}\$\u003c/span\u003e\u003c/span\u003e is the predicted probability for class\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\:c\$\u003c/span\u003e\u003c/span\u003e. In the Dice loss formula (15), \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{x}_{i}\$\u003c/span\u003e\u003c/span\u003e is the predicted probability that a given element in the prediction map belongs to a specific foreground class, while \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{i}\$\u003c/span\u003e\u003c/span\u003e is the true value of that element in the ground truth map. Unlike CE loss, Dice loss is not affected by the size of the foreground. However, CE loss provides important guidance for the model in learning the Dice loss. Therefore, combining both losses for network training is a more effective and rational approach.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eExperiment settings\u003c/h2\u003e \u003cp\u003eThe experiments were conducted using the public MMsegmentation codebase [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] and PyTorch [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. The model was trained on two NVIDIA GTX 4090 GPUs. During training, images were randomly cropped to a size of 512\u0026times;512. The AdamW optimizer [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] was used with a cosine learning rate decay strategy. The hyperparameters for training were as follows: momentum of 0.9, weight decay of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:1\\times\\:{10}^{-2}\$\u003c/span\u003e\u003c/span\u003e, a batch size of 16, 500 epochs, an initial learning rate of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:1\\times\\:{10}^{-4}\$\u003c/span\u003e\u003c/span\u003e, and a minimum learning rate of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:1\\times\\:{10}^{-7}\$\u003c/span\u003e\u003c/span\u003e. To prevent overfitting, the learning rate was reduced by a factor of 0.1 at specified intervals.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation Metrics\u003c/h2\u003e \u003cp\u003eThe quantitative metrics used to evaluate the model's performance include Precision [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], IoU [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], Dice coefficient [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e], and Recall [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Higher IoU and Dice values typically indicate greater overlap between the predicted and ground truth results, which corresponds to more accurate segmentation.\u003cdiv id=\"Equ16\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ16\" name=\"EquationSource\"\u003e\n$$\\:\\mathbf{P}\\mathbf{r}\\mathbf{e}\\mathbf{c}\\mathbf{i}\\mathbf{s}\\mathbf{i}\\mathbf{o}\\mathbf{n}\\:=\\:\\frac{TP}{TP+FP}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e16\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ17\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ17\" name=\"EquationSource\"\u003e\n$$\\:\\mathbf{R}\\mathbf{e}\\mathbf{c}\\mathbf{a}\\mathbf{l}\\mathbf{l}\\:=\\:\\frac{TP}{TP+FN}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e17\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ18\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ18\" name=\"EquationSource\"\u003e\n$$\\:\\mathbf{D}\\mathbf{i}\\mathbf{c}\\mathbf{e}\\:=\\:\\frac{2TP}{2TP+FP+FN}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e18\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ19\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ19\" name=\"EquationSource\"\u003e\n$$\\:\\mathbf{I}\\mathbf{o}\\mathbf{U}\\:=\\:\\frac{TP}{TP+FP+FN}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e19\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eHere, TP represents the number of positive pixels correctly classified as true positives. TN denotes the number of negative pixels correctly classified as true negatives. FP refers to pixels classified as leaves but are actually part of the background. FN represents pixels classified as background but are actually part of the leaves.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results and Discussion","content":"\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eComparative experiments\u003c/h2\u003e \u003cp\u003eIn this section, the proposed method is compared with several popular deep learning-based semantic segmentation approaches, including CNN-based methods such as U-Net and DeepLab v3+, Transformer-based methods like SegFormer and PVT2, as well as lightweight models such as TopFormer, AFFormer, and SwiftFormer. This comparison is conducted to further validate the feasibility and effectiveness of LKCAFormer.\u003c/p\u003e \u003cp\u003eSpecifically, U-Net features a simple skip connection structure that repeatedly fuses shallow and semantic features. DeepLab v3\u0026thinsp;+\u0026thinsp;adopts an ASPP (Atrous Spatial Pyramid Pooling) structure, using dilated convolutions to expand the receptive field. SegFormer utilizes a hierarchical Transformer block, while the decoder applies a lightweight MLP (Multi-Layer Perceptron) structure. TopFormer refines features layer by layer, effectively capturing global context while preventing detail loss. AFFormer employs a parallel architecture and uses prototype representations as learnable local descriptors to replace the decoder, preserving rich image semantics in high-resolution features. SwiftFormer introduces an efficient additive attention mechanism that learns consistent global context across multiple scales. Each method was trained and tested on three maize leaf disease datasets. The performance of each method was evaluated using seven metrics: Dice, Recall, IoU, Precision, FPS (Frames per Second), total parameters, and FLOPs/G. The results of the segmentation comparisons on the Single-CD\u0026amp;S dataset for three types of maize diseases are recorded in Tables\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, and \u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the proposed method achieves the best segmentation performance on the Gls test set. It outperforms the CNN-based models, U-Net and DeepLab v3+, in terms of segmentation accuracy. Compared to U-Net, the proposed method improves the IoU for background, leaf, and lesion segmentation by 1.14%, 0.6%, and 3.15%, respectively. DeepLab v3\u0026thinsp;+\u0026thinsp;exhibits lower IoU values than the proposed method, with reductions of 0.92% for leaf segmentation, 2.54% for lesion segmentation, and 1.77% for background segmentation. When compared to the proposed method, SegFormer shows a decrease in IoU by 0.45% for leaf segmentation, 2.83% for lesion segmentation, and 0.96% for background segmentation. In contrast, the proposed method outperforms PVT2, with improvements of 0.58% in leaf segmentation IoU, 2.83% in lesion segmentation IoU, and 0.57% in background segmentation IoU. Additionally, the proposed method consistently outperforms the lightweight models TopFormer and AFFormer. While SwiftFormer exhibits segmentation accuracy similar to or slightly better than the other methods, it still falls short of the proposed method. Specifically, the proposed method improves the IoU for leaf segmentation by 0.35%, for lesion segmentation by 0.47%, and for background segmentation by 0.78% compared to SwiftFormer.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Gls Test Set of the Single-CD\u0026amp;S Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"13\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eGls\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c9\" namest=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c13\" namest=\"c10\"\u003e \u003cp\u003eBackground\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e86.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e87.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e86.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e95.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.79\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e86.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e86.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e86.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e98.29\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e95.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e96.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.12\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegformer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e87.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e88.81\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e86.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cb\u003e98.59\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.13\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e88.27\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e76.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e87.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e88.81\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.31\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.40\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.22\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e88.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e88.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e87.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e90.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e95.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e92.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e95.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e94.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e87.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e88.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e87.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.24\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e98.77\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e88.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e78.01\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e88.19\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e88.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.13\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.21\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e98.53\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e88.86\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e78.48\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e88.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e88.32\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e98.19\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e96.56\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.13\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e98.42\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e96.97\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.48\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.26\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccording to Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, the proposed method demonstrates the best segmentation performance on the maize leaf Nls disease test set. Compared to U-Net, the proposed method improves the IoU for leaf and lesion segmentation by 1.08% and 3.51%, respectively, while achieving nearly the same IoU for background segmentation, with an increase of 0.03%. DeepLab v3\u0026thinsp;+\u0026thinsp;shows lower segmentation accuracy than the proposed method for all categories: the IoU for background segmentation is 1.24% lower, leaf segmentation is 1.74% lower, and lesion segmentation is 2.12% lower. Additionally, the proposed method outperforms SegFormer by 0.72% in lesion segmentation IoU. The segmentation performance of PVT2 is weaker than the proposed method, with IoU values for background, leaf, and lesion segmentation being 0.21%, 1.01%, and 108% lower, respectively. SwiftFormer exhibits better segmentation performance than lightweight methods like TopFormer and AFFormer, but the proposed method still outperforms SwiftFormer. Specifically, the proposed method improves the IoU for background segmentation by 0.56%, for leaf segmentation by 0.62%, and for lesion segmentation by 0.95%. In summary, while SwiftFormer shows higher segmentation accuracy than other methods, it does not outperform the method presented in this study.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nls Test Set of the Single-CD\u0026amp;S Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"13\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eNls\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c9\" namest=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c13\" namest=\"c10\"\u003e \u003cp\u003eBackground\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e83.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e72.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e82.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e81.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e98.67\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.78\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.43\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.49\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e74.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e86.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e85.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e98.39\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.12\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegformer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e75.49\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e86.59\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e85.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.11\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e96.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e85.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e85.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.30\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.37\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e83.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e72.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e85.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e85.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.38\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.77\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e80.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e70.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e79.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e81.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e91.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e95.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cb\u003e98.84\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e85.92\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e84.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e86.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e86.20\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e76.21\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e87.67\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e86.99\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e98.49\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e96.29\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.33\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e98.72\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e97.81\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e98.16\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e presents the experimental results on the maize leaf Nlb disease test set. The proposed method again shows the best segmentation performance. Notably, TopFormer performs poorly in leaf and lesion segmentation, with IoU values 4.63% and 7.56% lower than those of the proposed method. AFFormer slightly outperforms the proposed method in background segmentation IoU, but its IoU for leaf and lesion segmentation is 1.23% and 0.54% lower, respectively. Compared to the methods mentioned above, the CNN-based U-Net and DeepLab v3\u0026thinsp;+\u0026thinsp;show similar performance, but both perform worse than the proposed method. The Transformer-based SegFormer performs similarly to the proposed method in lesion segmentation, with a difference of only 0.13%, but its performance in leaf and background segmentation is 1.88% and 1.11% lower, respectively. PVT2 shows lower IoU values in background, leaf, and lesion segmentation by 0.61%, 1.61%, and 0.78%, respectively. SwiftFormer, with segmentation accuracy comparable to U-Net and DeepLab v3+, still falls short of the proposed method. In conclusion, the proposed method demonstrates significant improvements in segmentation accuracy compared to other methods across all tested datasets.\u003c/p\u003e \u003cp\u003eTo better validate the performance of the proposed method in real-world scenarios, each method was trained and tested on the CD\u0026amp;S dataset, which contains complex backgrounds and multiple leaf diseases. The results were then compared with the proposed method. Tables\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, \u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, and \u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e present the segmentation performance comparisons of the proposed method and other methods on three disease test sets. As seen in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, compared to Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, all methods show poorer performance when segmenting multiple leaves and disease areas compared to segmenting single leaves and disease regions. Despite this, the proposed method still performs the best in terms of segmentation performance, achieving IoUs of 99.02%, 97.39%, and 70.52% for background, leaf, and lesion segmentation, respectively. These results outperform the worst-performing PVT2 by 2.55%, 0.65%, and 6.83%, respectively, in lesion segmentation.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nlb Test Set of the Single-CD\u0026amp;S Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"13\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eNlb\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c9\" namest=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c13\" namest=\"c10\"\u003e \u003cp\u003eBackground\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e83.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e72.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e83.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e84.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.50\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cb\u003e98.43\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e96.79\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e82.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e72.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e83.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e82.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.99\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.20\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.32\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegformer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e84.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e73.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e84.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e85.81\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e96.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e73.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e86.26\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e84.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e98.37\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e98.62\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e78.45\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e66.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e78.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e79.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.17\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e92.56\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e96.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e98.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e84.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e73.45\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e84.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e85.30\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e95.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e98.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.37\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e72.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e85.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e84.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.99\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e98.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e85.73\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e73.99\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e85.91\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e85.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e98.49\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e97.19\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e98.13\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e98.76\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e98.21\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.38\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.66\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Gls Test Set of the CD\u0026amp;S Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"13\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eGls\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c9\" namest=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c13\" namest=\"c10\"\u003e \u003cp\u003eBackground\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e75.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e77.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e75.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.34\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e97.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e98.84\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e76.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e65.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e76.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e75.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.35\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.92\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegformer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e77.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e66.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e78.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e76.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e99.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e99.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.82\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e74.27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e63.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e73.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e73.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.24\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e75.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e65.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e75.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e75.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.84\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e79.23\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e68.48\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e80.32\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e79.76\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e97.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.77\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e99.18\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e98.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cb\u003e99.35\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e99.37\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e78.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e68.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e78.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e78.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e97.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e97.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e99.21\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.97\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e99.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.89\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e80.86\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e70.52\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e81.01\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e79.32\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e98.49\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e97.39\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.62\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e99.13\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e99.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e99.02\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e99.34\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e99.66\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nls Test Set of the CD\u0026amp;S Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"13\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eNls\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c9\" namest=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c13\" namest=\"c10\"\u003e \u003cp\u003eBackground\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e73.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e62.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e73.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e73.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e95.64\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e93.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.84\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e95.73\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e96.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e94.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e96.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e96.19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e72.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e61.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e72.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e72.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e96.43\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e94.35\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e97.08\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e96.25\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.75\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.18\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cb\u003e98.84\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.27\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegformer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e76.31\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e65.63\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e76.29\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e75.84\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e94.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e92.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e94.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e94.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.36\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e74.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e74.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e75.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e93.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e91.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e94.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e93.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e71.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e61.08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e71.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e70.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e90.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e83.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e91.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e91.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e94.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e89.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e93.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e93.97\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e70.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e59.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e69.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e70.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e91.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e87.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e92.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e91.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e93.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e91.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e93.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e93.32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e72.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e62.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e72.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e72.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e94.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e93.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e93.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e93.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e98.67\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e76.63\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e67.28\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e77.49\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e76.99\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e95.19\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e95.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e95.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e99.06\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e98.02\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.34\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.66\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAmong the lightweight methods, TopFormer shows lower IoUs for background, leaf, and lesion segmentation than the proposed method, with reductions of 2.15%, 0.83%, and 4.60%, respectively. AFFormer and SwiftFormer show similar segmentation performance, but both fall short of the proposed method, with average reductions of 0.18%, 0.46%, and 1.82%, respectively. Compared to DeepLab v3+, the proposed method improves the IoU for background and lesion segmentation by 1.91% and 4.54%, respectively, while showing a similar performance for leaf segmentation, with a 0.04% increase. U-Net, on the other hand, shows lower segmentation accuracy than the proposed method across all categories. The IoU for background segmentation is 1.24% lower, for leaf segmentation it is 0.21% lower, and for lesion segmentation it is 6.19% lower compared to the proposed method.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e presents the segmentation performance comparison of the proposed method with other methods on the Nls test set. Overall, the segmentation results are relatively poor, but the proposed method still achieves the highest accuracy, with IoUs of 98.02%, 95.19%, and 67.28% for background, leaf, and lesion segmentation, respectively. The worst performance is observed with the lightweight AFFormer, which achieves IoUs of 91.37%, 87.96%, and 59.11% for background, leaf, and lesion segmentation, respectively. SegFormer performs worse than the proposed method across all categories. Specifically, the IoU for background segmentation is 1.01% lower, for leaf segmentation it is 2.88% lower, and for lesion segmentation it is 1.58% lower compared to the proposed method. Furthermore, the proposed method outperforms U-Net, with an improvement of 4.78% in lesion segmentation IoU, 2.01% in leaf segmentation IoU, and 3.24% in background segmentation IoU. DeepLab v3\u0026thinsp;+\u0026thinsp;shows a decrease of 5.3% in lesion segmentation IoU, 0.84% in leaf segmentation IoU, and 0.82% in background segmentation IoU compared to the proposed method.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eQuantitative Comparison of CNN-based and Transformer-based SOTA Methods on the Nlb Test Set of the CD\u0026amp;S Dataset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"13\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c13\" colnum=\"13\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c5\" namest=\"c2\"\u003e \u003cp\u003eNlb\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c9\" namest=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"4\" nameend=\"c13\" namest=\"c10\"\u003e \u003cp\u003eBackground\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDice\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c13\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e74.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e74.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e74.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e97.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e98.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.53\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.99\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e74.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e65.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e74.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e74.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.43\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.65\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.57\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.19\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.51\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.20\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.03\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegformer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e73.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e62.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e74.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e74.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e96.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.99\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e70.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e60.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e71.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e70.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.73\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e78.45\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e66.86\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e78.66\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e79.09\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e96.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e97.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e70.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e59.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e69.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e70.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.38\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e96.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e96.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e97.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e95.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e97.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e98.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e74.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e63.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e74.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e73.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e97.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e97.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e98.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e98.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e98.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e99.04\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e79.11\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e69.48\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e79.31\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e80.01\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e99.22\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e98.19\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e99.00\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e99.10\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e99.31\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e99.01\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e\u003cb\u003e99.11\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c13\"\u003e \u003cp\u003e\u003cb\u003e99.24\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e presents the experimental results on the maize leaf Nlb disease test set, where the proposed method demonstrates the best segmentation performance. Specifically, the Transformer-based SegFormer method achieves 6.62% lower IoU for lesion segmentation compared to the proposed method, and its IoU for leaf and background segmentation is 2.33% and 2.18% lower, respectively. The proposed method outperforms PVT2 by 2.74%, 1.44%, and 9.27% in IoU for background, leaf, and lesion segmentation, respectively. U-Net and DeepLab v3\u0026thinsp;+\u0026thinsp;exhibit similar segmentation performance, with DeepLab v3\u0026thinsp;+\u0026thinsp;performing slightly better overall. However, DeepLab v3\u0026thinsp;+\u0026thinsp;still lags behind the proposed method by 2.01%, 0.54%, and 4.2% in IoU for background, leaf, and lesion segmentation, respectively. The proposed method shows better segmentation accuracy than the lightweight TopFormer, with improvements of 3.05%, 1.81%, and 2.62% in background, leaf, and lesion segmentation IoU. AFFormer, on the other hand, shows relatively poor performance, with IoUs of only 95.68%, 95.06%, and 59.45% for background, leaf, and lesion segmentation, respectively. In summary, the proposed method demonstrates a significant improvement in segmentation accuracy compared to other methods.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e presents a comparison of the methods based on the remaining evaluation metrics. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e, the proposed method outperforms U-Net in terms of FPS by 7.36 ms. Additionally, the total number of parameters and FLOPs of the proposed method are only 12.7% and 0.14% of those of U-Net, respectively. PVT2 achieves the highest FPS among all methods, but its parameter count is more than twice that of the proposed method. AFFormer has the fewest parameters and FLOPs, but its FPS is 0.94 ms lower than that of the proposed method. Compared to TopFormer, the proposed method increases FPS by 6.4 ms. Furthermore, in terms of total parameters and FLOPs, the proposed method is more efficient, requiring 1.46M fewer parameters and 1.05G fewer FLOPs than TopFormer. In summary, based on a comprehensive comparison of all parameters, the proposed method strikes the best balance between segmentation performance and computational efficiency, offering superior segmentation accuracy with lower computational overhead.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe results of different methods on the remaining evaluation indicators.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFPS\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTotal parameters/M\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFLOPs/G\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUNet(2015)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e22.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e29.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e76.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeeplabV3+(2018)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e125.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18.49\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSegFormer(2021)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e94.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6.77\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePVT3(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e13.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e7.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.53\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopformer(2022)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e21.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.18\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAFFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e15.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSwiftFormer(2023)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e18.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3.29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e17.47\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLKCAFormer(ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e14.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.13\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, and \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e display the segmentation results for each method on the single-leaf test sets of Gls, Nls, and Nlb diseases in the Single-CD\u0026amp;S dataset. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, the white dashed boxes highlight specific disease areas where lesion colors are similar to the leaf color due to lighting conditions. These regions are crucial for evaluating the accuracy of lesion segmentation. Comparing Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(a) and 3(c), U-Net correctly segments most of the lesions, but segmentation is poor in certain areas, with significant loss of detail. However, comparing Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(c) and 3(d), it is clear that DeepLab v3\u0026thinsp;+\u0026thinsp;performs worse than U-Net. In Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(e), SegFormer shows better segmentation performance than the previous two methods, but its segmentation of some edge regions is poor. While it has strong global modeling capabilities, some local details are lost. Comparing Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(f), 3(h), and 3(i), AFFormer effectively reduces noise from lighting and other factors, focusing on lesion segmentation, but it performs poorly in segmenting lesion edges. In Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(j), the proposed method, LKCAFormer, compensates effectively for the loss of fine-grained details caused by aggregating different resolutions, providing accurate segmentation in key regions. It also performs well in segmenting leaf-edge lesions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the Nls single-leaf disease segmentation results. Since Nls lesions are relatively concentrated and the diseased areas are scattered, the segmentation differences between methods are smaller. In Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e(h), the AFFormer method performs poorly in lesion area segmentation, only capturing a few prominent lesions. Figures\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e(c), 4(d), 4(f), and 4(g) show that these methods can segment dense lesions, but they struggle with lesion edges, resulting in incorrect segmentation. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e(e) performs poorly in segmenting independent lesions at the edges. In Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e(i) and 4(j), differences are visible at the leaf\u0026rsquo;s striped areas, with SwiftFormer mistakenly classifying the stripe color as lesions, while the proposed method segments these areas more accurately, with better performance on leaf-edge lesion segmentation.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e presents the Nlb single-leaf disease segmentation results, with the white dashed boxes indicating the dense lesion areas. Comparing Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(a) and 5(c), U-Net shows poor segmentation performance in specific regions. Methods in Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(d), 5(e), 5(f), and 5(i) perform poorly in lesion segmentation in these regions, and these methods mistakenly classify the leaf\u0026rsquo;s main veins, which are similar in color to the lesions, as lesions, resulting in incorrect segmentation. Comparing Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e(h) and 5(j), the proposed method segments more dense lesions in specific regions and performs better in segmenting the lesion edges around larger regions near the main veins.\u003c/p\u003e \u003cp\u003eThe results from these experiments demonstrate that LKCAFormer not only achieves clearer segmentation of the leaf-edge regions but also segments lesion edges more accurately, providing superior segmentation performance overall.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, \u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e, and \u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e present the segmentation results for each method on the multi-leaf test sets of Gls, Nls, and Nlb diseases from the CD\u0026amp;S dataset. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, Gls lesions are densely distributed, and lighting effects cause parts of the leaves to reflect light, making the color of the reflections similar to that of the lesions, which can lead to missegmentation. Comparing Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(a) and 6(c), the dense lesion areas are segmented fairly well, but some leaf areas are misclassified as lesions. Additionally, shadowed regions show poor segmentation performance. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(h) provides the best segmentation for the shadowed regions, but a comparison with Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(b) reveals that the annotated lesions were not segmented correctly. Figures\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(f) and 6(g) show poor performance in segmenting the edges of dense lesions, leading to incorrect segmentations. In Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(j), the proposed method achieves the best overall segmentation, correctly segmenting most of the annotated lesions from Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e(b) with minimal missegmentation.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e presents the segmentation results for Nls disease, where the background and leaf colors are similar. Due to the presence of grass and maize plants in the background, which share similar colors with the leaves, methods in Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(c), 7(d), 7(g), 7(h), and 7(i) all show varying degrees of incorrect segmentation of the leaves. Other methods, such as those in Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(e) and 7(f), fail to effectively segment the leaf edges where lighting conditions affect them. In contrast, the proposed method in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(j) accurately segments the leaves, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(b). Regarding lesion segmentation, since Nls lesions are small and scattered, all methods perform similarly on larger lesions. However, the proposed method (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(j)) performs better on smaller lesions. Other methods, such as Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(c) and 7(d), incorrectly segment non-lesion areas as lesions, while methods like Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e(g) and 7(i) mistakenly classify the background as lesions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e displays the segmentation results for Nlb disease on multi-leaf images. Lighting effects in the background cause areas similar in color to the leaves, leading to missegmentation, as shown in Figs.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e(e), 8(f), and 8(h). Comparing Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e(j) with Figs.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e(c) and 8(d), the performance for lesion segmentation is comparable across all methods, with most annotated lesions being correctly segmented. However, Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e(j) performs better in segmenting both lesion edges and leaf boundaries.\u003c/p\u003e \u003cp\u003eIn summary, the LKCAFormer method demonstrates superior segmentation performance in the complex background multi-leaf CD\u0026amp;S test sets. It effectively minimizes background noise, focuses on the leaf regions, and accurately segments lesions, whether they are dispersed or dense, showing overall superior segmentation accuracy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eAblation studies\u003c/h2\u003e \u003cp\u003eIn this section, five ablation experiments are designed to verify the adaptability of the LK-COA module across different model architectures and its effectiveness in optimizing global feature modeling and detail feature extraction. Specifically, in Test 1, LKCAFormer-TR is a model where the LK-COA module is removed and replaced with three Transformer blocks. In Test 2, the traditional Transformer blocks are replaced with one LK-COA module. Test 3 adds two LK-COA modules, while Test 4 adds three LK-COA modules, which corresponds to the model proposed in this paper. Test 5 includes four LK-COA modules. Additionally, the ablation experiments in this section are conducted using the three disease datasets from the Single-CD\u0026amp;S dataset. The evaluation results of the ablation study are recorded in Table\u0026nbsp;\u003cspan refid=\"Tab9\" class=\"InternalRef\"\u003e9\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab9\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 9\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe results of ablation studies on three maize leaf datasets.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"11\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eTest No.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e \u003cp\u003eGls\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c8\" namest=\"c6\"\u003e \u003cp\u003eNls\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c11\" namest=\"c9\"\u003e \u003cp\u003eNlb\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c8\" namest=\"c6\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c11\" namest=\"c9\"\u003e \u003cp\u003eIoU\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDisease\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eBackgorund\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eDisease\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eBackgorund\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eDisease\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eBackgorund\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLKCAFormer-TR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e95.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e75.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e96.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e74.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e96.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e95.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e72.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e97.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLKCAFormer-LC1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e83.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e58.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e90.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e80.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e56.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e84.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e77.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e54.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e82.06\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLKCAFormer-LC2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e88.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e70.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e93.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e89.27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e69.95\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e93.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e86.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e65.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e91.43\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLKCAFormer-LC3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.56\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e78.48\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.97\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.29\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e76.21\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e97.81\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e96.19\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cb\u003e73.99\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e98.21\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLKCAFormer\u0026thinsp;+\u0026thinsp;LC4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e97.11\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e76.15\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e97.86\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e97.91\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e75.10\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e98.38\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e97.92\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e73.02\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e\u003cb\u003e98.93\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab9\" class=\"InternalRef\"\u003e9\u003c/span\u003e, a comparison between Experiment 1 and Experiment 2 clearly indicates a significant decrease in segmentation accuracy when the traditional Transformer blocks are replaced with the proposed LK-COA module. However, when the LK-COA module is added to the model in Experiment 4 (with three LK-COA modules), the IoU for lesion segmentation across the three maize disease test sets improves by 3.11%, 1.26%, and 1.89%, respectively, compared to Experiment 1. Interestingly, when four LK-COA modules are added in Experiment 5, the IoU for lesion segmentation decreases, but the accuracy for background and leaf segmentation improves, which can be attributed to the large-kernel convolution's ability to better capture global features. Considering the segmentation results for lesions, leaves, and background, the encoder with three stacked LK-COA modules provides a more balanced improvement in segmentation accuracy. It not only enhances the perception of global features but also improves the model's ability to extract finer details and edge features, thereby effectively boosting segmentation performance for the background, leaves, and lesions. Figure\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e illustrates the segmentation results for leaf and lesion areas in each test. Comparing the two rows in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e for Test 1 and Test 4, it is evident that replacing the traditional Transformer blocks with the LK-COA module enables the model to capture more fine-grained lesions, resulting in more precise segmentation of leaf and lesion boundaries. From the comparison between the row in Experiment 4 and the other experiments, it is clear that the proposed method not only segments the contours of the leaves and lesions more clearly but also captures more small lesion areas, accurately defining the lesion boundaries and mitigating the issue of small lesions merging in dense lesion regions. The experimental results demonstrate that the proposed method improves segmentation performance for various types of maize leaf lesions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis paper presents a novel lightweight algorithm, LKCAFormer, for precise segmentation of maize disease lesions. The network incorporates the large-kernel convolution cooperative attention (LK-COA) module, which uses large-kernel convolutions to model global features and capture global context. The COA attention mechanism is then applied to extract finer lesion details, focusing on lesion segmentation. This module aggregates and enhances the ability to capture local edge and detail features, improving the extraction of small lesions and alleviating the problem of lesion merging. The CSDecoder decoder is designed to fuse shallow features, rich in detail and edge information, with deeper features containing stronger semantic information, allowing for precise recovery and yielding finely segmented results. Experimental results demonstrate that, compared to other segmentation methods, LKCAFormer achieves the best segmentation performance across the three disease datasets tested. These findings indicate that LKCAFormer can provide valuable technical support for pathological image analysis of various maize leaf diseases. It is worth noting that LKCAFormer offers computational advantages, but experiments were only conducted on three maize disease leaf datasets. Future research will focus on further improving the model's accuracy and enhancing its computational performance for deployment on edge computing devices. Additionally, further experiments will be conducted on a broader range of crop disease datasets.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization, J.L.G. and X.H.J.; methodology, J.H; validation, J.H and C.J.Z.; writing\u0026mdash;original draft preparation,J.H ; writing\u0026mdash;review and editing, X.F.Y, and J.L.G; visualization, J.H; project administration. X.H.J.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was supported by the National Natural Science Foundation of China(62061037、31960494)，Natural Science Foundation of Inner Mongolia(2023LHMS06017、2023QN06006、NJZZ21068),\u0026nbsp;Science and Technology R\u0026amp;D Program of Inner Mongolia(0200GG0169).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets used during the current study are available from the correspond ing author on reasonable request.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eData Availability Statement: The datasets used during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInstitutional review board statement\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInformed consent statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflict of interest.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eRonneberger, O., P. Fischer, and T. Brox, \u003cem\u003eU-net: Convolutional networks for biomedical image segmentation.\u003c/em\u003e 2015: p. 234-241.\u003c/li\u003e\n\u003cli\u003eZhao, H., et al., \u003cem\u003ePyramid scene parsing network.\u003c/em\u003e 2017: p. 2881-2890.\u003c/li\u003e\n\u003cli\u003eSun, K., et al., \u003cem\u003eHigh-resolution representations for labeling pixels and regions.\u003c/em\u003e arXiv preprint arXiv:1904.04514, 2019.\u003c/li\u003e\n\u003cli\u003eChen, L.-C., \u003cem\u003eSemantic image segmentation with deep convolutional nets and fully connected CRFs.\u003c/em\u003e arXiv preprint arXiv:1412.7062, 2014.\u003c/li\u003e\n\u003cli\u003eChen, L.-C., \u003cem\u003eRethinking atrous convolution for semantic image segmentation.\u003c/em\u003e arXiv preprint arXiv:1706.05587, 2017.\u003c/li\u003e\n\u003cli\u003eChen, L.-C., et al., \u003cem\u003eDeeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.\u003c/em\u003e IEEE transactions on pattern analysis and machine intelligence, 2017. \u003cstrong\u003e40\u003c/strong\u003e(4): p. 834-848 %@ 0162-8828.\u003c/li\u003e\n\u003cli\u003eDivyanth, L.G., A. Ahmad, and D. Saraswat, \u003cem\u003eA two-stage deep-learning based segmentation model for crop disease quantification based on corn field imagery.\u003c/em\u003e Smart Agricultural Technology, 2023. \u003cstrong\u003e3\u003c/strong\u003e: p. 100108 %@ 2772-3755.\u003c/li\u003e\n\u003cli\u003eYang, Y., H. Shan, and F. Qu, \u003cem\u003eMaize disease segmentation method based on improved image segmentation network model.\u003c/em\u003e 2023. \u003cstrong\u003e12610\u003c/strong\u003e: p. 481-485.\u003c/li\u003e\n\u003cli\u003eDosovitskiy, A., \u003cem\u003eAn image is worth 16x16 words: Transformers for image recognition at scale.\u003c/em\u003e arXiv preprint arXiv:2010.11929, 2020.\u003c/li\u003e\n\u003cli\u003eXie, E., et al., \u003cem\u003eSegFormer: Simple and efficient design for semantic segmentation with transformers.\u003c/em\u003e Advances in neural information processing systems, 2021. \u003cstrong\u003e34\u003c/strong\u003e: p. 12077-12090.\u003c/li\u003e\n\u003cli\u003eYu, W., et al., \u003cem\u003eMetaformer is actually what you need for vision.\u003c/em\u003e 2022: p. 10819-10829.\u003c/li\u003e\n\u003cli\u003eAhmad, A., et al., \u003cem\u003eCD\u0026amp;S dataset: Handheld imagery dataset acquired under field conditions for corn disease identification and severity estimation.\u003c/em\u003e arXiv preprint arXiv:2110.12084, 2021.\u003c/li\u003e\n\u003cli\u003eRussell, B.C., et al., \u003cem\u003eLabelMe: a database and web-based tool for image annotation.\u003c/em\u003e International journal of computer vision, 2008. \u003cstrong\u003e77\u003c/strong\u003e: p. 157-173 %@ 0920-5691.\u003c/li\u003e\n\u003cli\u003eBloice, M.D., P.M. Roth, and A. Holzinger, \u003cem\u003eBiomedical image augmentation using Augmentor.\u003c/em\u003e Bioinformatics, 2019. \u003cstrong\u003e35\u003c/strong\u003e(21): p. 4522-4524 %@ 1367-4803.\u003c/li\u003e\n\u003cli\u003eContributors, M., \u003cem\u003eMMSegmentation: Openmmlab semantic segmentation toolbox and benchmark.\u003c/em\u003e 2020.\u003c/li\u003e\n\u003cli\u003eZhang, S. and C. Zhang, \u003cem\u003eModified U-Net for plant diseased leaf image segmentation.\u003c/em\u003e Computers and Electronics in Agriculture, 2023. \u003cstrong\u003e204\u003c/strong\u003e: p. 107511 %@ 0168-1699.\u003c/li\u003e\n\u003cli\u003eZhu, S., et al., \u003cem\u003eA novel approach for apple leaf disease image segmentation in complex scenes based on two-stage DeepLabv3+ with adaptive loss.\u003c/em\u003e Computers and Electronics in Agriculture, 2023. \u003cstrong\u003e204\u003c/strong\u003e: p. 107539 %@ 0168-1699.\u003c/li\u003e\n\u003cli\u003eZhang, X., et al., \u003cem\u003eResearch of segmentation recognition of small disease spots on apple leaves based on hybrid loss function and cbam.\u003c/em\u003e Frontiers in Plant Science, 2023. \u003cstrong\u003e14\u003c/strong\u003e: p. 1175027 %@ 1664-462X.\u003c/li\u003e\n\u003cli\u003eChang, B., et al., \u003cem\u003eA general-purpose edge-feature guidance module to enhance vision transformers for plant disease identification.\u003c/em\u003e Expert Systems with Applications, 2024. \u003cstrong\u003e237\u003c/strong\u003e: p. 121638 %@ 0957-4174.\u003c/li\u003e\n\u003cli\u003eThai, H.-T., K.-H. Le, and N.L.-T. Nguyen, \u003cem\u003eFormerLeaf: An efficient vision transformer for Cassava Leaf Disease detection.\u003c/em\u003e Computers and Electronics in Agriculture, 2023. \u003cstrong\u003e204\u003c/strong\u003e: p. 107518 %@ 0168-1699.\u003c/li\u003e\n\u003cli\u003eLiu, Z., et al., \u003cem\u003eA convnet for the 2020s.\u003c/em\u003e Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022: p. 11976-11986.\u003c/li\u003e\n\u003cli\u003eDing, X., et al., \u003cem\u003eScaling up your kernels to 31x31: Revisiting large kernel design in cnns.\u003c/em\u003e 2022: p. 11963-11975.\u003c/li\u003e\n\u003cli\u003ePeng, C., et al., \u003cem\u003eLarge kernel matters--improve semantic segmentation by global convolutional network.\u003c/em\u003e 2017: p. 4353-4361.\u003c/li\u003e\n\u003cli\u003eLiu, S., et al., \u003cem\u003eMore convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity.\u003c/em\u003e arXiv preprint arXiv:2207.03620, 2022.\u003c/li\u003e\n\u003cli\u003eEl-Assiouti, H.S., et al., \u003cem\u003eLite-SRGAN and Lite-UNet: toward fast and accurate image super-resolution, segmentation, and localization for plant leaf diseases.\u003c/em\u003e IEEE Access, 2023. \u003cstrong\u003e11\u003c/strong\u003e: p. 67498-67517 %@ 2169-3536.\u003c/li\u003e\n\u003cli\u003eZhang, X., et al., \u003cem\u003eUPFormer: U-sharped perception lightweight transformer for segmentation of field grape leaf diseases.\u003c/em\u003e Expert Systems with Applications, 2024. \u003cstrong\u003e249\u003c/strong\u003e: p. 123546 %@ 0957-4174.\u003c/li\u003e\n\u003cli\u003eZhang, Y. and C. Lv, \u003cem\u003eTinySegformer: A lightweight visual segmentation model for real-time agricultural pest detection.\u003c/em\u003e Computers and Electronics in Agriculture, 2024. \u003cstrong\u003e218\u003c/strong\u003e: p. 108740 %@ 0168-1699.\u003c/li\u003e\n\u003cli\u003eShi, M., et al., \u003cem\u003eLightweight context-aware network using partial-channel transformation for real-time semantic segmentation.\u003c/em\u003e IEEE Transactions on Intelligent Transportation Systems %@ 1524-9050, 2024.\u003c/li\u003e\n\u003cli\u003eSheng, X., et al., \u003cem\u003eAn edge-guided method to fruit segmentation in complex environments.\u003c/em\u003e Computers and Electronics in Agriculture, 2023. \u003cstrong\u003e208\u003c/strong\u003e: p. 107788 %@ 0168-1699.\u003c/li\u003e\n\u003cli\u003eLu, J., et al., \u003cem\u003eEAIS-Former: An efficient and accurate image segmentation method for fruit leaf diseases.\u003c/em\u003e Computers and Electronics in Agriculture, 2024. \u003cstrong\u003e218\u003c/strong\u003e: p. 108739 %@ 0168-1699.\u003c/li\u003e\n\u003cli\u003eChen, G., et al., \u003cem\u003eESKNet: An enhanced adaptive selection kernel convolution for ultrasound breast tumors segmentation.\u003c/em\u003e Expert Systems with Applications, 2024. \u003cstrong\u003e246\u003c/strong\u003e: p. 123265 %@ 0957-4174.\u003c/li\u003e\n\u003cli\u003eYan, S., et al., \u003cem\u003eLiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention.\u003c/em\u003e Expert Systems with Applications, 2024. \u003cstrong\u003e237\u003c/strong\u003e: p. 121338 %@ 0957-4174.\u003c/li\u003e\n\u003cli\u003ePaszke, A., et al., \u003cem\u003ePytorch: An imperative style, high-performance deep learning library.\u003c/em\u003e Advances in neural information processing systems, 2019. \u003cstrong\u003e32\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eLoshchilov, I. and F. Hutter, \u003cem\u003eFixing weight decay regularization in adam.\u003c/em\u003e arXiv preprint arXiv:1711.05101, 2017. \u003cstrong\u003e5\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eGarcia-Garcia, A., et al., \u003cem\u003eA review on deep learning techniques applied to semantic segmentation.\u003c/em\u003e arXiv preprint arXiv:1704.06857, 2017.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"maize leaf disease, Lightweight, Large-kernel, cooperative attention, Semantic Segmentation","lastPublishedDoi":"10.21203/rs.3.rs-6543171/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6543171/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIn smart agriculture, segmentation models are essential for the early and accurate detection of diseases. However, the complex backgrounds and diverse diseases on maize leaves present significant challenges. Although current models have improved, these advancements often lead to larger model sizes and higher computational demands, making them difficult to deploy on hardware with limited resources. To overcome these issues, we propose a new lightweight segmentation network called LKCAFormer. This network is specifically designed for accurate maize leaf disease segmentation and is built upon a coordinated attention mechanism and cross-scale large-kernel convolutions. Our approach introduces the Large-Kernel Convolution Cooperative Attention (LK-COA) module, which uses large-kernel convolutions to extract global features and a cooperative attention mechanism to capture fine details of small spots. This combination enhances the segmentation of small spots and reduces errors caused by spot adhesion. Additionally, the CSDecoder effectively fuses shallow features, rich in edge and detail information, with deeper semantic features to produce precise segmentation results. Experimental results on three maize leaf disease datasets demonstrate that our method outperforms existing segmentation techniques, confirming its effectiveness in the pathological analysis of maize leaf diseases.\u003c/p\u003e","manuscriptTitle":"LKCAFormer: A Lightweight Transformer with Large-Kernel Cooperative Attention for the Segmentation of Field Maize Leaf Diseases","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-15 10:58:18","doi":"10.21203/rs.3.rs-6543171/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-08-12T12:58:09+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-06T19:05:49+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-05T13:45:49+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-03T15:20:04+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"174228501540563203366399365165316344462","date":"2025-07-25T12:18:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"255263264052299705212532878155628390542","date":"2025-07-25T10:42:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"146515033573357788683085517117119580124","date":"2025-07-24T15:12:23+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"59213795091472590715452272706462641909","date":"2025-07-23T13:34:58+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-30T06:17:39+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"258138946411621581635991294078895702986","date":"2025-05-18T08:46:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"71986297693083650087665429500789389005","date":"2025-05-15T07:09:58+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-13T07:58:46+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-05-08T14:56:12+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-08T04:28:09+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-05-08T04:23:48+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Plant Biology","date":"2025-04-28T02:36:03+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-plant-biology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pbio","sideBox":"Learn more about [BMC Plant Biology](http://bmcplantbiol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pbio/default.aspx","title":"BMC Plant Biology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"73d29c08-8a0b-4720-b00e-ab1c9b90f2a5","owner":[],"postedDate":"May 15th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-03-02T16:06:40+00:00","versionOfRecord":{"articleIdentity":"rs-6543171","link":"https://doi.org/10.1186/s12870-026-08409-w","journal":{"identity":"bmc-plant-biology","isVorOnly":false,"title":"BMC Plant Biology"},"publishedOn":"2026-02-28 15:57:52","publishedOnDateReadable":"February 28th, 2026"},"versionCreatedAt":"2025-05-15 10:58:18","video":"","vorDoi":"10.1186/s12870-026-08409-w","vorDoiUrl":"https://doi.org/10.1186/s12870-026-08409-w","workflowStages":[]},"version":"v1","identity":"rs-6543171","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6543171","identity":"rs-6543171","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00