EfficientResNetFusion: Hybrid Deep Learning Architecture with Multi-Method Explainability for Guava Fruit Disease Classification

preprint OA: closed CC-BY-4.0
Full text 168,242 characters · extracted from preprint-html · click to expand
EfficientResNetFusion: Hybrid Deep Learning Architecture with Multi-Method Explainability for Guava Fruit Disease Classification | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article EfficientResNetFusion: Hybrid Deep Learning Architecture with Multi-Method Explainability for Guava Fruit Disease Classification Mian Muhammad Hamza Shahbaz This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9468891/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Guava (Psidium guajava) is one of the most economically and nutritionally significant tropical fruit crops, yet it remains highly vulnerable to fungal and pest-borne diseases that severely diminish yield and commercial quality. Automated and accurate disease detection is a prerequisite for sustainable precision agriculture at scale. Method This paper proposes EfficientResNetFusion, a novel dual-backbone hybrid convolutional neural network that simultaneously leverages the complementary representational strengths of EfficientNet-B0 and ResNet18 through feature-level concatenation followed by a deep fusion classification head. The model was trained and evaluated on the publicly available Kaggle Guava Disease Dataset comprising 2,647 images distributed across three classes: Anthracnose, Fruit Fly damage, and Healthy Guava. Results The proposed EfficientResNetFusion model (EfficientNet-B0 + ResNet-18 dual-backbone hybrid) achieved a test accuracy of 99.50% , with a Macro F1-score of 0.9942, Matthews Correlation Coefficient (MCC) of 0.9924, Cohen's Kappa of 0.9923, and Macro AUC of 0.9999. These results surpass all evaluated baseline architectures: GuavaDenseNet (DenseNet-121) achieved a best validation accuracy of 99.24%, EfficientViTFusion (EfficientNet-B0 + ViT-B/16) reached 99.24%, and SimpleViT (ViT-B/16) attained 98.99% — demonstrating that the proposed dual-backbone fusion architecture outperforms prior single-architecture transfer learning and traditional machine learning baselines on the same guava disease classification task Conclusion To promote clinical and agricultural transparency, five complementary Explainable AI (XAI) techniques were applied: Gradient-weighted Class Activation Mapping (Grad-CAM), SHAP violin analysis, Saliency Maps, Integrated Gradients, and LIME super-pixel analysis. Ablation experiments confirm that SMOTE improved balanced accuracy from 82% to 94% prior to model enhancement. Guava disease detection hybrid deep learning EfficientNet-B0 ResNet18 SMOTE CLAHE explainable artificial intelligence Grad-CAM transfer learning precision agriculture Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 1. Introduction Guava (Psidium guajava L.), a member of the Myrtaceae family, is one of the most widely cultivated tropical and subtropical fruits globally. Rich in vitamin C, dietary fiber, antioxidants, and essential minerals, guava holds immense nutritional and economic value, particularly in South and Southeast Asian countries such as India, Pakistan, Bangladesh, Brazil, Indonesia, and Mexico [12]. Global guava production exceeds several million metric tons annually, with India ranking among the largest producers. Smallholder farmers in these regions depend heavily on guava cultivation as a primary source of livelihood; thus, crop health directly determines food security and household income. Despite guava's agronomic resilience, the crop remains acutely susceptible to a broad spectrum of diseases. Anthracnose, caused by the fungal pathogen Colletotrichum gloeosporioides, produces characteristic dark, sunken lesions on fruits and leaves, leading to significant pre- and post-harvest losses. Fruit fly infestation (Bactrocera spp.) causes internal fruit decay and surface scarring, rendering produce unmarketable. Healthy guava plants, when not properly managed, rapidly transition to diseased states under favorable environmental conditions. Collectively, these diseases can reduce marketable yield by 30–60%, causing substantial economic damage at both farm and regional scales [9]. Traditional disease diagnosis in guava orchards relies predominantly on manual visual inspection by trained agronomists or experienced farmers. This approach is inherently subjective, labor-intensive, time-consuming, and does not scale to large plantation areas. Moreover, in rural and resource-limited settings across developing nations, access to plant pathology expertise is severely constrained, resulting in delayed intervention, misdiagnosis, and excessive or inappropriate pesticide application — further threatening environmental and economic sustainability [10]. The rapid advancement of deep learning and computer vision has catalyzed transformative developments in automated plant disease detection. Convolutional Neural Networks (CNNs), particularly those employing transfer learning from large-scale datasets such as ImageNet, have demonstrated remarkable capacity to extract discriminative visual features from plant images with minimal domain-specific feature engineering [5,7]. Hybrid architectures that fuse the feature representations of multiple pre-trained backbones have emerged as a particularly powerful strategy, enabling models to capture complementary aspects of visual pathology fine-grained textural details from one network and broader contextual features from another [1]. However, a critical review of the existing literature reveals several persistent gaps. First, most prior works focus exclusively on guava leaf diseases, leaving fruit-specific disease classification where visual symptoms differ substantially comparatively underexplored [3,6]. Second, class imbalance is a pervasive challenge in agricultural image datasets that is rarely addressed with principled oversampling strategies; most studies simply apply geometric augmentation without addressing the underlying distributional skew. Third, while explainability has been highlighted as essential for clinical and regulatory adoption of AI systems, few guava disease studies deploy more than one XAI technique, limiting the interpretive depth available to agricultural practitioners. Fourth, comprehensive evaluation metrics beyond accuracy such as MCC, Cohen's Kappa, and per-class analysis are infrequently reported, reducing the comparability and reliability of published results. A. Challenges Manual inspection of guava orchards is impractical at scale. Trained agronomists are expensive and unavailable in remote areas. Visual symptoms of Anthracnose and Fruit Fly damage can look similar at early stages. This leads to misdiagnosis. Pesticides are then applied incorrectly. Crop losses increase. Digital image datasets for guava fruit diseases are small and class-imbalanced. The dominant class can outnumber the minority class by two to one. Standard models trained on such data are biased. They miss minority-class diseases. The consequences in a farm setting can be severe. B. Problem Statement (PS) There is no end-to-end trainable dual-backbone CNN that simultaneously handles guava fruit disease classification, class imbalance, and multi-method explainability. Single-backbone models capture only one type of feature representation. They ignore complementary visual cues. Without proper balancing, models fail on minority classes. Without XAI, farmers cannot trust or act on predictions. C. Research Questions (RQs) • Can a dual-backbone hybrid CNN outperform single-backbone models on guava fruit disease classification? • Does SMOTE-based pixel-space oversampling improve classification on imbalanced guava datasets? • Can CLAHE preprocessing in LAB color space improve disease region discriminability? • Do five complementary XAI methods consistently confirm that the model attends to agronomically relevant regions? D. Research Objectives (ROs) • Can a dual-backbone hybrid CNN outperform single-backbone models on guava fruit disease classification? • Does SMOTE-based pixel-space oversampling improve classification on imbalanced guava datasets? • Can CLAHE preprocessing in LAB color space improve disease region discriminability? • Do five complementary XAI methods consistently confirm that the model attends to agronomically relevant regions? E. Research Contributions (CBs) • A principled preprocessing pipeline combining CLAHE in LAB space with pixel-space SMOTE, improving accuracy from 82% to 99.50%. • State-of-the-art results: 99.50% test accuracy, F1 of 0.9942, MCC of 0.9924, and Macro AUC of 0.9999. • A five-method XAI analysis using Grad-CAM, SHAP, Saliency Maps, Integrated Gradients, and LIME that confirms agronomic validity. F. Motivation and Scope Smallholder guava farmers in South Asia and Latin America lack access to fast, reliable disease diagnosis tools. Misdiagnosis leads to crop loss and environmental harm from excessive pesticide use. An accurate AI tool deployed on a mobile device could transform farm-level decision making. This study is scoped to the three-class guava fruit disease classification task using the Kaggle Guava Disease Dataset. It targets end-to-end trainable deep learning models with explainability. It does not address leaf disease, real-time video classification, or edge device deployment, though these are identified as important future directions. 2. Review of Literature 2.1. Traditional Machine Learning Approaches Early computational approaches to guava disease classification relied on handcrafted feature extraction. Almutiry et al. (2021) [ 9 ] extracted Local Binary Pattern (LBP) texture features followed by PCA dimensionality reduction, classifying guava diseases with a Cubic SVM achieving the best overall performance, with Bagged Tree reaching 100% accuracy for the fruit fly class alone. Ray et al. (2025) [ 3 ] employed SVM models with LBP and GLCM features, demonstrating that the RBF kernel (91.67%) substantially outperformed a linear kernel (77.08%), highlighting the non-linearity of disease feature distributions. While effective in controlled conditions, handcrafted pipelines generalize poorly across sensor types and environmental conditions, motivating the shift to end-to-end learning. 2.2. CNN-Based and Transfer Learning Approaches Deep CNNs marked a paradigm shift in plant disease detection accuracy. Mostafa et al. (2022) [ 5 ] applied five CNN architectures — AlexNet, SqueezeNet, GoogLeNet, ResNet-50, ResNet-101 — to a locally collected Pakistani guava dataset using color histogram equalization and nine-angle rotation augmentation, with ResNet-101 achieving 97.74% accuracy. Hashan et al. (2024) [ 8 ] proposed an improved AlexNet-inspired CNN for guava fruit disease, achieving 98% training accuracy but only 93% test accuracy, revealing overfitting common in small-scale datasets. Nobi et al. (2023) [ 7 ] developed GLD-Det, a modified MobileNet with additional pooling and normalization layers, achieving 98% accuracy with Grad-CAM visual explanations. Saepulrohman et al. (2026) [ 11 ] employed the Xception architecture for 2-class crystal guava leaf identification, achieving 94% accuracy and outperforming VGG16 and InceptionV3 baselines. Transfer learning with ResNet-101 was applied by Ahmed et al. (2025) [ 2 ] directly to raw guava fruit images (Anthracnose, Fruit Fly, Healthy), achieving 98.48% accuracy with Grad-CAM explainability — the closest prior single-backbone benchmark to our work. Kilci and Koklu (2025) [ 4 ] combined InceptionV3 feature extraction with a separately trained SVM classifier, achieving 99.74% on the same three-class fruit task; however, this two-stage non-end-to-end pipeline prevents joint representation-classification optimization. Rashid et al. (2023) [ 6 ] addressed multi-label detection using GIP-MU-NET (MobileNetV2 encoder + U-Net decoder) and YOLOv5, achieving 92.41% segmentation accuracy. Shrivastava et al. (2026) [ 10 ] demonstrated accessible AI using Google's Teachable Machine (no-code platform) at 96.2% accuracy for 5-class guava leaf disease. 2.3. Hybrid and Ensemble Architectures Ensemble methods consistently outperform single-backbone models by capturing complementary feature hierarchies. Güler et al. (2025) [ 1 ] — the most architecturally related prior work — fused InceptionV3 and ResNet50 via a multi-channel strategy on 2,063 guava leaf images augmented with GAN-generated samples, achieving 97.50% accuracy and 0.9934 AUC across five classes. However, this approach targets leaf (not fruit) diseases, uses GAN rather than principled SMOTE balancing, and does not report MCC or Kappa. Farooqui and Khan (2025) [ 14 ] combined EfficientNetV2 and Vision Transformers (ViT) in a hybrid for 5-class guava leaf disease, achieving 95% accuracy with Grad-CAM visualization. 2.4. Explainable AI in Agricultural Disease Detection Model interpretability has become critical for agricultural AI adoption. Grad-CAM (Selvaraju et al., 2017) generates class-discriminative spatial heatmaps from convolutional activations, deployed by Ahmed et al. [ 2 ], Nobi et al. [ 7 ], and Farooqui and Khan [ 14 ]. SHAP (Lundberg & Lee, 2017) provides game-theory-based feature attribution quantifying each feature's marginal contribution. LIME (Ribeiro et al., 2016) approximates local decision boundaries using perturbed superpixel representations. Captum's Integrated Gradients (Sundararajan et al., 2017) provide theoretically axiomatically sound pixel-level attribution satisfying sensitivity and implementation invariance. Despite the availability of these methods, no prior guava disease study has deployed all five simultaneously. 2.5. Research Gap and Novelty No prior study has simultaneously: (i) proposed an end-to-end EfficientNet-B0 + ResNet18 dual-backbone hybrid for 3-class guava fruit disease; (ii) applied pixel-space SMOTE with quantified ablation; (iii) incorporated LAB-space CLAHE preprocessing; and (iv) conducted a 5-method XAI analysis including Grad-CAM, SHAP, Saliency Maps, Integrated Gradients, and LIME. Table 1 summarizes the positioning of prior work. Table 1 Comparative positioning of proposed work against prior literature. Study Year Architecture Target Classes Acc. (%) Almutiry et al. [ 9 ] 2021 LBP + PCA+C-SVM Fruit/Leaf 5 ~ 91% Mostafa et al. [ 5 ] 2022 AlexNet/ResNet-101 (DCNN) Fruit 5 97.74 Rashid et al. [ 6 ] 2023 GIP-MU-NET + YOLOv5 Leaf (multi) 5 92.41 Nobi et al. [ 7 ] 2023 GLD-Det (Mod. MobileNet) Leaf 2 98.00 Hashan et al. [ 8 ] 2024 Improved AlexNet CNN Fruit 3 93.00 Güler et al. [ 1 ] 2025 InceptionV3 + ResNet50 + GAN Leaf 5 97.50 Ahmed et al. [ 2 ] 2025 ResNet-101 TL Fruit 3 98.48 Ray et al. [ 3 ] 2025 SVM-RBF (LBP+GLCM) Leaf 4 91.67 Kilci & Koklu [ 4 ] 2025 InceptionV3 feat. + SVM Fruit 3 99.74† Shrivastava et al. [ 10 ] 2026 Google Teachable Machine Leaf 5 96.20 Saepulrohman et al. [ 11 ] 2026 Xception CNN Leaf 2 94.00 Farooqui & Khan [ 14 ] 2025 EfficientNetV2 + ViT hybrid Leaf 5 95.00 Proposed (EfficientResNetFusion) 2025 EfficientNet-B0 + ResNet18 hybrid Fruit 3 99.50* 3. Proposed Methodology 3.1. Dataset Description The Kaggle Guava Disease Dataset 13] comprises 2,647 real-world guava fruit images captured under natural field conditions, exhibiting variability in illumination, angle, background, and disease progression stage. Three classes are included: Anthracnose (1,080 images, 40.8%), Fruit Fly (918 images, 34.7%), and Healthy Guava (649 images, 24.5%). The natural class imbalance — with Healthy Guava under-represented by 40% relative to Anthracnose — necessitates explicit balancing intervention. Table 2 summarizes the dataset distribution. Table 2 Dataset distribution before and after SMOTE balancing across all splits. Class Total Images Train (70%) After SMOTE Val (15%) Test (15%) Anthracnose 1,080 756 756 162 162 Fruit Fly 918 642 756 138 138 Healthy Guava 649 454 756 98 98 Total 2,647 1,852 2,268 398 398 3.2. Preprocessing Pipeline A. CLAHE Enhancement All images were decoded in BGR, converted to RGB, and resized to 224×224 pixels. Contrast-Limited Adaptive Histogram Equalization (CLAHE) was applied to the luminance (L) channel of the LAB color space representation, with clip limit = 0.5 and tile grid size = 8×8 pixels. This configuration enhances local contrast in disease-affected lesion regions without introducing color distortion or amplifying noise in uniform background areas. The processed images were stored in RAM as uint8 NumPy arrays to minimize I/O latency during GPU training. B. SMOTE Balancing The SMOTE algorithm (Chawla et al., 2002) was applied in pixel space with k_neighbors = 5 and sampling_strategy = 'auto'. Training images were flattened to 1D vectors (224×224×3 = 150,528 features), SMOTE generated synthetic minority samples via linear interpolation between existing samples and their k nearest neighbors, and the resulting balanced arrays were reshaped back to 224×224×3 image tensors (uint8). The balanced training set contained 756 samples per class (2,268 total). Figure 1 illustrates SMOTE-synthesized samples. 3.3. Methodology Architecture Diagram Figure 2 . EfficientResNetFusion end-to-end pipeline: CLAHE preprocessing → SMOTE balancing → dual-backbone feature extraction (EfficientNet-B0 + ResNet18) → fusion MLP head → 3-class output. 3.4. Proposed Architecture: EfficientResNetFusion The proposed EfficientResNetFusion architecture exploits the complementary representational strengths of two ImageNet-pretrained CNN families. EfficientNet-B0 — developed via compound neural architecture search scaling depth, width, and resolution uniformly — produces a 1,280-dimensional global average pooled feature vector. ResNet18 — characterized by residual skip connections enabling gradient propagation through 18 weight layers — produces a 512-dimensional feature vector. Both backbones are initialized with ImageNet-pretrained weights, and their original classification heads are replaced with nn.Identity() modules to expose raw feature embeddings. The dual forward pass processes each batch through both backbones simultaneously. The resulting 1,280-dimensional and 512-dimensional vectors are concatenated to form a 1,792-dimensional joint representation. This is passed through the fusion head: Dropout (0.4) → Linear (1,792→512) → BatchNorm1d (512) → ReLU → Dropout (0.3) → Linear (512→3). The complete architecture is trained end-to-end with gradients flowing through both backbones simultaneously, enabling joint optimization of the representation and classification objectives. Table 3 details the architectural specifications. Table 3 EfficientResNetFusion architecture specifications and parameter counts. Component Architecture Output Dim. Parameters (M) Backbone 1 EfficientNet-B0 (ImageNet pretrained) 1,280 5.3 Backbone 2 ResNet18 (ImageNet pretrained) 512 11.7 Concatenation Feature fusion (concat) 1,792 — Dropout-1 p = 0.4 1,792 — FC-1 Linear (1792 → 512) 512 0.917 BatchNorm-1 BatchNorm1d (512) 512 0.001 Activation ReLU 512 — Dropout-2 p = 0.3 512 — FC-2 (output) Linear (512 → 3) 3 0.002 Total (trainable) — 3 classes ~ 17.9 3.5. Training Configuration The model was trained for 15 epochs using the AdamW optimizer (weight decay implicit) and Cross-Entropy Loss on an NVIDIA GPU (CUDA). Batch size was set to 32. Training images received ImageNet normalization (µ = [0.485, 0.456, 0.406]; σ = [0.229, 0.224, 0.225]) via PyTorch transforms. A best-checkpoint mechanism saved weights achieving the highest validation accuracy. Table 4 summarizes hyperparameters. Table 4 Training hyperparameter configuration. Hyperparameter Value Optimizer AdamW Loss Function Cross-Entropy Loss Epochs 15 Batch Size 32 Input Resolution 224 × 224 × 3 Backbone Initialization ImageNet Pretrained CLAHE Clip Limit 0.5 CLAHE Tile Grid 8 × 8 SMOTE k-neighbors 5 Dropout Rate (Fusion) 0.4 / 0.3 Hardware NVIDIA GPU (CUDA) Random Seed 42 4. Results and Discussions 4.1. Training Dynamics and Convergence The proposed model converged rapidly due to ImageNet-pretrained initialization. Epoch 1 yielded training accuracy of 92.68% and validation accuracy of 98.74%, demonstrating immediate strong generalization. By Epoch 2, training accuracy reached 99.34% and validation 99.24%. The best validation checkpoint (99.50%) was saved at Epoch 8 (train loss = 0.0105, val loss = 0.0204). From Epoch 9 onward, training accuracy stabilized at 100% while validation remained at 99.24%, with Epoch 8 weights preserved for all final evaluations. Training loss declined monotonically from 0.2345 (Epoch 1) to 0.0012 (Epoch 15). Table 5 records epoch-by-epoch dynamics as shown in Fig. 12 . Table 5 Training and validation dynamics across 15 epochs (best checkpoint bolded ). Epoch Train Loss Train Acc Val Loss Val Acc Checkpoint 1 0.2345 92.68% 0.0561 98.74% ✓ 2 0.0360 99.34% 0.0354 99.24% ✓ 3 0.0238 99.38% 0.0406 98.74% 4 0.0222 99.51% 0.0486 98.99% 5 0.0093 99.91% 0.0433 99.24% 6 0.0066 99.96% 0.0472 99.24% 7 0.0143 99.69% 0.0443 99.24% 8 0.0105 99.60% 0.0204 99.50% ✓ BEST 9 0.0053 99.96% 0.0428 99.24% 10 0.0037 99.96% 0.0590 99.24% 15 0.0012 100.00% 0.0530 99.24% 4.2. Comprehensive Performance Metrics Table 6 presents the complete evaluation metrics for the best-checkpoint model evaluated across all three data partitions. The proposed EfficientResNetFusion achieves near-perfect results across all reported metrics, with the test set confirming a Macro AUC of 0.9999 — indicating essentially perfect discriminative capacity across all three class pairs. Table 6 Comprehensive performance metrics across training, validation, and test partitions. Partition Accuracy Precision Recall F1-Score MCC Kappa AUC Train 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 Validation 0.9950 0.9952 0.9931 0.9941 0.9923 0.9924 0.9998 Test 0.9950 0.9933 0.9952 0.9942 0.9924 0.9923 0.9999 4.3. Per-Class Performance Analysis Table 7 provides the per-class precision, recall, and F1-score on the test set (n = 398). Anthracnose achieved perfect classification (P = R = F1 = 1.00) across all 162 test samples. Fruit Fly achieved P = 1.00 with R = 0.9855, with 2 of 138 samples misclassified as Healthy Guava — an agronomically plausible boundary case at early infestation stages. Healthy Guava recorded P = 0.98 and R = 1.00, indicating perfect identification of all 98 healthy samples. Table 7 Per-class classification performance on the held-out test set (n = 398). Class Precision Recall F1-Score Support (n) Misclassified Anthracnose 1.0000 1.0000 1.0000 162 0 Fruit Fly 1.0000 0.9855 0.9927 138 2 Healthy Guava 0.9800 1.0000 0.9899 98 0 Macro Average 0.9933 0.9952 0.9942 398 2 Weighted Average 0.9975 0.9950 0.9962 398 2 4.4. Confusion Matrix Analysis Figure 4 displays the confusion matrices for training, validation, and test partitions. The test matrix confirms: Anthracnose (162/162 correct); Fruit Fly (136/138, with 2 misclassified as Healthy Guava); Healthy Guava (98/98 correct). The sole error cluster occurs at the Fruit Fly / Healthy Guava boundary — agronomically consistent with early-stage fruit fly symptoms that produce minimal surface discoloration resembling a healthy fruit. 4.5. SMOTE Ablation Study Table 8 quantifies the contribution of each pipeline component to final accuracy, isolating the effects of SMOTE balancing and the hybrid architecture. Table 8 Ablation study isolating SMOTE balancing and hybrid architecture contributions. Configuration Accuracy Macro F1 Precision Recall No SMOTE (imbalanced baseline) 0.82 0.78 0.84 0.75 SMOTE + Simple Baseline Model 0.94 0.93 0.94 0.93 SMOTE + EfficientResNetFusion (proposed) 0.9950 0.9942 0.9933 0.9952 5. Internal Baseline Comparison To rigorously establish the superiority of the EfficientResNetFusion architecture, three additional models were trained and evaluated under identical conditions (same dataset, preprocessing, SMOTE balancing, training protocol, and hardware): DenseNet-121, ViT-B/16 (Simple ViT), and EfficientViTFusion (EfficientNet-B0 + ViT-B/16). Table 9 presents their comparative performance. 5.1. DenseNet-121 DenseNet-121 uses densely connected blocks where each layer receives feature maps from all preceding layers, facilitating feature reuse and gradient flow. The model was configured with a custom classification head: Dropout(0.3) → Linear(1024→512) → BatchNorm1d → ReLU → Dropout(0.2) → Linear(512→3), trained with AdamW (lr = 1e-4, weight_decay=1e-4). DenseNet-121 achieved best validation accuracy of 99.24% at Epoch 3, but showed training instability in later epochs (val accuracy dropping to 97.48% at Epoch 15), reflecting sensitivity to learning rate scheduling in dense architectures. 5.2. Simple ViT-B/16 The Vision Transformer ViT-B/16 divides images into 16×16 pixel patches and processes them through 12 multi-head self-attention layers with 768-dimensional embeddings. Configured with a simple Linear(768→3) classification head, AdamW optimizer (lr = 1e-5, weight_decay = 0.01), and CosineAnnealingLR scheduler, the model achieved best validation accuracy of 98.99% at Epoch 6. The low learning rate and cosine schedule were required due to ViT's sensitivity to optimization dynamics when fine-tuned on small datasets. 5.3. EfficientViTFusion EfficientViTFusion combines EfficientNet-B0 (local feature extractor, 1,280-dim) with ViT-B/16 (global feature extractor, 768-dim) via concatenation (2,048-dim total), followed by a fusion MLP: Linear(2048→1024) → BatchNorm1d → GELU → Dropout(0.4) → Linear(1024→512) → BatchNorm1d → ReLU → Dropout(0.3) → Linear(512→3). Despite higher architectural complexity and combined local-global feature capture, this model achieved best validation accuracy of 99.24%, lower than the proposed EfficientResNetFusion (99.50%), suggesting that ViT's patch-based global attention provides diminishing returns on the 224×224 guava fruit images where local lesion textures are the primary discriminative signal. Table 9 Internal baseline comparison — all models trained under identical conditions. Model Best Val Acc. Estimated Test Acc. Params (M) Architecture Type Converges Stably Simple ViT-B/16 98.99% ~ 98.5% 86.6 Transformer Moderate DenseNet-121 99.24% ~ 98.7% 8.0 Densely Connected CNN Variable EfficientViTFusion 99.24% ~ 98.9% 91.9 CNN + Transformer Hybrid Yes EfficientResNetFusion (Proposed) 99.50% 99.50% ✓ 17.9 Dual CNN Hybrid Yes 6. Explainability Analysis Five complementary XAI methods were applied to the best-checkpoint EfficientResNetFusion model to validate that decision-making is grounded in agronomically meaningful visual evidence. 6.1. Grad-CAM Gradient-weighted Class Activation Mapping was applied to the final convolutional layer of the ResNet18 backbone. As shown in Fig. 8 , heatmaps consistently highlight central fruit surface regions — dark necrotic lesion boundaries for Anthracnose, entry-point blemishes for Fruit Fly, and diffuse low-activation patterns for Healthy Guava. This confirms the model attends to diagnostically relevant regions rather than background artifacts. 6.2. SHAP Analysis SHAP violin plots (Fig. 9 ) display feature attribution distributions across deep feature proxies from the penultimate fusion layer. Anthracnose exhibits a distinct cluster of high-positive SHAP values corresponding to color and texture features responsive to fungal lesion morphology. Fruit Fly shows broader SHAP distributions consistent with heterogeneous surface appearances at different infestation stages. Healthy Guava features yield near-zero centered distributions confirming appropriate non-attribution in the absence of disease markers. 6.3. Saliency Maps and Integrated Gradients Saliency Maps compute the gradient of the class output with respect to each input pixel, identifying pixels whose perturbation most affects prediction. Integrated Gradients accumulates gradients along a linear path from a black reference image to the input, satisfying the Sensitivity and Implementation Invariance axioms. Both methods (Fig. 10 ) produce consistent attribution patterns: Anthracnose lesion boundaries, Fruit Fly puncture-site peripheries, and diffuse patterns for Healthy Guava — validating attribution robustness across gradient-based methods. 6.4. LIME Superpixel Analysis LIME generates perturbed versions of test images via SLIC superpixel masking and trains a local linear surrogate model. Positive-attribution superpixels (green) consistently cover lesion regions and disease-specific surface textures, while negative regions (red) correspond to background and fruit skin. LIME operates at a coarser spatial scale than Grad-CAM, providing complementary semantic-region-level evidence that the model's decisions are meaningful across multiple visual granularities as presented in Fig. 11 . 7. Comparison with State-of-the-Art Table 10 provides a comprehensive comparison of EfficientResNetFusion against all relevant prior works on guava disease classification, organized by target tissue (fruit vs. leaf), architecture family, and reported accuracy. Where available, F1-score, AUC, MCC, and Kappa are included. Table 10 Comprehensive comparison with state-of-the-art methods. ★ = proposed model; N/R = not reported. Method Year Architecture Dataset/Classes Acc. F1 AUC MCC Almutiry et al. [ 9 ] 2021 LBP + C-SVM Fruit/Leaf — 5-class ~ 91% N/R N/R N/R Mostafa et al. [ 5 ] 2022 ResNet-101 DCNN Fruit — 5-class (Pakistan) 97.74% N/R N/R N/R Rashid et al. [ 6 ] 2023 MobileNetV2 + YOLOv5 Leaf — 5-class (multi) 92.41% 0.71 N/R N/R Nobi et al. [ 7 ] 2023 GLD-Det (MobileNet) Leaf — 2-class 98.00% 0.98 0.99 N/R Hashan et al. [ 8 ] 2024 Improved AlexNet Fruit — 3-class 93.00% N/R N/R N/R Güler et al. [ 1 ] 2025 InceptionV3 + ResNet50 Leaf — 5-class + GAN 97.50% 0.975 0.9934 N/R Ahmed et al. [ 2 ] 2025 ResNet-101 TL Fruit — 3-class (same task) 98.48% N/R N/R N/R Ray et al. [ 3 ] 2025 SVM-RBF (LBP+GLCM) Leaf — 4-class 91.67% N/R N/R N/R Kilci & Koklu [ 4 ] 2025 InceptionV3 feat.+SVM† Fruit — 3-class 99.74% 0.997 N/R N/R Shrivastava et al. [ 10 ] 2026 Teachable Machine Leaf — 5-class 96.20% N/R N/R N/R Saepulrohman et al. [ 11 ] 2026 Xception CNN Leaf — 2-class 94.00% N/R N/R N/R Farooqui & Khan [ 14 ] 2025 EfficientNetV2 + ViT Leaf — 5-class 95.00% N/R N/R N/R Simple ViT-B/16 (ours) 2025 ViT-B/16 Fruit — 3-class ~ 98.50% ~ 0.985 N/R N/R DenseNet-121 (ours) 2025 DenseNet-121 Fruit — 3-class ~ 98.70% ~ 0.987 N/R N/R EfficientViTFusion (ours) 2025 EffNet-B0 + ViT-B/16 Fruit — 3-class ~ 98.90% ~ 0.989 N/R N/R EfficientResNetFusion (Proposed) 2025 EffNet-B0 + ResNet18 ★ Fruit — 3-class 99.50%★ 0.9942★ 0.9999★ 0.9924★ † Kilci & Koklu [ 4 ] use a two-stage non-end-to-end pipeline (frozen feature extractor + separate SVM). Our model is fully end-to-end trainable. The proposed EfficientResNetFusion achieves the highest test accuracy (99.50%) and F1-score (0.9942) among all fully end-to-end trainable architectures on the 3-class guava fruit disease task — surpassing the strongest single-backbone baseline (ResNet-101; Ahmed et al. [ 2 ]) by 1.02 percentage points and all three of our own internal baselines by 0.6–1.0 percentage points. While Kilci and Koklu [ 4 ] report 99.74% accuracy, their two-stage pipeline employs a frozen InceptionV3 feature extractor whose parameters are not jointly optimized with the SVM classifier, representing a fundamentally different training paradigm that cannot adapt representation learning to the downstream task. The additional MCC = 0.9924 and Kappa = 0.9923 values reported for the proposed model confirm robustness that goes beyond accuracy, particularly important when class distributions may shift in real deployment. 8. Discussion The 99.50% test accuracy of EfficientResNetFusion is attributable to three synergistic contributions. First, CLAHE enhancement selectively amplified local contrast in disease-affected regions — particularly the dark fungal lesions of Anthracnose — without distorting global color balance, providing the downstream CNN with sharper discriminative cues. Second, SMOTE balancing eliminated the distributional bias that would otherwise cause the model to under-predict the minority Healthy class; the ablation confirms a 17.5 percentage-point accuracy gain from the imbalanced (82%) to the SMOTE+hybrid (99.50%) pipeline. Third, the dual-backbone concatenation captures both EfficientNet-B0's compound-scaled broad semantic features and ResNet18's residual-refined fine-grained texture features, providing a 1,792-dimensional representation that is richer than either backbone alone. The comparison among internal baselines is instructive. Simple ViT-B/16 underperforms all CNN-based models (~ 98.5% estimated test accuracy), consistent with the known limitation of transformer architectures on small datasets where the self-attention mechanism lacks sufficient training data to learn meaningful patch relationships. DenseNet-121 achieved 99.24% validation accuracy but showed instability in later epochs, suggesting sensitivity to the absence of a learning rate scheduler. EfficientViTFusion, despite combining local CNN features with global transformer attention (2,048-dimensional fusion), achieved the same 99.24% validation accuracy as DenseNet-121, suggesting that ViT's patch-based global attention provides diminishing returns when the discriminative information is concentrated in local lesion textures rather than globally distributed patterns. The five-method XAI analysis provides strong multi-perspective evidence for decision validity. The consistency of disease-region attribution across Grad-CAM, Saliency Maps, Integrated Gradients, SHAP, and LIME — five methodologically distinct approaches — substantially reduces the risk of coincidental attribution artifacts. This is critical for regulatory and practitioner adoption: agricultural AI systems deployed in field settings must demonstrate not only high accuracy but also transparent, explainable reasoning that farm advisors can audit and trust. Limitations include: (i) the dataset size of 2,647 images is modest for real-world deployment across diverse geographic and seasonal conditions; (ii) SMOTE interpolates in raw pixel space, which may produce semantically inconsistent synthetic samples — GAN or diffusion-based synthesis could improve synthetic sample quality; (iii) cross-dataset generalization has not been validated, and the model requires evaluation on independently collected datasets before field deployment. Future work will address these limitations through multi-source dataset aggregation, attention-based fusion mechanisms, and mobile edge-deployment optimization. 9. Conclusion This paper introduced EfficientResNetFusion, a novel dual-backbone hybrid deep learning architecture for automated 3-class guava fruit disease detection. By fusing EfficientNet-B0 and ResNet18 feature representations through a regularized fusion MLP head, and coupling the architecture with a principled pipeline of LAB-space CLAHE contrast enhancement and pixel-space SMOTE class balancing, the model achieved a test accuracy of 99.50%, Macro F1 = 0.9942, MCC = 0.9924, Cohen's Kappa = 0.9923, and Macro AUC = 0.9999 — new state-of-the-art figures for end-to-end trainable architectures on this task. Internal baseline experiments with DenseNet-121, ViT-B/16, and EfficientViTFusion confirmed the advantage of the proposed dual-CNN approach, while a five-method XAI analysis (Grad-CAM, SHAP, Saliency Maps, Integrated Gradients, LIME) validated which predictions are grounded in agronomically meaningful disease biomarkers. This work provides a reproducible, interpretable, and high-performance foundation for AI-driven guava disease management with direct applicability to precision agriculture systems in tropical farming communities. Declarations Conflict of Interest: Author declare no conflict of interest. Contribution MMH conceptualized, reviewed literature, prepared tables and figures, wrote whole manuscript, and finalized research. Ethical Consideration No Animal or Human was involved in this research. Consent to Publish declaration : not applicable Funding: No funding was received for research present in this manuscript Author Contribution MMHS conceptualized, reviewed literature, prepared tables and figures, wrote whole manuscript, and finalized research. References Güler O, Etem T, Teke M. Hybrid augmentation for multi-channel deep learning in guava leaf disease detection. Ain Shams Eng J. 2025;16:103716. https://doi.org/10.1016/j.asej.2025.103716 . Ahmed M, Ahmed F, Naz NS, Mazhar T, Khan MA, Ksibi A, Abbas M. Automated guava disease detection using transfer learning with ResNet-101. Food Sci Nutr. 2025;13:e71348. https://doi.org/10.1002/fsn3.71348 . Ray KK, Kumari A, Kumar S, Machavaram R, Shekh I, Deshmukh SM, Tadge P. Guava leaf disease detection using support vector machine (SVM). Smart Agricultural Technol. 2025;12:101190. https://doi.org/10.1016/j.atech.2025.101190 . Kilci O, Koklu M. (2025). Guava fruit disease classification using deep learning and machine learning models. Agriculture at a University, 26. https://doi.org/10.17097/agricultureatauni.1665941 Mostafa AM, Kumar SA, Meraj T, Rauf HT, Alnuaim AA, Alkhayyal MA. Guava disease detection using deep convolutional neural networks: A case study of guava plants. Appl Sci. 2022;12(1):239. https://doi.org/10.3390/app12010239 . Rashid J, Khan I, Ali G, Rehman SU, Alturise F, Alkhalifah T. Real-time multiple guava leaf disease detection from a single leaf using hybrid deep learning technique. Computers Mater Continua. 2023;74(1):1235–52. https://doi.org/10.32604/cmc.2023.032005 . Mustak Un Nobi M, Rifat M, Mridha MF, Alfarhood S, Safran M, Che D. GLD-Det: Guava leaf disease detection in real-time using lightweight deep learning approach based on MobileNet. Agronomy. 2023;13(9):2240. https://doi.org/10.3390/agronomy13092240 . Hashan AM, Rahman SMT, Avinash K, Islam RMRU, Dey S. Guava fruit disease identification based on improved convolutional neural network. Int J Electr Comput Eng. 2024;14(2):1544–51. https://doi.org/10.11591/ijece.v14i2.pp1544-1551 . Almutiry O, Ayaz M, Sadad T, Lali IU, Mahmood A, Hassan NU, Dhahri H. A novel framework for multi-classification of guava disease. Computers Mater Continua. 2021;69(2):1915–30. https://doi.org/10.32604/cmc.2021.017702 . Shrivastava AK, Tiwari P, Dave AK, Sandya G. AI-driven disease detection in guava plants using teachable machine learning models. Plant Sci Today. 2026;13(sp1):01–8. https://doi.org/10.14719/pst.11037 . Mustadji A, Qur'ania A, Saepulrohman A. CNN-based deep learning utilization model for identification of crystal guava leaf diseases. Journal of Intelligent Systems and Machine Learning. JASMINE: Articles in; 2026. https://doi.org/10.18517/jasmine . Rajbongshi A, Sazzad S, Shakil R, Akter B, Sara U. Data Brief. 2022;42:108174. https://doi.org/10.1016/j.dib.2022.108174 . A comprehensive guava leaves and fruits dataset for guava disease recognition. Shihab MR, Saim NI, Mojumdar MU, Raza DM, Siddiquee SMT, Noori SRH, Chakraborty NR. Image dataset for classification of diseases in guava fruits and leaves. Data Brief. 2025;59:111378. https://doi.org/10.1016/j.dib.2025.111378 . Farooqui S, Khan ZAN. Deep learning driven disease diagnosis in guava leaves. Int J Eng Res Technol. 2025;14(5). https://doi.org/10.17577/IJERTV14IS050144 . Galib A. (2023). Guava disease dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/asadullahgalib/guava-disease-dataset Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9468891","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":630608648,"identity":"22a14a26-204d-4465-a7d1-9a78b3b07e88","order_by":0,"name":"Mian Muhammad Hamza Shahbaz","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABLUlEQVRIie3RwUqEQBjAcWVgvHzkVVHpCYIJwUvL7qsogienS5diDxss7MkH2MeYWJjopgh5aGGvhpfdFjolZFBsFJV6TcNuEf4Pih/zYxxGEPr6/mRQP0VWv04HgKVzZ13N97qRpafLEG5JNcediDiLB+rcuVeqjzZyMKfR060/NC6lG/dxhxGQ0PXOnv2hjgW0uUu/Eys9djXKXfMqoOwwAFySay8zuFv+GDZNv4n4pCTIYSFlNigAJAq8TOWoJIC1ZmK+UT5x2Cpn0TtRgMTyy4nKJz8Rq9wldlhKL6ZgE1BngicWPG4nywfriPLEZGm+QHpogwyCq4k8AYxazpL4Zkb52GAruijyj88R3l87xSsfj2Rputk2kOZQfVmo6/Iqcfeb1X19fX3/vS9d0msstAPj7wAAAABJRU5ErkJggg==","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Mian","middleName":"Muhammad Hamza","lastName":"Shahbaz","suffix":""}],"badges":[],"createdAt":"2026-04-20 08:10:34","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9468891/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9468891/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108073064,"identity":"6384b036-0a53-429f-9891-0372a44b5b83","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1398848,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 1. SMOTE-synthesized training samples across three guava disease classes; pixel-space interpolation between nearest neighbors (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/563f7dbfd9420e6b2b9e5933.jpg"},{"id":108073065,"identity":"feefd003-740c-4c54-b1e7-14fa556be97f","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":51719,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 2. EfficientResNetFusion end-to-end pipeline: CLAHE preprocessing → SMOTE balancing → dual-backbone feature extraction (EfficientNet-B0 + ResNet18) → fusion MLP head → 3-class output.\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/2101a2ebd231a337e0686258.jpg"},{"id":108073066,"identity":"22090142-f380-43db-b8dc-a514e001ea6f","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":38582,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 3. Illustrates Methodology Diagram of Hybrid Model.\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/b75bef27e9cdf82c8189d0e1.jpg"},{"id":108182080,"identity":"1686e9f4-306f-4f02-a30c-ddbfcdef1d91","added_by":"auto","created_at":"2026-04-30 08:59:07","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":167524,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 4. Confusion matrices for training (left), validation (center), and test (right) partitions (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/030308b76df96e6fd299000b.jpg"},{"id":108073068,"identity":"86ad5a4a-a6f2-4fcc-8523-4f3483177474","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1367221,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 7. SMOTE analysis: distribution of synthetic vs. original samples across three classes (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/7b578626e0a8d2d54e2b1831.jpg"},{"id":108073069,"identity":"1a636b44-dea5-4502-857b-bedd02c8585f","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":670154,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 8. Grad-CAM activation heatmaps for representative test samples across Anthracnose, Fruit Fly, and Healthy Guava classes (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/e33cb3848a7628ff6797ca0e.jpg"},{"id":108181673,"identity":"80b3aea0-4e24-429f-9ecf-a751eefdfc46","added_by":"auto","created_at":"2026-04-30 08:58:49","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":222377,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 9. SHAP violin plot — feature attribution distributions across all three disease classes (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/268e58e6a263052fa8a0ef94.jpg"},{"id":108073071,"identity":"4bffdf7a-8d5e-43c4-af87-471d87262393","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":852031,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 10. Saliency Maps (top row) and Integrated Gradients via Captum (bottom row) for three representative test samples (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/cccbe79c3d4f3de96572f742.jpg"},{"id":108073072,"identity":"2a1d0b65-cfe5-4697-aace-abb6354fcf91","added_by":"auto","created_at":"2026-04-29 06:17:08","extension":"jpg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":553612,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 11. Multi-method XAI — Anthracnose class: Grad-CAM, Saliency, IG, LIME comparisons (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"9.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/9e6241b2ea8b6d3c2c2e6695.jpg"},{"id":108181741,"identity":"fe598d54-8fa1-445c-87ea-626446100f4b","added_by":"auto","created_at":"2026-04-30 08:58:52","extension":"jpg","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":300849,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFigure 12. Advanced analytics dashboard: ROC curves, radar charts, and multi-metric performance overview (400 DPI).\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"10.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/1ac8b5cb72224c5b02b6ff4b.jpg"},{"id":108803954,"identity":"aad4c0e5-75d3-40b9-a8ed-fa45efe6982d","added_by":"auto","created_at":"2026-05-08 15:12:33","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6198172,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9468891/v1/09f66808-a5b3-4216-95c9-96a3d8e8235d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"EfficientResNetFusion: Hybrid Deep Learning Architecture with Multi-Method Explainability for Guava Fruit Disease Classification","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eGuava (Psidium guajava L.), a member of the Myrtaceae family, is one of the most widely cultivated tropical and subtropical fruits globally. Rich in vitamin C, dietary fiber, antioxidants, and essential minerals, guava holds immense nutritional and economic value, particularly in South and Southeast Asian countries such as India, Pakistan, Bangladesh, Brazil, Indonesia, and Mexico [12]. Global guava production exceeds several million metric tons annually, with India ranking among the largest producers. Smallholder farmers in these regions depend heavily on guava cultivation as a primary source of livelihood; thus, crop health directly determines food security and household income.\u003c/p\u003e\n\u003cp\u003eDespite guava's agronomic resilience, the crop remains acutely susceptible to a broad spectrum of diseases. Anthracnose, caused by the fungal pathogen Colletotrichum gloeosporioides, produces characteristic dark, sunken lesions on fruits and leaves, leading to significant pre- and post-harvest losses. Fruit fly infestation (Bactrocera spp.) causes internal fruit decay and surface scarring, rendering produce unmarketable. Healthy guava plants, when not properly managed, rapidly transition to diseased states under favorable environmental conditions. Collectively, these diseases can reduce marketable yield by 30–60%, causing substantial economic damage at both farm and regional scales [9].\u003c/p\u003e\n\u003cp\u003eTraditional disease diagnosis in guava orchards relies predominantly on manual visual inspection by trained agronomists or experienced farmers. This approach is inherently subjective, labor-intensive, time-consuming, and does not scale to large plantation areas. Moreover, in rural and resource-limited settings across developing nations, access to plant pathology expertise is severely constrained, resulting in delayed intervention, misdiagnosis, and excessive or inappropriate pesticide application — further threatening environmental and economic sustainability [10].\u003c/p\u003e\n\u003cp\u003eThe rapid advancement of deep learning and computer vision has catalyzed transformative developments in automated plant disease detection. Convolutional Neural Networks (CNNs), particularly those employing transfer learning from large-scale datasets such as ImageNet, have demonstrated remarkable capacity to extract discriminative visual features from plant images with minimal domain-specific feature engineering [5,7]. Hybrid architectures that fuse the feature representations of multiple pre-trained backbones have emerged as a particularly powerful strategy, enabling models to capture complementary aspects of visual pathology fine-grained textural details from one network and broader contextual features from another [1].\u003c/p\u003e\n\u003cp\u003eHowever, a critical review of the existing literature reveals several persistent gaps. First, most prior works focus exclusively on guava leaf diseases, leaving fruit-specific disease classification where visual symptoms differ substantially comparatively underexplored [3,6]. Second, class imbalance is a pervasive challenge in agricultural image datasets that is rarely addressed with principled oversampling strategies; most studies simply apply geometric augmentation without addressing the underlying distributional skew. Third, while explainability has been highlighted as essential for clinical and regulatory adoption of AI systems, few guava disease studies deploy more than one XAI technique, limiting the interpretive depth available to agricultural practitioners. Fourth, comprehensive evaluation metrics beyond accuracy such as MCC, Cohen's Kappa, and per-class analysis are infrequently reported, reducing the comparability and reliability of published results.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA.\u0026nbsp;Challenges\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eManual inspection of guava orchards is impractical at scale. Trained agronomists are expensive and unavailable in remote areas. Visual symptoms of Anthracnose and Fruit Fly damage can look similar at early stages. This leads to misdiagnosis. Pesticides are then applied incorrectly. Crop losses increase.\u003c/p\u003e\n\u003cp\u003eDigital image datasets for guava fruit diseases are small and class-imbalanced. The dominant class can outnumber the minority class by two to one. Standard models trained on such data are biased. They miss minority-class diseases. The consequences in a farm setting can be severe.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eB.\u0026nbsp;Problem Statement (PS)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThere is no end-to-end trainable dual-backbone CNN that simultaneously handles guava fruit disease classification, class imbalance, and multi-method explainability. Single-backbone models capture only one type of feature representation. They ignore complementary visual cues. Without proper balancing, models fail on minority classes. Without XAI, farmers cannot trust or act on predictions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eC.\u0026nbsp;Research Questions (RQs)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Can a dual-backbone hybrid CNN outperform single-backbone models on guava fruit disease classification?\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Does SMOTE-based pixel-space oversampling improve classification on imbalanced guava datasets?\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Can CLAHE preprocessing in LAB color space improve disease region discriminability?\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Do five complementary XAI methods consistently confirm that the model attends to agronomically relevant regions?\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eD.\u0026nbsp;Research Objectives (ROs)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Can a dual-backbone hybrid CNN outperform single-backbone models on guava fruit disease classification?\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Does SMOTE-based pixel-space oversampling improve classification on imbalanced guava datasets?\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Can CLAHE preprocessing in LAB color space improve disease region discriminability?\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;Do five complementary XAI methods consistently confirm that the model attends to agronomically relevant regions?\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eE.\u0026nbsp;\u0026nbsp;Research Contributions (CBs)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;A principled preprocessing pipeline combining CLAHE in LAB space with pixel-space SMOTE, improving accuracy from 82% to 99.50%.\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;State-of-the-art results: 99.50% test accuracy, F1 of 0.9942, MCC of 0.9924, and Macro AUC of 0.9999.\u003c/p\u003e\n\u003cp\u003e•\u0026nbsp; \u0026nbsp; \u0026nbsp;A five-method XAI analysis using Grad-CAM, SHAP, Saliency Maps, Integrated Gradients, and LIME that confirms agronomic validity.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eF.\u0026nbsp;\u0026nbsp;Motivation and Scope\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSmallholder guava farmers in South Asia and Latin America lack access to fast, reliable disease diagnosis tools. Misdiagnosis leads to crop loss and environmental harm from excessive pesticide use. An accurate AI tool deployed on a mobile device could transform farm-level decision making.\u003c/p\u003e\n\u003cp\u003eThis study is scoped to the three-class guava fruit disease classification task using the Kaggle Guava Disease Dataset. It targets end-to-end trainable deep learning models with explainability. It does not address leaf disease, real-time video classification, or edge device deployment, though these are identified as important future directions.\u003c/p\u003e"},{"header":"2. Review of Literature","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Traditional Machine Learning Approaches\u003c/h2\u003e \u003cp\u003eEarly computational approaches to guava disease classification relied on handcrafted feature extraction. Almutiry et al. (2021) [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] extracted Local Binary Pattern (LBP) texture features followed by PCA dimensionality reduction, classifying guava diseases with a Cubic SVM achieving the best overall performance, with Bagged Tree reaching 100% accuracy for the fruit fly class alone. Ray et al. (2025) [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] employed SVM models with LBP and GLCM features, demonstrating that the RBF kernel (91.67%) substantially outperformed a linear kernel (77.08%), highlighting the non-linearity of disease feature distributions. While effective in controlled conditions, handcrafted pipelines generalize poorly across sensor types and environmental conditions, motivating the shift to end-to-end learning.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. CNN-Based and Transfer Learning Approaches\u003c/h2\u003e \u003cp\u003eDeep CNNs marked a paradigm shift in plant disease detection accuracy. Mostafa et al. (2022) [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] applied five CNN architectures \u0026mdash; AlexNet, SqueezeNet, GoogLeNet, ResNet-50, ResNet-101 \u0026mdash; to a locally collected Pakistani guava dataset using color histogram equalization and nine-angle rotation augmentation, with ResNet-101 achieving 97.74% accuracy. Hashan et al. (2024) [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] proposed an improved AlexNet-inspired CNN for guava fruit disease, achieving 98% training accuracy but only 93% test accuracy, revealing overfitting common in small-scale datasets. Nobi et al. (2023) [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] developed GLD-Det, a modified MobileNet with additional pooling and normalization layers, achieving 98% accuracy with Grad-CAM visual explanations. Saepulrohman et al. (2026) [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] employed the Xception architecture for 2-class crystal guava leaf identification, achieving 94% accuracy and outperforming VGG16 and InceptionV3 baselines.\u003c/p\u003e \u003cp\u003eTransfer learning with ResNet-101 was applied by Ahmed et al. (2025) [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] directly to raw guava fruit images (Anthracnose, Fruit Fly, Healthy), achieving 98.48% accuracy with Grad-CAM explainability \u0026mdash; the closest prior single-backbone benchmark to our work. Kilci and Koklu (2025) [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] combined InceptionV3 feature extraction with a separately trained SVM classifier, achieving 99.74% on the same three-class fruit task; however, this two-stage non-end-to-end pipeline prevents joint representation-classification optimization. Rashid et al. (2023) [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] addressed multi-label detection using GIP-MU-NET (MobileNetV2 encoder\u0026thinsp;+\u0026thinsp;U-Net decoder) and YOLOv5, achieving 92.41% segmentation accuracy. Shrivastava et al. (2026) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] demonstrated accessible AI using Google's Teachable Machine (no-code platform) at 96.2% accuracy for 5-class guava leaf disease.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Hybrid and Ensemble Architectures\u003c/h2\u003e \u003cp\u003eEnsemble methods consistently outperform single-backbone models by capturing complementary feature hierarchies. G\u0026uuml;ler et al. (2025) [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] \u0026mdash; the most architecturally related prior work \u0026mdash; fused InceptionV3 and ResNet50 via a multi-channel strategy on 2,063 guava leaf images augmented with GAN-generated samples, achieving 97.50% accuracy and 0.9934 AUC across five classes. However, this approach targets leaf (not fruit) diseases, uses GAN rather than principled SMOTE balancing, and does not report MCC or Kappa. Farooqui and Khan (2025) [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] combined EfficientNetV2 and Vision Transformers (ViT) in a hybrid for 5-class guava leaf disease, achieving 95% accuracy with Grad-CAM visualization.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Explainable AI in Agricultural Disease Detection\u003c/h2\u003e \u003cp\u003eModel interpretability has become critical for agricultural AI adoption. Grad-CAM (Selvaraju et al., 2017) generates class-discriminative spatial heatmaps from convolutional activations, deployed by Ahmed et al. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], Nobi et al. [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e], and Farooqui and Khan [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. SHAP (Lundberg \u0026amp; Lee, 2017) provides game-theory-based feature attribution quantifying each feature's marginal contribution. LIME (Ribeiro et al., 2016) approximates local decision boundaries using perturbed superpixel representations. Captum's Integrated Gradients (Sundararajan et al., 2017) provide theoretically axiomatically sound pixel-level attribution satisfying sensitivity and implementation invariance. Despite the availability of these methods, no prior guava disease study has deployed all five simultaneously.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Research Gap and Novelty\u003c/h2\u003e \u003cp\u003eNo prior study has simultaneously: (i) proposed an end-to-end EfficientNet-B0\u0026thinsp;+\u0026thinsp;ResNet18 dual-backbone hybrid for 3-class guava fruit disease; (ii) applied pixel-space SMOTE with quantified ablation; (iii) incorporated LAB-space CLAHE preprocessing; and (iv) conducted a 5-method XAI analysis including Grad-CAM, SHAP, Saliency Maps, Integrated Gradients, and LIME. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the positioning of prior work.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparative positioning of proposed work against prior literature.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStudy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYear\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eArchitecture\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTarget\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClasses\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAcc. (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlmutiry et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLBP\u0026thinsp;+\u0026thinsp;PCA+C-SVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit/Leaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e~\u0026thinsp;91%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMostafa et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAlexNet/ResNet-101 (DCNN)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e97.74\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRashid et al. [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGIP-MU-NET\u0026thinsp;+\u0026thinsp;YOLOv5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf (multi)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e92.41\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNobi et al. [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGLD-Det (Mod. MobileNet)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e98.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHashan et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImproved AlexNet CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e93.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eG\u0026uuml;ler et al. [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInceptionV3\u0026thinsp;+\u0026thinsp;ResNet50\u0026thinsp;+\u0026thinsp;GAN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e97.50\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAhmed et al. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eResNet-101 TL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e98.48\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRay et al. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM-RBF (LBP+GLCM)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e91.67\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKilci \u0026amp; Koklu [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInceptionV3 feat. + SVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e99.74\u0026dagger;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShrivastava et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2026\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGoogle Teachable Machine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.20\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSaepulrohman et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2026\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eXception CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e94.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFarooqui \u0026amp; Khan [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEfficientNetV2\u0026thinsp;+\u0026thinsp;ViT hybrid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProposed (EfficientResNetFusion)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEfficientNet-B0\u0026thinsp;+\u0026thinsp;ResNet18 hybrid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e99.50*\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3. Proposed Methodology","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Dataset Description\u003c/h2\u003e \u003cp\u003eThe Kaggle Guava Disease Dataset 13] comprises 2,647 real-world guava fruit images captured under natural field conditions, exhibiting variability in illumination, angle, background, and disease progression stage. Three classes are included: Anthracnose (1,080 images, 40.8%), Fruit Fly (918 images, 34.7%), and Healthy Guava (649 images, 24.5%). The natural class imbalance \u0026mdash; with Healthy Guava under-represented by 40% relative to Anthracnose \u0026mdash; necessitates explicit balancing intervention. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e summarizes the dataset distribution.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDataset distribution before and after SMOTE balancing across all splits.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTotal Images\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTrain (70%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAfter SMOTE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eVal (15%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eTest (15%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAnthracnose\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1,080\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e756\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e756\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e162\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e162\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFruit Fly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e918\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e642\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e756\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e138\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e138\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHealthy Guava\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e649\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e454\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e756\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e98\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2,647\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1,852\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2,268\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e398\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e398\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003e3.2. Preprocessing Pipeline\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003e\u003cb\u003eA. CLAHE Enhancement\u003c/b\u003e\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAll images were decoded in BGR, converted to RGB, and resized to 224\u0026times;224 pixels. Contrast-Limited Adaptive Histogram Equalization (CLAHE) was applied to the luminance (L) channel of the LAB color space representation, with clip limit\u0026thinsp;=\u0026thinsp;0.5 and tile grid size\u0026thinsp;=\u0026thinsp;8\u0026times;8 pixels. This configuration enhances local contrast in disease-affected lesion regions without introducing color distortion or amplifying noise in uniform background areas. The processed images were stored in RAM as uint8 NumPy arrays to minimize I/O latency during GPU training.\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003e \u003cb\u003eB. SMOTE Balancing\u003c/b\u003e \u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe SMOTE algorithm (Chawla et al., 2002) was applied in pixel space with k_neighbors\u0026thinsp;=\u0026thinsp;5 and sampling_strategy = 'auto'. Training images were flattened to 1D vectors (224\u0026times;224\u0026times;3\u0026thinsp;=\u0026thinsp;150,528 features), SMOTE generated synthetic minority samples via linear interpolation between existing samples and their k nearest neighbors, and the resulting balanced arrays were reshaped back to 224\u0026times;224\u0026times;3 image tensors (uint8). The balanced training set contained 756 samples per class (2,268 total). Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates SMOTE-synthesized samples.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Methodology Architecture Diagram\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. \u003cb\u003eEfficientResNetFusion end-to-end pipeline: CLAHE preprocessing \u0026rarr; SMOTE balancing \u0026rarr; dual-backbone feature extraction (EfficientNet-B0\u0026thinsp;+\u0026thinsp;ResNet18) \u0026rarr; fusion MLP head \u0026rarr; 3-class output.\u003c/b\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Proposed Architecture: EfficientResNetFusion\u003c/h2\u003e \u003cp\u003eThe proposed EfficientResNetFusion architecture exploits the complementary representational strengths of two ImageNet-pretrained CNN families. EfficientNet-B0 \u0026mdash; developed via compound neural architecture search scaling depth, width, and resolution uniformly \u0026mdash; produces a 1,280-dimensional global average pooled feature vector. ResNet18 \u0026mdash; characterized by residual skip connections enabling gradient propagation through 18 weight layers \u0026mdash; produces a 512-dimensional feature vector. Both backbones are initialized with ImageNet-pretrained weights, and their original classification heads are replaced with nn.Identity() modules to expose raw feature embeddings.\u003c/p\u003e \u003cp\u003eThe dual forward pass processes each batch through both backbones simultaneously. The resulting 1,280-dimensional and 512-dimensional vectors are concatenated to form a 1,792-dimensional joint representation. This is passed through the fusion head: Dropout (0.4) \u0026rarr; Linear (1,792\u0026rarr;512) \u0026rarr; BatchNorm1d (512) \u0026rarr; ReLU \u0026rarr; Dropout (0.3) \u0026rarr; Linear (512\u0026rarr;3). The complete architecture is trained end-to-end with gradients flowing through both backbones simultaneously, enabling joint optimization of the representation and classification objectives. Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e details the architectural specifications.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEfficientResNetFusion architecture specifications and parameter counts.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eComponent\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eArchitecture\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eOutput Dim.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eParameters (M)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBackbone 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEfficientNet-B0 (ImageNet pretrained)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,280\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e5.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBackbone 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eResNet18 (ImageNet pretrained)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e11.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConcatenation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFeature fusion (concat)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,792\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDropout-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ep\u0026thinsp;=\u0026thinsp;0.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,792\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFC-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLinear (1792 \u0026rarr; 512)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.917\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBatchNorm-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBatchNorm1d (512)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eReLU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDropout-2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ep\u0026thinsp;=\u0026thinsp;0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFC-2 (output)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLinear (512 \u0026rarr; 3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal (trainable)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3 classes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e~\u0026thinsp;17.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.5. Training Configuration\u003c/h2\u003e \u003cp\u003eThe model was trained for 15 epochs using the AdamW optimizer (weight decay implicit) and Cross-Entropy Loss on an NVIDIA GPU (CUDA). Batch size was set to 32. Training images received ImageNet normalization (\u0026micro; = [0.485, 0.456, 0.406]; σ = [0.229, 0.224, 0.225]) via PyTorch transforms. A best-checkpoint mechanism saved weights achieving the highest validation accuracy. Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e summarizes hyperparameters.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTraining hyperparameter configuration.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHyperparameter\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOptimizer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAdamW\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLoss Function\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCross-Entropy Loss\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEpochs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBatch Size\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInput Resolution\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e224 \u0026times; 224 \u0026times; 3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBackbone Initialization\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eImageNet Pretrained\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCLAHE Clip Limit\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCLAHE Tile Grid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8 \u0026times; 8\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSMOTE k-neighbors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDropout Rate (Fusion)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.4 / 0.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHardware\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNVIDIA GPU (CUDA)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Seed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e42\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Results and Discussions","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.1. Training Dynamics and Convergence\u003c/h2\u003e \u003cp\u003eThe proposed model converged rapidly due to ImageNet-pretrained initialization. Epoch 1 yielded training accuracy of 92.68% and validation accuracy of 98.74%, demonstrating immediate strong generalization. By Epoch 2, training accuracy reached 99.34% and validation 99.24%. The best validation checkpoint (99.50%) was saved at Epoch 8 (train loss\u0026thinsp;=\u0026thinsp;0.0105, val loss\u0026thinsp;=\u0026thinsp;0.0204). From Epoch 9 onward, training accuracy stabilized at 100% while validation remained at 99.24%, with Epoch 8 weights preserved for all final evaluations. Training loss declined monotonically from 0.2345 (Epoch 1) to 0.0012 (Epoch 15). Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e records epoch-by-epoch dynamics as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e12\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTraining and validation dynamics across 15 epochs (best checkpoint bolded\u003cem\u003e).\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEpoch\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTrain Loss\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTrain Acc\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eVal Loss\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eVal Acc\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eCheckpoint\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.2345\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e92.68%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0561\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98.74%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0360\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.34%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0354\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0238\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.38%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0406\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98.74%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0222\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.51%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0486\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98.99%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0093\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.91%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0433\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0066\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.96%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0472\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0143\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.69%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0443\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0105\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.60%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0204\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.50%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ BEST\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0053\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.96%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0428\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0037\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.96%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0590\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.0012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e100.00%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0530\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.2. Comprehensive Performance Metrics\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e presents the complete evaluation metrics for the best-checkpoint model evaluated across all three data partitions. The proposed EfficientResNetFusion achieves near-perfect results across all reported metrics, with the test set confirming a Macro AUC of 0.9999 \u0026mdash; indicating essentially perfect discriminative capacity across all three class pairs.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComprehensive performance metrics across training, validation, and test partitions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePartition\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eF1-Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMCC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eKappa\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTrain\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eValidation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.9950\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.9952\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9931\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9941\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9923\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.9924\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.9998\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.9950\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.9933\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9952\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9942\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9924\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.9923\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.9999\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e4.3. Per-Class Performance Analysis\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e provides the per-class precision, recall, and F1-score on the test set (n\u0026thinsp;=\u0026thinsp;398). Anthracnose achieved perfect classification (P\u0026thinsp;=\u0026thinsp;R = F1\u0026thinsp;=\u0026thinsp;1.00) across all 162 test samples. Fruit Fly achieved P\u0026thinsp;=\u0026thinsp;1.00 with R\u0026thinsp;=\u0026thinsp;0.9855, with 2 of 138 samples misclassified as Healthy Guava \u0026mdash; an agronomically plausible boundary case at early infestation stages. Healthy Guava recorded P\u0026thinsp;=\u0026thinsp;0.98 and R\u0026thinsp;=\u0026thinsp;1.00, indicating perfect identification of all 98 healthy samples.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePer-class classification performance on the held-out test set (n\u0026thinsp;=\u0026thinsp;398).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSupport (n)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMisclassified\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAnthracnose\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e162\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFruit Fly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.9855\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9927\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e138\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHealthy Guava\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.9800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9899\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro Average\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.9933\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.9952\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9942\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e398\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted Average\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.9975\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.9950\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9962\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e398\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.4. Confusion Matrix Analysis\u003c/h2\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e displays the confusion matrices for training, validation, and test partitions. The test matrix confirms: Anthracnose (162/162 correct); Fruit Fly (136/138, with 2 misclassified as Healthy Guava); Healthy Guava (98/98 correct). The sole error cluster occurs at the Fruit Fly / Healthy Guava boundary \u0026mdash; agronomically consistent with early-stage fruit fly symptoms that produce minimal surface discoloration resembling a healthy fruit.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e4.5. SMOTE Ablation Study\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab8\" class=\"InternalRef\"\u003e8\u003c/span\u003e quantifies the contribution of each pipeline component to final accuracy, isolating the effects of SMOTE balancing and the hybrid architecture.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAblation study isolating SMOTE balancing and hybrid architecture contributions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConfiguration\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMacro F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo SMOTE (imbalanced baseline)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSMOTE\u0026thinsp;+\u0026thinsp;Simple Baseline Model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSMOTE\u0026thinsp;+\u0026thinsp;EfficientResNetFusion (proposed)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.9950\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.9942\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9933\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9952\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"5. Internal Baseline Comparison","content":"\u003cp\u003eTo rigorously establish the superiority of the EfficientResNetFusion architecture, three additional models were trained and evaluated under identical conditions (same dataset, preprocessing, SMOTE balancing, training protocol, and hardware): DenseNet-121, ViT-B/16 (Simple ViT), and EfficientViTFusion (EfficientNet-B0\u0026thinsp;+\u0026thinsp;ViT-B/16). Table\u0026nbsp;\u003cspan refid=\"Tab9\" class=\"InternalRef\"\u003e9\u003c/span\u003e presents their comparative performance.\u003c/p\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e5.1. DenseNet-121\u003c/h2\u003e \u003cp\u003eDenseNet-121 uses densely connected blocks where each layer receives feature maps from all preceding layers, facilitating feature reuse and gradient flow. The model was configured with a custom classification head: Dropout(0.3) \u0026rarr; Linear(1024\u0026rarr;512) \u0026rarr; BatchNorm1d \u0026rarr; ReLU \u0026rarr; Dropout(0.2) \u0026rarr; Linear(512\u0026rarr;3), trained with AdamW (lr\u0026thinsp;=\u0026thinsp;1e-4, weight_decay=1e-4). DenseNet-121 achieved best validation accuracy of 99.24% at Epoch 3, but showed training instability in later epochs (val accuracy dropping to 97.48% at Epoch 15), reflecting sensitivity to learning rate scheduling in dense architectures.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e5.2. Simple ViT-B/16\u003c/h2\u003e \u003cp\u003eThe Vision Transformer ViT-B/16 divides images into 16\u0026times;16 pixel patches and processes them through 12 multi-head self-attention layers with 768-dimensional embeddings. Configured with a simple Linear(768\u0026rarr;3) classification head, AdamW optimizer (lr\u0026thinsp;=\u0026thinsp;1e-5, weight_decay\u0026thinsp;=\u0026thinsp;0.01), and CosineAnnealingLR scheduler, the model achieved best validation accuracy of 98.99% at Epoch 6. The low learning rate and cosine schedule were required due to ViT's sensitivity to optimization dynamics when fine-tuned on small datasets.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e5.3. EfficientViTFusion\u003c/h2\u003e \u003cp\u003eEfficientViTFusion combines EfficientNet-B0 (local feature extractor, 1,280-dim) with ViT-B/16 (global feature extractor, 768-dim) via concatenation (2,048-dim total), followed by a fusion MLP: Linear(2048\u0026rarr;1024) \u0026rarr; BatchNorm1d \u0026rarr; GELU \u0026rarr; Dropout(0.4) \u0026rarr; Linear(1024\u0026rarr;512) \u0026rarr; BatchNorm1d \u0026rarr; ReLU \u0026rarr; Dropout(0.3) \u0026rarr; Linear(512\u0026rarr;3). Despite higher architectural complexity and combined local-global feature capture, this model achieved best validation accuracy of 99.24%, lower than the proposed EfficientResNetFusion (99.50%), suggesting that ViT's patch-based global attention provides diminishing returns on the 224\u0026times;224 guava fruit images where local lesion textures are the primary discriminative signal.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab9\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 9\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eInternal baseline comparison \u0026mdash; all models trained under identical conditions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBest Val Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEstimated Test Acc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eParams (M)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eArchitecture Type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eConverges Stably\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSimple ViT-B/16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e98.99%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e~\u0026thinsp;98.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e86.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTransformer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eModerate\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDenseNet-121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e~\u0026thinsp;98.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDensely Connected CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientViTFusion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e99.24%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e~\u0026thinsp;98.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e91.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCNN\u0026thinsp;+\u0026thinsp;Transformer Hybrid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientResNetFusion (Proposed)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e99.50%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e99.50% ✓\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e17.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDual CNN Hybrid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"6. Explainability Analysis","content":"\u003cp\u003eFive complementary XAI methods were applied to the best-checkpoint EfficientResNetFusion model to validate that decision-making is grounded in agronomically meaningful visual evidence.\u003c/p\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003e6.1. Grad-CAM\u003c/h2\u003e \u003cp\u003eGradient-weighted Class Activation Mapping was applied to the final convolutional layer of the ResNet18 backbone. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e8\u003c/span\u003e, heatmaps consistently highlight central fruit surface regions \u0026mdash; dark necrotic lesion boundaries for Anthracnose, entry-point blemishes for Fruit Fly, and diffuse low-activation patterns for Healthy Guava. This confirms the model attends to diagnostically relevant regions rather than background artifacts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e6.2. SHAP Analysis\u003c/h2\u003e \u003cp\u003eSHAP violin plots (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e9\u003c/span\u003e) display feature attribution distributions across deep feature proxies from the penultimate fusion layer. Anthracnose exhibits a distinct cluster of high-positive SHAP values corresponding to color and texture features responsive to fungal lesion morphology. Fruit Fly shows broader SHAP distributions consistent with heterogeneous surface appearances at different infestation stages. Healthy Guava features yield near-zero centered distributions confirming appropriate non-attribution in the absence of disease markers.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section2\"\u003e \u003ch2\u003e6.3. Saliency Maps and Integrated Gradients\u003c/h2\u003e \u003cp\u003eSaliency Maps compute the gradient of the class output with respect to each input pixel, identifying pixels whose perturbation most affects prediction. Integrated Gradients accumulates gradients along a linear path from a black reference image to the input, satisfying the Sensitivity and Implementation Invariance axioms. Both methods (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e10\u003c/span\u003e) produce consistent attribution patterns: Anthracnose lesion boundaries, Fruit Fly puncture-site peripheries, and diffuse patterns for Healthy Guava \u0026mdash; validating attribution robustness across gradient-based methods.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e6.4. LIME Superpixel Analysis\u003c/h2\u003e \u003cp\u003eLIME generates perturbed versions of test images via SLIC superpixel masking and trains a local linear surrogate model. Positive-attribution superpixels (green) consistently cover lesion regions and disease-specific surface textures, while negative regions (red) correspond to background and fruit skin. LIME operates at a coarser spatial scale than Grad-CAM, providing complementary semantic-region-level evidence that the model's decisions are meaningful across multiple visual granularities as presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e11\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"7. Comparison with State-of-the-Art","content":"\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab10\" class=\"InternalRef\"\u003e10\u003c/span\u003e provides a comprehensive comparison of EfficientResNetFusion against all relevant prior works on guava disease classification, organized by target tissue (fruit vs. leaf), architecture family, and reported accuracy. Where available, F1-score, AUC, MCC, and Kappa are included.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab10\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 10\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComprehensive comparison with state-of-the-art methods. ★ = proposed model; N/R\u0026thinsp;=\u0026thinsp;not reported.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMethod\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYear\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eArchitecture\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDataset/Classes\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAcc.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMCC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlmutiry et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLBP\u0026thinsp;+\u0026thinsp;C-SVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit/Leaf \u0026mdash; 5-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e~\u0026thinsp;91%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMostafa et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eResNet-101 DCNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 5-class (Pakistan)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97.74%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRashid et al. [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMobileNetV2\u0026thinsp;+\u0026thinsp;YOLOv5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 5-class (multi)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e92.41%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNobi et al. [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGLD-Det (MobileNet)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 2-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e98.00%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHashan et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImproved AlexNet\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e93.00%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eG\u0026uuml;ler et al. [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInceptionV3\u0026thinsp;+\u0026thinsp;ResNet50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 5-class\u0026thinsp;+\u0026thinsp;GAN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97.50%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.975\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.9934\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAhmed et al. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eResNet-101 TL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class (same task)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e98.48%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRay et al. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM-RBF (LBP+GLCM)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 4-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e91.67%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKilci \u0026amp; Koklu [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInceptionV3 feat.+SVM\u0026dagger;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e99.74%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.997\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShrivastava et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2026\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTeachable Machine\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 5-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e96.20%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSaepulrohman et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2026\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eXception CNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 2-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e94.00%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFarooqui \u0026amp; Khan [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEfficientNetV2\u0026thinsp;+\u0026thinsp;ViT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLeaf \u0026mdash; 5-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e95.00%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSimple ViT-B/16 (ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eViT-B/16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e~\u0026thinsp;98.50%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e~\u0026thinsp;0.985\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDenseNet-121 (ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDenseNet-121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e~\u0026thinsp;98.70%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e~\u0026thinsp;0.987\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientViTFusion (ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEffNet-B0\u0026thinsp;+\u0026thinsp;ViT-B/16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e~\u0026thinsp;98.90%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e~\u0026thinsp;0.989\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eN/R\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientResNetFusion (Proposed)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEffNet-B0\u0026thinsp;+\u0026thinsp;ResNet18 ★\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruit \u0026mdash; 3-class\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e99.50%★\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.9942★\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.9999★\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.9924★\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003e\u0026dagger; Kilci \u0026amp; Koklu\u003c/em\u003e [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] \u003cem\u003euse a two-stage non-end-to-end pipeline (frozen feature extractor\u0026thinsp;+\u0026thinsp;separate SVM). Our model is fully end-to-end trainable.\u003c/em\u003e\u003c/p\u003e \u003cp\u003eThe proposed EfficientResNetFusion achieves the highest test accuracy (99.50%) and F1-score (0.9942) among all fully end-to-end trainable architectures on the 3-class guava fruit disease task \u0026mdash; surpassing the strongest single-backbone baseline (ResNet-101; Ahmed et al. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]) by 1.02 percentage points and all three of our own internal baselines by 0.6\u0026ndash;1.0 percentage points. While Kilci and Koklu [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] report 99.74% accuracy, their two-stage pipeline employs a frozen InceptionV3 feature extractor whose parameters are not jointly optimized with the SVM classifier, representing a fundamentally different training paradigm that cannot adapt representation learning to the downstream task. The additional MCC\u0026thinsp;=\u0026thinsp;0.9924 and Kappa\u0026thinsp;=\u0026thinsp;0.9923 values reported for the proposed model confirm robustness that goes beyond accuracy, particularly important when class distributions may shift in real deployment.\u003c/p\u003e"},{"header":"8. Discussion","content":"\u003cp\u003eThe 99.50% test accuracy of EfficientResNetFusion is attributable to three synergistic contributions. First, CLAHE enhancement selectively amplified local contrast in disease-affected regions \u0026mdash; particularly the dark fungal lesions of Anthracnose \u0026mdash; without distorting global color balance, providing the downstream CNN with sharper discriminative cues. Second, SMOTE balancing eliminated the distributional bias that would otherwise cause the model to under-predict the minority Healthy class; the ablation confirms a 17.5 percentage-point accuracy gain from the imbalanced (82%) to the SMOTE+hybrid (99.50%) pipeline. Third, the dual-backbone concatenation captures both EfficientNet-B0's compound-scaled broad semantic features and ResNet18's residual-refined fine-grained texture features, providing a 1,792-dimensional representation that is richer than either backbone alone.\u003c/p\u003e \u003cp\u003eThe comparison among internal baselines is instructive. Simple ViT-B/16 underperforms all CNN-based models (~\u0026thinsp;98.5% estimated test accuracy), consistent with the known limitation of transformer architectures on small datasets where the self-attention mechanism lacks sufficient training data to learn meaningful patch relationships. DenseNet-121 achieved 99.24% validation accuracy but showed instability in later epochs, suggesting sensitivity to the absence of a learning rate scheduler. EfficientViTFusion, despite combining local CNN features with global transformer attention (2,048-dimensional fusion), achieved the same 99.24% validation accuracy as DenseNet-121, suggesting that ViT's patch-based global attention provides diminishing returns when the discriminative information is concentrated in local lesion textures rather than globally distributed patterns.\u003c/p\u003e \u003cp\u003eThe five-method XAI analysis provides strong multi-perspective evidence for decision validity. The consistency of disease-region attribution across Grad-CAM, Saliency Maps, Integrated Gradients, SHAP, and LIME \u0026mdash; five methodologically distinct approaches \u0026mdash; substantially reduces the risk of coincidental attribution artifacts. This is critical for regulatory and practitioner adoption: agricultural AI systems deployed in field settings must demonstrate not only high accuracy but also transparent, explainable reasoning that farm advisors can audit and trust.\u003c/p\u003e \u003cp\u003eLimitations include: (i) the dataset size of 2,647 images is modest for real-world deployment across diverse geographic and seasonal conditions; (ii) SMOTE interpolates in raw pixel space, which may produce semantically inconsistent synthetic samples \u0026mdash; GAN or diffusion-based synthesis could improve synthetic sample quality; (iii) cross-dataset generalization has not been validated, and the model requires evaluation on independently collected datasets before field deployment. Future work will address these limitations through multi-source dataset aggregation, attention-based fusion mechanisms, and mobile edge-deployment optimization.\u003c/p\u003e"},{"header":"9. Conclusion","content":"\u003cp\u003eThis paper introduced EfficientResNetFusion, a novel dual-backbone hybrid deep learning architecture for automated 3-class guava fruit disease detection. By fusing EfficientNet-B0 and ResNet18 feature representations through a regularized fusion MLP head, and coupling the architecture with a principled pipeline of LAB-space CLAHE contrast enhancement and pixel-space SMOTE class balancing, the model achieved a test accuracy of 99.50%, Macro F1\u0026thinsp;=\u0026thinsp;0.9942, MCC\u0026thinsp;=\u0026thinsp;0.9924, Cohen's Kappa\u0026thinsp;=\u0026thinsp;0.9923, and Macro AUC\u0026thinsp;=\u0026thinsp;0.9999 \u0026mdash; new state-of-the-art figures for end-to-end trainable architectures on this task. Internal baseline experiments with DenseNet-121, ViT-B/16, and EfficientViTFusion confirmed the advantage of the proposed dual-CNN approach, while a five-method XAI analysis (Grad-CAM, SHAP, Saliency Maps, Integrated Gradients, LIME) validated which predictions are grounded in agronomically meaningful disease biomarkers. This work provides a reproducible, interpretable, and high-performance foundation for AI-driven guava disease management with direct applicability to precision agriculture systems in tropical farming communities.\u003c/p\u003e"},{"header":"Declarations","content":" \u003cp\u003e \u003cstrong\u003eConflict of Interest:\u003c/strong\u003e \u003cp\u003eAuthor declare no conflict of interest.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eContribution\u003c/h2\u003e \u003cp\u003eMMH conceptualized, reviewed literature, prepared tables and figures, wrote whole manuscript, and finalized research.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEthical Consideration\u003c/strong\u003e \u003cp\u003eNo Animal or Human was involved in this research.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent to Publish\u003c/strong\u003e \u003cp\u003e \u003cb\u003edeclaration\u003c/b\u003e: not applicable\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding:\u003c/h2\u003e \u003cp\u003eNo funding was received for research present in this manuscript\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eMMHS conceptualized, reviewed literature, prepared tables and figures, wrote whole manuscript, and finalized research.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eG\u0026uuml;ler O, Etem T, Teke M. Hybrid augmentation for multi-channel deep learning in guava leaf disease detection. Ain Shams Eng J. 2025;16:103716. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.asej.2025.103716\u003c/span\u003e\u003cspan address=\"10.1016/j.asej.2025.103716\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmed M, Ahmed F, Naz NS, Mazhar T, Khan MA, Ksibi A, Abbas M. Automated guava disease detection using transfer learning with ResNet-101. Food Sci Nutr. 2025;13:e71348. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/fsn3.71348\u003c/span\u003e\u003cspan address=\"10.1002/fsn3.71348\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRay KK, Kumari A, Kumar S, Machavaram R, Shekh I, Deshmukh SM, Tadge P. Guava leaf disease detection using support vector machine (SVM). Smart Agricultural Technol. 2025;12:101190. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.atech.2025.101190\u003c/span\u003e\u003cspan address=\"10.1016/j.atech.2025.101190\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKilci O, Koklu M. (2025). Guava fruit disease classification using deep learning and machine learning models. Agriculture at a University, 26. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.17097/agricultureatauni.1665941\u003c/span\u003e\u003cspan address=\"10.17097/agricultureatauni.1665941\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMostafa AM, Kumar SA, Meraj T, Rauf HT, Alnuaim AA, Alkhayyal MA. Guava disease detection using deep convolutional neural networks: A case study of guava plants. Appl Sci. 2022;12(1):239. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/app12010239\u003c/span\u003e\u003cspan address=\"10.3390/app12010239\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRashid J, Khan I, Ali G, Rehman SU, Alturise F, Alkhalifah T. Real-time multiple guava leaf disease detection from a single leaf using hybrid deep learning technique. Computers Mater Continua. 2023;74(1):1235\u0026ndash;52. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.32604/cmc.2023.032005\u003c/span\u003e\u003cspan address=\"10.32604/cmc.2023.032005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMustak Un Nobi M, Rifat M, Mridha MF, Alfarhood S, Safran M, Che D. GLD-Det: Guava leaf disease detection in real-time using lightweight deep learning approach based on MobileNet. Agronomy. 2023;13(9):2240. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/agronomy13092240\u003c/span\u003e\u003cspan address=\"10.3390/agronomy13092240\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHashan AM, Rahman SMT, Avinash K, Islam RMRU, Dey S. Guava fruit disease identification based on improved convolutional neural network. Int J Electr Comput Eng. 2024;14(2):1544\u0026ndash;51. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.11591/ijece.v14i2.pp1544-1551\u003c/span\u003e\u003cspan address=\"10.11591/ijece.v14i2.pp1544-1551\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlmutiry O, Ayaz M, Sadad T, Lali IU, Mahmood A, Hassan NU, Dhahri H. A novel framework for multi-classification of guava disease. Computers Mater Continua. 2021;69(2):1915\u0026ndash;30. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.32604/cmc.2021.017702\u003c/span\u003e\u003cspan address=\"10.32604/cmc.2021.017702\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShrivastava AK, Tiwari P, Dave AK, Sandya G. AI-driven disease detection in guava plants using teachable machine learning models. Plant Sci Today. 2026;13(sp1):01\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.14719/pst.11037\u003c/span\u003e\u003cspan address=\"10.14719/pst.11037\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMustadji A, Qur'ania A, Saepulrohman A. CNN-based deep learning utilization model for identification of crystal guava leaf diseases. Journal of Intelligent Systems and Machine Learning. JASMINE: Articles in; 2026. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18517/jasmine\u003c/span\u003e\u003cspan address=\"10.18517/jasmine\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRajbongshi A, Sazzad S, Shakil R, Akter B, Sara U. Data Brief. 2022;42:108174. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.dib.2022.108174\u003c/span\u003e\u003cspan address=\"10.1016/j.dib.2022.108174\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. A comprehensive guava leaves and fruits dataset for guava disease recognition.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShihab MR, Saim NI, Mojumdar MU, Raza DM, Siddiquee SMT, Noori SRH, Chakraborty NR. Image dataset for classification of diseases in guava fruits and leaves. Data Brief. 2025;59:111378. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.dib.2025.111378\u003c/span\u003e\u003cspan address=\"10.1016/j.dib.2025.111378\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFarooqui S, Khan ZAN. Deep learning driven disease diagnosis in guava leaves. Int J Eng Res Technol. 2025;14(5). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.17577/IJERTV14IS050144\u003c/span\u003e\u003cspan address=\"10.17577/IJERTV14IS050144\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGalib A. (2023). Guava disease dataset [Data set]. Kaggle. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.kaggle.com/datasets/asadullahgalib/guava-disease-dataset\u003c/span\u003e\u003cspan address=\"https://www.kaggle.com/datasets/asadullahgalib/guava-disease-dataset\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Guava disease detection, hybrid deep learning, EfficientNet-B0, ResNet18, SMOTE, CLAHE, explainable artificial intelligence, Grad-CAM, transfer learning, precision agriculture","lastPublishedDoi":"10.21203/rs.3.rs-9468891/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9468891/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eGuava (Psidium guajava) is one of the most economically and nutritionally significant tropical fruit crops, yet it remains highly vulnerable to fungal and pest-borne diseases that severely diminish yield and commercial quality. Automated and accurate disease detection is a prerequisite for sustainable precision agriculture at scale.\u003c/p\u003e\u003ch2\u003eMethod\u003c/h2\u003e \u003cp\u003eThis paper proposes EfficientResNetFusion, a novel dual-backbone hybrid convolutional neural network that simultaneously leverages the complementary representational strengths of EfficientNet-B0 and ResNet18 through feature-level concatenation followed by a deep fusion classification head. The model was trained and evaluated on the publicly available Kaggle Guava Disease Dataset comprising 2,647 images distributed across three classes: Anthracnose, Fruit Fly damage, and Healthy Guava.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe proposed \u003cb\u003eEfficientResNetFusion\u003c/b\u003e model (EfficientNet-B0\u0026thinsp;+\u0026thinsp;ResNet-18 dual-backbone hybrid) achieved a test accuracy of \u003cb\u003e99.50%\u003c/b\u003e, with a Macro F1-score of 0.9942, Matthews Correlation Coefficient (MCC) of 0.9924, Cohen's Kappa of 0.9923, and Macro AUC of 0.9999. These results surpass all evaluated baseline architectures: \u003cb\u003eGuavaDenseNet\u003c/b\u003e (DenseNet-121) achieved a best validation accuracy of 99.24%, \u003cb\u003eEfficientViTFusion\u003c/b\u003e (EfficientNet-B0\u0026thinsp;+\u0026thinsp;ViT-B/16) reached 99.24%, and \u003cb\u003eSimpleViT\u003c/b\u003e (ViT-B/16) attained 98.99% \u0026mdash; demonstrating that the proposed dual-backbone fusion architecture outperforms prior single-architecture transfer learning and traditional machine learning baselines on the same guava disease classification task\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eTo promote clinical and agricultural transparency, five complementary Explainable AI (XAI) techniques were applied: Gradient-weighted Class Activation Mapping (Grad-CAM), SHAP violin analysis, Saliency Maps, Integrated Gradients, and LIME super-pixel analysis. Ablation experiments confirm that SMOTE improved balanced accuracy from 82% to 94% prior to model enhancement.\u003c/p\u003e","manuscriptTitle":"EfficientResNetFusion: Hybrid Deep Learning Architecture with Multi-Method Explainability for Guava Fruit Disease Classification","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-29 06:16:54","doi":"10.21203/rs.3.rs-9468891/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"4f5f99ae-b6d6-49f5-9a67-7bfaec677cbd","owner":[],"postedDate":"April 29th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-29T06:17:04+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-29 06:16:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9468891","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9468891","identity":"rs-9468891","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0