Vision Transformer-based diabetic foot ulcer classification for mobile deployment: development, validation, and implementation of an iOS clinical decision support tool | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Vision Transformer-based diabetic foot ulcer classification for mobile deployment: development, validation, and implementation of an iOS clinical decision support tool Phap Tran Ngoc Hoang, Thien Vu, Research Dawadi, Bao Tran Ngoc Hoang, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8313326/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Diabetic foot ulcers (DFUs) affect 15–25% of diabetic patients and frequently lead to amputation. Accurate classification into infection, ischaemia, or combined pathology guides treatment decisions, but visual assessment remains challenging. Mobile-based clinical decision support could improve triage in primary care and resource-limited settings. This study aimed to develop and validate a Vision Transformer-based DFU classification system suitable for mobile point-of-care deployment. Methods We trained a Vision Transformer (ViT-Small, 22M parameters) on the publicly available DFUC2021 dataset (n = 5,955 images) with four classes: Normal, Infection, Ischaemia, and Both. Data were split 70/15/15 for training, validation, and held-out testing using stratified sampling. We addressed class imbalance using weighted cross-entropy loss. Performance was evaluated using macro-F1, macro-AUC, Cohen's kappa, and Brier scores with bootstrap confidence intervals. External validation was performed on the independent DFU_Kaggle dataset (n = 1,055). Ablation studies quantified component contributions. The model was deployed as a CoreML package in an iOS application. Results The ViT-Small achieved macro-F1 of 0.9015 (95% CI: 0.871–0.926) and macro-AUC of 0.9834 on the held-out test set. Cohen's kappa was 0.836 indicating substantial agreement. All four classes achieved F1 scores above 88%, including the minority Ischaemia class (3.8% of data). External validation yielded ROC-AUC of 0.951. Ablation studies showed weighted loss contributed + 2.9% and the ViT architecture + 4.4% over EfficientNet-B2. The identical model was deployed as a 41 MB iOS application with 50–80 ms inference time, preserving full accuracy for clinical use. Conclusions We demonstrate a complete pipeline from model development through external validation to mobile deployment for DFU classification. The system achieves strong classification performance including on minority classes critical for clinical decision-making. By deploying the full-accuracy model without compression, the mobile application maintains research-grade performance for point-of-care use. Medical Informatics Diabetic foot ulcer Vision Transformer Deep learning Mobile health Clinical decision support CoreML External validation Class imbalance Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Background Diabetic foot ulcers (DFUs) represent one of the most serious complications of diabetes mellitus. They affect 15–25% of diabetic patients during their lifetime, with substantial impact on quality of life and healthcare costs ( 1 – 3 ). The clinical consequences are severe: approximately 50–60% of DFUs become infected, and 15–20% of moderate to severe infections result in lower extremity amputation ( 4 – 6 ). These amputations carry devastating consequences, with 5-year mortality rates exceeding 50% post-amputation, comparable to many aggressive malignancies ( 1 ). The clinical classification of DFUs requires distinguishing between several pathological states: infection (characterized by inflammatory signs, purulent discharge, and tissue destruction), ischaemia (resulting from peripheral arterial disease with manifestations including pallor, delayed capillary refill, and tissue necrosis), and combined pathology where both conditions coexist ( 7 , 8 ). However, visual differentiation between early infection and normal healing tissue presents significant challenges even for experienced clinicians, as inflammatory responses in healing wounds can mimic early infection ( 9 , 10 ). Similarly, subtle ischaemic changes—often manifesting as mild pallor, temperature differences, or delayed blanching—may be overlooked in routine clinical assessments ( 8 ). Mobile-based clinical decision support could help address this challenge, particularly in resource-limited settings where specialist access is constrained ( 11 ). A smartphone application capable of real-time DFU classification could assist primary care providers and community health workers in triaging patients appropriately—identifying those who need urgent specialist referral versus those who can be managed with routine wound care. Vision Transformers (ViTs) have shown promise for medical imaging tasks ( 12 , 13 ). Unlike convolutional neural networks that rely on local receptive fields, ViTs use self-attention to capture global dependencies across image regions ( 14 ). We hypothesized this architecture might benefit DFU classification, where relationships between wound bed, surrounding tissue, and anatomical context inform clinical assessment. In this study, we aimed to develop and validate a DFU classification system suitable for mobile deployment. Our objectives were: ( 1 ) train a Vision Transformer model on the DFUC2021 dataset with rigorous held-out evaluation; ( 2 ) validate generalizability on an independent external dataset; ( 3 ) understand which training components contribute most through ablation studies; ( 4 ) assess probability calibration for clinical decision support; and ( 5 ) deploy the full-accuracy model in an iOS application without sacrificing performance for reduced model size. Methods Datasets Training and evaluation dataset (DFUC2021): The Diabetic Foot Ulcer Challenge 2021 dataset served as the primary development and evaluation resource. This dataset was curated by Cassidy et al. for the MICCAI 2021 Grand Challenge on diabetic foot ulcer classification (15, 16). The dataset contains 5,955 labeled DFU images distributed across four mutually exclusive categories: Normal (n=2,552, 42.9%) representing ulcers without clinical signs of infection or ischaemia; Infection (n=2,555, 42.9%) representing ulcers with clinical signs of infection including erythema, edema, warmth, purulent discharge, or malodor; Ischaemia (n=227, 3.8%) representing ulcers with signs of compromised perfusion including pallor, delayed capillary refill, or necrotic tissue; and Both (n=621, 10.4%) representing ulcers exhibiting combined infection and ischaemia pathology (Figure 1). External validation dataset (DFU_Kaggle): For external validation, we utilized the publicly available DFU_Kaggle dataset containing 1,055 images with binary labels (543 Normal, 512 Ulcer). This dataset differs from DFUC2021 in imaging protocols, patient demographics, and annotation criteria. Notably, the "Ulcer" category in DFU_Kaggle encompasses various pathologies including ischemic and staging variability without fine-grained infection/ischaemia labels. This heterogeneity provides a stringent test of model generalizability across annotation schemas. Data partitioning and preprocessing The DFUC2021 dataset was partitioned into training, validation, and test sets using stratified random sampling to maintain class proportions across splits. We employed a 70/15/15 split ratio: training set (n=4,169), validation set (n=893), and test set (n=893). Stratification ensured each split contained representative samples of all four classes, critical given the severe class imbalance. No patient-level overlap or duplicate images existed across partitions—each unique image appeared in exactly one split. The test set (subsequently adjusted to n=894 after final sampling) was held out completely throughout model development and hyperparameter tuning. All architecture selection, hyperparameter optimization, and ablation decisions were made exclusively using training and validation sets. The test set was evaluated only once with the final selected model configuration, preventing overfitting to test data through repeated evaluation. Image Preprocessing: All images were resized to 224×224 pixels using bilinear interpolation to match Vision Transformer input requirements. Pixel values were normalized to [0, 1] range and standardized using ImageNet statistics (mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]) for compatibility with pretrained weights. No dataset-specific normalization was applied to preserve transferability. Data Augmentation: Training data underwent on-the-fly augmentation to increase effective training distribution coverage and reduce overfitting. The augmentation pipeline included: random horizontal flips (p=0.5), random rotations up to ±15 degrees, random scale variations (0.9–1.1×), color jitter (brightness ±0.2, contrast ±0.2, saturation ±0.2, hue ±0.1), and random cropping with padding. Augmentations were applied only during training; validation and test evaluations used only preprocessing without augmentation to enable reproducible evaluation. Model architecture Vision Transformer (ViT-Small) The primary model employed the Vision Transformer architecture as introduced by Dosovitskiy et al.(12). We specifically utilized the ViT-Small variant containing 22 million parameters, offering a balance between model capacity and computational efficiency suitable for both research evaluation and mobile deployment. The ViT-Small architecture processes input images by: (1) dividing the 224×224 image into non-overlapping 16×16 patches, yielding 196 patch tokens; (2) linearly projecting each patch into a 384-dimensional embedding; (3) prepending a learnable [CLS] classification token; (4) adding learnable positional embeddings to preserve spatial information; (5) processing the sequence through 12 transformer encoder layers, each containing multi-head self-attention (6 heads) and feed-forward networks with GELU activation; and (6) extracting the final [CLS] token representation for classification through a linear projection to 4 output classes. The model was initialized with weights pretrained on ImageNet-21k (14 million images, 21,843 classes), providing robust low-level visual features and mid-level semantic representations. The final classification head was randomly initialized and trained from scratch for the 4-class DFU classification task. All parameters were fine-tuned during training (full fine-tuning rather than frozen backbone). EfficientNet-B2 For architectural comparison, we trained EfficientNet-B2 under identical conditions (17). EfficientNet-B2 contains 9.2 million parameters and represents state-of-the-art CNN architecture using compound scaling of depth, width, and resolution. The model was initialized with ImageNet-1k pretrained weights and fine-tuned identically to ViT-Small. Training procedure Models were trained using the AdamW optimizer with decoupled weight decay regularization (18). The initial learning rate was set to 1×10⁻⁴ with weight decay coefficient of 0.01. Learning rate scheduling employed cosine annealing with warm restarts, gradually decreasing the learning rate following a cosine curve to facilitate convergence to flat minima associated with improved generalization. Training proceeded for a maximum of 25 epochs with early stopping based on validation macro-F1. The best checkpoint was selected based on highest validation macro-F1, typically occurring around epoch 20. Batch size was set to 12 due to GPU memory constraints. Gradient clipping with max norm 1.0 prevented gradient explosion. Dropout with probability 0.1 was applied in transformer attention layers for regularization. To address severe class imbalance, we employed weighted cross-entropy loss with class weights inversely proportional to class frequency. Specifically, weights were computed as w_c = N / (C × n_c), where N is total samples, C is number of classes, and n_c is samples in class c (19, 20). This weighting ensures the optimization objective equally prioritizes all classes regardless of prevalence, preventing the model from optimizing primarily for majority classes (Normal, Infection) at the expense of minority classes (Ischaemia, Both) Evaluation metrics Primary metric was macro-F1 (unweighted mean of per-class F1 scores), which equally weights all classes regardless of prevalence (21). Secondary metrics included: macro-AUC (one-vs-rest); Cohen's kappa for chance-corrected agreement (22); and per-class Brier scores for probability calibration (23, 24). Statistical Analysis We computed 95% confidence intervals for macro-F1 and macro-AUC were computed using non-parametric bootstrap resampling with 1,000 iterations. Each bootstrap sample was drawn with replacement from the test set, maintaining original sample size. Percentile method (2.5th and 97.5th percentiles) determined interval bounds (25). Pairwise model comparison between ViT-Small and EfficientNet-B2 used McNemar's test for paired binary outcomes (26). This test evaluates whether discordant predictions (cases where one model is correct and the other incorrect) are asymmetrically distributed. The test statistic χ² = (b - c)² / (b + c), where b and c are counts of discordant predictions, follows a chi-squared distribution with 1 degree of freedom under the null hypothesis of equal accuracy. Statistical significance was determined at α = 0.05, with p < 0.001 considered highly significant. Model interpretability Gradient-weighted Class Activation Mapping (Grad-CAM) was employed to visualize model attention patterns (27). Grad-CAM computes importance weights by global average pooling the gradients of the target class score with respect to feature map activations, then produces a localization map highlighting discriminative regions. For Vision Transformers, we applied Grad-CAM to the final attention layer, visualizing which image patches receive highest attention for each classification decision. Mobile deployment The trained ViT-Small model was converted to CoreML format (.mlpackage) for iOS deployment using coremltools . The conversion pipeline included: (1) tracing the PyTorch model with representative input; (2) converting to CoreML using the neural network converter with FP16 precision for size optimization; (3) validating numerical consistency between PyTorch and CoreML predictions. The final CoreML package size was 41 MB. The iOS application was developed in Swift using SwiftUI for the user interface and Vision framework for camera integration. On-device inference utilizes the Apple Neural Engine when available, falling back to GPU computation on devices without dedicated neural hardware. The application architecture follows offline-first principles, performing all inference locally without network connectivity requirements. Results Training Dynamics and Model Convergence The ViT-Small model was trained for 25 epochs with early stopping based on validation macro-F1 score. Figure 2 illustrates the training dynamics, showing the evolution of training loss, validation loss, and validation macro-F1 across training epochs. The model demonstrated stable convergence with training loss decreasing monotonically from an initial value of approximately 1.2 to below 0.1. Validation loss exhibited typical behavior with initial decrease followed by stabilization around epoch 15, indicating the model reached generalization capacity without substantial overfitting. The validation macro-F1 score increased progressively, reaching optimal performance at epoch 20. The best checkpoint, selected based on maximum validation macro-F1, was retained for all subsequent evaluations. The gap between training and validation metrics remained modest throughout training, suggesting that the regularization strategies (weight decay, dropout, and data augmentation) effectively controlled overfitting despite the relatively small dataset size. Classification performance Table 1 summarizes performance on the held-out test set (n=894). The ViT-Small achieved macro-F1 of 0.9015 (95% CI: 0.871–0.926) and macro-AUC of 0.9834 (95% CI: 0.977–0.989). Cohen's kappa of 0.8356 indicates substantial agreement between predictions and ground truth. The overall Brier score of 0.1519 suggests reasonable probability calibration. Table 1. Classification performance on held-out test set (n=894). 95% CIs from 1,000 bootstrap resamples. Metric Value 95% CI Macro-F1 0.9015 0.871–0.926 Macro-AUC 0.9834 0.977–0.989 Cohen's κ 0.8356 — Brier Score 0.1519 — Architecture comparison To rigorously establish the superiority of ViT-Small over alternative architectures, we performed McNemar's test comparing paired predictions between ViT-Small and EfficientNet-B2 (trained under identical conditions). McNemar's test comparing ViT-Small and EfficientNet-B2 (trained identically) yielded χ²=13.26, p=2.71×10⁻⁴, confirming statistically significant superiority of the Vision Transformer architecture. ViT-Small outperformed EfficientNet-B2 by 4.4 percentage points in macro-F1 (0.9015 vs. 0.8574). This statistical significance, combined with the non-overlapping bootstrap confidence intervals, provides strong evidence that the performance improvement reflects true architectural advantages rather than random variation. Per-class performance Table 2 presents detailed per-class performance metrics, revealing consistent high performance across all four classification categories despite severe class imbalance. For the Normal class (n=382 test samples, 42.7%), the model achieved precision of 88.4% and recall of 91.1%, yielding an F1 score of 89.7%. The high recall indicates the model rarely misses normal cases, which is important for avoiding unnecessary treatment escalation. For the Infection class (n=384 test samples, 43.0%), precision of 90.4% and recall of 88.0% produced an F1 score of 89.2%. The slightly lower recall compared to Normal reflects the inherent difficulty distinguishing early infection from inflammatory healing responses. Notably, the model achieved strong performance on minority classes despite their limited representation. For Ischaemia (n=34 test samples, 3.8%), precision and recall both reached 88.2%, yielding F1 of 88.2% and the highest class-specific AUC of 0.993. This exceptional discriminative ability for ischaemia detection is clinically significant, as ischaemic ulcers require urgent vascular intervention and delayed identification substantially reduces limb salvage success rates. For the Both category (combined infection and ischaemia, n=94 test samples, 10.5%), the model achieved the highest per-class metrics with precision 94.5%, recall 92.5%, F1 93.5%, and AUC 0.994. These combined pathology cases, representing the most severe clinical presentations, were classified with outstanding accuracy. Table 2. Per-class classification metrics on held-out test set. Class Precision Recall F1 AUC Normal 88.4% 91.1% 89.7% 0.974 Infection 90.4% 88.0% 89.2% 0.972 Ischaemia 88.2% 88.2% 88.2% 0.993 Both 94.5% 92.5% 93.5% 0.994 Confusion Matrix and Error analysis Figure 6 presents the normalized confusion matrix, revealing the distribution of classification errors across categories. Of 894 test samples, the model correctly classified 803 (89.8%) with only 91 errors (10.2% error rate). The confusion matrix reveals that Normal↔Infection misclassification accounts for approximately 80% of all errors (73/91 cases), reflecting the inherent visual similarity between early infection and normal healing tissue—a challenge that confronts human clinicians as well. Specifically, 31 Normal cases were misclassified as Infection, while 42 Infection cases were misclassified as Normal. This bidirectional confusion pattern suggests visual overlap between inflammatory healing responses and early infectious changes. Table 3 provides detailed error distribution analysis. Importantly, Ischaemia errors were minimal despite severe class imbalance—only 4 of 34 ischaemic cases (11.8%) were misclassified, with 3 confused as Normal and 1 as Infection. Given that ischaemia detection is clinically critical for triggering vascular intervention, this low error rate for minority-class ischaemia is particularly encouraging. The Both category showed the lowest error rate at 7.5% (7/94), with most errors (5 cases) involving misclassification as Infection (correctly identifying at least one pathology component). Table 3. Error distribution (91 total errors, 10.2% overall). True Class Errors (Rate) Primary Confusion Normal (n=382) 34 (8.9%) 31 → Infection Infection (n=384) 46 (12.0%) 42 → Normal Ischaemia (n=34) 4 (11.8%) 3 → Normal Both (n=94) 7 (7.5%) 5 → Infection Figure 7 presents exemplar misclassifications showing the three highest-confidence errors for each class. Visual inspection reveals that misclassified cases often exhibit ambiguous features that could reasonably support multiple interpretations—cases at the boundary between Normal healing and early Infection display subtle inflammatory changes that challenge even expert annotation. These edge cases represent inherent diagnostic uncertainty rather than systematic model failure. Calibration analysis We assessed model calibration using Brier scores and reliability diagrams. The Brier score measures the mean squared error between predicted probabilities and actual outcomes, with lower scores indicating better calibration (perfect calibration = 0, random guessing = 0.25 for binary classification). Table 4 presents per-class Brier scores. The overall Brier score of 0.1519 indicates reasonable calibration suitable for clinical decision support. Notably, minority classes showed excellent calibration: Ischaemia achieved a Brier score of 0.0081 and Both achieved 0.0106, indicating highly reliable probability estimates for these critical categories. The majority classes (Normal: 0.0656, Infection: 0.0674) showed moderately higher Brier scores, reflecting the greater uncertainty in distinguishing these visually similar categories. Table 4. Brier scores by class (lower indicates better calibration). Class Brier Score Class Brier Score Normal 0.0656 Ischaemia 0.0081 Infection 0.0674 Both 0.0106 For clinical deployment scenarios where a probability threshold (e.g., 0.5) triggers urgent specialist referral, these calibration results suggest reliable decision support. When the model predicts infection or ischaemia with high confidence, clinicians can trust these predictions warrant immediate attention. Temperature scaling or Platt calibration may further refine probability estimates for institution-specific threshold optimization. Model interpretability To provide clinical transparency and support model validation, we generated Gradient-weighted Class Activation Mapping (Grad-CAM) attention maps visualizing regions the model prioritizes for each classification decision. Figure 8 presents representative attention maps for correctly classified cases across all four categories. The attention patterns demonstrate clinically interpretable behavior: for Infection cases, attention concentrates on regions showing erythema, swelling, and purulent discharge; for Ischaemia cases, attention focuses on pallid tissue, eschar, and areas suggesting compromised perfusion; for Normal cases, attention distributes more broadly across healthy granulation tissue; for Both cases, attention appropriately spans regions exhibiting both inflammatory and ischaemic features. While these attention patterns align qualitatively with clinical diagnostic criteria, we acknowledge that quantitative evaluation of interpretability reliability and consistency across similar cases remains an area for future investigation. External validation To assess cross-dataset generalizability—a critical requirement for clinical translation—we evaluated the trained model on the independent DFU_Kaggle dataset (n=1,055 images). This external dataset differs from DFUC2021 in image acquisition protocols, patient demographics, and annotation procedures, providing a stringent test of model robustness to distribution shift. Because DFU_Kaggle provides only binary labels (Ulcer vs. Normal) without fine-grained infection/ischaemia annotations, we performed binary classification by mapping DFUC2021 classes accordingly: Normal → Normal; Infection, Ischaemia, Both → Ulcer. Table 5 presents external validation results. Table 5. External validation on DFU_Kaggle (n=1,055). ROC-AUC: 0.951. Class Precision Recall F1 n Normal 71% 99% 83% 543 Ulcer 99% 57% 73% 512 The external validation achieved ROC-AUC of 0.951, demonstrating excellent discriminative ability that transfers across datasets despite distribution differences. The asymmetric precision-recall pattern (Normal: 71% precision / 99% recall; Ulcer: 99% precision / 57% recall) indicates the model is conservative in ulcer detection on this external dataset, showing very high specificity but moderate sensitivity. This pattern partially reflects annotation differences—DFU_Kaggle's "ulcer" category includes ischaemic and staging variability that may not align with DFUC2021's multi-class definitions. The high ROC-AUC despite this label heterogeneity suggests robust learned representations. Domain adaptation techniques may further improve cross-institutional performance by explicitly accounting for annotation protocol differences. Ablation studies To quantify the contribution of each training component to overall performance, we conducted systematic ablation experiments removing individual components while maintaining all others constant. Table 6 presents ablation study results. Table 6. Ablation study results showing component contributions. Configuration Macro-F1 Macro-AUC Δ F1 ViT-Small (full) 0.9015 0.9834 — Without weighted loss 0.8723 0.9712 -2.92% Without augmentation 0.8856 0.9789 -1.59% EfficientNet-B2 0.8574 0.9644 -4.41% Weighted Cross-Entropy Loss (+2.9% macro-F1): Removing class-balanced loss weighting reduced macro-F1 from 0.9015 to 0.8723. This 2.9 percentage point degradation primarily manifests in minority class performance—without weighting, the model optimizes predominantly for majority classes (Normal, Infection), sacrificing Ischaemia and Both detection. The weighted loss function, which assigns inverse class frequency weights, ensures the optimization objective balances all classes appropriately. Data Augmentation (+1.6% macro-F1): Removing augmentation (random rotations, flips, color jitter, scale variations) reduced macro-F1 from 0.9015 to 0.8856. This 1.6 percentage point improvement demonstrates that augmentation effectively expands training distribution coverage, reducing overfitting and improving generalization to test-time variations in image orientation, lighting, and scale. Vision Transformer Architecture (+4.4% over CNN): Comparing ViT-Small against EfficientNet-B2 trained under identical conditions (same loss function, augmentation, optimizer, and training duration) reveals a 4.4 percentage point advantage for the transformer architecture (0.9015 vs. 0.8574 macro-F1). This architectural advantage likely reflects ViT's global attention mechanism, which captures long-range spatial dependencies and contextual relationships between wound regions that local convolution operations may miss. Mobile application We deployed the same ViT-Small model—without compression or distillation—in an iOS application (Figure 9). We deliberately chose to preserve full accuracy (90.15% macro-F1) rather than reduce model size. While EfficientNet-B0 would be smaller (7.8 MB vs. 41 MB), its accuracy is substantially lower (83.46% macro-F1). For clinical applications where classification errors have real consequences, we believe accuracy takes priority over storage (11). Table 7. Mobile application specifications (tested on iPhone 12 Pro). Specification Value Notes Model ViT-Small 22M parameters Model size 41 MB CoreML .mlpackage Inference time 50–80 ms Apple Neural Engine Frame rate 12–20 FPS Every 10th frame Memory 200–300 MB Peak usage Accuracy 90.15% Identical to research model The application includes clinical workflow features: structured assessment forms aligned with IWGDF guidelines (9, 10), patient management, referral generation with AI-derived urgency prioritization, and offline-first architecture for connectivity-limited settings. Discussion This study demonstrates a complete pipeline from model development through external validation to mobile deployment for DFU classification. Using the publicly available DFUC2021 dataset, we trained a Vision Transformer achieving 90.15% macro-F1 on held-out evaluation, validated generalizability on an external dataset (ROC-AUC 0.951), and deployed the full-accuracy model in an iOS application. Our results compare favorably with prior work on automated DFU analysis. Previous deep learning approaches for DFU classification have reported varying performance levels depending on the task and dataset. Goyal et al. achieved 80% accuracy using CNNs for DFU detection ( 28 ), while Alzubaidi et al. reported F1 scores ranging from 0.72 to 0.85 for wound segmentation and classification tasks ( 29 ). The DFUC2021 challenge, which used the same dataset as our study, saw submissions achieving macro-F1 scores up to 0.63 using ensemble approaches ( 16 ). Our single ViT-Small model achieving 0.90 macro-F1 suggests that transformer architectures may offer advantages for this task, though direct comparison is limited by differences in evaluation protocols. Notably, few prior studies have addressed the complete pipeline from model development through external validation to mobile deployment, which we consider essential for clinical translation. Several findings have practical implications for DFU classifier development. First, addressing class imbalance is essential. The DFUC2021 dataset reflects real-world epidemiology where ischaemia represents only 3.8% of cases ( 7 ). Without weighted loss, our model essentially ignored this minority class. Since ischaemia detection triggers vascular intervention that may prevent amputation, any practical DFU classifier must handle imbalance explicitly ( 19 , 20 ). Second, the Vision Transformer architecture appears well-suited to this task. Under identical training conditions, ViT-Small outperformed EfficientNet-B2 by 4.4 percentage points. We hypothesize this reflects the global attention mechanism's ability to integrate information across the entire wound image( 12 – 14 ). Clinical DFU assessment involves relating wound bed, surrounding tissue, and anatomical context—relationships that self-attention captures directly. Third, the dominant error pattern—Normal↔Infection confusion—reflects a genuine clinical challenge. Early infection and normal healing both involve inflammatory responses, and distinguishing them visually is difficult even for specialists ( 9 , 10 ). This suggests we may be approaching ceiling performance for single-image visual classification. Further improvement might require additional modalities: thermal imaging, patient history, laboratory markers, or longitudinal imaging. External validation showed strong ranking ability (AUC 0.951) despite distribution shift, suggesting learned representations generalize across datasets ( 30 , 31 ). However, the conservative behavior (high specificity, moderate sensitivity) indicates threshold calibration would be needed for deployment in different clinical settings ( 24 ). For mobile deployment, we prioritized accuracy over model size. While smaller models would reduce storage requirements, the accuracy gap is substantial for medical applications ( 11 ). Modern smartphones handle 41 MB models without difficulty, and 50–80 ms inference is sufficient for real-time use. This study has several limitations. First, training data comes from a single publicly available dataset with specific imaging protocols and likely limited geographic diversity, potentially limiting generalizability to other clinical settings and patient populations. Second, external validation used a dataset with only binary labels, preventing direct assessment of multi-class generalization to infection and ischaemia categories. Third, we did not perform temporal or longitudinal validation to assess model stability across different time periods or wound progression stages. Fourth, we did not compare model performance against clinician diagnostic accuracy; such comparison would be valuable for understanding the clinical utility of AI-assisted diagnosis. Fifth, the interpretability analysis is qualitative—formal validation of attention patterns against expert annotations remains future work. Finally, and most importantly, prospective clinical validation with patient outcome tracking is needed before deployment to establish that improved classification accuracy translates to better clinical decisions and patient outcomes. Future Work The logical next step for this work is prospective clinical validation comparing AI-assisted diagnosis against clinician performance. We plan to conduct studies where primary care providers assess DFU images both with and without AI support, measuring diagnostic accuracy, confidence, and time-to-decision against specialist ground truth. Such head-to-head comparison would establish whether the system genuinely augments clinical judgment or merely provides redundant information. Randomized controlled trials could then evaluate downstream patient outcomes—time-to-appropriate-treatment, ulcer healing rates, amputation incidence, and healthcare utilization—to determine whether improved classification accuracy translates to meaningful clinical benefit. Beyond diagnostic support, we envision developing an integrated referral system connecting different levels of care. In this model, community health workers or primary care providers would capture wound images using the mobile application, receive AI-generated classification and urgency scores, and transmit structured referrals to specialists when indicated. The system would facilitate triage by prioritizing patients with detected ischaemia or combined pathology for urgent vascular consultation, while routing uncomplicated infected ulcers to appropriate antimicrobial management. Such a tiered referral pathway could be particularly valuable in resource-limited settings where specialist access is constrained, enabling earlier intervention for high-risk cases while reducing unnecessary referrals for routine wounds. To support reliable deployment across diverse clinical environments, additional technical development is needed. Multi-center validation across different geographic regions, patient populations, and imaging equipment would identify domain shifts requiring adaptation. Standardized annotation protocols developed collaboratively across institutions would reduce inter-rater variability and enable consistent model updates. Post-hoc calibration methods could optimize probability thresholds for institution-specific referral criteria. Ultimately, the goal is a robust clinical decision support system that assists—rather than replaces—clinician judgment, providing reliable second opinions that improve care quality particularly where specialist expertise is scarce. Conclusions We developed and validated a Vision Transformer-based DFU classification system achieving 90.15% macro-F1, with strong performance on minority classes critical for clinical decision-making. External validation confirmed cross-dataset generalizability. By deploying the identical full-accuracy model in an iOS application without compression, the system maintains research-grade performance for point-of-care clinical use. This work demonstrates a pathway from deep learning development through rigorous validation to practical mobile deployment for medical imaging applications. Abbreviations DFU: Diabetic foot ulcer ViT: Vision Transformer CNN: Convolutional neural network AUC: Area under the receiver operating characteristic curve CI: Confidence interval IWGDF: International Working Group on the Diabetic Foot FPS: Frames per second Declarations Acknowledgements The author thanks the creators of the DFUC2021 and DFU_Kaggle datasets for making their data publicly available for research. Authors' contributions P.T.N.H conceived and designed the study, developed and trained the deep learning models, performed all experiments and statistical analyses, and wrote the manuscript. P.T.N.H, B.T.N.H: developed the iOS mobile application. Supervision: N.Y. Reviewing and editing: P.T.N.H, T.V, R.D, B.T.N.H, N.Y. All authors critically revised and approved the final version of the manuscript. Funding This research received no specific funding from any agency in the public, commercial, or not-for-profit sectors. Data availability The DFUC2021 dataset is publicly available from https://dfu-challenge.github.io/. The DFU_Kaggle dataset is available at https://www.Kaggle.com/datasets/laithjj/diabetic-foot-ulcer-dfu. Training code, model weights, and iOS application source code are available from the corresponding author on reasonable request. Ethics approval and consent to participate This study used publicly available, anonymized datasets (DFUC2021 and DFU_Kaggle). No human subjects were directly involved, and no new data collection was performed. As retrospective analysis of publicly available, anonymized data, institutional review board approval was not required. Consent for publication Not applicable. This study used only publicly available anonymized datasets. Relevant guidelines and regulations Not applicable. Competing interests The author declares no competing interests. Author detail 1 Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan. 2 Institute of Medical and Pharmaceutical Education, Thu Dau Mot University, 06 Tran Van On, Phu Hoa Ward, Thu Dau Mot, Binh Duong, Vietnam. 3 Department of Diagnostic Imaging, Children’s Hospital 2, 14 Ly Tu Trong, Ben Nghe Ward, District 1, Ho Chi Minh, Viet Nam References Armstrong David G, Boulton Andrew JM, Bus Sicco A. Diabetic Foot Ulcers and Their Recurrence. New England Journal of Medicine. 2017;376(24):2367–75.10.1056/NEJMra1615439 McDermott K, Fang M, Boulton AJM, Selvin E, Hicks CW. Etiology, Epidemiology, and Disparities in the Burden of Diabetic Foot Ulcers. Diabetes Care. 2022;46(1):209–21.10.2337/dci22-0043 Singh N, Armstrong DG, Lipsky BA. Preventing Foot Ulcers in Patients With Diabetes. JAMA. 2005;293(2):217–28.10.1001/jama.293.2.217 Lipsky BA, Senneville É, Abbas ZG, Aragón-Sánchez J, Diggle M, Embil JM, et al. Guidelines on the diagnosis and treatment of foot infection in persons with diabetes (IWGDF 2019 update). Diabetes/Metabolism Research and Reviews. 2020;36(S1):e3280.https://doi.org/10.1002/dmrr.3280 Edmonds M, Manu C, Vas P. The current burden of diabetic foot disease. Journal of Clinical Orthopaedics and Trauma. 2021;17:88–93.https://doi.org/10.1016/j.jcot.2021.01.017 Armstrong DG, Tan TW, Boulton AJM, Bus SA. Diabetic Foot Ulcers: A Review. JAMA. 2023;330(1):62–75.10.1001/jama.2023.10578 Mills JL, Conte MS, Armstrong DG, Pomposelli FB, Schanzer A, Sidawy AN, et al. The Society for Vascular Surgery Lower Extremity Threatened Limb Classification System: Risk stratification based on Wound, Ischemia, and foot Infection (WIfI). Journal of Vascular Surgery. 2014;59(1):220–34.e2.https://doi.org/10.1016/j.jvs.2013.08.003 Conte MS, Bradbury AW, Kolh P, White JV, Dick F, Fitridge R, et al. Global vascular guidelines on the management of chronic limb-threatening ischemia. Journal of Vascular Surgery. 2019;69(6, Supplement):3S–125S.e40.https://doi.org/10.1016/j.jvs.2019.02.016 Bus SA, Lavery LA, Monteiro-Soares M, Rasmussen A, Raspovic A, Sacco ICN, et al. Guidelines on the prevention of foot ulcers in persons with diabetes (IWGDF 2019 update). Diabetes/Metabolism Research and Reviews. 2020;36(S1):e3269.https://doi.org/10.1002/dmrr.3269 Schaper NC, van Netten JJ, Apelqvist J, Bus SA, Hinchliffe RJ, Lipsky BA, et al. Practical Guidelines on the prevention and management of diabetic foot disease (IWGDF 2019 update). Diabetes/Metabolism Research and Reviews. 2020;36(S1):e3266.https://doi.org/10.1002/dmrr.3266 Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine. 2019;17(1):195.10.1186/s12916-019-1426-2 Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020 Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, et al. Transformers in medical imaging: A survey. Medical Image Analysis. 2023;88:102802.https://doi.org/10.1016/j.media.2023.102802 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30 Yap MH, Cassidy B, Pappachan JM, O'Shea C, Gillespie D, Reeves ND. Analysis Towards Classification of Infection and Ischaemia of Diabetic Foot Ulcers. 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). 2021:1–4 Cassidy B, Kendrick C, Reeves ND, Pappachan JM, O’Shea C, Armstrong DG, et al., editors. Diabetic Foot Ulcer Grand Challenge 2021: Evaluation and Summary2022; Cham: Springer International Publishing. Tan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Kamalika C, Ruslan S, editors. Proceedings of the 36th International Conference on Machine Learning; Proceedings of Machine Learning Research: PMLR; 2019. p. 6105––14. Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint arXiv:171105101. 2017 Lin TY, Goyal P, Girshick R, He K, Dollár P, editors. Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV); 2017 22–29 Oct. 2017. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. Journal of Big Data. 2019;6(1):27.10.1186/s40537-019-0192-5 Japkowicz N, Shah M. Evaluating Learning Algorithms: A Classification Perspective. Cambridge: Cambridge University Press; 2011. Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20(1):37–46.10.1177/001316446002000104 BRIER GW. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY. Monthly Weather Review. 1950;78(1):1–3.https://doi.org/10.1175/1520-0493(1950)0782.0.CO;2 Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning - Volume 70; Sydney, NSW, Australia: JMLR.org; 2017. p. 1321–30. Efron B, Tibshirani RJ. An introduction to the bootstrap: Chapman and Hall/CRC; 1994. McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12(2):153–7.10.1007/BF02295996 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D, editors. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV); 2017 22–29 Oct. 2017. Goyal M, Reeves ND, Rajbhandari S, Ahmad N, Wang C, Yap MH. Recognition of ischaemia and infection in diabetic foot ulcers: Dataset and techniques. Computers in Biology and Medicine. 2020;117:103616.https://doi.org/10.1016/j.compbiomed.2020.103616 Alzubaidi L, Fadhel MA, Oleiwi SR, Al-Shamma O, Zhang J. DFU_QUTNet: diabetic foot ulcer classification using novel deep convolutional neural network. Multimedia Tools and Applications. 2020;79(21):15655–77.10.1007/s11042-019-07820-w Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND. Dataset Shift in Machine Learning: The MIT Press; 2009. Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics. 2019;21(2):345–52.10.1093/biostatistics/kxz041 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8313326","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":557373003,"identity":"003964c4-b097-454f-85d0-b30f02065979","order_by":0,"name":"Phap Tran Ngoc Hoang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA40lEQVRIiWNgGAWjYPCCAwwM7A0Mh6G8BLxqeeBaeA6QrEUigYGZKBfZSx9+9oBxxx15c8m3Bw8X7mCw62dgePYAry18aeYGjGeeGe6cnZdweOYZhuSZDQzpBni18DCYSf9tO8y44XaOwWHeNoZkgwMMaRL4tbB/k2BsO2y/4eYZorXwmIG0JG64wQPWYkdYyxmeMgnGM4eTd/YAHTazTSJBspmAX9h72LdJMO44bLud/Yzx58I2G3t+9p60B/i0gAFjAwMD1GCJxAZmnjSCOpC1MNgDbT5GWMsoGAWjYBSMJAAAUT1Hxjaln/QAAAAASUVORK5CYII=","orcid":"","institution":"Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan","correspondingAuthor":true,"prefix":"","firstName":"Phap","middleName":"Tran Ngoc","lastName":"Hoang","suffix":""},{"id":557373112,"identity":"5070500b-cccc-472c-8e57-ed7477382497","order_by":1,"name":"Thien Vu","email":"","orcid":"","institution":"Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan","correspondingAuthor":false,"prefix":"","firstName":"Thien","middleName":"","lastName":"Vu","suffix":""},{"id":557373113,"identity":"da7df857-d80b-4415-bba8-eae13403d556","order_by":2,"name":"Research Dawadi","email":"","orcid":"","institution":"Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan","correspondingAuthor":false,"prefix":"","firstName":"Research","middleName":"","lastName":"Dawadi","suffix":""},{"id":557373114,"identity":"bdee4312-55b6-4479-8667-05b5137dc760","order_by":3,"name":"Bao Tran Ngoc Hoang","email":"","orcid":"","institution":"3Department of Diagnostic Imaging, Children’s Hospital 2, 14 Ly Tu Trong, Ben Nghe Ward, District 1, Ho Chi Minh, Viet Nam","correspondingAuthor":false,"prefix":"","firstName":"Bao","middleName":"Tran Ngoc","lastName":"Hoang","suffix":""},{"id":557373115,"identity":"c05803ba-b156-453e-b07d-912c070c0b69","order_by":4,"name":"Natusme-Kitatani Yayoi","email":"","orcid":"","institution":"Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan","correspondingAuthor":false,"prefix":"","firstName":"Natusme-Kitatani","middleName":"","lastName":"Yayoi","suffix":""}],"badges":[],"createdAt":"2025-12-09 05:30:45","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":true,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-8313326/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8313326/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97873575,"identity":"f919cf32-4caf-44fa-ac3f-626883191731","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6505466,"visible":true,"origin":"","legend":"","description":"","filename":"DFUPaperBMCFormatNoTrackReview.docx","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/e32e8795fe93744e9f913de4.docx"},{"id":97873562,"identity":"d32e774e-2ce8-4ddd-b6c3-c3ca4ece1eaf","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs8313326.json","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/9dbfae8071aabe4c31fb2fb6.json"},{"id":97873566,"identity":"ab50d47c-8f4e-4728-b324-a5ca945f7959","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":105788,"visible":true,"origin":"","legend":"","description":"","filename":"rs83133260enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/b6f447b10f225053a42c84f8.xml"},{"id":97900101,"identity":"8db31042-bc88-4fe2-acfb-38364b9e81c5","added_by":"auto","created_at":"2025-12-10 15:45:13","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":190081,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/7a2c85905932c99ffbc09681.png"},{"id":97900940,"identity":"3abdb1dc-ab15-41a0-a80a-2b77db0928c4","added_by":"auto","created_at":"2025-12-10 15:46:09","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":279317,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/9c8edf9c1c55ee8205313575.png"},{"id":97898460,"identity":"d601a097-6cbb-4b57-99c1-32749583de34","added_by":"auto","created_at":"2025-12-10 15:39:11","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":154894,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/6fa1423cfb13ff95165c8639.png"},{"id":97873580,"identity":"7728c552-49c2-4121-815c-0af42122cca9","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":100390,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/5a481db97813a174ca8ce5a2.png"},{"id":97900921,"identity":"0ef23808-10c2-412f-8a8c-0a3bb51fa1da","added_by":"auto","created_at":"2025-12-10 15:46:05","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":180633,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/0562f09b6d1ae80805579c92.png"},{"id":97900216,"identity":"365a344c-cfd6-4079-8e24-d46834cbeba9","added_by":"auto","created_at":"2025-12-10 15:45:18","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":335356,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/ed8c3c5a7fffeea48ec20a8b.png"},{"id":97873587,"identity":"cf1c7881-c737-4c1f-86a4-f7bf45c92e97","added_by":"auto","created_at":"2025-12-10 10:44:41","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2296469,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/4142a6363951e00984c53393.png"},{"id":97900164,"identity":"04c62c18-a71a-476f-a3c5-29d1cf467f4d","added_by":"auto","created_at":"2025-12-10 15:45:16","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2618324,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/1b61c8c3aa0721d28afa38c4.png"},{"id":97900712,"identity":"c650792d-4e46-4bdb-bc69-0cfecbbedb61","added_by":"auto","created_at":"2025-12-10 15:45:46","extension":"jpeg","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":281858,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/745315684c9fec7d39806327.jpeg"},{"id":97900938,"identity":"adbe1a69-121e-4b1a-a848-286c709dc15b","added_by":"auto","created_at":"2025-12-10 15:46:09","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":40730,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/0ad0a7b227e60a87cec4bbe0.png"},{"id":97898511,"identity":"d4b3b117-4b9b-442d-a885-72cf75d2518b","added_by":"auto","created_at":"2025-12-10 15:39:14","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37600,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/b74de650a2d740b76b6ca717.png"},{"id":97899779,"identity":"dd987bd1-d924-43a9-9c53-f553ade92665","added_by":"auto","created_at":"2025-12-10 15:44:52","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33942,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/351bea31c61289a2cd795655.png"},{"id":97900136,"identity":"bcb1bcea-27d8-464a-b64a-6b260ad54109","added_by":"auto","created_at":"2025-12-10 15:45:16","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21115,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/b5d6e93e91772a81f942c246.png"},{"id":97900701,"identity":"253463e3-fa18-4f8d-a85b-42380e47f8ae","added_by":"auto","created_at":"2025-12-10 15:45:45","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":24361,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/93a1c14451791b5be56f45a6.png"},{"id":97873591,"identity":"3b91748f-fd29-42f8-8294-3f1567fda9f5","added_by":"auto","created_at":"2025-12-10 10:44:41","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":74303,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/5c83fb14bd7d78281490a3dc.png"},{"id":97900727,"identity":"868af2f0-500b-492c-8344-727668841fd6","added_by":"auto","created_at":"2025-12-10 15:45:47","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":255336,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/17752d24bf6fd3957841f9df.png"},{"id":97900607,"identity":"43230c26-58ff-4e4f-a758-c6292004d275","added_by":"auto","created_at":"2025-12-10 15:45:40","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":371904,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/5425098455b7a0e43419c7ff.png"},{"id":97900503,"identity":"d4e4c9d4-1f1e-483f-9f6a-15656c82c904","added_by":"auto","created_at":"2025-12-10 15:45:35","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":110197,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/72f29795ce56c91a31f70802.png"},{"id":97873590,"identity":"8b20fc51-07e1-4dc5-9fcc-005b63bba58f","added_by":"auto","created_at":"2025-12-10 10:44:41","extension":"xml","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":103574,"visible":true,"origin":"","legend":"","description":"","filename":"rs83133260structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/f12e31ea24ab86c8b9ad8ed8.xml"},{"id":97873593,"identity":"58fc223e-5791-4670-8989-eb9742197ad5","added_by":"auto","created_at":"2025-12-10 10:44:41","extension":"html","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":114347,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/037750a03f9cf23c4a2b2789.html"},{"id":97899744,"identity":"70d2a9fd-62e1-4435-8b50-03047ae68a92","added_by":"auto","created_at":"2025-12-10 15:44:52","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":190081,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eDFUC2021 class distribution showing severe class imbalance. Ischaemia represents only 3.8% of samples.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/3521bb7a0d2c696669edb0b9.png"},{"id":97873563,"identity":"291917de-a6cd-4734-a0fe-e9fd3167eb07","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":279317,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eTraining dynamics showing loss curves and validation macro-F1 across epochs. Best checkpoint at epoch 20.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/0f23aa835d17b6b2ea2c572d.png"},{"id":97899727,"identity":"aa948f6c-528f-415d-bd0b-2e7a0c2ca752","added_by":"auto","created_at":"2025-12-10 15:44:51","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":154894,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eModel performance comparison across architectures.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/06c4fc7ec149bc0f9050bd86.png"},{"id":97899551,"identity":"9634cf29-ae65-4f40-beaf-69d7f296d17e","added_by":"auto","created_at":"2025-12-10 15:44:43","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":100390,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003ePer-class performance visualization.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/ce4156993d7fe2bd1f63cc1a.png"},{"id":97900221,"identity":"5579adf8-efa1-4cbc-9267-6ec663a77b03","added_by":"auto","created_at":"2025-12-10 15:45:18","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":180633,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eROC curves for each class. Macro-AUC: 0.9834.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/5ddfeccf8032c2f95e57799d.png"},{"id":97873571,"identity":"e3970711-ad04-47d4-a1ac-4e4c8aba45a2","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":335356,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eNormalized confusion matrix showing error distribution across classes.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/48f2548bd9785f1bec1a7cbc.png"},{"id":97900181,"identity":"81404588-21f8-4ce4-ba57-2470d45cb7ec","added_by":"auto","created_at":"2025-12-10 15:45:17","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":2296469,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eRepresentative misclassifications with prediction confidence scores.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/0f900615cb7f2cd0007cbcd1.png"},{"id":97873578,"identity":"0944cc0d-8aab-40c1-92c2-7bff16eff0f8","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":2618324,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eGrad-CAM attention maps showing model focus regions for each class.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/32a46aec5cc842ed207416e7.png"},{"id":97873586,"identity":"b7e054a4-7271-4aca-8193-0f9b2bcd179e","added_by":"auto","created_at":"2025-12-10 10:44:40","extension":"jpeg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":281858,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eiOS application interface. (A) Region selection. (B) AI classification with confidence. (C) Clinical assessment. (D) Referral management.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/f090f4b1f9c43cf23e2c9746.jpeg"},{"id":98420963,"identity":"585742db-eca3-45cc-b610-fdc28e706a9a","added_by":"auto","created_at":"2025-12-17 16:21:12","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":9483321,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8313326/v1/dbb4dd0f-c2dd-423c-a3fa-35a2e5a075c5.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eVision Transformer-based diabetic foot ulcer classification for mobile deployment: development, validation, and implementation of an iOS clinical decision support tool\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Background","content":"\u003cp\u003eDiabetic foot ulcers (DFUs) represent one of the most serious complications of diabetes mellitus. They affect 15\u0026ndash;25% of diabetic patients during their lifetime, with substantial impact on quality of life and healthcare costs (\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). The clinical consequences are severe: approximately 50\u0026ndash;60% of DFUs become infected, and 15\u0026ndash;20% of moderate to severe infections result in lower extremity amputation (\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). These amputations carry devastating consequences, with 5-year mortality rates exceeding 50% post-amputation, comparable to many aggressive malignancies (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe clinical classification of DFUs requires distinguishing between several pathological states: infection (characterized by inflammatory signs, purulent discharge, and tissue destruction), ischaemia (resulting from peripheral arterial disease with manifestations including pallor, delayed capillary refill, and tissue necrosis), and combined pathology where both conditions coexist (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). However, visual differentiation between early infection and normal healing tissue presents significant challenges even for experienced clinicians, as inflammatory responses in healing wounds can mimic early infection (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e). Similarly, subtle ischaemic changes\u0026mdash;often manifesting as mild pallor, temperature differences, or delayed blanching\u0026mdash;may be overlooked in routine clinical assessments (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eMobile-based clinical decision support could help address this challenge, particularly in resource-limited settings where specialist access is constrained (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). A smartphone application capable of real-time DFU classification could assist primary care providers and community health workers in triaging patients appropriately\u0026mdash;identifying those who need urgent specialist referral versus those who can be managed with routine wound care.\u003c/p\u003e\u003cp\u003eVision Transformers (ViTs) have shown promise for medical imaging tasks (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e). Unlike convolutional neural networks that rely on local receptive fields, ViTs use self-attention to capture global dependencies across image regions (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e). We hypothesized this architecture might benefit DFU classification, where relationships between wound bed, surrounding tissue, and anatomical context inform clinical assessment.\u003c/p\u003e\u003cp\u003eIn this study, we aimed to develop and validate a DFU classification system suitable for mobile deployment. Our objectives were: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) train a Vision Transformer model on the DFUC2021 dataset with rigorous held-out evaluation; (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) validate generalizability on an independent external dataset; (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) understand which training components contribute most through ablation studies; (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) assess probability calibration for clinical decision support; and (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e) deploy the full-accuracy model in an iOS application without sacrificing performance for reduced model size.\u003c/p\u003e"},{"header":"Methods","content":"\u003ch2\u003eDatasets\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eTraining and evaluation dataset (DFUC2021):\u0026nbsp;\u003c/strong\u003eThe Diabetic Foot Ulcer Challenge 2021 dataset served as the primary development and evaluation resource. This dataset was curated by Cassidy et al. for the MICCAI 2021 Grand Challenge on diabetic foot ulcer classification (15, 16). The dataset contains 5,955 labeled DFU images distributed across four mutually exclusive categories: \u003cem\u003eNormal\u0026nbsp;\u003c/em\u003e(n=2,552, 42.9%) representing ulcers without clinical signs of infection or ischaemia; \u003cem\u003eInfection\u0026nbsp;\u003c/em\u003e(n=2,555, 42.9%) representing ulcers with clinical signs of infection including erythema, edema, warmth, purulent discharge, or malodor; \u003cem\u003eIschaemia\u0026nbsp;\u003c/em\u003e(n=227, 3.8%) representing ulcers with signs of compromised perfusion including pallor, delayed capillary refill, or necrotic tissue; and \u003cem\u003eBoth\u0026nbsp;\u003c/em\u003e(n=621, 10.4%) representing ulcers exhibiting combined infection and ischaemia pathology (Figure 1).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExternal validation dataset (DFU_Kaggle):\u0026nbsp;\u003c/strong\u003eFor external validation, we utilized the publicly available DFU_Kaggle dataset containing 1,055 images with binary labels (543 Normal, 512 Ulcer). This dataset differs from DFUC2021 in imaging protocols, patient demographics, and annotation criteria. Notably, the \u0026quot;Ulcer\u0026quot; category in DFU_Kaggle encompasses various pathologies including ischemic and staging variability without fine-grained infection/ischaemia labels. This heterogeneity provides a stringent test of model generalizability across annotation schemas.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData partitioning and preprocessing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe DFUC2021 dataset was partitioned into training, validation, and test sets using stratified random sampling to maintain class proportions across splits. We employed a 70/15/15 split ratio: training set (n=4,169), validation set (n=893), and test set (n=893). Stratification ensured each split contained representative samples of all four classes, critical given the severe class imbalance. No patient-level overlap or duplicate images existed across partitions\u0026mdash;each unique image appeared in exactly one split.\u003c/p\u003e\n\u003cp\u003eThe test set (subsequently adjusted to n=894 after final sampling) was held out completely throughout model development and hyperparameter tuning. All architecture selection, hyperparameter optimization, and ablation decisions were made exclusively using training and validation sets. The test set was evaluated only once with the final selected model configuration, preventing overfitting to test data through repeated evaluation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImage Preprocessing:\u0026nbsp;\u003c/strong\u003eAll images were resized to 224\u0026times;224 pixels using bilinear interpolation to match Vision Transformer input requirements. Pixel values were normalized to [0, 1] range and standardized using ImageNet statistics (mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]) for compatibility with pretrained weights. No dataset-specific normalization was applied to preserve transferability.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Augmentation:\u0026nbsp;\u003c/strong\u003eTraining data underwent on-the-fly augmentation to increase effective training distribution coverage and reduce overfitting. The augmentation pipeline included: random horizontal flips (p=0.5), random rotations up to \u0026plusmn;15 degrees, random scale variations (0.9\u0026ndash;1.1\u0026times;), color jitter (brightness \u0026plusmn;0.2, contrast \u0026plusmn;0.2, saturation \u0026plusmn;0.2, hue \u0026plusmn;0.1), and random cropping with padding. Augmentations were applied only during training; validation and test evaluations used only preprocessing without augmentation to enable reproducible evaluation.\u003c/p\u003e\n\u003ch2\u003eModel architecture\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eVision Transformer (ViT-Small)\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe primary model employed the Vision Transformer architecture as introduced by Dosovitskiy et al.(12). We specifically utilized the ViT-Small variant containing 22 million parameters, offering a balance between model capacity and computational efficiency suitable for both research evaluation and mobile deployment.\u003c/p\u003e\n\u003cp\u003eThe ViT-Small architecture processes input images by: (1) dividing the 224\u0026times;224 image into non-overlapping 16\u0026times;16 patches, yielding 196 patch tokens; (2) linearly projecting each patch into a 384-dimensional embedding; (3) prepending a learnable [CLS] classification token; (4) adding learnable positional embeddings to preserve spatial information; (5) processing the sequence through 12 transformer encoder layers, each containing multi-head self-attention (6 heads) and feed-forward networks with GELU activation; and (6) extracting the final [CLS] token representation for classification through a linear projection to 4 output classes.\u003c/p\u003e\n\u003cp\u003eThe model was initialized with weights pretrained on ImageNet-21k (14 million images, 21,843 classes), providing robust low-level visual features and mid-level semantic representations. The final classification head was randomly initialized and trained from scratch for the 4-class DFU classification task. All parameters were fine-tuned during training (full fine-tuning rather than frozen backbone).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEfficientNet-B2\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor architectural comparison, we trained EfficientNet-B2 under identical conditions (17). EfficientNet-B2 contains 9.2 million parameters and represents state-of-the-art CNN architecture using compound scaling of depth, width, and resolution. The model was initialized with ImageNet-1k pretrained weights and fine-tuned identically to ViT-Small.\u003c/p\u003e\n\u003ch2\u003eTraining procedure\u003c/h2\u003e\n\u003cp\u003eModels were trained using the AdamW optimizer with decoupled weight decay regularization (18). The initial learning rate was set to 1\u0026times;10⁻⁴ with weight decay coefficient of 0.01. Learning rate scheduling employed cosine annealing with warm restarts, gradually decreasing the learning rate following a cosine curve to facilitate convergence to flat minima associated with improved generalization.\u003c/p\u003e\n\u003cp\u003eTraining proceeded for a maximum of 25 epochs with early stopping based on validation macro-F1. The best checkpoint was selected based on highest validation macro-F1, typically occurring around epoch 20. Batch size was set to 12 due to GPU memory constraints. Gradient clipping with max norm 1.0 prevented gradient explosion. Dropout with probability 0.1 was applied in transformer attention layers for regularization.\u003c/p\u003e\n\u003cp\u003eTo address severe class imbalance, we employed weighted cross-entropy loss with class weights inversely proportional to class frequency. Specifically, weights were computed as w_c = N / (C \u0026times; n_c), where N is total samples, C is number of classes, and n_c is samples in class c (19, 20). This weighting ensures the optimization objective equally prioritizes all classes regardless of prevalence, preventing the model from optimizing primarily for majority classes (Normal, Infection) at the expense of minority classes (Ischaemia, Both)\u003c/p\u003e\n\u003ch2\u003eEvaluation metrics\u003c/h2\u003e\n\u003cp\u003ePrimary metric was macro-F1 (unweighted mean of per-class F1 scores), which equally weights all classes regardless of prevalence (21). Secondary metrics included: macro-AUC (one-vs-rest); Cohen\u0026apos;s kappa for chance-corrected agreement (22); and per-class Brier scores for probability calibration (23, 24).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStatistical Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe computed 95% confidence intervals for macro-F1 and macro-AUC were computed using non-parametric bootstrap resampling with 1,000 iterations. Each bootstrap sample was drawn with replacement from the test set, maintaining original sample size. Percentile method (2.5th and 97.5th percentiles) determined interval bounds (25).\u003c/p\u003e\n\u003cp\u003ePairwise model comparison between ViT-Small and EfficientNet-B2 used McNemar\u0026apos;s test for paired binary outcomes (26). This test evaluates whether discordant predictions (cases where one model is correct and the other incorrect) are asymmetrically distributed. The test statistic \u0026chi;\u0026sup2; = (b - c)\u0026sup2; / (b + c), where b and c are counts of discordant predictions, follows a chi-squared distribution with 1 degree of freedom under the null hypothesis of equal accuracy.\u003c/p\u003e\n\u003cp\u003eStatistical significance was determined at \u0026alpha; = 0.05, with p \u0026lt; 0.001 considered highly significant.\u003c/p\u003e\n\u003ch2\u003eModel interpretability\u003c/h2\u003e\n\u003cp\u003eGradient-weighted Class Activation Mapping (Grad-CAM)\u003csup\u003e\u0026nbsp;\u003c/sup\u003ewas employed to visualize model attention patterns (27). Grad-CAM computes importance weights by global average pooling the gradients of the target class score with respect to feature map activations, then produces a localization map highlighting discriminative regions. For Vision Transformers, we applied Grad-CAM to the final attention layer, visualizing which image patches receive highest attention for each classification decision.\u003c/p\u003e\n\u003ch2\u003eMobile deployment\u003c/h2\u003e\n\u003cp\u003eThe trained ViT-Small model was converted to CoreML format (.mlpackage) for iOS deployment using coremltools\u003csup\u003e\u0026nbsp;\u003c/sup\u003e. The conversion pipeline included: (1) tracing the PyTorch model with representative input; (2) converting to CoreML using the neural network converter with FP16 precision for size optimization; (3) validating numerical consistency between PyTorch and CoreML predictions. The final CoreML package size was 41 MB.\u003c/p\u003e\n\u003cp\u003eThe iOS application was developed in Swift using SwiftUI for the user interface and Vision framework for camera integration. On-device inference utilizes the Apple Neural Engine when available, falling back to GPU computation on devices without dedicated neural hardware. The application architecture follows offline-first principles, performing all inference locally without network connectivity requirements.\u003c/p\u003e"},{"header":"Results","content":"\u003ch2\u003eTraining Dynamics and Model Convergence\u003c/h2\u003e\n\u003cp\u003eThe ViT-Small model was trained for 25 epochs with early stopping based on validation macro-F1 score. Figure 2 illustrates the training dynamics, showing the evolution of training loss, validation loss, and validation macro-F1 across training epochs. The model demonstrated stable convergence with training loss decreasing monotonically from an initial value of approximately 1.2 to below 0.1. Validation loss exhibited typical behavior with initial decrease followed by stabilization around epoch 15, indicating the model reached generalization capacity without substantial overfitting. The validation macro-F1 score increased progressively, reaching optimal performance at epoch 20. The best checkpoint, selected based on maximum validation macro-F1, was retained for all subsequent evaluations. The gap between training and validation metrics remained modest throughout training, suggesting that the regularization strategies (weight decay, dropout, and data augmentation) effectively controlled overfitting despite the relatively small dataset size.\u003c/p\u003e\n\u003ch2\u003eClassification performance\u003c/h2\u003e\n\u003cp\u003eTable 1 summarizes performance on the held-out test set (n=894). The ViT-Small achieved macro-F1 of 0.9015 (95% CI: 0.871\u0026ndash;0.926) and macro-AUC of 0.9834 (95% CI: 0.977\u0026ndash;0.989). Cohen\u0026apos;s kappa of 0.8356 indicates substantial agreement between predictions and ground truth. The overall Brier score of 0.1519 suggests reasonable probability calibration.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 1. Classification performance on held-out test set (n=894). 95% CIs from 1,000 bootstrap resamples.\u003c/em\u003e\u003c/p\u003e\n\u003cdiv align=\"\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMetric\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eValue\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e\u003cstrong\u003e95% CI\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003eMacro-F1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e0.9015\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e0.871\u0026ndash;0.926\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003eMacro-AUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e0.9834\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e0.977\u0026ndash;0.989\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003eCohen\u0026apos;s \u0026kappa;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e0.8356\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e\u0026mdash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003eBrier Score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e0.1519\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 33.3333%;\"\u003e\n \u003cp\u003e\u0026mdash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003ch2\u003eArchitecture comparison\u003c/h2\u003e\n\u003cp\u003eTo rigorously establish the superiority of ViT-Small over alternative architectures, we performed McNemar\u0026apos;s test comparing paired predictions between ViT-Small and EfficientNet-B2 (trained under identical conditions). McNemar\u0026apos;s test comparing ViT-Small and EfficientNet-B2 (trained identically) yielded \u0026chi;\u0026sup2;=13.26, p=2.71\u0026times;10⁻⁴, confirming statistically significant superiority of the Vision Transformer architecture. ViT-Small outperformed EfficientNet-B2 by 4.4 percentage points in macro-F1 (0.9015 vs. 0.8574). This statistical significance, combined with the non-overlapping bootstrap confidence intervals, provides strong evidence that the performance improvement reflects true architectural advantages rather than random variation.\u003c/p\u003e\n\u003ch2\u003ePer-class performance\u003c/h2\u003e\n\u003cp\u003eTable 2 presents detailed per-class performance metrics, revealing consistent high performance across all four classification categories despite severe class imbalance. For the Normal class (n=382 test samples, 42.7%), the model achieved precision of 88.4% and recall of 91.1%, yielding an F1 score of 89.7%. The high recall indicates the model rarely misses normal cases, which is important for avoiding unnecessary treatment escalation. For the Infection class (n=384 test samples, 43.0%), precision of 90.4% and recall of 88.0% produced an F1 score of 89.2%. The slightly lower recall compared to Normal reflects the inherent difficulty distinguishing early infection from inflammatory healing responses.\u003c/p\u003e\n\u003cp\u003eNotably, the model achieved strong performance on minority classes despite their limited representation. For Ischaemia (n=34 test samples, 3.8%), precision and recall both reached 88.2%, yielding F1 of 88.2% and the highest class-specific AUC of 0.993. This exceptional discriminative ability for ischaemia detection is clinically significant, as ischaemic ulcers require urgent vascular intervention and delayed identification substantially reduces limb salvage success rates. For the Both category (combined infection and ischaemia, n=94 test samples, 10.5%), the model achieved the highest per-class metrics with precision 94.5%, recall 92.5%, F1 93.5%, and AUC 0.994. These combined pathology cases, representing the most severe clinical presentations, were classified with outstanding accuracy.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 2. Per-class classification metrics on held-out test set.\u003c/em\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eClass\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eF1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003eNormal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e88.4%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e91.1%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e89.7%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e0.974\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003eInfection\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e90.4%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e88.0%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e89.2%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e0.972\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003eIschaemia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e88.2%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e88.2%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e88.2%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e0.993\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003eBoth\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e94.5%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e92.5%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e93.5%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e0.994\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch2\u003eConfusion Matrix and Error analysis\u003c/h2\u003e\n\u003cp\u003eFigure 6 presents the normalized confusion matrix, revealing the distribution of classification errors across categories. Of 894 test samples, the model correctly classified 803 (89.8%) with only 91 errors (10.2% error rate). The confusion matrix reveals that Normal\u0026harr;Infection misclassification accounts for approximately 80% of all errors (73/91 cases), reflecting the inherent visual similarity between early infection and normal healing tissue\u0026mdash;a challenge that confronts human clinicians as well. Specifically, 31 Normal cases were misclassified as Infection, while 42 Infection cases were misclassified as Normal. This bidirectional confusion pattern suggests visual overlap between inflammatory healing responses and early infectious changes.\u003c/p\u003e\n\u003cp\u003eTable 3 provides detailed error distribution analysis. Importantly, Ischaemia errors were minimal despite severe class imbalance\u0026mdash;only 4 of 34 ischaemic cases (11.8%) were misclassified, with 3 confused as Normal and 1 as Infection. Given that ischaemia detection is clinically critical for triggering vascular intervention, this low error rate for minority-class ischaemia is particularly encouraging. The Both category showed the lowest error rate at 7.5% (7/94), with most errors (5 cases) involving misclassification as Infection (correctly identifying at least one pathology component).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 3. Error distribution (91 total errors, 10.2% overall).\u003c/em\u003e\u003c/p\u003e\n\u003cdiv align=\"\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.625%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTrue Class\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eErrors (Rate)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 44.375%;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrimary Confusion\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.625%;\"\u003e\n \u003cp\u003eNormal (n=382)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e34 (8.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 44.375%;\"\u003e\n \u003cp\u003e31 \u0026rarr; Infection\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.625%;\"\u003e\n \u003cp\u003eInfection (n=384)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e46 (12.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 44.375%;\"\u003e\n \u003cp\u003e42 \u0026rarr; Normal\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.625%;\"\u003e\n \u003cp\u003eIschaemia (n=34)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e4 (11.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 44.375%;\"\u003e\n \u003cp\u003e3 \u0026rarr; Normal\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.625%;\"\u003e\n \u003cp\u003eBoth (n=94)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e7 (7.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 44.375%;\"\u003e\n \u003cp\u003e5 \u0026rarr; Infection\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eFigure 7 presents exemplar misclassifications showing the three highest-confidence errors for each class. Visual inspection reveals that misclassified cases often exhibit ambiguous features that could reasonably support multiple interpretations\u0026mdash;cases at the boundary between Normal healing and early Infection display subtle inflammatory changes that challenge even expert annotation. These edge cases represent inherent diagnostic uncertainty rather than systematic model failure.\u003c/p\u003e\n\u003ch2\u003eCalibration analysis\u003c/h2\u003e\n\u003cp\u003eWe assessed model calibration using Brier scores and reliability diagrams. The Brier score measures the mean squared error between predicted probabilities and actual outcomes, with lower scores indicating better calibration (perfect calibration = 0, random guessing = 0.25 for binary classification).\u003c/p\u003e\n\u003cp\u003eTable 4 presents per-class Brier scores. The overall Brier score of 0.1519 indicates reasonable calibration suitable for clinical decision support. Notably, minority classes showed excellent calibration: Ischaemia achieved a Brier score of 0.0081 and Both achieved 0.0106, indicating highly reliable probability estimates for these critical categories. The majority classes (Normal: 0.0656, Infection: 0.0674) showed moderately higher Brier scores, reflecting the greater uncertainty in distinguishing these visually similar categories.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 4. Brier scores by class (lower indicates better calibration).\u003c/em\u003e\u003c/p\u003e\n\u003cdiv align=\"\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eClass\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eBrier Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eClass\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eBrier Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003eNormal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e0.0656\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003eIschaemia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e0.0081\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003eInfection\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e0.0674\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003eBoth\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 25%;\"\u003e\n \u003cp\u003e0.0106\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eFor clinical deployment scenarios where a probability threshold (e.g., 0.5) triggers urgent specialist referral, these calibration results suggest reliable decision support. When the model predicts infection or ischaemia with high confidence, clinicians can trust these predictions warrant immediate attention. Temperature scaling or Platt calibration may further refine probability estimates for institution-specific threshold optimization.\u003c/p\u003e\n\u003ch2\u003eModel interpretability\u003c/h2\u003e\n\u003cp\u003eTo provide clinical transparency and support model validation, we generated Gradient-weighted Class Activation Mapping (Grad-CAM) attention maps visualizing regions the model prioritizes for each classification decision. Figure 8 presents representative attention maps for correctly classified cases across all four categories.\u003c/p\u003e\n\u003cp\u003eThe attention patterns demonstrate clinically interpretable behavior: for Infection cases, attention concentrates on regions showing erythema, swelling, and purulent discharge; for Ischaemia cases, attention focuses on pallid tissue, eschar, and areas suggesting compromised perfusion; for Normal cases, attention distributes more broadly across healthy granulation tissue; for Both cases, attention appropriately spans regions exhibiting both inflammatory and ischaemic features. While these attention patterns align qualitatively with clinical diagnostic criteria, we acknowledge that quantitative evaluation of interpretability reliability and consistency across similar cases remains an area for future investigation.\u003c/p\u003e\n\u003ch2\u003eExternal validation\u003c/h2\u003e\n\u003cp\u003eTo assess cross-dataset generalizability\u0026mdash;a critical requirement for clinical translation\u0026mdash;we evaluated the trained model on the independent DFU_Kaggle dataset (n=1,055 images). This external dataset differs from DFUC2021 in image acquisition protocols, patient demographics, and annotation procedures, providing a stringent test of model robustness to distribution shift.\u003c/p\u003e\n\u003cp\u003eBecause DFU_Kaggle provides only binary labels (Ulcer vs. Normal) without fine-grained infection/ischaemia annotations, we performed binary classification by mapping DFUC2021 classes accordingly: Normal \u0026rarr; Normal; Infection, Ischaemia, Both \u0026rarr; Ulcer. Table 5 presents external validation results.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 5. External validation on DFU_Kaggle (n=1,055). ROC-AUC: 0.951.\u003c/em\u003e\u003c/p\u003e\n\u003cdiv align=\"\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eClass\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eF1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e\u003cstrong\u003en\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003eNormal\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e71%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e99%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e83%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e543\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003eUlcer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e99%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e57%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e73%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20%;\"\u003e\n \u003cp\u003e512\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eThe external validation achieved ROC-AUC of 0.951, demonstrating excellent discriminative ability that transfers across datasets despite distribution differences. The asymmetric precision-recall pattern (Normal: 71% precision / 99% recall; Ulcer: 99% precision / 57% recall) indicates the model is conservative in ulcer detection on this external dataset, showing very high specificity but moderate sensitivity. This pattern partially reflects annotation differences\u0026mdash;DFU_Kaggle\u0026apos;s \u0026quot;ulcer\u0026quot; category includes ischaemic and staging variability that may not align with DFUC2021\u0026apos;s multi-class definitions. The high ROC-AUC despite this label heterogeneity suggests robust learned representations. Domain adaptation techniques may further improve cross-institutional performance by explicitly accounting for annotation protocol differences.\u003c/p\u003e\n\u003ch2\u003eAblation studies\u003c/h2\u003e\n\u003cp\u003eTo quantify the contribution of each training component to overall performance, we conducted systematic ablation experiments removing individual components while maintaining all others constant. Table 6 presents ablation study results.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 6. Ablation study results showing component contributions.\u003c/em\u003e\u003c/p\u003e\n\u003cdiv align=\"\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 39.8876%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eConfiguration\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMacro-F1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMacro-AUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026Delta; F1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 39.8876%;\"\u003e\n \u003cp\u003eViT-Small (full)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.9015\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.9834\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e\u0026mdash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 39.8876%;\"\u003e\n \u003cp\u003eWithout weighted loss\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.8723\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.9712\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e-2.92%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 39.8876%;\"\u003e\n \u003cp\u003eWithout augmentation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.8856\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.9789\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e-1.59%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 39.8876%;\"\u003e\n \u003cp\u003eEfficientNet-B2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.8574\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e0.9644\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 20.0375%;\"\u003e\n \u003cp\u003e-4.41%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eWeighted Cross-Entropy Loss (+2.9% macro-F1):\u0026nbsp;\u003c/strong\u003eRemoving class-balanced loss weighting reduced macro-F1 from 0.9015 to 0.8723. This 2.9 percentage point degradation primarily manifests in minority class performance\u0026mdash;without weighting, the model optimizes predominantly for majority classes (Normal, Infection), sacrificing Ischaemia and Both detection. The weighted loss function, which assigns inverse class frequency weights, ensures the optimization objective balances all classes appropriately.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Augmentation (+1.6% macro-F1):\u0026nbsp;\u003c/strong\u003eRemoving augmentation (random rotations, flips, color jitter, scale variations) reduced macro-F1 from 0.9015 to 0.8856. This 1.6 percentage point improvement demonstrates that augmentation effectively expands training distribution coverage, reducing overfitting and improving generalization to test-time variations in image orientation, lighting, and scale.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eVision Transformer Architecture (+4.4% over CNN):\u0026nbsp;\u003c/strong\u003eComparing ViT-Small against EfficientNet-B2 trained under identical conditions (same loss function, augmentation, optimizer, and training duration) reveals a 4.4 percentage point advantage for the transformer architecture (0.9015 vs. 0.8574 macro-F1). This architectural advantage likely reflects ViT\u0026apos;s global attention mechanism, which captures long-range spatial dependencies and contextual relationships between wound regions that local convolution operations may miss.\u003c/p\u003e\n\u003ch2\u003eMobile application\u003c/h2\u003e\n\u003cp\u003eWe deployed the same ViT-Small model\u0026mdash;without compression or distillation\u0026mdash;in an iOS application (Figure 9). We deliberately chose to preserve full accuracy (90.15% macro-F1) rather than reduce model size. While EfficientNet-B0 would be smaller (7.8 MB vs. 41 MB), its accuracy is substantially lower (83.46% macro-F1). For clinical applications where classification errors have real consequences, we believe accuracy takes priority over storage (11).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eTable 7. Mobile application specifications (tested on iPhone 12 Pro).\u003c/em\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" class=\"fr-table-selection-hover\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSpecification\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eValue\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNotes\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eModel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eViT-Small\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003e22M parameters\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eModel size\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e41 MB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003eCoreML .mlpackage\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eInference time\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e50\u0026ndash;80 ms\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003eApple Neural Engine\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eFrame rate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e12\u0026ndash;20 FPS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003eEvery 10th frame\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eMemory\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e200\u0026ndash;300 MB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003ePeak usage\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 30.7692%;\"\u003e\n \u003cp\u003e90.15%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 38.4615%;\"\u003e\n \u003cp\u003eIdentical to research model\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe application includes clinical workflow features: structured assessment forms aligned with IWGDF guidelines (9, 10), patient management, referral generation with AI-derived urgency prioritization, and offline-first architecture for connectivity-limited settings.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study demonstrates a complete pipeline from model development through external validation to mobile deployment for DFU classification. Using the publicly available DFUC2021 dataset, we trained a Vision Transformer achieving 90.15% macro-F1 on held-out evaluation, validated generalizability on an external dataset (ROC-AUC 0.951), and deployed the full-accuracy model in an iOS application.\u003c/p\u003e\u003cp\u003eOur results compare favorably with prior work on automated DFU analysis. Previous deep learning approaches for DFU classification have reported varying performance levels depending on the task and dataset. Goyal et al. achieved 80% accuracy using CNNs for DFU detection (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e), while Alzubaidi et al. reported F1 scores ranging from 0.72 to 0.85 for wound segmentation and classification tasks (\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e). The DFUC2021 challenge, which used the same dataset as our study, saw submissions achieving macro-F1 scores up to 0.63 using ensemble approaches (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e). Our single ViT-Small model achieving 0.90 macro-F1 suggests that transformer architectures may offer advantages for this task, though direct comparison is limited by differences in evaluation protocols. Notably, few prior studies have addressed the complete pipeline from model development through external validation to mobile deployment, which we consider essential for clinical translation.\u003c/p\u003e\u003cp\u003eSeveral findings have practical implications for DFU classifier development. First, addressing class imbalance is essential. The DFUC2021 dataset reflects real-world epidemiology where ischaemia represents only 3.8% of cases (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). Without weighted loss, our model essentially ignored this minority class. Since ischaemia detection triggers vascular intervention that may prevent amputation, any practical DFU classifier must handle imbalance explicitly (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eSecond, the Vision Transformer architecture appears well-suited to this task. Under identical training conditions, ViT-Small outperformed EfficientNet-B2 by 4.4 percentage points. We hypothesize this reflects the global attention mechanism's ability to integrate information across the entire wound image(\u003cspan additionalcitationids=\"CR13\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e). Clinical DFU assessment involves relating wound bed, surrounding tissue, and anatomical context\u0026mdash;relationships that self-attention captures directly.\u003c/p\u003e\u003cp\u003eThird, the dominant error pattern\u0026mdash;Normal\u0026harr;Infection confusion\u0026mdash;reflects a genuine clinical challenge. Early infection and normal healing both involve inflammatory responses, and distinguishing them visually is difficult even for specialists (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e). This suggests we may be approaching ceiling performance for single-image visual classification. Further improvement might require additional modalities: thermal imaging, patient history, laboratory markers, or longitudinal imaging.\u003c/p\u003e\u003cp\u003eExternal validation showed strong ranking ability (AUC 0.951) despite distribution shift, suggesting learned representations generalize across datasets (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e). However, the conservative behavior (high specificity, moderate sensitivity) indicates threshold calibration would be needed for deployment in different clinical settings (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eFor mobile deployment, we prioritized accuracy over model size. While smaller models would reduce storage requirements, the accuracy gap is substantial for medical applications (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). Modern smartphones handle 41 MB models without difficulty, and 50\u0026ndash;80 ms inference is sufficient for real-time use.\u003c/p\u003e\u003cp\u003eThis study has several limitations. First, training data comes from a single publicly available dataset with specific imaging protocols and likely limited geographic diversity, potentially limiting generalizability to other clinical settings and patient populations. Second, external validation used a dataset with only binary labels, preventing direct assessment of multi-class generalization to infection and ischaemia categories. Third, we did not perform temporal or longitudinal validation to assess model stability across different time periods or wound progression stages. Fourth, we did not compare model performance against clinician diagnostic accuracy; such comparison would be valuable for understanding the clinical utility of AI-assisted diagnosis. Fifth, the interpretability analysis is qualitative\u0026mdash;formal validation of attention patterns against expert annotations remains future work. Finally, and most importantly, prospective clinical validation with patient outcome tracking is needed before deployment to establish that improved classification accuracy translates to better clinical decisions and patient outcomes.\u003c/p\u003e\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e\u003ch2\u003eFuture Work\u003c/h2\u003e\u003cp\u003eThe logical next step for this work is prospective clinical validation comparing AI-assisted diagnosis against clinician performance. We plan to conduct studies where primary care providers assess DFU images both with and without AI support, measuring diagnostic accuracy, confidence, and time-to-decision against specialist ground truth. Such head-to-head comparison would establish whether the system genuinely augments clinical judgment or merely provides redundant information. Randomized controlled trials could then evaluate downstream patient outcomes\u0026mdash;time-to-appropriate-treatment, ulcer healing rates, amputation incidence, and healthcare utilization\u0026mdash;to determine whether improved classification accuracy translates to meaningful clinical benefit.\u003c/p\u003e\u003cp\u003eBeyond diagnostic support, we envision developing an integrated referral system connecting different levels of care. In this model, community health workers or primary care providers would capture wound images using the mobile application, receive AI-generated classification and urgency scores, and transmit structured referrals to specialists when indicated. The system would facilitate triage by prioritizing patients with detected ischaemia or combined pathology for urgent vascular consultation, while routing uncomplicated infected ulcers to appropriate antimicrobial management. Such a tiered referral pathway could be particularly valuable in resource-limited settings where specialist access is constrained, enabling earlier intervention for high-risk cases while reducing unnecessary referrals for routine wounds.\u003c/p\u003e\u003cp\u003eTo support reliable deployment across diverse clinical environments, additional technical development is needed. Multi-center validation across different geographic regions, patient populations, and imaging equipment would identify domain shifts requiring adaptation. Standardized annotation protocols developed collaboratively across institutions would reduce inter-rater variability and enable consistent model updates. Post-hoc calibration methods could optimize probability thresholds for institution-specific referral criteria. Ultimately, the goal is a robust clinical decision support system that assists\u0026mdash;rather than replaces\u0026mdash;clinician judgment, providing reliable second opinions that improve care quality particularly where specialist expertise is scarce.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusions","content":"\u003cp\u003eWe developed and validated a Vision Transformer-based DFU classification system achieving 90.15% macro-F1, with strong performance on minority classes critical for clinical decision-making. External validation confirmed cross-dataset generalizability. By deploying the identical full-accuracy model in an iOS application without compression, the system maintains research-grade performance for point-of-care clinical use. This work demonstrates a pathway from deep learning development through rigorous validation to practical mobile deployment for medical imaging applications.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eDFU: Diabetic foot ulcer\u003c/p\u003e\n\u003cp\u003eViT: Vision Transformer\u003c/p\u003e\n\u003cp\u003eCNN: Convolutional neural network\u003c/p\u003e\n\u003cp\u003eAUC: Area under the receiver operating characteristic curve\u003c/p\u003e\n\u003cp\u003eCI: Confidence interval\u003c/p\u003e\n\u003cp\u003eIWGDF: International Working Group on the Diabetic Foot\u003c/p\u003e\n\u003cp\u003eFPS: Frames per second\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAcknowledgements\u003c/h2\u003e\n\u003cp\u003eThe author thanks the creators of the DFUC2021 and DFU_Kaggle datasets for making their data publicly available for research.\u003c/p\u003e\n\u003ch2\u003eAuthors\u0026apos; contributions\u003c/h2\u003e\n\u003cp\u003eP.T.N.H conceived and designed the study, developed and trained the deep learning models, performed all experiments and statistical analyses, and wrote the manuscript. P.T.N.H, B.T.N.H: developed the iOS mobile application. Supervision: N.Y. Reviewing and editing: P.T.N.H, T.V, R.D, B.T.N.H, N.Y. All authors critically revised and approved the final version of the manuscript.\u003c/p\u003e\n\u003ch2\u003eFunding\u003c/h2\u003e\n\u003cp\u003eThis research received no specific funding from any agency in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\n\u003ch2\u003eData availability\u003c/h2\u003e\n\u003cp\u003eThe DFUC2021 dataset is publicly available from https://dfu-challenge.github.io/. The DFU_Kaggle dataset is available at https://www.Kaggle.com/datasets/laithjj/diabetic-foot-ulcer-dfu. Training code, model weights, and iOS application source code are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003ch2\u003eEthics approval and consent to participate\u003c/h2\u003e\n\u003cp\u003eThis study used publicly available, anonymized datasets (DFUC2021 and DFU_Kaggle). No human subjects were directly involved, and no new data collection was performed. As retrospective analysis of publicly available, anonymized data, institutional review board approval was not required.\u003c/p\u003e\n\u003ch2\u003eConsent for publication\u003c/h2\u003e\n\u003cp\u003eNot applicable. This study used only publicly available anonymized datasets.\u003c/p\u003e\n\u003ch2\u003eRelevant guidelines and regulations\u003c/h2\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003ch2\u003eCompeting interests\u003c/h2\u003e\n\u003cp\u003eThe author declares no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor detail\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e1\u003c/sup\u003eLaboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan.\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e2\u003c/sup\u003eInstitute of Medical and Pharmaceutical Education, Thu Dau Mot University, 06 Tran Van On, Phu Hoa Ward, Thu Dau Mot, Binh Duong, Vietnam.\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e3\u003c/sup\u003eDepartment of Diagnostic Imaging, Children\u0026rsquo;s Hospital 2, 14 Ly Tu Trong, Ben Nghe Ward, District 1, Ho Chi Minh, Viet Nam\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eArmstrong David G, Boulton Andrew JM, Bus Sicco A. Diabetic Foot Ulcers and Their Recurrence. New England Journal of Medicine. 2017;376(24):2367\u0026ndash;75.10.1056/NEJMra1615439\u003c/li\u003e\n\u003cli\u003eMcDermott K, Fang M, Boulton AJM, Selvin E, Hicks CW. Etiology, Epidemiology, and Disparities in the Burden of Diabetic Foot Ulcers. Diabetes Care. 2022;46(1):209\u0026ndash;21.10.2337/dci22-0043\u003c/li\u003e\n\u003cli\u003eSingh N, Armstrong DG, Lipsky BA. Preventing Foot Ulcers in Patients With Diabetes. JAMA. 2005;293(2):217\u0026ndash;28.10.1001/jama.293.2.217\u003c/li\u003e\n\u003cli\u003eLipsky BA, Senneville \u0026Eacute;, Abbas ZG, Arag\u0026oacute;n-S\u0026aacute;nchez J, Diggle M, Embil JM, et al. Guidelines on the diagnosis and treatment of foot infection in persons with diabetes (IWGDF 2019 update). Diabetes/Metabolism Research and Reviews. 2020;36(S1):e3280.https://doi.org/10.1002/dmrr.3280\u003c/li\u003e\n\u003cli\u003eEdmonds M, Manu C, Vas P. The current burden of diabetic foot disease. Journal of Clinical Orthopaedics and Trauma. 2021;17:88\u0026ndash;93.https://doi.org/10.1016/j.jcot.2021.01.017\u003c/li\u003e\n\u003cli\u003eArmstrong DG, Tan TW, Boulton AJM, Bus SA. Diabetic Foot Ulcers: A Review. JAMA. 2023;330(1):62\u0026ndash;75.10.1001/jama.2023.10578\u003c/li\u003e\n\u003cli\u003eMills JL, Conte MS, Armstrong DG, Pomposelli FB, Schanzer A, Sidawy AN, et al. The Society for Vascular Surgery Lower Extremity Threatened Limb Classification System: Risk stratification based on Wound, Ischemia, and foot Infection (WIfI). Journal of Vascular Surgery. 2014;59(1):220\u0026ndash;34.e2.https://doi.org/10.1016/j.jvs.2013.08.003\u003c/li\u003e\n\u003cli\u003eConte MS, Bradbury AW, Kolh P, White JV, Dick F, Fitridge R, et al. Global vascular guidelines on the management of chronic limb-threatening ischemia. Journal of Vascular Surgery. 2019;69(6, Supplement):3S\u0026ndash;125S.e40.https://doi.org/10.1016/j.jvs.2019.02.016\u003c/li\u003e\n\u003cli\u003eBus SA, Lavery LA, Monteiro-Soares M, Rasmussen A, Raspovic A, Sacco ICN, et al. Guidelines on the prevention of foot ulcers in persons with diabetes (IWGDF 2019 update). Diabetes/Metabolism Research and Reviews. 2020;36(S1):e3269.https://doi.org/10.1002/dmrr.3269\u003c/li\u003e\n\u003cli\u003eSchaper NC, van Netten JJ, Apelqvist J, Bus SA, Hinchliffe RJ, Lipsky BA, et al. Practical Guidelines on the prevention and management of diabetic foot disease (IWGDF 2019 update). Diabetes/Metabolism Research and Reviews. 2020;36(S1):e3266.https://doi.org/10.1002/dmrr.3266\u003c/li\u003e\n\u003cli\u003eKelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine. 2019;17(1):195.10.1186/s12916-019-1426-2\u003c/li\u003e\n\u003cli\u003eDosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020\u003c/li\u003e\n\u003cli\u003eShamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, et al. Transformers in medical imaging: A survey. Medical Image Analysis. 2023;88:102802.https://doi.org/10.1016/j.media.2023.102802\u003c/li\u003e\n\u003cli\u003eVaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30\u003c/li\u003e\n\u003cli\u003eYap MH, Cassidy B, Pappachan JM, O\u0026apos;Shea C, Gillespie D, Reeves ND. Analysis Towards Classification of Infection and Ischaemia of Diabetic Foot Ulcers. 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). 2021:1\u0026ndash;4\u003c/li\u003e\n\u003cli\u003eCassidy B, Kendrick C, Reeves ND, Pappachan JM, O\u0026rsquo;Shea C, Armstrong DG, et al., editors. Diabetic Foot Ulcer Grand Challenge 2021: Evaluation and Summary2022; Cham: Springer International Publishing.\u003c/li\u003e\n\u003cli\u003eTan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Kamalika C, Ruslan S, editors. Proceedings of the 36th International Conference on Machine Learning; Proceedings of Machine Learning Research: PMLR; 2019. p. 6105\u0026ndash;\u0026ndash;14.\u003c/li\u003e\n\u003cli\u003eLoshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint arXiv:171105101. 2017\u003c/li\u003e\n\u003cli\u003eLin TY, Goyal P, Girshick R, He K, Doll\u0026aacute;r P, editors. Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV); 2017 22\u0026ndash;29 Oct. 2017.\u003c/li\u003e\n\u003cli\u003eJohnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. Journal of Big Data. 2019;6(1):27.10.1186/s40537-019-0192-5\u003c/li\u003e\n\u003cli\u003eJapkowicz N, Shah M. Evaluating Learning Algorithms: A Classification Perspective. Cambridge: Cambridge University Press; 2011.\u003c/li\u003e\n\u003cli\u003eCohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20(1):37\u0026ndash;46.10.1177/001316446002000104\u003c/li\u003e\n\u003cli\u003eBRIER GW. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY. Monthly Weather Review. 1950;78(1):1\u0026ndash;3.https://doi.org/10.1175/1520-0493(1950)078\u0026lt;0001:VOFEIT\u0026gt;2.0.CO;2\u003c/li\u003e\n\u003cli\u003eGuo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning - Volume 70; Sydney, NSW, Australia: JMLR.org; 2017. p. 1321\u0026ndash;30.\u003c/li\u003e\n\u003cli\u003eEfron B, Tibshirani RJ. An introduction to the bootstrap: Chapman and Hall/CRC; 1994.\u003c/li\u003e\n\u003cli\u003eMcNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12(2):153\u0026ndash;7.10.1007/BF02295996\u003c/li\u003e\n\u003cli\u003eSelvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D, editors. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV); 2017 22\u0026ndash;29 Oct. 2017.\u003c/li\u003e\n\u003cli\u003eGoyal M, Reeves ND, Rajbhandari S, Ahmad N, Wang C, Yap MH. Recognition of ischaemia and infection in diabetic foot ulcers: Dataset and techniques. Computers in Biology and Medicine. 2020;117:103616.https://doi.org/10.1016/j.compbiomed.2020.103616\u003c/li\u003e\n\u003cli\u003eAlzubaidi L, Fadhel MA, Oleiwi SR, Al-Shamma O, Zhang J. DFU_QUTNet: diabetic foot ulcer classification using novel deep convolutional neural network. Multimedia Tools and Applications. 2020;79(21):15655\u0026ndash;77.10.1007/s11042-019-07820-w\u003c/li\u003e\n\u003cli\u003eQuionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND. Dataset Shift in Machine Learning: The MIT Press; 2009.\u003c/li\u003e\n\u003cli\u003eSubbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics. 2019;21(2):345\u0026ndash;52.10.1093/biostatistics/kxz041\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Diabetic foot ulcer, Vision Transformer, Deep learning, Mobile health, Clinical decision support, CoreML, External validation, Class imbalance","lastPublishedDoi":"10.21203/rs.3.rs-8313326/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8313326/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e\u003cp\u003eDiabetic foot ulcers (DFUs) affect 15\u0026ndash;25% of diabetic patients and frequently lead to amputation. Accurate classification into infection, ischaemia, or combined pathology guides treatment decisions, but visual assessment remains challenging. Mobile-based clinical decision support could improve triage in primary care and resource-limited settings. This study aimed to develop and validate a Vision Transformer-based DFU classification system suitable for mobile point-of-care deployment.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eWe trained a Vision Transformer (ViT-Small, 22M parameters) on the publicly available DFUC2021 dataset (n\u0026thinsp;=\u0026thinsp;5,955 images) with four classes: Normal, Infection, Ischaemia, and Both. Data were split 70/15/15 for training, validation, and held-out testing using stratified sampling. We addressed class imbalance using weighted cross-entropy loss. Performance was evaluated using macro-F1, macro-AUC, Cohen's kappa, and Brier scores with bootstrap confidence intervals. External validation was performed on the independent DFU_Kaggle dataset (n\u0026thinsp;=\u0026thinsp;1,055). Ablation studies quantified component contributions. The model was deployed as a CoreML package in an iOS application.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eThe ViT-Small achieved macro-F1 of 0.9015 (95% CI: 0.871\u0026ndash;0.926) and macro-AUC of 0.9834 on the held-out test set. Cohen's kappa was 0.836 indicating substantial agreement. All four classes achieved F1 scores above 88%, including the minority Ischaemia class (3.8% of data). External validation yielded ROC-AUC of 0.951. Ablation studies showed weighted loss contributed\u0026thinsp;+\u0026thinsp;2.9% and the ViT architecture\u0026thinsp;+\u0026thinsp;4.4% over EfficientNet-B2. The identical model was deployed as a 41 MB iOS application with 50\u0026ndash;80 ms inference time, preserving full accuracy for clinical use.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e\u003cp\u003eWe demonstrate a complete pipeline from model development through external validation to mobile deployment for DFU classification. The system achieves strong classification performance including on minority classes critical for clinical decision-making. By deploying the full-accuracy model without compression, the mobile application maintains research-grade performance for point-of-care use.\u003c/p\u003e","manuscriptTitle":"Vision Transformer-based diabetic foot ulcer classification for mobile deployment: development, validation, and implementation of an iOS clinical decision support tool","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-10 10:44:35","doi":"10.21203/rs.3.rs-8313326/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a382dc1f-f9dc-4932-9995-db74a33df055","owner":[],"postedDate":"December 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":59320439,"name":"Medical Informatics"}],"tags":[],"updatedAt":"2025-12-10T10:44:36+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-10 10:44:35","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8313326","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8313326","identity":"rs-8313326","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.