Comparison of Deep Learning Tools for Optic Nerve Axon Quantification Finds Limited Generalizability on Independent Validation

doi:10.64898/2026.03.11.710915

Comparison of Deep Learning Tools for Optic Nerve Axon Quantification Finds Limited Generalizability on Independent Validation

2026 · doi:10.64898/2026.03.11.710915

preprint OA: closed

Full text JSON View at publisher

Full text 2,231 characters · extracted from oa-doi-fallback · 4 sections · click to expand

Abstract

Purpose Machine learning approaches for automated quantification of optic nerve histology have emerged as potential tools for objective assessment of axonal injury in experimental glaucoma models. However, the generalizability of these models to independent datasets remains unclear. Guided by a scoping review of the literature, this study performed independent validation testing of publicly available models on a novel rat optic nerve dataset to assess their generalizability.

Methods

We conducted a scoping review following PRISMA-ScR guidelines. PubMed, EMBASE, Scopus, and Cochrane CENTRAL were searched from 2000 through 2025. Two reviewers independently screened records and extracted data on model characteristics and performance metrics. Additionally, we performed independent validation of three models (AxoNet, AxonDeepSeg, AxoNet 2.0) on a novel rat optic nerve dataset comprising 57 images with 9,514 manually annotated axons. Because AxonDeep is not publicly available, we instead evaluated AxonDeepSeg, a separate publicly available deep learning-based tool that, while not previously applied to optic nerve tissue, is widely used for nerve fiber segmentation.

Results

From 2,036 records, four manuscripts describing three deep learning models met inclusion criteria. Published correlation coefficients between model predictions and reference counts ranged from 0.959 to 0.99. On independent validation, performance was reduced: AxoNet 2.0 achieved the highest correlation (r = 0.89), followed by AxonDeepSeg (r = 0.86) and AxoNet (r = 0.79). Segmentation quality metrics revealed high precision (>0.94) but low recall (0.18 to 0.27), with Dice coefficients of 0.29 to 0.40, substantially below published benchmarks of 0.81.

Conclusions

Deep learning models for optic nerve histology demonstrate strong within-study performance but show meaningful performance decrements when applied to independent datasets. The observed generalizability gap (correlations 0.07 to 0.182 points below published values) demonstrates the need for standardized validation datasets and multi-center testing before widespread adoption of these tools. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00