All Models are Wrong, Some are Annotated: Automating Metadata in Biomedical Repositories

preprint OA: closed
Full text JSON View at publisher
Full text 2,864 characters · extracted from oa-doi-fallback · 4 sections · click to expand

Abstract

Objective High-quality metadata is essential for scientific discovery, yet sparse annotations in rapidly growing repositories leave many biologically relevant details uncaptured. We evaluated whether large language models (LLMs) can accurately infer ion channel and receptor subtype metadata from source code in a neuroscience repository.

Materials and methods

We extracted 5,133 model files from ModelDB. A subset of 1,100 was manually annotated; 253 were held out for testing, and the remainder split into training (80%) and validation (20%) sets. LLM-based approaches (GPT-5.2 and GPT-mini) were evaluated under zero-shot and heuristic-augmented prompting. Performance was assessed at type and subtype levels using accuracy, precision, recall, and F1 score. A feature-engineered XGBoost model using text- and simulation-derived features served as a baseline.

Results

LLMs outperformed the XGBoost baseline. At the type level, GPT-mini with heuristic augmentation achieved the highest performance (accuracy 96.0%, F1 0.962). At the subtype level, both GPT-5.2+heuristics and GPT-mini+heuristics achieved identical accuracy (88.1%), with GPT-5.2+heuristics achieving the highest F1(0.878). Model outputs were consistent across runs and errors confined to related mechanistic families.

Discussion

and Conclusion LLMs demonstrate strong potential for metadata annotation directly from source code, outperforming feature-engineering approaches with minimal tuning. However, performance varied across subtypes, and errors often reflected ambiguity or bias toward more common labels. These findings suggest LLMs may serve as practical tools for scalable metadata generation in biomedical repositories, although careful evaluation and domain-specific validation remain important. While demonstrated in computational neuroscience, this approach may generalize to repository-agnostic metadata annotation in other scientific code repositories. Competing Interest Statement Robert A. McDougal currently runs the ModelDB repository and is one of the main developers of the NEURON simulator. This work was supported in part by university funds. All other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Footnotes Meetings: American Medical Informatics Association (AMIA) 2025 Annual Meeting. Atlanta, GA. November 17, 2025; Computational Neuroscience (CNS), Florence, Italy, July 6, 2025. Disclosures: Robert A. McDougal currently runs the ModelDB repository and is one of the main developers of the NEURON simulator. This work was supported in part by university funds. All other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00