Benchmarking Pre-trained Genomic Language Models for RNA Sequence-Related Predictive Applications

doi:10.1101/2025.03.05.641574

Benchmarking Pre-trained Genomic Language Models for RNA Sequence-Related Predictive Applications

2025 · doi:10.1101/2025.03.05.641574

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 1,165 characters · extracted from oa-doi-fallback · click to expand

ABSTRACT RNA plays a pivotal role in diverse cellular functions across organisms. Developing computational algorithms for RNA sequence related questions is highly valuable. Recently, genomic language models (gLMs) with pre-training have emerged, offering flexibility for various downstream prediction tasks. However, comprehensive and fair evaluations of gLMs are lacking. In this study, we benchmark eight gLMs on prediction tasks covering four RNA processes, highlighting their strengths and limitations. While gLMs excel in performance overall, the larger model is not always better. Interestingly, models that integrate biological information consistently perform well in related tasks. Notably, gLMs demonstrate superior performance with limited training data, whereas task-specific methods achieve comparable performance with better computational efficiency when sufficient training data is available. Finally, we provide recommendations for model selection in different scenarios. These evaluation results underscore the potential of gLMs and suggest areas for future improvement. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00