BOTANIC-0: a series of foundation models for plant genomic data

preprint OA: closed
Full text JSON View at publisher
Full text 1,764 characters · extracted from oa-doi-fallback · click to expand
Abstract Genomic language models (gLMs) have emerged as a powerful paradigm for learning regulatory biology directly from DNA sequence. Here, we introduce Botanic0¶, a family of plant genomic foundation models spanning 100M to 1B parameters and pretrained on 43 phylogenetically diverse plant genomes. The Botanic0-S, Botanic0-M, and Botanic0-L models form the first generation of a long-term research initiative, dedicated to advancing crop improvement research, genotype–to-phenotype modeling, and sequence-based genome editing. The architecture, pre-training pipeline and pre-training dataset of Botanic0 follow the seminal work of [1]. Across a broad suite of genomic and genetic prediction tasks, including regulatory element annotation, gene expression inference, and variant effect prediction, Botanic0 models achieve performance competitive with state-of-the-art foundation models, both in zero-shot settings and after fine-tuning. Scaling analyses reveal consistent improvements in predictive power with increased model capacity, highlighting the benefits of large-model pretraining for plant genomics. This work establishes our ability to train foundation models at scale, and lays the foundation for the next generations of models to come. To support reproducible research and community benchmarking, we release all Botanic0 models at https://huggingface.co/living-models/models. Competing Interest Statement The authors declare the existence of a financial competing interest. All authors are or were employed by Living Models during their time on the project. Footnotes This revision fixes a few typo and references errors as well as update the acknowledgments. ↵¶ Biological Omics Transformer for Agricultural and Nutritional trait Inference in Crops

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00