Evaluating Expert Specialization in Mixture-of-Experts Antibody Language Models

preprint OA: closed
Full text JSON View at publisher
Full text 1,372 characters · extracted from oa-doi-fallback · click to expand
Abstract Antibody language models (AbLMs) show an impressive aptitude for learning antibody features, but tend to struggle learning the highly diverse, non-templated regions of antibodies. Existing AbLMs use dense architectures, where all model parameters attend to each amino acid token. We hypothesized that the modular nature of antibodies could benefit from a sparse mixture-of-experts (MoE) architecture, allowing specific parameters (referred to as ‘experts’) to specialize in distinct antibody features. While MoE architectures are widely adopted and optimized in natural language processing domains, they are less common in biological modeling. To this end, we assess existing MoE routing strategies and find that token-choice routing strategies outperform expert-choice routing, presumably due to their specialization in CDRH3 residues. We further optimized the token-choice router for AbLMs, by minimizing the routing of padding tokens to enable pre-training with varying sequence lengths. Finally, we show that a large-scale baseline antibody language model with a Top-2 MoE architecture (BALM-MoE), trained on a mixture of unpaired and paired antibody sequences, outperforms its dense counterpart with the same number of active parameters. Competing Interest Statement BB is an equity shareholder in Infinimmune and a member of their Scientific Advisory Board.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00