High-Throughput Machine Learning-Aided Antibody Discovery for Cell Surface Antigens

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 1,395 characters · extracted from oa-doi-fallback · click to expand
Abstract Machine learning (ML) has the potential to revolutionize antibody design and selection, but its success depends on access to extensive, well-curated datasets of antibody-antigen interactions. To address this need, we developed a synthetic Fab yeast display library optimized for seamless ML integration, focusing on sequence diversity within the CDRH3 loop. The library incorporates key sequence features derived from human B cell repertoires essential for efficient antibody generation captured in a compact antigen recognition module (ARM) format. Built using the VH1-69 heavy chain and four light chains, the library was evaluated against ten human and murine cell surface antigens, including PD-L1, TIGIT, and ROBO1. This approach yielded hundreds of antibodies with robust biophysical properties, validated for functional performance in flow cytometry and immunohistochemistry. Furthermore, ML analysis identified additional antibodies for ROBO2 and PD-L2 from the aggregate sequencing data, demonstrating utility for hybrid in silico and experimental workflows. We provide a publicly accessible dataset comprising more than 68,000 Fab sequences and 486 characterized antibodies. This study establishes an ML-compatible framework designed to accelerate and streamline antibody discovery and development. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00