Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

preprint OA: closed CC-BY-NC-4.0
📄 Open PDF Full text JSON View at publisher
Full text 1,454 characters · extracted from oa-doi-fallback · click to expand
Abstract Multi-objective reinforcement learning based on predicted structure feedback has been introduced into protein inverse folding. However, existing methods typically rely on a single model to optimize multiple structural objectives via a scalarized reward, which can bias the optimization toward dominant objectives and limit the exploration of diverse solutions. Here, we propose a online Symmetric Self-play Preference Optimization (SSP) framework that decouples the optimization of multiple structural objectives by training separate preference models with distinct reward signals, while enabling interaction through a shared sampling pool. This design allows the models to explore diverse optimization trajectories without enforcing a single dominant direction. Extensive experiments on both natural and de novo binder backbone inverse folding tasks demonstrate that SSP consistently improves sequence design self-consistency compared to single-model and existing baselines. Further analysis shows that different structural objectives are only partially aligned and induce distinct optimization directions, as evidenced by metric correlation and white-box analyses. This supports the effectiveness of decoupling objectives to enable higher design quality in protein design. Competing Interest Statement The authors have declared no competing interest. Footnotes {wwz_cs{at}hnu.edu.cn,hnulixy{at}hnu.edu.cn,zht{at}hnu.edu.cn,ytdou{at}hnu.edu.cn}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-NC-4.0