Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

doi:10.64898/2026.03.26.714453

Symmetric Self-play Online Preference Optimization for Protein Inverse Folding

2026 · doi:10.64898/2026.03.26.714453

preprint OA: closed CC-BY-NC-4.0

📄 Open PDF Full text JSON View at publisher

Full text 1,454 characters · extracted from oa-doi-fallback · click to expand

Abstract Multi-objective reinforcement learning based on predicted structure feedback has been introduced into protein inverse folding. However, existing methods typically rely on a single model to optimize multiple structural objectives via a scalarized reward, which can bias the optimization toward dominant objectives and limit the exploration of diverse solutions. Here, we propose a online Symmetric Self-play Preference Optimization (SSP) framework that decouples the optimization of multiple structural objectives by training separate preference models with distinct reward signals, while enabling interaction through a shared sampling pool. This design allows the models to explore diverse optimization trajectories without enforcing a single dominant direction. Extensive experiments on both natural and de novo binder backbone inverse folding tasks demonstrate that SSP consistently improves sequence design self-consistency compared to single-model and existing baselines. Further analysis shows that different structural objectives are only partially aligned and induce distinct optimization directions, as evidenced by metric correlation and white-box analyses. This supports the effectiveness of decoupling objectives to enable higher design quality in protein design. Competing Interest Statement The authors have declared no competing interest. Footnotes {wwz_cs{at}hnu.edu.cn,hnulixy{at}hnu.edu.cn,zht{at}hnu.edu.cn,ytdou{at}hnu.edu.cn}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-NC-4.0