Semi-supervised Retrieval of Functional Residues Through the Integration of Protein Language Models and Gene Ontology Data

preprint OA: closed
Full text JSON View at publisher
Full text 1,603 characters · extracted from oa-doi-fallback · 2 sections · click to expand

Abstract

Motivation Experimental studies of protein function often focus on mechanistic descriptions, characterizing how specific sites and residues contribute to activity. Abstractions such as domains and active sites enable quantitative descriptions of how protein features act biologically. Thanks to the abundance of high-quality sequence and function data, machine learning has achieved great success in directly predicting protein function. However, translating functional characterizations into mechanistic ones on the level of the domains, binding sites, or motifs remains challenging. This represents a semi-supervised problem: sequences and global functional labels are available, but local annotations must be inferred.

Results

We investigate the unsupervised discovery of functionally active protein regions by integrating protein sequence models with functional information. We first formalize the residue-level functional annotation problem by constructing unified evaluation datasets linking Gene Ontology functions to annotated residues. Eight datasets are assembled, spanning levels of specificity from single active-site residues to domains covering up to 60% of a protein. We then introduce a new class of function-conditioned generative models that more accurately predict functionally important residues than existing approaches, including interpretability methods and PSSM entropy estimation, across multiple benchmark datasets. Availability github.com/mofradlab/go_interp Contact mofrad{at}berkeley.edu Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00