Protein secondary structure and remote homology detection

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 1,895 characters · extracted from oa-doi-fallback · click to expand
1 Abstract A protein can be represented by its primary, secondary, or tertiary structure. With recent advances in AI, there is now as much tertiary as primary structural data available. Fast and accurate search methods exist for both types of data, with searches over both representations being highly precise. However, primary structure data can sometimes be incomplete. As a result, tertiary structure has become the gold standard for remote homology detection. How does secondary structure perform in remote homology detection? Secondary structure interprets proteins as a sequence using an alphabet representing helices, strands, or loops. It shares its sequential nature with primary structure while retaining topological information similar to tertiary structure. To assess the effectiveness of secondary structure in remote homology detection, we devised a challenging classification task aimed at determining the superfamily membership of very distantly related protein domains. We used benchmarks from the CATH and SCOP databases and evaluated sequence and structure alignment algorithms on primary, secondary, and tertiary structures. As expected, both basic and advanced sequence alignment algorithms applied to primary structure achieved high precision, but their overall area under the curve was lower compared to the gold standard of structural alignment using tertiary structure. Surprisingly, a simple string comparison algorithm applied to secondary structure performed close to the gold standard. This result supports the hypothesis that key structural information is already encoded in secondary structure and suggests that secondary structure may be a promising representation to use when high-confidence structural data is unavailable, such as in cases involving protein flexibility and disorder. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00