NeuroCDS: Integrating Local and Global Neural Network Representations via Structural Constrained Viterbi Decoding for Robust CDS Annotation

preprint OA: closed
Full text JSON View at publisher
Full text 2,221 characters · extracted from oa-doi-fallback · 2 sections · click to expand

Abstract

Motivation Robust annotation of Coding Sequences (CDS) is critical for downstream transcriptomics, yet heavily fragmented de novo RNA-Seq assemblies pose a severe challenge. Traditional computational tools rely on fixed, hand-crafted features that are prone to fail when canonical sequence signals are truncated. While recent deep learning models excel at automatically extracting complex representations, they predominantly treat these as isolated prediction tasks. Lacking a joint inference mechanism to enforce structural constraints, existing models occasionally output biologically invalid predictions. Therefore, a computational framework capable of fusing heterogeneous neural network representations for joint annotation is critically needed.

Results

We present NeuroCDS, a reliable framework that bridges the effective representation capabilities of deep neural networks with the structural rigor of dynamic programming. NeuroCDS employs a dual-branch architecture: a Convolutional Neural Network (CNN) acts as a local sensor to extract Translation Initiation Sites (TIS), while a Temporal Convolutional Network (TCN) acts as a global sensor to evaluate continuous regional coding potential. The primary contribution of NeuroCDS lies in a structurally constrained Viterbi Decoding algorithm designed to fuse these heterogeneous signals. This joint inference mechanism strictly enforces biological grammars (e.g., reading frame preservation) to dynamically calculate the globally optimal transcript structure via a tripartite state space. Crucially, by introducing a dynamic length normalization mechanism, our formulation adaptively leverages global continuous representations to stably annotate both intact transcripts and highly truncated fragments. Comprehensive evaluations demonstrate that NeuroCDS achieves high F1-scores on full-length transcripts and maintains robust performance on complex Ribo-seq validated datasets, comparing favorably against traditional HMM-based and heuristic methodologies. Availability Source code, pre-trained models, and datasets are freely available at https://github.com/hgcwei/NeuroCDS. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00