Gene-centered identification of cis-regulatory islands reveals regulatory landscapes complementary to motif-centric approaches

doi:10.64898/2026.01.05.697455

Gene-centered identification of cis-regulatory islands reveals regulatory landscapes complementary to motif-centric approaches

2026 · doi:10.64898/2026.01.05.697455

preprint OA: closed

Full text JSON View at publisher

Full text 2,200 characters · extracted from oa-doi-fallback · click to expand

Abstract Cis-regulatory elements constitute a fundamental layer of gene regulation, yet their computational identification has largely relied on transcription factor (TF)–centric frameworks that assume genome-wide background normalization and explicit TF binding models. While effective at the genome scale, such assumptions are less appropriate for gene-centered analyses, where local sequence composition rather than global averages defines the relevant regulatory context. Here, we introduce a TF-independent framework for the gene-centered identification of cis-regulatory islands (GCIC), which detects regulatory structure based on the local enrichment and diversity of short cis-regulatory sequence words derived from curated plant regulatory elements. Cis-regulatory islands are identified through the spatial overlap of independently enriched motif families, without relying on TF identity, binding affinity, or genome-wide normalization. Application of the GCIC framework to the DROOPING LEAF (DL) locus in rice identifies discrete cis-regulatory islands, including one that coincides with a previously characterized intronic regulatory region, and reveals spatial patterns distinct from those detected by PWM-based motif scanning and motif clustering approaches. Genome-wide analyses further show that cis-regulatory islands are broadly distributed across genes but exhibit heterogeneous motif-family usage: regulatory vocabulary diversity expands at the gene level, whereas individual islands preferentially reuse a limited set of motif-family combinations. These results indicate that cis-regulatory organization is best described as a gene-centered property of sequence vocabulary usage, in which regulatory diversity arises through gene-specific deployment and constrained reuse of motif-family combinations rather than unrestricted combinatorial complexity. The GCIC framework thus provides a complementary representation of regulatory landscapes tailored to gene-centered analyses, capturing regulatory features that are not readily detected by motif-centric approaches optimized for genome-wide inference. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00