Gene-centered identification of cis-regulatory islands reveals regulatory landscapes complementary to motif-centric approaches

preprint OA: closed
Full text JSON View at publisher
Full text 2,200 characters · extracted from oa-doi-fallback · click to expand
Abstract Cis-regulatory elements constitute a fundamental layer of gene regulation, yet their computational identification has largely relied on transcription factor (TF)–centric frameworks that assume genome-wide background normalization and explicit TF binding models. While effective at the genome scale, such assumptions are less appropriate for gene-centered analyses, where local sequence composition rather than global averages defines the relevant regulatory context. Here, we introduce a TF-independent framework for the gene-centered identification of cis-regulatory islands (GCIC), which detects regulatory structure based on the local enrichment and diversity of short cis-regulatory sequence words derived from curated plant regulatory elements. Cis-regulatory islands are identified through the spatial overlap of independently enriched motif families, without relying on TF identity, binding affinity, or genome-wide normalization. Application of the GCIC framework to the DROOPING LEAF (DL) locus in rice identifies discrete cis-regulatory islands, including one that coincides with a previously characterized intronic regulatory region, and reveals spatial patterns distinct from those detected by PWM-based motif scanning and motif clustering approaches. Genome-wide analyses further show that cis-regulatory islands are broadly distributed across genes but exhibit heterogeneous motif-family usage: regulatory vocabulary diversity expands at the gene level, whereas individual islands preferentially reuse a limited set of motif-family combinations. These results indicate that cis-regulatory organization is best described as a gene-centered property of sequence vocabulary usage, in which regulatory diversity arises through gene-specific deployment and constrained reuse of motif-family combinations rather than unrestricted combinatorial complexity. The GCIC framework thus provides a complementary representation of regulatory landscapes tailored to gene-centered analyses, capturing regulatory features that are not readily detected by motif-centric approaches optimized for genome-wide inference. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00