Beyond Pathway Boundaries: A Degree-Aware Network Clustering Test for Gene Sets

preprint OA: closed
Full text JSON View at publisher
Full text 2,294 characters · extracted from oa-doi-fallback · click to expand
ABSTRACT Over-representation analysis (ORA) is the most commonly used interpretation tool for gene lists despite well-documented limitations: pathway boundaries are fixed, genes are assumed independent, and results depend on the background set. Network-based methods address these using interaction-network modularity, but introduce hub bias: highly connected genes appear clustered under naive nulls because curated networks overrepresent well-studied genes. Existing corrections are imperfect: edge permutation destroys the topology the test should condition on, and propagation methods hide the confound in parameter tuning. We introduce MANGO (Moran’s Autocorrelation for Network Gene Over-representation), which asks one conditional question: does a gene set’s spatial autocorrelation on a fixed biological network exceed what its degree composition alone would predict? MANGO computes Global Moran’s I under a null that conditions on both the network and the binned degree distribution of the gene set, then decomposes significant signals at the component and gene level. In benchmarks, uniform nulls produce a false positive rate of 1.0 on hub-enriched gene sets with no real clustering; ten-bin degree-stratified nulls bring that to 0.0 with no power loss (AUC ≥ 0.98; on degree-typical signals, |ΔAUC| ≤ 0.004). Pathway-spiking simulations confirm detection of real biological clustering across diverse pathway sizes and degree profiles. Applied to the FIGI colorectal cancer GWAS (204 SNPs), the set is degree-typical (KS p = 0.83), yet Moran’s I is highly significant (p < 0.001). Component-level jackknife localizes the entire signal to a single 24-gene module spanning TGF-β, Wnt/cadherin, and related pathways, with four bottlenecks (SMAD3, MYC, CTNNB1, PTPN1) matching established CRC driver biology. eTOC blurb MANGO tests whether a gene set’s spatial autocorrelation on a biological network exceeds what its degree composition predicts, by conditioning Global Moran’s I on the binned degree distribution with the network held fixed. Significant signals are decomposed to modules, bottleneck genes, and statistical drivers through component jackknife, articulation-point, and gene-jackknife analysis. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00