Inferring fungal cis-regulatory networks from genome sequences via unsupervised and interpretable representation learning

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 2,547 characters · extracted from oa-doi-fallback · click to expand
Abstract Gene expression patterns are determined to a large extent by transcription factor binding to non-coding regulatory regions in the genome. However, gene expression cannot yet be systematically predicted from genome sequences, in part because non-functional matches to the sequence patterns (motifs) recognized by transcription factors (TFs) occur frequently throughout the genome. Large-scale functional genomics data for many TFs has enabled characterization of regulatory networks in experimentally accessible cells such as budding yeast. Beyond yeast, fungi are important industrial organisms and pathogens, but large-scale functional data is only sporadically available. Uncharacterized regulatory networks control key pathways and gene expression programs associated with fungal phenotypes. Here we explore a sequence-only approach to inferring regulatory networks by leveraging the 100s of genomes now available for many clades of fungi. We use gene orthology as the learning signal to infer interpretable, TF motif-based representations of non-coding regulatory regions. Using these representations to identify conserved signals for motifs, comparative genomics can be scaled to evolutionary comparisons where sequence similarity cannot be detected. We show that similarity of these conserved motif signals predicts gene expression and regulation better than using experimental data, and that we can infer known and novel regulatory connections in diverse fungi. Our new predictions include a pathway for recombination in C. albicans and pathways for mating and an RNAi immune response in Neurospora. Taken together, our results indicate that specific hypotheses about transcriptional regulation in fungi can be obtained for many genes from genome sequence analysis alone. Competing Interest Statement The authors have declared no competing interest. Footnotes Specific highlights of the revisions include: -Improving accessibility for fungal biologists by making plain text versions of all 4 inferred regulatory maps and html output of TomTom motif searchers available for download at the zenodo page: doi.org/10.5281/zenodo.14920043 -Highlighted the originality of the method by changing the title to include unsupervised and interpretable representation learning -More clearly demonstrated that our approach is a significant advance on other approaches (including phylogenetic footprinting and in vivo transcription factor binding experiments) -Identified Ste12 regulated effectors in Fusarium (thanks to a reviewer s suggestion)

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00