Deciphering the 3D genome organization across species from Hi-C data

doi:10.1101/2024.11.14.623548

Deciphering the 3D genome organization across species from Hi-C data

2024 · doi:10.1101/2024.11.14.623548

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 3,035 characters · extracted from oa-doi-fallback · click to expand

ABSTRACT Three-dimensional genome organization is essential for gene regulation, yet in various species it is driven by different biological mechanisms. Species-specific factors and DNA sequences influence chromatin folding, complicating cross-species comparisons. Leveraging Hi-C data and machine learning, we introduce Chimaera — a convolutional neural network that predicts Hi-C maps from DNA sequences, enabling exploration of genome folding in evolution. Chimaera’s latent representations revealed an unsupervised atlas of key chromatin features (such as insulation, loops, fountains/jets) and supported the detection and quantification of structural signatures in processes such as the cell cycle and embryogenesis. Targeted search in the latent space linked DNA sequence elements to specific chromatin structures. Applying Chimaera across multiple species confirmed the insulator roles of CTCF in vertebrates and BEAF-32 in D. melanogaster and identified a previously unreported insulator motif in D. melanogaster. In amoeba D. discoideum, gene orientation on the DNA strand was shown to influence loop formation. Models for other organisms also showed chromatin folding patterns associated with gene location. Finally, using cross-species predictions we tested the transferability of chromatin folding patterns and revealed evolutionary relationships, culminating in a chromatin structure-based cluster tree spanning plants to mammals. Competing Interest Statement The authors have declared no competing interest. Footnotes ↵† Aleksei Shkolikov and Aleksandra Galitsyna should be regarded as joint first authors. New species have been added to the analysis, some methods have been revised and the results have been supplemented. DATA AVAILABILITY Raw and processed Hi-C and Micro-C data were obtained from BioProject accessions PRJNA606649 (Xenopus tropicalis), PRJNA630123 (Anopheles merus), PRJNA665323 (Culex quinquefasciatus), PRJNA749654 (Sarcoptes scabiei), PRJNA683935 (Archegozetes longisetosus), PRJNA680311 (Arion vulgaris), PRJNA427478 (Pomacea canaliculata), PRJNA792953 (Cataglyphis hispanica) PRJCA014302 (Arabidopsis thaliana); BioSample accession SAMN13118423 (Apis cerana); GEO datasets GSE178982 (Mus musculus ESC), GSE129997 (Mus musculus cell cycle), GSE178982 (Mus musculus with depleted structural proteins) GSE171396 (Drosophila melanogaster), GSE128568 (Caenorhabditis elegans), GSE151553 (Saccharomyces cerevisiae wild type), GSE217017 (Saccharomyces cerevisiae with exogenous DNA), GSE85220 (Schizosaccharomyces pombe) GSE260572 (Trichoplax adhaerens, Mnemiopsis leidyi) GSE152150 (Symbiodinium microadriaticum), GSE195609 (Danio rerio embryos), GSE134055 (Danio rerio muscle cells) GSE247397 (Dictyostelium discoideum), GSM7120275 (Bombyx mori); 4DN dataset 4DNBSZOFFFM6 (Homo sapiens). Preprocessed data for all studied organisms and trained models are posted online at OSF (doi: 10.17605/OSF.IO/YF7CR). Chimaera code and illustrative examples are available at Zenodo (doi: 10.5281/zenodo.17418710).

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00