Highly conserved sequence-specific DNA binding networks of 716 transcription factors associated with continuing divergent genomic evolution of human and chimpanzee brain development.

preprint OA: closed
Full text JSON View at publisher
Full text 156,790 characters · extracted from preprint-html · click to expand
Highly conserved sequence-specific DNA binding networks of 716 transcription factors associated with continuing divergent genomic evolution of human and chimpanzee brain development. | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Highly conserved sequence-specific DNA binding networks of 716 transcription factors associated with continuing divergent genomic evolution of human and chimpanzee brain development. Gennadi Glinsky This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5442388/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Emergence during mammalian evolution of common and divergent traits of genomic regulatory networks (GRNs) encompassing ubiquitous, compositionally nearly identical yet quantitatively distinct panels of DNA sequences of transcription factor binding sites (TFBS) for 716 proteins is reported. The evolutionary-conserved foundation of these GRNs appears assembled from arrays of DNA codes for ~770 TFBS, including 65 instances of immediately adjacent TFBS for two distinct TFs. A majority of protein constituents of these GRNs (770 of 716; 98%) is defined by Gene Ontology (GO) features of sequence-specific double-stranded DNA binding (GO: 1990837). Genome-wide and individual chromosome-level analyses of 17,935 ATAC-seq-defined brain development regulatory regions (BDRRs) revealed nearly universal representations of TFBS for TF-constituents of these networks, TFBS densities of which appear consistently higher within thousands BDRRs of Modern Humans compare to Chimpanzee. Transposable elements (TE), including LTR/HERV, SINE/Alu, SVA, and LINE families, appear to harbor consensus regulatory nodes of identified herein highly conserved sequence-specific double-stranded DNA binding networks. Notably, selections of quantitative features of TFBS panels of these GRNs manifest individual chromosome-specific profiles and species-specific divergence patterns. Collectively, this contribution highlights a previously unrecognized essential function of genomic DNA sequences derived from multiple TE families in providing genome-wide regulatory seed templates of sequence-specific double-stranded DNA binding GRNs. Since DNA sequences of TFBS panels for 716 proteins are encoded by transposons that remain active in genomes of present day humans, namely SVA and LINE families, retrotransposition-mediated spread of seeds for these GRNs may contribute to continuing divergent genomic evolution of human and chimpanzee brain development. Epigenetics & Genomics transcription factor binding sites (TFBS) transposable elements (TE) human endogenous retrovirus type H (HERVH) human endogenous retrovirus type L (HERVL) human endogenous retrovirus type K (HERVK) LTR7 MLT2A1 MLT2A2 LTR5_Hs/HERVK LINE SINE/Alu retrotransposition primate evolution mammalian offspring survival genes human embryogenesis brain development regulatory regions human endogenous complexomes viral-host protein-protein interactions neoplasm metastasis neurodevelopmental disorders neurodegenerative diseases human-specific phenotypic traits human-specific regulatory sequences. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction DNA sequences derived from transposable elements (TE) constitute ~50% of human genome, contributing to a multitude of structural features and regulatory functions at different levels of genomic organization (Elbarbary et al., 2016; Sundaram and Wysocka, 2020). During evolution, concurrently with TE colonization of primates’ genomes, a highly sophisticated defense system co-evolved designed to restrict uncontrolled TE expansion and diminish potentially deleterious effects of TE insertions on genome integrity, while providing TE family sequence-specific co-option mechanisms integrating TE-derived sequences into cell type- and tissue-specific genomic regulatory networks (Jacobs et al., 2014; Pontis et al., 2019). While potentially destructive effects of TE on genome integrity represent their ubiquitous biological feature defined by the very nature of transpositionally-competent TE, it is unknown whether TE-derived sequences may possess similarly ubiquitous non-deleterious functions, perhaps, contributing to evolutionary fitness of the host. TE-derived sequences are often integrated into species’ genomics regulatory networks. In mammalian genomes, TE-derived sequences rewired the core regulatory circuitry of embryonic stem cells (Kunarso et al., 2010); they intrinsically activated in human preimplantation embryos (Grow et al., 2015) by operating as long-range enhancers (Fuentes et al, 2018) and providing transcription factor binding sites (TFBS) for TP53 (Wang et al., 2007), STAT1 (Schmid et al., 2010), CTCF (Schmidt et al., 2012), a set of transcription factors (TFs) defined as master pluripotency regulators (Kunarso et al., Schmidt et al., 2012), including candidate human-specific TFBS for POU5F1 (OCT4), NANOG, SOX2, and CTCF (Glinsky, 2015; Glinsky et al., 2018). Overall, chromatin immunoprecipitation and sequencing (Chip-seq) experiments demonstrated that TE-derived loci harbor thousands of TFBS and presumably exerting global regulatory effects on gene expression in various pathophysiological conditions. However, distribution of TFBS are non-uniform and highly variable within specific TE subfamilies that evolutionary emerged from identical or highly similar sequences. TE sequence-dependent distribution of TFBS is likely influenced by genetic drift and other mutational processes as well as cell type-specific epigenetic contexts affecting chromatin states of genomic regions harboring TE insertions. Based on these considerations, it was reasonable to assume that Chip-seq experiments could capture only a fraction of TFBS repertoire encoded by the ancestral TE loci that may have existed at the time of the TE insertion and expansion in a host genome. Therefore, it was of interest to catalogue all TFs that have potential TFBS located within genomic loci encoded by a specific TE subfamily, which should facilitate the comparative analyses of TFBS encoded by distinct TE families to infer their concordant, non-overlapping, and discordant regulatory functions. Results Highly conserved sequence-specific double-stranded DNA binding genomic regulatory networks (GRNs) encoded by distinct families of human embryo regulatory LTRs. To catalogue all TFs that have potential TFBS located within genomic loci encoded by a specific TE subfamily of human embryo regulatory LTRs (Glinsky, 2022; 2024), the Jaspar algorithm was employed to identify all TFBS located within 606 LTR5_Hs loci previously characterized as functionally active regulatory elements in human cells (Fuentes et al, 2018; Glinsky, 2022). The LTR5_Hs loci are likely to manifest the relatively minor divergence of TFBS profiles because LTR5_Hs/HERVK successfully infected and colonized primates’ germline most recently compared to other HERVs with documented regulatory functions in human embryogenesis (Glinsky, 2022; 2024). The Jaspar algorithm output was processed to identify at a single-nucleotide resolution all TFs having predicted TFBS within LTR5_Hs loci and the number of TFBS for each TF was calculated (Supplementary Table S1). Overall, LTR5_Hs loci harbor 771 distinct TFBS for 716 individual TFs, including 65 paired TFBS features for 68 TFs comprising the immediately adjacent TFBS for two different TFs. Since individual LTR5_Hs loci harbor subsets of a composite set of 771 TFBS, it was designated a composite ancestral set of TFBS encoded by LTR5_Hs loci. Gene Set Enrichment Analyses (GSEA) employing the ENRICHR bioinformatics platform documented the exceedingly broad engagement of the LTR5_Hs loci-residing TFs in manifestations of physiological phenotypes and pathological conditions of Homo sapiens (Table 1; Supplementary Table S1). As expected, GSEA employing GO Molecular Function 2023 database identified among most significantly enriched categories 556 genes of Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000987) category; 581 genes of RNA Polymerase II Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000978) category; 609 genes of RNA Polymerase II Transcription Regulatory Region Sequence-Specific DNA Binding (GO:0000977) category. Notably, 700 of 716 (98%) of TFs having TFBS within LTR5_Hs loci represent constituents of the single Gene Ontology classification category designated the Sequence-Specific Double-Stranded DNA Binding (GO: 1990837). These observations indicate that identified herein set of LTR5_Hs-residing TFs may constitute a sequence-specific double-stranded DNA binding pathway exerting exceedingly broad and predominantly regulatory impacts on phenotypes of Modern Humans. Quantitative features of the ancestral LTR5_Hs set of TFBS were documented by calculating the numbers of TFBS for each of the 771 distinct TFBS and reporting the computed values as numbers of TFBS per LTR5_Hs locus (defined as TFBS frequency) and the estimated TFBS density normalized to 1 Kb of the locus length (defined as TFBS density; Supplementary Table S1). It has been observed that 533 of 771 distinct TFBS (69%) have at least one TFBS per LTR5_Hs locus and TFBS density more than 1 binding site per 1 Kb of regulatory DNA, while top 100 TFBS have TFBS frequency of more than 8.7 binding sites per LTR5_Hs locus and TFBS density more than 9.9 binding sites per 1 kb of regulatory DNA. These observations suggest that a significant fraction of the LTR5_Hs ancestral set of TFBS has been preserved in LTR5_Hs loci residing in the human genome. Analysis of TFBS density values indicate that individual LTR5_Hs loci harbor TFBS for numerous TFs (Supplementary Table S1). This feature appears similar to experimentally defined 3583 regulatory loci in the mouse ESC which are documented to have TFBS for multiple TFs (Chen et al., 2008). It was of interest to identify all potential TFBS residing within 3583 mouse ESC multi-TFs-binding regulatory loci and compared them to the ancestral LR5_Hs set of 771 TFBS. These analyses demonstrated that 3583 mouse ESC multi-TFs-binding loci harbor 776 distinct TFBS (Supplementary Table S2). Notably, all 771 distinct TFBS comprising the ancestral set of LTR5_Hs-encoded TFBS were identified among TFBS residing within mouse ESC regulatory sequences binding multiple TFs. These observations suggest that identified herein ancestral set of 771 TFBS encoded by LTR5_Hs elements may reflect the presence of evolutionary conserved genomic regulatory network (GRN) operating during mammalian embryogenesis. It was of interest to characterize the spectrum of TFBS encoded by other LTR families known to play a regulatory role during human embryogenesis and collectively defined as human embryo regulatory LTRs (Glinsky, 2022; 2024). To this end, all potential TFBS encoded by LTR7; MLT2A1; and MLT2A2 loci were identified employing the Jaspar algorithm and corresponding quantitative metrics were computed for each set of TFBS (Supplementary Table S3). Intriguingly, despite clearly discernable marked divergence of DNA sequences, all analyzed herein regulatory LTR elements appear to harbor nearly identical sets of TFBS (Figure 1; Supplementary Table S3). Since introductions into primates’ germline of different families of analyzed herein regulatory LTRs were separated by millions of years, these observations are consistent with the hypothesis of evolutionary conservation of identified in this contribution GRN. In agreement with this concept, correlation analyses of TFBS density profiles (Figure 1) revealed highly concordant patterns of TFBS densities between mouse ESC multi-TFs-binding loci and distinct families of human embryo regulatory LTRs, namely MLTA1A2 (r = 0.948); MLT2A1 (r = 0.921); LTR7 (r = 0.900); and LTR5_Hs (r = 0.834). Interestingly, networks of TFBS encoded by evolutionary ancient LTRs (MLT2A2 and MLT2A1) appear more closely related to the TFBS profile of mouse ESC multi-TFs-binding regulatory loci compared to TFBS networks encoded by human embryo regulatory LTRs introduced into primate germlines relatively recently (LTR7 and LTR5_Hs). Despite overall similarities TFBS density profiles (Figure 1), human embryo regulatory LTRs manifest notable quantitative differences of the TFBS numbers for specific TFs (Figure 2; Supplementary Figure S1; Supplementary Table S4). Interestingly, LTR5_Hs loci appear to exhibit the largest divergence of the TFBS density profiles when comparisons were made to either mouse ESC multi-TFs-binding loci or other families of human embryo regulatory LTRs, including MLT2A1; MLT2A2; and LTR7 sequences (Figure 2; Supplementary Figure S1; Supplementary Table S4). For example among LTR5_Hs TFs with top 100 TFBS density scores, 19 TFs have at least 50% higher TFBS density compared to mouse ESC multi-TFs-binding regulatory loci and no TFs have lower TFBS density at this threshold. In contrast, within LTR7 loci there are 9 TFs manifesting at least 50% TFBS density gain and 4 TFs exhibiting TFBS density loss compared to mouse ESC multi-TFs-binding regulatory loci. Similarly, within MLT2A1 loci there are 7 TFs manifesting gain and 8 TFs exhibiting loss of at least 50% TFBS density compared to mouse ESC multi-TFs-binding regulatory loci. There were no TFs with TFBS density gain of at least 50% and 4 TFs with TFBS density loss within MLT2A2 loci. Direct comparisons of TFBS density scores between different families of human embryo regulatory LTRs reveals that among LTR5_Hs TFs with top 100 TFBS density scores, there are 23; 33; and 36 TFs having at least 50% higher TFBS density scores within LTR5_Hs loci compared to LTR7; MLT2A2; and MLT2A1 loci, respectively. In contrast, there are 1; 1; and 7 TFs having at least 50% higher TFBS density scores within LTR7; MLT2A2; and MLT2A1 loci compared to LTR5_Hs loci, respectively. Comparative analyses of TFBS density scores between LTR5_Hs and LTR7 loci among TFs with top 533 TFBS density scores (see above), demonstrate that there are 159 TFs and 55 TFs manifesting at least 50% higher TFBS density scores within LTR5_Hs versus LTR7 loci and LTR7 versus LTR5_Hs loci, respectively. At the threshold of at least 100% gains of TFBS density scores, there are 107 TFs having higher TFBS density scores within LTR5_Hs loci versus LTR7 loci, whereas there are no TFs harboring higher TFBS density scores within LTR7 loci versus LTR5_Hs loci (Supplementary Table S1). These findings suggest that LT5_Hs loci may have made a significant impact on the divergence of GRNs governed by the human embryo regulatory LTRs by providing regulatory loci with increased numbers of TFBS for numerous TFs (Figure 2; Supplementary Figure S1; Supplementary Tables S1 and S4). GSEA of TF-coding genes manifesting most significant increases of TFBS densities within LTR5_Hs loci compared to either mouse ESC multi-TFs-binding loci and/or other human embryo regulatory LTRs demonstrated that these TFs contribute to development of preimplantation embryogenesis phenotypic traits (Embryonic stem cells; Trophoblast stem cells; Blastocyst; Germ cells; Ectoderm); as well as multiple types of cells and tissues of central nervous system, including Neural stem/precursor cells; Neural crest; Neuroblast; Neural tube; Motor neurons; Oligodendrocyte precursor/progenitor cells; Fetal brain; Cerebellum; Prefrontal cortex; Superior frontal gyrus; Peripheral nerve (Table 2; Supplementary Figure S1; Supplementary Tables S1 and S4). These findings are consistent with the previously reported observations based on gene ontology-guided proximity placement analyses of GRNs governed by human embryo regulatory LTRs, which implicated LTR-associated GRNs in regulation of development and functions of primates’ central nervous system (Glinsky, 2022; 2024). Comparative analyses of highly conserved sequence-specific double-stranded DNA binding GRNs within ATAC-seq-defined regulatory regions of human and chimpanzee brain development. It was of interest to extend further this line of inquiry by determining whether reported herein sets of TFs and TFBS constituting the Sequence-Specific Double-Stranded DNA Binding GRNs associated with human embryo regulatory LTRs could be identified within genomic regulatory loci known to contribute to regulation of human and chimpanzee brain development. To this end, 17,935 genomic regulatory regions defined in the organoid single-cell genomic atlas of human and chimpanzee brain development (Kanton et al, 2019) were analyzed to catalogue all TFs having putative TFBS within these regions (Table 3). Kanton et al (2019) reported 8099 human-specific and 9836 chimpanzee-specific brain development regulatory regions (BDRRs), which were identified employing DNA sequence unbiased open chromatin accessibility screening method termed the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq; Buenrostro et al, 2013; 2015). Genome-wide and individual chromosome level analyses of these regions revealed that TFs and corresponding TFBS of the Sequence-Specific Double-Stranded DNA Binding pathway comprise the intrinsic elements of the ATAC-seq-defined human and chimpanzee BDRRs (Table 3). Notably, quantitative characteristics of the Sequence-Specific Double-Stranded DNA Binding networks appear significantly distinct between human and chimpanzee ATAC-seq-defined regulatory regions of brain development. While sets of individual TFs and corresponding TFBS independently identified in both species are indistinguishable, the TFBS frequencies (numbers of TFBS per ATAC-seq-defined regulatory locus) and the TFBS densities (TFBS numbers per 1 kb of ATAC-seq-defined regulatory locus) are consistently higher in Modern Humans compared to Chimpanzee BDRRs (Table 3). These finding suggest that reported distinct quantitative characteristics of the Sequence-Specific Double-Stranded DNA Binding pathway within ATAC-seq-defined brain development regulatory loci may have contributed to the previously observed different trajectories of Modern Humans and Chimpanzee brain development (Kanton et al., 2019). Next, quantitative characteristics of gains and/or losses for individual LTR5_Hs loci-associated TFs having top 100 TFBS density scores (see above) were calculated in 17,935 ATAC-seq-defined BDRRs of human and chimpanzee, corresponding quantitative metrics were estimated for individual species’ chromosomes, and multiple comparative analyses were carried out. Comparisons of TBS density profiles within BDRRs of the individual chromosomes of human and chimpanzee genomes that were acquired as a result of TFBS density gains or losses during mammalian evolution (Figure 3) identified five chromosomes manifesting highly concordant changes of TFBS densities, namely chr19 (r = 0.99); chr22 (r = 0.98); chr4 (r = 0.94); chr17 (r = 0.93); chr13 (r = 0.90); while eight chromosome exhibit weak or no concordance of TFBS density changes, namely chr21 (r = -0.12); chr14 (r = 0.12); chrX (r = 0.31); chr9 (r = 0.35); chr7 (r = 0.37); chr11 (r = 0.40); chr15 (0.43); chr12 (r = 0.47). Examples of correlation plots for two chromosomes manifesting most concordant (chr19 and chr22) and discordant (chr21 and chr14) profiles of TFBS density changes are shown in the Figure 3. Remaining 10 chromosomes appear to have moderate concordance levels of TFBS density profile changes within BDRRs of human and chimpanzee (Figure 3), suggesting largely divergent patterns of evolutionary TFBS density gains and losses of two species. To explore further the hypothesis of the divergent evolution of TFBS density gains within BDRRs of human and chimpanzee, TFBS density gains and losses compared to mouse ESC multi-TFs-binding loci were computed for each chromosome and numbers of events manifesting TFBS density changes of at least 50% were plotted for visualization (Figure 4). Results of these analyses documented clearly discernable distinct patterns of TFBS density gains and losses acquired by humans and chimpanzee during mammalian evolution (Figure 4). Interestingly, humans appear to manifest predominantly gains of TFBS density within BDRRs, while chimpanzee seem to exhibit the prevalent losses within regulatory regions of brain development at several chromosomes (Figure 4). These observations were corroborated by direct comparisons of TFBS density values within BDRRs of human versus chimpanzee, which were calculated for each individual TFBS, recorded, and reported for each chromosome as the numbers of events manifesting most significant species-specific gains of TFBS densities (Figure 4). Computation of gains and losses for individual TFs of the Sequence-Specific Double-Stranded DNA Binding networks in 17,935 brain development genomic regulatory regions and recording of corresponding quantitative metrics of TFBS density changes for all individual chromosomes of human and chimpanzee genomes facilitated both interspecies and within species comparative genome-wide chromosome-level analyses. To this end, genome-wide chromosome-level pairwise correlation matrices of TFBS density changes acquired during mammalian evolution within human (Table 4) and chimpanzee (Table 5) BDRRs defined by the ATAC-seq analysis were developed and analyzed for within specie’s genome concordance and divergence patterns. Using numerical values of correlation coefficients reported in Tables 3 and 4, the mean values of correlation coefficients were calculated for each chromosome of human and chimpanzee genomes and corresponding divergence scores were quantified by subtracting the mean values from a perfect correlation coefficient value of 1.0. Additionally, for each individual human and chimpanzee chromosome coefficient of variations were estimated as the ratio of standard deviation to the mean value expressed as a percentage. Corresponding numerical values were plotted for visualization and reported in the Figure 5. In human genome, 19 chromosomes appear to exhibit similar values of divergence scores ranging from 0.208 (chr10) to 0.325 (chr20). In contrast, seemingly higher values of divergence scores were documented for 4 chromosomes, namely 0.553 (chr16); 0.831 (chr17); 0.960 (chr22); and 1.047 (chr19). These findings were corroborated by the results of the analysis of variation coefficients which identified chr17; chr19; and chr22 as chromosomes having largest values of coefficients of variation among human chromosomes (Figure 5). In chimpanzee genome, values of divergence scores appear to manifest a somewhat broader degree of variability ranging from 0.194 (chr7) to 0.472 (chr6), while 2 chromosomes had seemingly higher divergence score values of 0.646 (chr13) and 0.757 (chr4). Consistent with these observations, results of the analysis of variation coefficients identified chr13 and chr4 as chromosomes associated with largest values of coefficients of variation in chimpanzee genome (Figure 5). Therefore, analyses of patterns of divergence and concordance of chromosome-level TFBS density changes acquired by Modern Humans and Chimpanzee during mammalian evolution identified chr17; chr19; and chr22 as chromosomes exhibiting most divergent profiles of TFBS density changes in human genome, while chr4 and chr13 appear to manifest most divergent patterns of TFBS density changes in chimpanzee genome (Figure 5). Notably, chromosomes that appear to manifest most divergent patterns of TFBS density changes within human (chr17; chr19; and chr22) or chimpanzee (chr4; chr13) genomes are the same chromosomes that have most similar profiles of TFBS density gains/losses in brain development regulatory regions of human and chimpanzee (Figure 4). However, most divergent chromosomes within chimpanzee genome, namely chr4 and chr13, manifest variation coefficients closely related to 18 other chromosomes within human genome (Figure 5). Conversely, chr17; chr19; and chr22 that are most divergent within human genome have coefficients of variation closely related to 18 other chromosomes of chimpanzee genome (Figure 5). Distinct and common association patterns of TFBS density changes within BDRRs of human and chimpanzee revealed by genome-wide chromosome level alignment analyses with signatures of TFBS density changes of different families of human embryo regulatory LTRs. Establishment of genome-wide chromosome-level quantitative profiles of TFBS density changes within ATAC-seq-defined BDRRs of human and chimpanzee (Tables 2 – 4; Figures 3 – 5) prompted investigation of association patterns of TFBS density changes within BDRRs and signatures of TFBS density changes of 4 different families of human embryo regulatory LTRs (Figure 6). Visualization of the results of chromosome-level alignments of the corresponding profiles of TFBS density changes revealed clearly discernable common and distinct patterns of associations of TFBS density changes acquired during mammalian evolution within human and chimpanzee BDRRs and within DNA sequences encoded by different families of human embryo regulatory LTRs, namely MLT2A1 (Figure 5A); MLT2A2 (Figure 6C); LTR7 (Figure 6D); and LR5_Hs (Figure 6B). Notably, the concordant and discordant association patterns of TFBS density changes were observed in interspecies comparisons of TFBS density changes as well as in analyses of within-specie profiles of TFBS density changes of BDRRs aligned to signatures of TFBS density changes of different families of regulatory LTRs (Figure 6). Alignments to TFBS density changes profiles of BDRRs and the MLT2A1 and LTR5_Hs generated larger values of correlation coefficients compared to the MLT2A1 and LTR7 alignments, while highly concordant alignment patterns were observed for MLT2A1 and MLTA2 analyses as well as for LTR5_Hs and LTR7 analyses. These trends were observed in interspecies and within individual specie comparisons. Comparisons of within an individual specie genome alignments revealed striking negative correlations of the association patterns generated by the alignments of BDRRs TFBS density changes profiles to TFBS density changes profiles of the MLT2A1 versus LTR5_Hs regulatory LTRs (Figure 6), which were documented in analyses of either chimpanzee or human BDRRs. However, BDRRs residing only on 4 human chromosomes, namely chr19; chr22; chr17; and chr6, manifested positive correlation coefficients of TFBS density changes profiles alignments to the LTR5_Hs in contrast to BDRRs residing on 12 chimpanzee chromosomes. Conversely, BDRRs housed on 19 human chromosomes had positive correlation coefficients of TFBS density changes profiles alignments to the MLT2A1 loci in contrast to BDRRs housed on 10 chimpanzee chromosomes (Figure 6). These findings were corroborated by the results of the interspecies comparisons of the alignments of TFBS density changes profiles of human and chimpanzee BDRRs the profiles of TFBS density changes of either MLT2A1 or LTR5_Hs (see Figure 9). Combining these alignments into one plot has resulted in the visual depiction of common and distinct patterns of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes (see Figure 9). Overall, these observations are in agreement with the hypothesis that divergence of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes occurs along distinct alignment patterns to profiles of TFBS density changes of different families of human embryo regulatory LTRs, namely MLT2A1 and LTR5_Hs. This model was further corroborated by results of the analyses aggregating the chromosome-level observations into a simplified genome-wide data set highlighting panels of TFs manifesting either common or divergent patterns of TFBS density changes acquired during primate evolution within BDRRs of human and chimpanzee (Figures 6; 9; Supplementary Table S5). GSEA of TF-coding genes manifesting divergent profiles of TFBS within BDRRs of human and chimpanzee underscore their exceedingly broad developmental and pathophysiological impacts on phenotypic traits of Modern Humans (Supplementary Table S5), including key constituents of central nervous system development and functions. Potential impacts of currently active LINE and SVA transposons in shaping the continuing divergent genomic evolution of BDRRs of Modern Humans and Chimpanzee. Observations reported in this contribution are congruent with recent findings implicating TE-encoded regulatory sequences derived from multiple TE families in development of human and chimpanzee hippocampal intermediate progenitor cells (Patoori et al., 2022). It was of interest to investigate whether the TF-constituents of reported herein sequence-specific double-stranded DNA binding networks are engaged in TE-governed hippocampal neurogenesis regulatory pathways, which were discovered by Patoori et al. (2022) utilizing a model of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells. Results of these analyses (Figure 7; Supplementary Table S6) documented ubiquitous presence within TE-constituents of regulatory pathways of hippocampal intermediate progenitors of qualitatively nearly identical arrays of TFBS for TFs constituting protein components of reported herein sequence-specific double-stranded DNA binding networks. These observations indicate that one of universal features of multiple families of TEs, including LTR/HERV, SINE/Alu, SVA, and LINE families, is their intrinsic propensity to harbor and spread genome-wide consensus regulatory nodes of identified herein highly conserved sequence-specific double-stranded DNA binding networks, selections of TFBS panels of which manifest individual chromosome-specific profiles and species-specific divergence patterns. Consistent with this hypothesis, it has been observed that TE subfamilies that became more divergent from consensus TFBS patterns due to mutational processes are less likely to be represented within differentially-accessible (DA) ATAC-seq-defined regulatory loci of hippocampal intermediate progenitors’ development (Figure 7), while DA ATAC-regions intersecting larger numbers of SINE/Alu loci appear to harbor TFBS for more TF-constituents of highly conserved sequence-specific double-stranded DNA binding networks (Figure 7). Notably, different families of TE-constituents of regulatory pathways of hippocampal intermediate progenitors’ development manifest different degrees of conservation of arrays of TFBS for a consensus panel of TF-constituents of highly conserved sequence-specific double-stranded DNA binding networks. SINE/Alu subfamily members seem to exhibit the highest diversity and least apparent conservation profiles reaching the maximum divergence of 63.4%. In contrast, LINE subfamily members manifest the maximum divergence of only 14.56%, while the maximum divergence observed for LTR subfamily members was 36.21% (Figure 7). Intriguingly, TE families that are currently active as retrotransposons in human genome, namely LINE and SVA transposable elements, harbor arrays of TFBS for essentially all TF-constituents of highly conserved sequence-specific double-stranded DNA binding networks (Figure 7; Supplementary Table S6). Therefore, it was of interest to investigate association patterns of TFBS density changes within human and chimpanzee BDRRs and signatures of TFBS density changes acquired during mammalian evolution by SVA and LINE retrotransposons and compare the results with similar analyses carried out for human embryo regulatory LTRs (Figure 6). Visualization of the results of chromosome-level alignments of the corresponding profiles of TFBS density changes revealed clearly discernable common and distinct patterns of associations of TFBS density changes acquired during mammalian evolution within human and chimpanzee BDRRs and within DNA sequences encoded by SVA and L1PA6 loci (Figures 8-9). The concordant and discordant patterns of association of TFBS density changes were observed in interspecies comparisons of TFBS density changes as well as in analyses of within-specie profiles of TFBS density changes of BDRRs aligned to signatures of TFBS density changes of SVA and L1PA6 retrotransposons (Figure 8). Alignments to TRBS density changes profiles of BDRRs and the SVA and L1PA6 generated larger values of correlation coefficients compared to the MLT2A2 and LTR5_Hs alignments (Figures 6 and 8), while highly concordant alignment patterns were observed for MLT2A1 and L1PA6 analyses as well as for LTR5_Hs and SVA analyses. These trends were observed in interspecies and within individual specie comparisons. Comparisons of within an individual specie genome alignments revealed striking negative correlations of the association patterns of the alignments of BDRRs TFBS density changes profiles to TFBS density changes profiles of the L1PA6 versus LTR5_Hs and the L1PA6 versus SVA, in contrast to highly positive correlations of TFBS density changes profiles of the L1PA6 versus MLT2A1 (Figure 8). These patterns of associations were consistently observed in analyses of either chimpanzee or human BDRRs. BDRRs housed on 19 human chromosomes had positive correlation coefficients of TFBS density changes profiles alignments to the L1PA6 loci in contrast to BDRRs housed on 6 chimpanzee chromosomes, including chr4 and chr13 (Figure 9). In contrast, BDRRs residing only on 6 human chromosomes, including chr19; chr22; chr17; and chr6, manifested positive correlation coefficients of TFBS density changes profiles alignments to the SVA in contrast to BDRRs residing on 17 chimpanzee chromosomes. Results of these analyses are strikingly similar to the alignments of TFBS density changes to the TFBS density patterns of MLTA1 and LTR5_Hs loci (Figures 6 and 9) and the interspecies comparisons of the alignments of TFBS density changes profiles of human and chimpanzee BDRRs to the profiles of TFBS density changes of MLT2A1 and LTR5_Hs (Figure 9). Combining these alignments into one plot has resulted in the visual depiction of common and distinct patterns of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes (Figure 9). These findings were corroborated by the results of the interspecies comparisons of the alignments of TFBS density changes profiles of human and chimpanzee BDRRs the profiles of TFBS density changes of either L1PA6 or SVA (Figure 9). Combining these two alignments into a single plot has resulted in the visual depiction of common and distinct patterns of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes (Figure 9). These observations are in agreement with the hypothesis that divergence of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes occurs along distinct alignment patterns to profiles of TFBS density changes of retrotransposons that are currently active in human genome, namely SVA and LINE families. Discussion In this contribution, the emergence during mammalian evolution of genomic regulatory networks (GRNs) encompassing ubiquitous, qualitatively nearly identical and quantitatively markedly distinct arrays of sequences of TFBS for 716 proteins is reported. A vast majority of TFs (770 of 716; 98%) comprising protein constituents of these networks appear to share common Gene Ontology (GO) features of sequence-specific double-stranded DNA binding (GO: 1990837). Among most significantly enriched categories, GSEA employing GO Molecular Function 2023 database identified 556 genes assigned to Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000987) category; 581 genes of RNA Polymerase II Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000978) category; and 609 genes of RNA Polymerase II Transcription Regulatory Region Sequence-Specific DNA Binding (GO:0000977) category. Assignments of essentially all TFs constituents of these networks to GO functional categories of sequence-specific double-stranded DNA binding, cis-regulatory region sequence-specific DNA binding, RNA Polymerase II cis-regulatory and transcription regulatory region sequence-specific DNA binding strongly imply their structural-functional engagements into assembly and activities of heterochromatin and euchromatin multiprotein-DNA complexes. To date, ubiquitous, qualitatively nearly identical and quantitatively markedly distinct representations of sequence-specific TFBS arrays of these networks have been observed within genomic regulatory loci encoded by all analyzed TE families, including TE families coopted into GRNs contributing to development and functions of central nervous system. TE families, including LTR/HERV, SINE/Alu, SVA, and LINE subfamilies, appear to harbor and spread genome-wide consensus regulatory nodes of identified herein highly conserved GRNs, selections within which of TFBS panels manifest individual chromosome-specific profiles and species-specific divergence patterns. Markedly distinct quantitative characteristics of these networks, in particular, changes of TFBS densities, have been inferred from genome-wide chromosome-level analyses of BDRRs of Modern Humans and Chimpanzee, suggesting that species-specific differences of the activities of these networks may have contributed to continuing divergent genomic evolutions of brain development of humans and non-human primates. Reported in this contribution results of chromosome-level analyses of quantitative metrics of GRNs emanating from sequence-specific double-stranded DNA binding of ~700 proteins may achieve a marked functional diversity by operating in chromosome territory-specific patterns. Observed conservation of these GRNs beyond the boundaries of confidently mapped TE-derived regulatory loci suggest that considerations of contributions of TEs to creation of mammalian genomic DNA could be extended to more than currently estimated ~50% of genomes. Methods Data source and analytical protocols Solely publicly available datasets and resources were used in this contribution. Initial analyses were focused on human embryo regulatory LTR loci (see Introduction) that were identified as highly-conserved pan-primate regulatory sequences because they have been present in genomes of primate species for at least ~15 MYA. Four distinct LTR families meeting these criteria, namely MLT2A1 (2416 loci), MLT2A2 (3069 loci), LTR7 (3354 loci), and LTR5_Hs (606 loci), were analyzed. A total of 9445 fixed non-polymorphic sequences of human embryo regulatory LTR elements residing in genomes of Modern Humans (hg38 human reference genome database) were retrieved as described in recent studies (Hashimoto et al., 2021; Carter et al., 2022; Glinsky, 2022; 2024) and the number of highly conserved orthologous loci in genomes of sixteen non-human primates (NHP) were determined exactly as previously reported (Glinsky, 2022; 2024). Briefly, fixed non-polymorphic regulatory LTR loci residing in the human genome (hg38 human reference genome database) has been considered highly conserved in the genome of NHP only if the following two requirements are met: (1) During the direct LiftOver test (https://genome.ucsc.edu/cgi-bin/hgLiftOver ), the human LTR sequence has been mapped in the NHP genome to the single orthologous locus with a threshold of at least 95% sequence identity; (2) During the reciprocal LiftOver test, the NHP sequence identified in the direct LiftOver test has been remapped with at least 95% sequence identity threshold to the exactly same human orthologous sequence which was queried during the direct LiftOver test. A set of experimentally defined 3583 regulatory loci of the mouse ESC which are documented to harbor TFBS for multiple TFs (Chen et al., 2008) was utilized as a reference to estimate changes of TF-binding patterns and the TFBS densities during mammalian evolution. To identify TFBS within genomic regulatory loci known to contribute to regulation of human and chimpanzee brain development, a total of 17,935 genomic regulatory regions reported in the organoid single-cell genomic atlas of human and chimpanzee brain development (Kanton et al, 2019) were analyzed. A catalogue of TFs having putative TFBS within 8099 human-specific and 9836 chimpanzee-specific brain development regulatory regions (BDRRs) was compiled. BDRRs were identified employing DNA sequence unbiased open chromatin accessibility screening method termed the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq; Buenrostro et al, 2013; 2015). A set of 751 TE loci of the SVA families coopted as functional cis-regulatory elements in human induced pluripotent stem cells (Barnada et al., 2022) and a set of 3,265 TE loci engaged in TE-governed hippocampal neurogenesis regulatory pathways of human and chimpanzee (Patoori et al., 2022) were analyzed. TE-governed hippocampal neurogenesis regulatory pathways were discovered by Patoori et al. (2022) utilizing a model of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells. Identification of transcription factor binding sites (TFBS) within candidate genomic regulatory loci was performed employing the Jaspar algorithm (Jaspar Transcription Factors) accessible through the UCSC Genome Bowser Table Browser functions ( https://genome.ucsc.edu/cgi-bin/hgTables ) facilitating downloading, filtering, analyzing, and retrieving data from the Genome Browser. TFBS identification and data retrieval were carried out using the default thresholds settings of imputing up to 1,000 loci per screen to achieve the full coverage of the specified genomic regions of interest based on coordinated of hg38 and hg19 human reference genome databases. All identified TFBS were retrieved and all individual TFs having TFBS were catalogued. For each set of TFBS, quantitative features were documented by calculating the numbers of events recorded for each distinct TFBS and reporting the computed values as numbers of TFBS per regulatory locus (defined as TFBS frequency) and the estimated TFBS density calculated as TFBS frequency normalized to 1 Kb of the locus length (defined as TFBS density). The significance of the differences in the expected and observed numbers of events was calculated using two-tailed Fisher’s exact test. Multiple proximity placement enrichment tests were performed for individual families and sub-sets of LTRs, BDRRs, and human-specific regulatory regions (HSRS) taking into account the size in bp of corresponding genomic regions, size distributions in human cells of topologically associating domains, distances to putative regulatory targets, bona fide regulatory targets identified in targeted genetic interference and/or epigenetic silencing experiments. Additional details of methodological and analytical approaches are provided in the text, Supplementary Materials and previously reported contributions [Barakat et al. 2018; Fuentes et al. 2018; Glinsky 2015; 2016a, b; 2018; 2019; 2020a, b, c, 2021; 2022; Guffanti et al. 2018; Glinsky and Barakat, 2019; McLean et al. 2010; 2011; Pontis et al. 2019; Wang et al. 2014]. Gene set enrichment and genome-wide proximity placement analyses Gene set enrichment analyses were carried-out using the Enrichr bioinformatics platform, which enables the interrogation of nearly 200,000 gene sets from more than 100 gene set libraries. The Enrichr API (January 2018 through January 2023 releases) [Chen et al. 2013; Kuleshov et al. 2016; Xie et al. 2021] was used to test genes linked to regulatory LTR elements, HSRS, or other regulatory loci of interest for significant enrichment in numerous functional categories. When technically feasible, larger sets of genes comprising several thousand entries were analyzed. Regulatory connectivity maps between HSRS, regulatory LTRs and coding genes and additional functional enrichment analyses were performed with the Genomic Regions Enrichment of Annotations Tool (GREAT) algorithm [McLean et al. 2010; 2011] at default settings. The reproducibility of the results was validated by implementing two releases of the GREAT algorithm: GREAT version 3.0.0 (02/15/2015 to 08/18/2019) and GREAT version 4.0.4 (08/19/2019) applying default settings at differing maximum extension thresholds as previously reported (Glinsky 2020a, b, c; 2021; 2022; 2024). The GREAT algorithm allows investigators to identify and annotate the genome-wide connectivity networks of user-defined distal regulatory loci and their putative target genes. Concurrently, the GREAT algorithm performs functional Gene Ontology (GO) annotations and analyses of statistical enrichment of GO annotations of identified genomic regulatory elements (GREs) and target genes, thus enabling the inference of potential biological impacts of interrogated genomic regulatory networks. The Genomic Regions Enrichment of Annotations Tool (GREAT) algorithm was employed to identify putative down-stream target genes of human embryo regulatory LTRs. Concurrently with the identification of putative regulatory target genes of GREs, the GREAT algorithm performs stringent statistical enrichment analyses of functional annotations of identified down-stream target genes, thus enabling the inference of potential significance of phenotypic impacts of interrogated GRNs. Importantly, the assignment of phenotypic traits as putative statistically valid components of GRN actions entails the assessments of statistical significance of the enrichment of both GREs and down-stream target genes by applying independent statistical tests. The validity of statistical definitions of genomic regulatory networks (GRNs) and genomic regulatory modules (GRMs) based on the binominal (regulatory elements) and hypergeometrc (target genes) FDR Q values was evaluated using a directed acyclic graph (DAG) test based on the enriched terms from a single ontology-specific table generated by the GREAT algorithm (Glinsky, 2024). DAG test draws patterns and directions of connections between significantly enriched GO modules based on the experimentally-documented temporal logic of developmental processes and structural/functional relationships between gene ontology enrichment analysis-defined statistically significant terms. A specific DAG test utilizes only a sub-set of statistically significant GRMs from a single gene ontology-specific table generated by the GREAT algorithm by extracting GRMs manifesting connectivity patterns defined by experimentally documented developmental and/or structure/function/activity relationships. These GRMs are deemed valid observations and visualized as a consensus hierarchy network of the ontology-specific DAGs (Glinsky, 2024). Based on these considerations, the DAG algorithm draws the developmental and structure/function/activity relationships-guided hierarchy of connectivity between statistically significant gene ontology enriched GRMs. Genome-wide Proximity Placement Analysis (GPPA) of down-stream target genes and distinct genomic features co-localizing with regulatory LTRs, HSRS, BDDRs and other regulatory loci was carried out as described previously and originally implemented for human-specific transcription factor binding sites [Glinsky et al. 2018; Glinsky, 2015, 2016a, 2016b, 2017, 2018, 2019, 2020a, 2020b, 2020c, 2021; 2022; 2024; Guffanti et al. 2018]. Differential GSEA to infer the relative contributions of distinct subsets of regulatory LTR elements and down-stream target genes on phenotypes of interest. When technically and analytically feasible, different sets of regulatory LTRs and candidate down-stream target genes defined at several significance levels of statistical metrics and comprising from dozens to several thousand individual genetic loci were analyzed using differential GSEA. This approach was utilized to gain insights into biological effects of regulatory LTRs and down-stream target genes and infer potential mechanisms of phenotype affecting activities. Previously, this approach was successfully implemented for identification and characterization of human-specific regulatory networks governed by human-specific transcription factor-binding sites [Glinsky et al. 2018; Glinsky, 2015, 2016a, 2016b, 2017, 2018, 2019, 2020a, 2020b, 2020c, 2021; 2022; 2024; Guffanti et al. 2018] and functional enhancer elements [Barakat et al. 2018; Glinsky et al. 2018; Glinsky and Barakat 2019; Glinsky 2015, 2016a, 2016b, 2017, 2018, 2019, 2020a, 2020b, 2020c, 2021; 2022; 2024]. Differential GSEA approach has been utilized for characterization of phenotypic impacts of 13,824 genes associated with 59,732 human-specific regulatory sequences [Glinsky, 2020a], 8,405 genes associated with 35,074 human-specific neuroregulatory single-nucleotide changes [Glinsky, 2020b], 8,384 genes regulated by stem cell-associated retroviral sequences (SCARS) [Glinsky 2021], as well as human genes and medicinal molecules affecting the susceptibility to SARS-CoV-2 coronavirus [Glinsky, 2020c]. Initial GSEA entail interrogations of each specific set of candidate down-stream target genes using ~70 distinct genomic databases, including comprehensive pathway enrichment Gene Ontology (GO) analyses. Upon completion, these analyses were followed by in-depth interrogations of the identified significantly-enriched gene sets employing selected genomic databases deemed most statistically informative at the initial GSEA. In all reported tables and plots (unless stated otherwise), in addition to the nominal p values and adjusted p values, the Enrichr software calculate the “combined score”, which is a product of the significance estimate and the magnitude of enrichment (combined score c = log(p) * z, where p is the Fisher’s exact test p-value and z is the z-score deviation from the expected rank). Statistical Analyses of the Publicly Available Datasets All statistical analyses of the publicly available genomic datasets, including error rate estimates, background and technical noise measurements and filtering, feature peak calling, feature selection, assignments of genomic coordinates to the corresponding builds of the reference human genome, and data visualization, were performed exactly as reported in the original publications and associated references linked to the corresponding data visualization tracks (http://genome.ucsc.edu/ ). Additional elements or modifications of statistical analyses are described in the corresponding sections of the Results. Statistical significance of the Pearson correlation coefficients was determined using GraphPad Prism version 6.00 software. Both nominal and Bonferroni adjusted p values were estimated and considered as reported in corresponding sections of the Results. The significance of the differences in the numbers of events between the groups was calculated using two-sided Fisher’s exact and Chi-square test, and the significance of the overlap between the events was determined using the hypergeometric distribution test [Tavazoie et al. 1999]. Declarations Supplementary Information is available online. Supplementary information includes Supplementary Tables S1-S6; Supplemenmtary Figure S1; and Supplementary Summaries S1-S4. Acknowledgements. This work was made possible by the open public access policies of major grant funding agencies and international genomic databases and the willingness of many investigators worldwide to share their primary research. Author would like to thank you Victoria Glinskii for invaluable expert assistance with graphical presentation of the results of this study. Author Contributions This is a single author contribution. All elements of this work, including the conception of ideas, formulation, and development of concepts, execution of experiments, analysis of data, and writing of the paper, were performed by the author. Funding In part, this work was supported by OncoScar, LLC. Conflict of interest statement No conflicts of interest to declare. Data availability statement All data supporting the reported observations and required to reproduce the findings are provided in the main body of the paper and Supplementary materials. Ethics approval and consent to participate Not applicable Consent for publication Not applicable References Barnada SM, Isopi A, Tejada-Martinez D, Goubert C, Patoori S, Pagliaroli L, et al. (2022). Genomic features underlie the co-option of SVA transposons as cis-regulatory elements in human pluripotent stem cells. PLoS Genet 18(6): e1010225. https://doi.org/10.1371/journal.pgen.1010225 Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. "Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position". Nature Methods. 2013; 10 (12): 1213–8. doi:10.1038/nmeth.2688. PMC 3959825. PMID 24097267. Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. "ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide". Current Protocols in Molecular Biology. 2015; 109: 21.29.1–21.29.9. doi:10.1002/0471142727.mb2129s109. PMC 4374986. PMID 25559105. Carter T, Singh M, Dumbovic G, Chobirko JD, Rinn JL, Feschotte C. 2022. Mosaic cis-regulatory evolution drives transcriptional partitioning of HERVH endogenous retrovirus in the human embryo. Elife. 11: e76257. doi: 10.7554/eLife.76257. Chen, EY, et al., Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 2013. 14: 128. Chuong EB, Rumi MAK, Soares MJ, Baker JC. 2013. Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet. 45: 325–9. https://doi.org/10.1038/ng.2553. Chuong EB, Elde NC, Feschotte C. 2017. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet. 18: 71–86. https://doi.org/10.1038/nrg.2016.139 . Elbarbary RA, Lucas BA, Maquat LE. Retrotransposons as regulators of gene expression. Science. 2016; 351:aac7247. https://doi.org/10.1126/science.aac7247 . Fort A, Hashimoto K, Yamada D, Salimullah M, Keya CA, Saxena A, et al. 2014. Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet. 46: 558–66. https://doi.org/10.1038/ng.2965. Fuentes DR, Swigut T, Wysocka J. 2018. Systematic perturbation of regulatory LTRs reveals widespread long-range effects on human gene regulation. Elife. 7:e35989. https://doi.org/10.7554/eLife.35989 Glinsky GV. 2015. Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human- Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs. Genome Biol Evol 7 :1432–1454. https://doi.org/10.1093/gbe/evv081 Glinsky, G.V. 2016a. Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens. Genome Biol Evol 8: 2774-2788. Glinsky GV. 2016b. Single cell genomics reveals activation signatures of endogenous SCARS networks in aneuploid human embryos and clinically intractable malignant tumors. Cancer Lett. 381: 176-193. Glinsky G.V. 2017. Human-specific features of pluripotency regulatory networks link NANOG with fetal and adult brain development. BioRxiv. https://www.biorxiv.org/content/10.1101/022913v3 doi: https://doi.org/10.1101/022913 Glinsky G, Durruthy-Durruthy J, Wossidlo M, Grow EJ, Weirather JL, Au KF, Wysocka J, Sebastiano V. Single cell expression analysis of primate-specific retroviruses-derived HPAT lincRNAs in viable human blastocysts identifies embryonic cells co-expressing genetic markers of multiple lineages. Heliyon. 2018. 4: e00667. https://doi.org/10.1016/j.heliyon.2018.e00667 . PMID: 30003161; PMCID: PMC6039856. Glinsky GV. 2020a. A catalogue of 59,732 human-specific regulatory sequences reveals unique to human regulatory patterns associated with virus-interacting proteins, pluripotency and brain development. DNA and Cell Biology 39: 126-143. https://doi.org/10.1089/dna.2019.4988 Glinsky GV. 2020b. Impacts of genomic networks governed by human-specific regulatory sequences and genetic loci harboring fixed human-specific neuro-regulatory single nucleotide mutations on phenotypic traits of Modern Humans. Chromosome Res. 28: 331-354. https://doi.org/10.1007/s10577-020-09639-w Glinsky GV. 2021. Genomics-Guided Drawing of Molecular and Pathophysiological Components of Malignant Regulatory Signatures Reveals a Pivotal Role in Human Diseases of Stem Cell-Associated Retroviral Sequences and Functionally-Active hESC Enhancers. Frontiers in Oncology. 11: 974. https://doi.org/10.3389/fonc.2021.638363 Glinsky GV. 2022. Molecular diversity and phenotypic pleiotropy of ancient genomic regulatory loci derived from human endogenous retrovirus type H (HERVH) promoter LTR7 and HERVK promoter LTR5_Hs and their contemporary impacts on pathophysiology of Modern Humans. Mol Genet Genomics. 297: 1711-1740. Glinsky GV. 2024. Gene ontology-guided proximity placement analyses of pan-primate regulatory LTR elements contributing to embryogenesis, development of physiological traits and pathological phenotypes of Modern Humans. Under review. Göke J, Lu X, Chan Y-S, Ng H-H, Ly L-H, Sachs F, Szczerbinska I. 2015. Dynamic Transcription of Distinct Classes of Endogenous Retroviral Elements Marks Specific Populations of Early Human Embryonic Cells. 1052 Cell Stem Cell 16 :135–141. doi:10.1016/j.stem.2015.01.005 Goubert C, Zevallos NA, Feschotte C. 2020. Contribution of unfixed transposable element insertions to human regulatory variation. Philos Trans R Soc B Biol Sci. 375: 20190331. https://doi.org/10.1098/rstb.2019.0331. Grow EJ, Flynn RA, Chavez SL, Bayless NL, Wossidlo M, Wesche DJ, Martin L, Ware CB, Blish CA, Chang HY, Pera RA, Wysocka J. 2015. Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature 522:221–225. https://doi.org/10.1038/nature14308 Guffanti G, Bartlett A, Klengel T, Klengel C, Hunter R, Glinsky G, Macciardi F. 2018. Novel bioinformatics approach identifies transcriptional profiles of lineage-specific transposable elements at distinct loci in the human dorsolateral prefrontal cortex. Mol Biol Evol. 35: 2435-2453. Jacobs FMJ, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, et al. 2014. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature. 516: 242–5. https://doi.org/10.1038/nature13760. Jacques P-É, Jeyakani J, Bourque G. 2013. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 9: e1003504. https://doi.org/10.1371/journal.pgen.1003504. Kanton S, Boyle MJ, He Z, Santel M,Weigert A, Sanchís-Calleja F, Guijarro P, Sidow L, Fleck JS, Han D, Qian Z, Heide M, Huttner WB, Khaitovich P, Pääbo S, Treutlein B, Camp JG. 2019. Organoid single-cell genomic atlas uncovers human-specific features of brain development. Nature. 574: 418–422. https://doi.org/10.1038/s41586-019-1654-9 Kuleshov MV, et al., Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res, 2016. 44(W1): W90-7. Kunarso G, Chia N-Y, Jeyakani J, Hwang C, Lu X, Chan Y-S, Ng H-H, Bourque G. 2010. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nature Genetics 42 :631–634. doi:10.1038/ng.600 Lu X, Sachs F, Ramsay L, Jacques PÉ, Göke J, Bourque G, et al. 2014. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat Struct Mol Biol. 21: 423–5. https://doi.org/10.1038/nsmb.2799 McLean, CY, Bristor, D, Hiller, M, Clarke, SL, Schaar, BT, Lowe, CB, Wenger, AM. Bejerano, G. 2010. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28: 495-501. McLean CY, Reno PL, Pollen AA, Bassan AI, Capellini TD, Guenther C, Indjeian VB, Lim X, Menke DB, Schaar BT, Wenger AM, Bejerano G, Kingsley DM. 2011. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature 471: 216-9. Patoori S, Barnada SM, Large C, Murray JI, Trizzino M. 2022. Young transposable elements rewired gene regulatory networks in human and chimpanzee hippocampal intermediate progenitors. Development. 149: dev200413. doi: 10.1242/dev.200413. Pontis J, Planet E, Offner S, Turelli P, Duc J, Coudray A, et al. 2019. Hominoid-Specific Transposable Elements and KZFPs Facilitate Human Embryonic Genome Activation and Control Transcription in Naive Human ESCs. Cell Stem Cell. 24:724–735.e5. https://doi.org/10.1016/j.stem.2019.03.012 . Rayan NA, del Rosario RCH, Prabhakar S. 2016. Massive contribution of transposable elements to mammalian regulatory sequences. Semin Cell Dev Biol. 57: 51–6. https://doi.org/10.1016/j.semcdb.2016.05.004. Sasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, Kokubo N, et al. 2008. Possible involvement of SINEs in mammalian-specific brain formation. Proc Natl Acad Sci U S A. 105: 4220–5. https://doi.org/10.1073/pnas.0709398105 . Schmid CD, Bucher P. 2010 MER41 repeat sequences contain inducible STAT1 binding sites. PLoS ONE 5, (doi:10.1371/journal.pone.0011425) Schmidt D et al. 2012 Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335–348. (doi:10.1016/j.cell.2011.11.058) Sundaram V, Cheng Y, Ma Z, Li D, Xing X, Edge P, et al. 2014. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24: 1963–76. https://doi.org/10.1101/gr.168872.113 . Sundaram V, Wysocka J. 2020 Transposable elements as a potent source of diverse cis-regulatory sequences in mammalian genomes. Phil. Trans. R. Soc. B375: 20190347. http://dx.doi.org/10.1098/rstb.2019.0347 Xie Z, Bailey A, Kuleshov MV, Clarke DJB., Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM, Jeon M, & Ma’ayan A. Gene set knowledge discovery with Enrichr. Current Protocols, 1, e90. 2021. doi: 10.1002/cpz1.90 Wang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, Burgess SM, Brachmann RK, Haussler D. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18 613–18 618. (doi:10.1073/pnas. 0703637104) Wang J, Xie G, Singh M, Ghanbarian AT, Raskó T, Szvetnik A, Cai H, Besser D, Prigione A, Fuchs NV, Schumann GG, Chen W, Lorincz MC, Ivics Z, Hurst LD, Izsvák Z. 2014. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516 :405–409. 1248 https://doi.org/10.1038/nature13804 Wang L, Rishishwar L, Mariño-Ramírez L, Jordan IK. 2016. Human population specific Tables Tables 1 to 5 are available in the Supplementary Files section. Additional Declarations The authors declare no competing interests. Supplementary Files SupSummaryS1S6..zip Tables.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5442388","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":377381177,"identity":"0ac3966f-4ce1-403b-9b2f-d85be0a94a88","order_by":0,"name":"Gennadi Glinsky","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+0lEQVRIiWNgGAWjYHACxgMJDAdADDaGDyCSnQg9IC0SPEDFjDNAWpiJ0cIA1cLMA+IS0iLv3nvgwINfd+rs2ZufPbb5tU2ej5mB8cPHHNxaDM+cSziQ2PdMgofnmLlxbt9twzZmBmbJmdvwaJmRY3AgseewBI9EDpt0bs9tRqAWNmZefFrmv4FqkX/DJm3Zc9ueoBZ5CR6DAwk/QLbwsEkz/LidSFCLAQ/IYQ2HJXvOpJlJ9jbcTm5jZmzG6xf59jOGD3/8OczP3n74mcSPP7dt57c3H/zwEZ8tB4AEYxuUB2EwNuBWD7IFLP0Hxv2DU+EoGAWjYBSMYAAA38NU0Q9v/GwAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-2523-8264","institution":"University of California, San Diego","correspondingAuthor":true,"prefix":"","firstName":"Gennadi","middleName":"","lastName":"Glinsky","suffix":""}],"badges":[],"createdAt":"2024-11-12 22:24:04","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5442388/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5442388/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":69050706,"identity":"1d31df90-4ab6-4e3b-9776-4590be941b89","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":138491,"visible":true,"origin":"","legend":"\u003cp\u003eDNA sequences of four distinct families of human embryo regulatory LTRs (LTR5_Hs; MLT2A1; MLT2A2; LTR7) harbor nearly identical arrays of ~770 transcription factor binding sites (TFBS) for 716 proteins, a vast majority of which (770 of 716; 98%) comprises protein constituents of these networks appear to share common Gene Ontology (GO) features of sequence-specific double-stranded DNA binding (GO: 1990837). \u003cbr\u003e\nA. Graphical representations of identity profiles of TFBS harbored by four distinct families of human embryo regulatory LTRs (LTR5_Hs; MLT2A1; MLT2A2; LTR7).\u003cbr\u003e\nB. Correlation patterns of TFBS density profiles of human MLT2A2 loci and multiTF-binding loci of mouse ESC. \u003cbr\u003e\nC. Correlation patterns of TFBS density profiles of human MLT2A1 loci and multiTF-binding loci of mouse ESC. \u003cbr\u003e\nD. Correlation patterns of TFBS density profiles of human LTR7 loci and multiTF-binding loci of mouse ESC. \u003cbr\u003e\nE. Correlation patterns of TFBS density profiles of human LTR5_Hs loci and multiTF-binding loci of mouse ESC. \u003cbr\u003e\nTFBS densities were computed for 100 top-scoring TFs having TFBS within 606 LTR5_Hs loci; as well as MLT2A1; MLT2A2; LTR7 regulatory elements and aligned to TFBS density profiles estimated for 3583 mouse ESC multi-TFs-binding regulatory loci (see text for details and Supplementary Tables S1 and S2 for additional information).\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/d83a54c7ef7e4d9ce932c212.png"},{"id":69051229,"identity":"e5c51c15-740a-4c22-9aea-50d94520e9b8","added_by":"auto","created_at":"2024-11-15 04:42:11","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":167475,"visible":true,"origin":"","legend":"\u003cp\u003eGraphical summary of the analyses of TFBS density gains and losses acquired by regulatory loci of four families of human embryo regulatory LTRs.\u003cbr\u003e\nA. Transcription factors manifesting largest TFBS density gains compared with mouse ESC multi-TFs-binding loci.\u003cbr\u003e\nB. Transcription factors of human embryo LTRs manifesting more than 50% gain/loss of TFBS density compared with mouse ESC multi-TFs-binding loci and brief summary of GSEA 30 TFs of human embryo LTRs manifesting more than 50% gain of TFBS density.\u003cbr\u003e\nC. Correlation patterns of TFBS density profiles of human LTR5_Hs and LTR7 loci for 100 top-scoring TFs.\u003cbr\u003e\nD. Correlation patterns of TFBS density profiles of human LTR5_Hs and LTR7 loci for 563 top-scoring TFs.\u003cbr\u003e\nE-F. LTR5_Hs transcription factors manifesting largest TFBS density gains compared with LTR7, MLT2A1, and MLT2A2 loci at 100% gain threshold (E) and 50% gain threshold (F).\u003cbr\u003e\n \u003cbr\u003e\nTFBS densities were computed for 100 top-scoring TFs having TFBS within 606 LTR5_Hs loci; as well as MLT2A1; MLT2A2; LTR7 regulatory elements and aligned to TFBS density profiles estimated for 3583 mouse ESC multi-TFs-binding regulatory loci to estimate gains and/or losses relative to corresponding TFBS density values of mouse ESC multi-TFs-binding regulatory loci (see text for details and Supplementary Tables S1-S4 for additional information).\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/2a9859a9d9aea4109533dbe9.png"},{"id":69050699,"identity":"54f36db8-7877-4d17-b3c3-302a1aecc259","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":101268,"visible":true,"origin":"","legend":"\u003cp\u003eGenome-wide similarity and divergence matrix of TFBS density changes acquired during mammalian evolution within 17,935 brain development regulatory regions (BDRRs) on individual chromosomes of human and chimpanzee.\u003cbr\u003e\nA. Similarity matrix of TFBS gains/losses within brain development regulatory regions acquired on individual chromosomes by human and chimpanzee during 80 MYR of mammalian evolution.\u003cbr\u003e\nB. Common patterns of TFBS density changes within BDRRs acquired on chromosome 19 by Chimpanzee and Modern Humans.\u003cbr\u003e\nC. Common patterns of TFBS density changes within BDRRs acquired on chromosome 22 by Chimpanzee and Modern Humans.\u003cbr\u003e\nD. Distinct patterns of TFBS density changes within BDRRs acquired on chromosome 14 by Chimpanzee and Modern Humans. \u003cbr\u003e\nE. Distinct patterns of TFBS density changes within BDRRs acquired on chromosome 21 by Chimpanzee and Modern Humans. \u003cbr\u003e\nSee text and Table 3 for details and additional information.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/3719c39fb37f2108a1e93429.png"},{"id":69050698,"identity":"e24a2a1d-ade3-429d-aa75-4af26445e339","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":173628,"visible":true,"origin":"","legend":"\u003cp\u003eGenome-wide chromosome-level visualization of distinct patterns of TFBS density gains and losses within 17,935 brain development regulatory regions of Modern Humans and Chimpanzee acquired during mammalian evolution. \u003cbr\u003e\nA. Divergence patterns of TFBS density gains in Human and Chimpanzee regulatory regions of brain development during mammalian evolution.\u003cbr\u003e\nB. Divergence patterns of TFBS density loss in Human and Chimpanzee regulatory regions of brain development during mammalian evolution.\u003cbr\u003e\nC. Divergence patterns of TFBS density gains in Human versus Chimpanzee regulatory regions of brain development. \u003cbr\u003e\nD. Divergent patterns of TFBS density changes of at least 25% acquired by Chimpanzee and Modern Humans during primate evolution after segregation from last common ancestor. \u003cbr\u003e\nTFBS densities were computed for BDRRs located on individual chromosomes of human and chimpanzee genomes and for each TF gain or loss of TFBS density were estimated by comparisons to corresponding values of TFBS densities of mouse ESC multi-TFs-binding regulatory loci and reported as numbers of gains (A) and losses (B) of at least 25%. Panels (C) and (D) show results of the direct TFBS density comparisons within BDRRs of human versus chimpanzee.\u003cbr\u003e\nResults of the analyses of gains and losses of TFBS densities are reported for 100 top-scoring TFs as defined by the analyses of TFBS within LTR5_Hs loci (Supplementary Tables S1-S4).\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/30ff7796c87396b83c78a1ad.png"},{"id":69050702,"identity":"998368fc-30f0-4978-a802-27dddc3319e4","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":145266,"visible":true,"origin":"","legend":"\u003cp\u003ePatterns of divergence and concordance within human and chimpanzee genomes ascertained from genome-wide chromosome-level correlation matrices of TFBS density changes acquired during mammalian evolution within Modern Humans (A; B) and Chimpanzee (C; D) brain development regulatory regions defined by the ATAC-seq analysis.\u003cbr\u003e\nA. Visualization of human genome-wide divergence scores ascertained from genome-wide chromosome-level correlation matrix reported in the Table 4. \u003cbr\u003e\nB. Visualization of human genome-wide variation coefficients ascertained from genome-wide chromosome-level correlation matrix reported in the Table 4. \u003cbr\u003e\nC. Visualization of chimpanzee genome-wide divergence scores ascertained from genome-wide chromosome-level correlation matrix reported in the Table 5. \u003cbr\u003e\nD. Visualization of chimpanzee genome-wide variation coefficients ascertained from genome-wide chromosome-level correlation matrix reported in the Table 5. \u003cbr\u003e\nGenome-wide chromosome-level correlation matrices of TFBS density changes acquired during mammalian evolution within Modern Humans (Table 4) and Chimpanzee (Table 5) brain development regulatory regions defined by the ATAC-seq analysis were created and utilized for computation of divergence scores and variation coefficients for individual chromosomes. See text for details.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/9a15b9894ba8f486d85a7b2b.png"},{"id":69050708,"identity":"5bffc8f5-4cca-44b0-aa97-8474c959afa6","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":212097,"visible":true,"origin":"","legend":"\u003cp\u003eDistinct and common association patterns of TFBS density changes within brain development regulatory regions of human and chimpanzee revealed by genome-wide chromosome level alignment analyses with TFBS density profiles of different families of human embryo regulatory LTRs. \u003cbr\u003e\nA. Chromosome-level correlation patterns of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to MLT2A1 loci.\u003cbr\u003e\nB. Chromosome-level correlation patterns of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to LTR5_Hs loci. \u003cbr\u003e\nC. Chromosome-level correlation patterns of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to MLT2A2 loci.\u003cbr\u003e\nD. Chromosome-level correlation patterns of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to LTR7 loci.\u003cbr\u003e\nE-F. Chromosome-level discordant correlation patterns of TFBS density changes acquired during mammalian evolution within human (E) and chimpanzee (F) brain development regulatory regions aligned to LTR5_Hs and MLT2A1 loci.\u003cbr\u003e\nG-H. Chromosome-level negative correlation patterns of TFBS density changes acquired during mammalian evolution within human (G) and chimpanzee (H) brain development regulatory regions aligned to LTR5_Hs and MLT2A1 loci. \u003cbr\u003e\nI. Common patterns of genome-wide TFBS density changes acquired by Chimpanzee and Modern Humans during primate evolution identified by comparisons to TFBS density values within mouse ESC multi-TFs-binding regulatory loci.\u003cbr\u003e\nJ. Divergent patterns of TFBS density changes of at least 25% acquired by Chimpanzee and Modern Humans during primate evolution identified by direct comparisons of TFBS density gains and losses within BDRRs of human versus chimpanzee. Highlighted are changes of at least 45%.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/a4085de1a67df4a93726300f.png"},{"id":69051230,"identity":"e6c18134-aefc-4b4c-929c-8acd60267575","added_by":"auto","created_at":"2024-11-15 04:42:11","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":128647,"visible":true,"origin":"","legend":"\u003cp\u003eDNA sequences of LTR5_Hs subfamily of human embryo regulatory LTRs and five distinct families of transposable elements (TE) coopted in genomic regulatory networks of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells harbor nearly identical arrays of ~770 transcription factor binding sites (TFBS) for 716 proteins, a vast majority of which (770 of 716; 98%) comprises protein constituents of these networks appear to share common Gene Ontology (GO) features of sequence-specific double-stranded DNA binding (GO: 1990837). \u003cbr\u003e\nA. Graphical representations of identity profiles of TFBS harbored by LTR5_Hs subfamily of human embryo regulatory LTRs and five distinct families of TE implicated in genomic regulatory networks of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u003cbr\u003e\nB. Graphical representations of identity profiles of TFBS harbored by LTR5_Hs subfamily of human embryo regulatory LTRs and six distinct subfamilies of LINE1 retrotransposons implicated in genomic regulatory networks of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u003cbr\u003e\nC. Diminished representation of divergent LTR loci within differentially-accessed (DA) ATAC regulatory regions of differentiation of human induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u003cbr\u003e\nD. Diminished representation of divergent SINE/Alu loci within differentially-accessed (DA) ATAC regulatory regions of differentiation of human induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u003cbr\u003e\nE. DA ATAC regulatory regions of differentiation of human induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells intersecting high numbers of SINE/Alu loci house more transcription factors with TFBS. \u003cbr\u003e\nF. Diminished representation of divergent LINE loci within differentially-accessed (DA) ATAC regulatory regions of differentiation of human induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u003cbr\u003e\nTFs identities were determined and TFBS densities were computed for 100 top-scoring TFs having TFBS within 606 LTR5_Hs loci as well as TE-encoded regulatory elements coopted in genomic regulatory networks of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/c8d5e2b4c7daa631a59e1cd8.png"},{"id":69050705,"identity":"bc49a4dc-a1d8-492d-b4fc-7b4a626097b6","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":216435,"visible":true,"origin":"","legend":"\u003cp\u003eDistinct and common association patterns of TFBS density changes within brain development regulatory regions of human and chimpanzee revealed by genome-wide chromosome level alignment analyses with TFBS density profiles of distinct families of TE coopted in genomic regulatory networks of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells. \u003cbr\u003e\nA. Chromosome-level correlation patterns of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to SVA loci.\u003cbr\u003e\nB. Chromosome-level discordant correlation patterns of TFBS density changes acquired during mammalian evolution within human brain development regulatory regions aligned to SVA and MLT2A1 loci. \u003cbr\u003e\nC. Chromosome-level discordant correlation patterns of TFBS density changes acquired during mammalian evolution within chimpanzee brain development regulatory regions aligned to SVA and MLT2A1 loci.\u003cbr\u003e\nD. Chromosome-level correlation patterns of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to L1PA6 loci.\u003cbr\u003e\nE-F. Chromosome-level discordant correlation patterns of TFBS density changes acquired during mammalian evolution within human (E) and chimpanzee (F) brain development regulatory regions aligned to L1PA6 and SVA loci.\u003cbr\u003e\nG. Chromosome-level negative correlation patterns of TFBS density changes acquired during mammalian evolution within human (left panel) and chimpanzee (right panel) brain development regulatory regions aligned to LTR5_Hs and L1PA6 loci. \u003cbr\u003e\nH. Chromosome-level positive correlation patterns of TFBS density changes acquired during mammalian evolution within human (left panel) and chimpanzee (right panel) brain development regulatory regions aligned to MLTA1 and L1PA6 loci.\u003cbr\u003e\nI-J. Chromosome-level negative correlation patterns of TFBS density changes acquired during mammalian evolution within human (I) and chimpanzee (J) brain development regulatory regions aligned to SVA and L1PA6 loci.\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/33bb83a9b05749c90fd74fbf.png"},{"id":69050703,"identity":"9f7b8b69-5471-44b3-bf5b-713a00d81963","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":118804,"visible":true,"origin":"","legend":"\u003cp\u003eInterspecies association patterns of TFBS density changes within brain development regulatory regions of human and chimpanzee revealed by genome-wide chromosome level alignment analyses with TFBS density profiles of different families of human embryo regulatory LTRs. \u003cbr\u003e\nA-C. Interspecies chromosome-level correlation profiles of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to MLT2A1 loci (A); or to LTR5_Hs loci (B); and combining alignment patterns to both MLT2A1 and LTR5_Hs loci (C).\u003cbr\u003e\nD-F. Interspecies chromosome-level correlation profiles of TFBS density changes acquired during mammalian evolution within human and chimpanzee brain development regulatory regions aligned to SVA loci (D); or to L1PA6 loci (E); and combining alignment patterns to both SVA and L1PA6 loci (F).\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/4969cf69d8480f1ab7d945ad.png"},{"id":69051285,"identity":"9fcd59c1-f5f4-477e-b808-43f242b45319","added_by":"auto","created_at":"2024-11-15 04:50:11","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1992029,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/5a4c57a5-98c9-4525-988d-19e4cd7a29f4.pdf"},{"id":69050700,"identity":"5ae9c8bc-dd32-452c-9dbf-21a5d972d8d9","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"zip","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1765615,"visible":true,"origin":"","legend":"","description":"","filename":"SupSummaryS1S6..zip","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/b02e231791b274c09439b529.zip"},{"id":69050707,"identity":"408c3b10-7ce2-4b6c-afec-7e82817c6a81","added_by":"auto","created_at":"2024-11-15 04:34:11","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":78367,"visible":true,"origin":"","legend":"","description":"","filename":"Tables.docx","url":"https://assets-eu.researchsquare.com/files/rs-5442388/v1/9bdf45934c5d7460757c1424.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eHighly conserved sequence-specific DNA binding networks of 716 transcription factors associated with continuing divergent genomic evolution of human and chimpanzee brain development.\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eDNA sequences derived from transposable elements (TE) constitute ~50% of human genome, contributing to a multitude of structural features and regulatory functions at different levels of genomic organization (Elbarbary et al., 2016; Sundaram and Wysocka, 2020). During evolution, concurrently with TE colonization of primates\u0026rsquo; genomes, a highly sophisticated defense system co-evolved designed to restrict uncontrolled TE expansion and diminish potentially deleterious effects of TE insertions on genome integrity, while providing TE family sequence-specific co-option mechanisms integrating TE-derived sequences into cell type- and tissue-specific genomic regulatory networks (Jacobs et al., 2014; Pontis et al., 2019). While potentially destructive effects of TE on genome integrity represent their ubiquitous biological feature defined by the very nature of transpositionally-competent TE, it is unknown whether TE-derived sequences may possess similarly ubiquitous non-deleterious functions, perhaps, contributing to evolutionary fitness of the host.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTE-derived sequences are often integrated into species\u0026rsquo; genomics regulatory networks. In mammalian genomes, TE-derived sequences rewired the core regulatory circuitry of embryonic stem cells (Kunarso et al., 2010); they intrinsically activated in human preimplantation embryos (Grow et al., 2015) by operating as long-range enhancers (Fuentes et al, 2018) and providing transcription factor binding sites (TFBS) for TP53 (Wang et al., 2007), STAT1 (Schmid et al., 2010), CTCF (Schmidt et al., 2012), a set of transcription factors (TFs) defined as master pluripotency regulators (Kunarso et al., Schmidt et al., 2012), including candidate human-specific TFBS for POU5F1 (OCT4), NANOG, SOX2, and CTCF (Glinsky, 2015; Glinsky et al., 2018). Overall, chromatin immunoprecipitation and sequencing (Chip-seq) experiments demonstrated that TE-derived loci harbor thousands of TFBS and presumably exerting global regulatory effects on gene expression in various pathophysiological conditions. However, distribution of TFBS are non-uniform and highly variable within specific TE subfamilies that evolutionary emerged from identical or highly similar sequences. TE sequence-dependent distribution of TFBS is likely influenced by genetic drift and other mutational processes as well as cell type-specific epigenetic contexts affecting chromatin states of genomic regions harboring TE insertions. Based on these considerations, it was reasonable to assume that Chip-seq experiments could capture only a fraction of TFBS repertoire encoded by the ancestral TE loci that may have existed at the time of the TE insertion and expansion in a host genome. Therefore, it was of interest to catalogue all TFs that have potential TFBS located within genomic loci encoded by a specific TE subfamily, which should facilitate the comparative analyses of TFBS encoded by distinct TE families to infer their concordant, non-overlapping, and discordant regulatory functions.\u0026nbsp;\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eHighly conserved sequence-specific double-stranded DNA binding genomic regulatory networks (GRNs) encoded by distinct families of human embryo regulatory LTRs.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo catalogue all TFs that have potential TFBS located within genomic loci encoded by a specific TE subfamily of human embryo regulatory LTRs (Glinsky, 2022; 2024), the Jaspar algorithm was employed to identify all TFBS located within 606 LTR5_Hs loci previously characterized as functionally active regulatory elements in human cells (Fuentes et al, 2018; Glinsky, 2022). The LTR5_Hs loci are likely to manifest the relatively minor divergence of TFBS profiles because LTR5_Hs/HERVK successfully infected and colonized primates\u0026rsquo; germline most recently compared to other HERVs with documented regulatory functions in human embryogenesis (Glinsky, 2022; 2024). The Jaspar algorithm output was processed to identify at a single-nucleotide resolution all TFs having predicted TFBS within LTR5_Hs loci and the number of TFBS for each TF was calculated (Supplementary Table S1). Overall, LTR5_Hs loci harbor 771 distinct TFBS for 716 individual TFs, including 65 paired TFBS features for 68 TFs comprising the immediately adjacent TFBS for two different TFs. Since individual LTR5_Hs loci harbor subsets of a composite set of 771 TFBS, it was designated a composite ancestral set of TFBS encoded by LTR5_Hs loci. Gene Set Enrichment Analyses (GSEA) employing the ENRICHR bioinformatics platform documented the exceedingly broad engagement of the LTR5_Hs loci-residing TFs in manifestations of physiological phenotypes and pathological conditions of Homo sapiens (Table 1; Supplementary Table S1). As expected, GSEA employing GO Molecular Function 2023 database identified among most significantly enriched categories 556 genes of Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000987) category; 581 genes of RNA Polymerase II Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000978) category; 609 genes of RNA Polymerase II Transcription Regulatory Region Sequence-Specific DNA Binding (GO:0000977) category. Notably, 700 of 716 (98%) of TFs having TFBS within LTR5_Hs loci represent constituents of the single Gene Ontology classification category designated the Sequence-Specific Double-Stranded DNA Binding (GO: 1990837). These observations indicate that identified herein set of LTR5_Hs-residing TFs may constitute a sequence-specific double-stranded DNA binding pathway exerting exceedingly broad and predominantly regulatory impacts on phenotypes of Modern Humans.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eQuantitative features of the ancestral LTR5_Hs set of TFBS were documented by calculating the numbers of TFBS for each of the 771 distinct TFBS and reporting the computed values as numbers of TFBS per LTR5_Hs locus (defined as TFBS frequency) and the estimated TFBS density normalized to 1 Kb of the locus length (defined as TFBS density; Supplementary Table S1). It has been observed that 533 of 771 distinct TFBS (69%) have at least one TFBS per LTR5_Hs locus and TFBS density more than 1 binding site per 1 Kb of regulatory DNA, while top 100 TFBS have TFBS frequency of more than 8.7 binding sites per LTR5_Hs locus and TFBS density more than 9.9 binding sites per 1 kb of regulatory DNA. \u0026nbsp;These observations suggest that a significant fraction of the LTR5_Hs ancestral set of TFBS has been preserved in LTR5_Hs loci residing in the human genome. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAnalysis of TFBS density values indicate that individual LTR5_Hs loci harbor TFBS for numerous TFs (Supplementary Table S1). This feature appears similar to experimentally defined 3583 regulatory loci in the mouse ESC which are documented to have TFBS for multiple TFs (Chen et al., 2008). It was of interest to identify all potential TFBS residing within 3583 mouse ESC multi-TFs-binding regulatory loci and compared them to the ancestral LR5_Hs set of 771 TFBS. These analyses demonstrated that 3583 mouse ESC multi-TFs-binding loci harbor 776 distinct TFBS (Supplementary Table S2). Notably, all 771 distinct TFBS comprising the ancestral set of LTR5_Hs-encoded TFBS were identified among TFBS residing within mouse ESC regulatory sequences binding multiple TFs. These observations suggest that identified herein ancestral set of 771 TFBS encoded by LTR5_Hs elements may reflect the presence of evolutionary conserved genomic regulatory network (GRN) operating during mammalian embryogenesis.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIt was of interest to characterize the spectrum of TFBS encoded by other LTR families known to play a regulatory role during human embryogenesis and collectively defined as human embryo regulatory LTRs (Glinsky, 2022; 2024). To this end, all potential TFBS encoded by LTR7; MLT2A1; and MLT2A2 loci were identified employing the Jaspar algorithm and corresponding quantitative metrics were computed for each set of TFBS (Supplementary Table S3). Intriguingly, despite clearly discernable marked divergence of DNA sequences, all analyzed herein regulatory LTR elements appear to harbor nearly identical sets of TFBS (Figure 1; Supplementary Table S3). Since introductions into primates\u0026rsquo; germline of different families of analyzed herein regulatory LTRs were separated by millions of years, these observations are consistent with the hypothesis of evolutionary conservation of identified in this contribution GRN. In agreement with this concept, correlation analyses of TFBS density profiles (Figure 1) revealed highly concordant patterns of TFBS densities between mouse ESC multi-TFs-binding loci and distinct families of human embryo regulatory LTRs, namely MLTA1A2 (r = 0.948); MLT2A1 (r = 0.921); LTR7 (r = 0.900); and LTR5_Hs (r = 0.834). Interestingly, networks of TFBS encoded by evolutionary ancient LTRs (MLT2A2 and MLT2A1) appear more closely related to the TFBS profile of mouse ESC multi-TFs-binding regulatory loci compared to TFBS networks encoded by human embryo regulatory LTRs introduced into primate germlines relatively recently (LTR7 and LTR5_Hs).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDespite overall similarities TFBS density profiles (Figure 1), human embryo regulatory LTRs manifest notable quantitative differences of the TFBS numbers for specific TFs (Figure 2; Supplementary Figure S1; Supplementary Table S4). Interestingly, LTR5_Hs loci appear to exhibit the largest divergence of the TFBS density profiles when comparisons were made to either mouse ESC multi-TFs-binding loci or other families of human embryo regulatory LTRs, including MLT2A1; MLT2A2; and LTR7 sequences (Figure 2; Supplementary Figure S1; Supplementary Table S4). For example among LTR5_Hs TFs with top 100 TFBS density scores, 19 TFs have at least 50% higher TFBS density compared to mouse ESC multi-TFs-binding regulatory loci and no TFs have lower TFBS density at this threshold. In contrast, within LTR7 loci there are 9 TFs manifesting at least 50% TFBS density gain and 4 TFs exhibiting TFBS density loss compared to mouse ESC multi-TFs-binding regulatory loci. Similarly, within MLT2A1 loci there are 7 TFs manifesting gain and 8 TFs exhibiting loss of at least 50% TFBS density compared to mouse ESC multi-TFs-binding regulatory loci. There were no TFs with TFBS density gain of at least 50% and 4 TFs with TFBS density loss within MLT2A2 loci. Direct comparisons of TFBS density scores between different families of human embryo regulatory LTRs reveals that among LTR5_Hs TFs with top 100 TFBS density scores, there are 23; 33; and 36 TFs having at least 50% higher TFBS density scores within LTR5_Hs loci compared to LTR7; MLT2A2; and MLT2A1 loci, respectively. In contrast, there are 1; 1; and 7 TFs having at least 50% higher TFBS density scores within LTR7; MLT2A2; and MLT2A1 loci compared to LTR5_Hs loci, respectively. Comparative analyses of TFBS density scores between LTR5_Hs and LTR7 loci among TFs with top 533 TFBS density scores (see above), demonstrate that there are 159 TFs and 55 TFs manifesting at least 50% higher TFBS density scores within LTR5_Hs versus LTR7 loci and LTR7 versus LTR5_Hs loci, respectively. At the threshold of at least 100% gains of TFBS density scores, there are 107 TFs having higher TFBS density scores within LTR5_Hs loci versus LTR7 loci, whereas there are no TFs harboring higher TFBS density scores within LTR7 loci versus LTR5_Hs loci (Supplementary Table S1). These findings suggest that LT5_Hs loci may have made a significant impact on the divergence of GRNs governed by the human embryo regulatory LTRs by providing regulatory loci with increased numbers of TFBS for numerous TFs (Figure 2; Supplementary Figure S1; Supplementary Tables S1 and S4). GSEA of TF-coding genes manifesting most significant increases of TFBS densities within LTR5_Hs loci compared to either mouse ESC multi-TFs-binding loci and/or other human embryo regulatory LTRs demonstrated that these TFs contribute to development of preimplantation embryogenesis phenotypic traits (Embryonic stem cells; Trophoblast stem cells; Blastocyst; Germ cells; Ectoderm); as well as multiple types of cells and tissues of central nervous system, including Neural stem/precursor cells; Neural crest; Neuroblast; Neural tube; Motor neurons; Oligodendrocyte precursor/progenitor cells; Fetal brain; Cerebellum; Prefrontal cortex; Superior frontal gyrus; Peripheral nerve (Table 2; Supplementary Figure S1; Supplementary Tables S1 and S4). These findings are consistent with the previously reported observations based on gene ontology-guided proximity placement analyses of GRNs governed by human embryo regulatory LTRs, which implicated LTR-associated GRNs in regulation of development and functions of primates\u0026rsquo; central nervous system (Glinsky, 2022; 2024).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eComparative analyses of highly conserved sequence-specific double-stranded DNA binding GRNs within ATAC-seq-defined regulatory regions of human and chimpanzee brain development.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIt was of interest to extend further this line of inquiry by determining whether reported herein sets of TFs and TFBS constituting the Sequence-Specific Double-Stranded DNA Binding GRNs associated with human embryo regulatory LTRs could be identified within genomic regulatory loci known to contribute to regulation of human and chimpanzee brain development. To this end, 17,935 genomic regulatory regions defined in the organoid single-cell genomic atlas of human and chimpanzee brain development (Kanton et al, 2019) were analyzed to catalogue all TFs having putative TFBS within these regions (Table 3). Kanton et al (2019) reported 8099 human-specific and 9836 chimpanzee-specific brain development regulatory regions (BDRRs), which were identified employing DNA sequence unbiased open chromatin accessibility screening method termed\u0026nbsp;the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq; Buenrostro et al, 2013; 2015).\u0026nbsp;Genome-wide and individual chromosome level analyses of these regions revealed that TFs and corresponding TFBS of\u0026nbsp;the Sequence-Specific Double-Stranded DNA Binding pathway comprise the intrinsic elements of the\u0026nbsp;ATAC-seq-defined human and chimpanzee BDRRs\u0026nbsp;(Table 3). Notably, quantitative characteristics of\u0026nbsp;the Sequence-Specific Double-Stranded DNA Binding networks appear significantly distinct between human and chimpanzee ATAC-seq-defined regulatory regions of brain development. While sets of individual TFs and corresponding TFBS independently identified in both species are indistinguishable, the TFBS frequencies (numbers of TFBS per ATAC-seq-defined regulatory locus) and the TFBS densities (TFBS numbers per 1 kb of ATAC-seq-defined regulatory locus) are consistently higher in Modern Humans compared to Chimpanzee BDRRs (Table 3). These finding suggest that reported distinct quantitative characteristics of the Sequence-Specific Double-Stranded DNA Binding pathway within ATAC-seq-defined brain development regulatory loci may have contributed to the previously observed different trajectories of Modern Humans and Chimpanzee brain development (Kanton et al., 2019).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNext, quantitative characteristics of gains and/or losses for individual LTR5_Hs loci-associated TFs having top 100 TFBS density scores (see above) were calculated in 17,935 ATAC-seq-defined BDRRs of human and chimpanzee, corresponding quantitative metrics were estimated for individual species\u0026rsquo; chromosomes, and multiple comparative analyses were carried out. Comparisons of TBS density profiles within BDRRs of the individual chromosomes of human and chimpanzee genomes that were acquired as a result of TFBS density gains or losses during mammalian evolution (Figure 3) identified five chromosomes manifesting highly concordant changes of TFBS densities, namely chr19 (r = 0.99); chr22 (r = 0.98); chr4 (r = 0.94); chr17 (r = 0.93); chr13 (r = 0.90); while eight chromosome exhibit weak or no concordance of TFBS density changes, namely chr21 (r = -0.12); chr14 (r = 0.12); chrX (r = 0.31); chr9 (r = 0.35); chr7 (r = 0.37); chr11 (r = 0.40); chr15 (0.43); chr12 (r = 0.47). Examples of correlation plots for two chromosomes manifesting most concordant (chr19 and chr22) and discordant (chr21 and chr14) profiles of TFBS density changes are shown in the Figure 3. Remaining 10 chromosomes appear to have moderate concordance levels of TFBS density profile changes within BDRRs of human and chimpanzee (Figure 3), suggesting largely divergent patterns of evolutionary TFBS density gains and losses of two species.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo explore further the hypothesis of the divergent evolution of TFBS density gains within BDRRs of human and chimpanzee, TFBS density gains and losses compared to mouse ESC multi-TFs-binding loci were computed for each chromosome and numbers of events manifesting TFBS density changes of at least 50% were plotted for visualization (Figure 4). Results of these analyses documented clearly discernable distinct patterns of TFBS density gains and losses acquired by humans and chimpanzee during mammalian evolution (Figure 4). Interestingly, humans appear to manifest predominantly gains of TFBS density within BDRRs, while chimpanzee seem to exhibit the prevalent losses within regulatory regions of brain development at several chromosomes (Figure 4). These observations were corroborated by direct comparisons of TFBS density values within BDRRs of human versus chimpanzee, which were calculated for each individual TFBS, recorded, and reported for each chromosome as the numbers of events manifesting most significant species-specific gains of TFBS densities (Figure 4).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eComputation of gains and losses for individual TFs of the Sequence-Specific Double-Stranded DNA Binding networks in 17,935 brain development genomic regulatory regions and recording of corresponding quantitative metrics of TFBS density changes for all individual chromosomes of human and chimpanzee genomes facilitated both interspecies and within species comparative genome-wide chromosome-level analyses.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo this end, genome-wide chromosome-level pairwise correlation matrices of TFBS density changes acquired during mammalian evolution within human (Table 4) and chimpanzee (Table 5) BDRRs defined by the ATAC-seq analysis were developed and analyzed for within specie\u0026rsquo;s genome concordance and divergence patterns. Using numerical values of correlation coefficients reported in Tables 3 and 4, the mean values of correlation coefficients were calculated for each chromosome of human and chimpanzee genomes and corresponding divergence scores were quantified by subtracting the mean values from a perfect correlation coefficient value of 1.0. Additionally, for each individual human and chimpanzee chromosome coefficient of variations were estimated as the ratio of standard deviation to the mean value expressed as a percentage. Corresponding numerical values were plotted for visualization and reported in the Figure 5. In human genome, 19 chromosomes appear to exhibit similar values of divergence scores ranging from 0.208 (chr10) to 0.325 (chr20). In contrast, seemingly higher values of divergence scores were documented for 4 chromosomes, namely 0.553 (chr16); 0.831 (chr17); 0.960 (chr22); and 1.047 (chr19). These findings were corroborated by the results of the analysis of variation coefficients which identified chr17; chr19; and chr22 as chromosomes having largest values of coefficients of variation among human chromosomes (Figure 5). In chimpanzee genome, values of divergence scores appear to manifest a somewhat broader degree of variability ranging from 0.194 (chr7) to 0.472 (chr6), while 2 chromosomes had seemingly higher divergence score values of 0.646 (chr13) and 0.757 (chr4). Consistent with these observations, results of the analysis of variation coefficients identified chr13 and chr4 as chromosomes associated with largest values of coefficients of variation in chimpanzee genome (Figure 5).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTherefore, analyses of patterns of divergence and concordance of chromosome-level TFBS density changes acquired by Modern Humans and Chimpanzee during mammalian evolution identified chr17; chr19; and chr22 as chromosomes exhibiting most divergent profiles of TFBS density changes in human genome, while chr4 and chr13 appear to manifest most divergent patterns of TFBS density changes in chimpanzee genome (Figure 5). Notably, chromosomes that appear to manifest most divergent patterns of TFBS density changes within human (chr17; chr19; and chr22) or chimpanzee (chr4; chr13) genomes are the same chromosomes that have most similar profiles of TFBS density gains/losses in brain development regulatory regions of human and chimpanzee (Figure 4). However, most divergent chromosomes within chimpanzee genome, namely chr4 and chr13, manifest variation coefficients closely related to 18 other chromosomes within human genome (Figure 5). Conversely, chr17; chr19; and chr22 that are most divergent within human genome have coefficients of variation closely related to 18 other chromosomes of chimpanzee genome (Figure 5).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDistinct and common association patterns of TFBS density changes within BDRRs of human and chimpanzee revealed by genome-wide chromosome level alignment analyses with signatures of TFBS density changes of different families of human embryo regulatory LTRs.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEstablishment of genome-wide chromosome-level quantitative profiles of TFBS density changes within ATAC-seq-defined BDRRs of human and chimpanzee (Tables 2 \u0026ndash; 4; Figures 3 \u0026ndash; 5) prompted investigation of association patterns of TFBS density changes within BDRRs and signatures of TFBS density changes of 4 different families of human embryo regulatory LTRs (Figure 6). Visualization of the results of chromosome-level alignments of the corresponding profiles of TFBS density changes revealed clearly discernable common and distinct patterns of associations of TFBS density changes acquired during mammalian evolution within human and chimpanzee BDRRs and within DNA sequences encoded by different families of human embryo regulatory LTRs, namely MLT2A1 (Figure 5A); MLT2A2 (Figure 6C); LTR7 (Figure 6D); and LR5_Hs (Figure 6B). Notably, the concordant and discordant association patterns of TFBS density changes were observed in interspecies comparisons of TFBS density changes as well as in analyses of within-specie profiles of TFBS density changes of BDRRs aligned to signatures of TFBS density changes of different families of regulatory LTRs (Figure 6). Alignments to TFBS density changes profiles of BDRRs and the MLT2A1 and LTR5_Hs generated larger values of correlation coefficients compared to the MLT2A1 and LTR7 alignments, while highly concordant alignment patterns were observed for MLT2A1 and MLTA2 analyses as well as for LTR5_Hs and LTR7 analyses. These trends were observed in interspecies and within individual specie comparisons. Comparisons of within an individual specie genome alignments revealed striking negative correlations of the association patterns generated by the alignments of BDRRs TFBS density changes profiles to TFBS density changes profiles of the MLT2A1 versus LTR5_Hs regulatory LTRs (Figure 6), which were documented in analyses of either chimpanzee or human BDRRs. However, BDRRs residing only on 4 human chromosomes, namely chr19; chr22; chr17; and chr6, manifested positive correlation coefficients of TFBS density changes profiles alignments to the LTR5_Hs in contrast to BDRRs residing on 12 chimpanzee chromosomes. Conversely, BDRRs housed on 19 human chromosomes had positive correlation coefficients of TFBS density changes profiles alignments to the MLT2A1 loci in contrast to BDRRs housed on 10 chimpanzee chromosomes (Figure 6). These findings were corroborated by the results of the interspecies comparisons of the alignments of TFBS density changes profiles of human and chimpanzee BDRRs the profiles of TFBS density changes of either MLT2A1 or LTR5_Hs (see Figure 9). Combining these alignments into one plot has resulted in the visual depiction of common and distinct patterns of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes (see Figure 9). Overall, these observations are in agreement with the hypothesis that divergence of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes occurs along distinct alignment patterns to profiles of TFBS density changes of different families of human embryo regulatory LTRs, namely MLT2A1 and LTR5_Hs. This model was further corroborated by results of the analyses aggregating the chromosome-level observations into a simplified genome-wide data set highlighting panels of TFs manifesting either common or divergent patterns of TFBS density changes acquired during primate evolution within BDRRs of human and chimpanzee (Figures 6; 9; Supplementary Table S5). GSEA of TF-coding genes manifesting divergent profiles of TFBS within BDRRs of human and chimpanzee underscore their exceedingly broad developmental and pathophysiological impacts on phenotypic traits of Modern Humans (Supplementary Table S5), including key constituents of central nervous system development and functions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePotential impacts of currently active LINE and SVA transposons in shaping the continuing divergent genomic evolution of BDRRs of Modern Humans and Chimpanzee.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eObservations reported in this contribution are congruent with recent findings implicating TE-encoded regulatory sequences derived from multiple TE families in development of human and chimpanzee hippocampal intermediate progenitor cells (Patoori et al., 2022). It was of interest to investigate whether the TF-constituents of reported herein sequence-specific double-stranded DNA binding networks are engaged in TE-governed hippocampal neurogenesis regulatory pathways, which were discovered by Patoori et al. (2022) utilizing a model of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells. Results of these analyses (Figure 7; Supplementary Table S6) documented ubiquitous presence within TE-constituents of regulatory pathways of hippocampal intermediate progenitors of qualitatively nearly identical arrays of TFBS for TFs constituting protein components of reported herein sequence-specific double-stranded DNA binding networks. These observations indicate that one of universal features of multiple families of TEs, including LTR/HERV, SINE/Alu, SVA, and LINE families, is their intrinsic propensity to harbor and spread genome-wide consensus regulatory nodes of identified herein highly conserved sequence-specific double-stranded DNA binding networks, selections of TFBS panels of which manifest individual chromosome-specific profiles and species-specific divergence patterns. Consistent with this hypothesis, it has been observed that TE subfamilies that became more divergent from consensus TFBS patterns due to mutational processes are less likely to be represented within differentially-accessible (DA) ATAC-seq-defined regulatory loci of hippocampal intermediate progenitors\u0026rsquo; development (Figure 7), while DA ATAC-regions intersecting larger numbers of SINE/Alu loci appear to harbor TFBS for more TF-constituents of highly conserved sequence-specific double-stranded DNA binding networks (Figure 7). Notably, different families of TE-constituents of regulatory pathways of hippocampal intermediate progenitors\u0026rsquo; development manifest different degrees of conservation of arrays of TFBS for a consensus panel of TF-constituents of highly conserved sequence-specific double-stranded DNA binding networks. SINE/Alu subfamily members seem to exhibit the highest diversity and least apparent conservation profiles reaching the maximum divergence of 63.4%. In contrast, LINE subfamily members manifest the maximum divergence of only 14.56%, while the maximum divergence observed for LTR subfamily members was 36.21% (Figure 7).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIntriguingly, TE families that are currently active as retrotransposons in human genome, namely LINE and SVA transposable elements, harbor arrays of TFBS for essentially all TF-constituents of highly conserved sequence-specific double-stranded DNA binding networks (Figure 7; Supplementary Table S6). Therefore, it was of interest to investigate association patterns of TFBS density changes within human and chimpanzee BDRRs and signatures of TFBS density changes acquired during mammalian evolution by SVA and LINE retrotransposons and compare the results with similar analyses carried out for human embryo regulatory LTRs (Figure 6). Visualization of the results of chromosome-level alignments of the corresponding profiles of TFBS density changes revealed clearly discernable common and distinct patterns of associations of TFBS density changes acquired during mammalian evolution within human and chimpanzee BDRRs and within DNA sequences encoded by SVA and L1PA6 loci (Figures 8-9). The concordant and discordant patterns of association of TFBS density changes were observed in interspecies comparisons of TFBS density changes as well as in analyses of within-specie profiles of TFBS density changes of BDRRs aligned to signatures of TFBS density changes of SVA and L1PA6 retrotransposons (Figure 8). Alignments to TRBS density changes profiles of BDRRs and the SVA and L1PA6 generated larger values of correlation coefficients compared to the MLT2A2 and LTR5_Hs alignments (Figures 6 and 8), while highly concordant alignment patterns were observed for MLT2A1 and L1PA6 analyses as well as for LTR5_Hs and SVA analyses. These trends were observed in interspecies and within individual specie comparisons. Comparisons of within an individual specie genome alignments revealed striking negative correlations of the association patterns of the alignments of BDRRs TFBS density changes profiles to TFBS density changes profiles of the L1PA6 versus LTR5_Hs and the L1PA6 versus SVA, in contrast to highly positive correlations of TFBS density changes profiles of the L1PA6 versus MLT2A1 (Figure 8). These patterns of associations were consistently observed in analyses of either chimpanzee or human BDRRs.\u003c/p\u003e\n\u003cp\u003eBDRRs housed on 19 human chromosomes had positive correlation coefficients of TFBS density changes profiles alignments to the L1PA6 loci in contrast to BDRRs housed on 6 chimpanzee chromosomes, including chr4 and chr13 (Figure 9). In contrast, BDRRs residing only on 6 human chromosomes, including chr19; chr22; chr17; and chr6, manifested positive correlation coefficients of TFBS density changes profiles alignments to the SVA in contrast to BDRRs residing on 17 chimpanzee chromosomes. Results of these analyses are strikingly similar to the alignments of TFBS density changes to the TFBS density patterns of MLTA1 and LTR5_Hs loci (Figures 6 and 9) and the interspecies comparisons of the alignments of TFBS density changes profiles of human and chimpanzee BDRRs to the profiles of TFBS density changes of MLT2A1 and LTR5_Hs (Figure 9). Combining these alignments into one plot has resulted in the visual depiction of common and distinct patterns of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes (Figure 9).\u003c/p\u003e\n\u003cp\u003eThese findings were corroborated by the results of the interspecies comparisons of the alignments of TFBS density changes profiles of human and chimpanzee BDRRs the profiles of TFBS density changes of either L1PA6 or SVA (Figure 9). Combining these two alignments into a single plot has resulted in the visual depiction of common and distinct patterns of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes (Figure 9). These observations are in agreement with the hypothesis that divergence of TFBS density changes within BDRRs residing on different chromosomes of human and chimpanzee genomes occurs along distinct alignment patterns to profiles of TFBS density changes of retrotransposons that are currently active in human genome, namely SVA and LINE families.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this contribution, the emergence during mammalian evolution of genomic regulatory networks (GRNs) encompassing ubiquitous, qualitatively nearly identical and quantitatively markedly distinct arrays of sequences of TFBS for 716 proteins is reported. A vast majority of TFs (770 of 716; 98%) comprising protein constituents of these networks appear to share common Gene Ontology (GO) features of sequence-specific double-stranded DNA binding (GO: 1990837). Among most significantly enriched categories, GSEA employing GO Molecular Function 2023 database identified 556 genes assigned to Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000987) category; 581 genes of RNA Polymerase II Cis-Regulatory Region Sequence-Specific DNA Binding (GO: 0000978) category; and 609 genes of RNA Polymerase II Transcription Regulatory Region Sequence-Specific DNA Binding (GO:0000977) category. Assignments of essentially all TFs constituents of these networks to GO functional categories of sequence-specific double-stranded DNA binding, cis-regulatory region sequence-specific DNA binding, RNA Polymerase II cis-regulatory and transcription regulatory region sequence-specific DNA binding strongly imply their structural-functional engagements into assembly and activities of heterochromatin and euchromatin multiprotein-DNA complexes. To date, ubiquitous, qualitatively nearly identical and quantitatively markedly distinct representations of sequence-specific TFBS arrays of these networks have been observed within genomic regulatory loci encoded by all analyzed TE families, including TE families coopted into GRNs contributing to development and functions of central nervous system. TE families, including LTR/HERV, SINE/Alu, SVA, and LINE subfamilies, appear to harbor and spread genome-wide consensus regulatory nodes of identified herein highly conserved GRNs, selections within which of TFBS panels manifest individual chromosome-specific profiles and species-specific divergence patterns. Markedly distinct quantitative characteristics of these networks, in particular, changes of TFBS densities, have been inferred from genome-wide chromosome-level analyses of BDRRs of Modern Humans and Chimpanzee, suggesting that species-specific differences of the activities of these networks may have contributed to continuing divergent genomic evolutions of brain development of humans and non-human primates. Reported in this contribution results of chromosome-level analyses of quantitative metrics of GRNs emanating from sequence-specific double-stranded DNA binding of ~700 proteins may achieve a marked functional diversity by operating in chromosome territory-specific patterns. Observed conservation of these GRNs beyond the boundaries of confidently mapped TE-derived regulatory loci suggest that considerations of contributions of TEs to creation of mammalian genomic DNA could be extended to more than currently estimated ~50% of genomes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cem\u003eData source and analytical protocols\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eSolely publicly available datasets and resources were used in this contribution. Initial analyses were focused on human embryo regulatory LTR loci (see Introduction) that were identified as highly-conserved pan-primate regulatory sequences because they have been present in genomes of primate species for at least ~15 MYA. Four distinct LTR families meeting these criteria, namely MLT2A1 (2416 loci), MLT2A2 (3069 loci), LTR7 (3354 loci), and LTR5_Hs (606 loci), were analyzed. A total of 9445 fixed non-polymorphic sequences of human embryo regulatory LTR elements residing in genomes of Modern Humans (hg38 human reference genome database) were retrieved as described in recent studies (Hashimoto et al., 2021; Carter et al., 2022; Glinsky, 2022; 2024) and the number of highly conserved orthologous loci in genomes of sixteen non-human primates (NHP) were determined exactly as previously reported (Glinsky, 2022; 2024). Briefly, fixed non-polymorphic regulatory LTR loci residing in the human genome (hg38 human reference genome database) has been considered highly conserved in the genome of NHP only if the following two requirements are met:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e(1) During the direct LiftOver test (https://genome.ucsc.edu/cgi-bin/hgLiftOver ), the human LTR sequence has been mapped in the NHP genome to the single orthologous locus with a threshold of at least 95% sequence identity;\u003c/p\u003e\n\u003cp\u003e(2) During the reciprocal LiftOver test, the NHP sequence identified in the direct LiftOver test has been remapped with at least 95% sequence identity threshold to the exactly same human orthologous sequence which was queried during the direct LiftOver test.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eA set of experimentally defined 3583 regulatory loci of the mouse ESC which are documented to harbor TFBS for multiple TFs (Chen et al., 2008) was utilized as a reference to estimate changes of TF-binding patterns and the TFBS densities during mammalian evolution. To identify TFBS within genomic regulatory loci known to contribute to regulation of human and chimpanzee brain development, a total of 17,935 genomic regulatory regions reported in the organoid single-cell genomic atlas of human and chimpanzee brain development (Kanton et al, 2019) were analyzed. A catalogue of TFs having putative TFBS within\u0026nbsp;8099 human-specific and 9836 chimpanzee-specific brain development regulatory regions (BDRRs) was compiled. BDRRs were identified employing DNA sequence unbiased open chromatin accessibility screening method termed the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq; Buenrostro et al, 2013; 2015). A set of 751 TE loci of the SVA families coopted as functional cis-regulatory elements in human induced pluripotent stem cells (Barnada et al., 2022) and a set of 3,265 TE loci engaged in TE-governed hippocampal neurogenesis regulatory pathways of human and chimpanzee (Patoori et al., 2022) were analyzed. TE-governed hippocampal neurogenesis regulatory pathways were discovered by Patoori et al. (2022) utilizing a model of differentiation of human and chimpanzee induced pluripotent stem cells into TBR2 (or EOMES)-positive hippocampal intermediate progenitor cells.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIdentification of transcription factor binding sites (TFBS) within candidate genomic regulatory loci was performed employing the Jaspar algorithm (Jaspar Transcription Factors) accessible through the UCSC Genome Bowser Table Browser functions ( https://genome.ucsc.edu/cgi-bin/hgTables ) facilitating downloading, filtering, analyzing, and retrieving data from the Genome Browser. TFBS identification and data retrieval were carried out using the default thresholds settings of imputing up to 1,000 loci per screen to achieve the full coverage of the specified genomic regions of interest based on coordinated of hg38 and hg19 human reference genome databases. \u0026nbsp;All identified TFBS were retrieved and all individual TFs having TFBS were catalogued. For each set of TFBS, quantitative features were documented by calculating the numbers of events recorded for each distinct TFBS and reporting the computed values as numbers of TFBS per regulatory locus (defined as TFBS frequency) and the estimated TFBS density calculated as TFBS frequency normalized to 1 Kb of the locus length (defined as TFBS density).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe significance of the differences in the expected and observed numbers of events was calculated using two-tailed Fisher\u0026rsquo;s exact test. Multiple proximity placement enrichment tests were performed for individual families and sub-sets of LTRs, BDRRs, and human-specific regulatory regions (HSRS) taking into account the size in bp of corresponding genomic regions, size distributions in human cells of topologically associating domains, distances to putative regulatory targets, bona fide regulatory targets identified in targeted genetic interference and/or epigenetic silencing experiments. Additional details of methodological and analytical approaches are provided in the text, Supplementary Materials and previously reported contributions [Barakat et al. 2018; Fuentes et al. 2018; Glinsky 2015; 2016a, b; 2018; 2019; 2020a, b, c, 2021; 2022; Guffanti et al. 2018; Glinsky and Barakat, 2019; McLean et al. 2010; 2011; Pontis et al. 2019; Wang et al. 2014].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eGene set enrichment and genome-wide proximity placement analyses\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eGene set enrichment analyses were carried-out using the Enrichr bioinformatics platform, which enables the interrogation of nearly 200,000 gene sets from more than 100 gene set libraries. The Enrichr API (January 2018 through January 2023 releases) [Chen et al. 2013; Kuleshov et al. 2016; Xie et al. 2021] was used to test genes linked to regulatory LTR elements, HSRS, or other regulatory loci of interest for significant enrichment in numerous functional categories. When technically feasible, larger sets of genes comprising several thousand entries were analyzed. Regulatory connectivity maps between HSRS, regulatory LTRs and coding genes and additional functional enrichment analyses were performed with the Genomic Regions Enrichment of Annotations Tool (GREAT) algorithm [McLean et al. 2010; 2011] at default settings. The reproducibility of the results was validated by implementing two releases of the GREAT algorithm: GREAT version 3.0.0 (02/15/2015 to 08/18/2019) and GREAT version 4.0.4 (08/19/2019) applying default settings at differing maximum extension thresholds as previously reported (Glinsky 2020a, b, c; 2021; 2022; 2024). The GREAT algorithm allows investigators to identify and annotate the genome-wide connectivity networks of user-defined distal regulatory loci and their putative target genes. Concurrently, the GREAT algorithm performs functional Gene Ontology (GO) annotations and analyses of statistical enrichment of GO annotations of identified genomic regulatory elements (GREs) and target genes, thus enabling the inference of potential biological impacts of interrogated genomic regulatory networks. The Genomic Regions Enrichment of Annotations Tool (GREAT) algorithm was employed to identify putative down-stream target genes of human embryo regulatory LTRs. Concurrently with the identification of putative regulatory target genes of GREs, the GREAT algorithm performs stringent statistical enrichment analyses of functional annotations of identified down-stream target genes, thus enabling the inference of potential significance of phenotypic impacts of interrogated GRNs. Importantly, the assignment of phenotypic traits as putative statistically valid components of GRN actions entails the assessments of statistical significance of the enrichment of both GREs and down-stream target genes by applying independent statistical tests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe validity of statistical definitions of genomic regulatory networks (GRNs) and genomic regulatory modules (GRMs) based on the binominal (regulatory elements) and hypergeometrc (target genes) FDR Q values was evaluated using a directed acyclic graph (DAG) test based on the enriched terms from a single ontology-specific table generated by the GREAT algorithm (Glinsky, 2024). DAG test draws patterns and directions of connections between significantly enriched GO modules based on the experimentally-documented temporal logic of developmental processes and structural/functional relationships between gene ontology enrichment analysis-defined statistically significant terms. A specific DAG test utilizes only a sub-set of statistically significant GRMs from a single gene ontology-specific table generated by the GREAT algorithm by extracting GRMs manifesting connectivity patterns defined by experimentally documented developmental and/or structure/function/activity relationships. These GRMs are deemed valid observations and visualized as a consensus hierarchy network of the ontology-specific DAGs (Glinsky, 2024). Based on these considerations, the DAG algorithm draws the developmental and structure/function/activity relationships-guided hierarchy of connectivity between statistically significant gene ontology enriched GRMs.\u003c/p\u003e\n\u003cp\u003eGenome-wide Proximity Placement Analysis (GPPA) of down-stream target genes and distinct genomic features co-localizing with regulatory LTRs, HSRS, BDDRs and other regulatory loci was carried out as described previously and originally implemented for human-specific transcription factor binding sites [Glinsky et al. 2018; Glinsky, 2015, 2016a, 2016b, 2017, 2018, 2019, 2020a, 2020b, 2020c, 2021; 2022; 2024; Guffanti et al. 2018].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eDifferential GSEA to infer the relative contributions of distinct subsets of regulatory LTR elements and down-stream target genes on phenotypes of interest.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWhen technically and analytically feasible, different sets of regulatory LTRs and candidate down-stream target genes defined at several significance levels of statistical metrics and comprising from dozens to several thousand individual genetic loci were analyzed using differential GSEA. This approach was utilized to gain insights into biological effects of regulatory LTRs and down-stream target genes and infer potential mechanisms of phenotype affecting activities. Previously, this approach was successfully implemented for identification and characterization of human-specific regulatory networks governed by human-specific transcription factor-binding sites [Glinsky et al. 2018; Glinsky, 2015, 2016a, 2016b, 2017, 2018, 2019, 2020a, 2020b, 2020c, 2021; 2022; 2024; Guffanti et al. 2018] and functional enhancer elements [Barakat et al. 2018; Glinsky et al. 2018; Glinsky and Barakat 2019; Glinsky 2015, 2016a, 2016b, 2017, 2018, 2019, 2020a, 2020b, 2020c, 2021; 2022; 2024]. Differential GSEA approach has been utilized for characterization of phenotypic impacts of 13,824 genes associated with 59,732 human-specific regulatory sequences [Glinsky, 2020a], 8,405 genes associated with 35,074 human-specific neuroregulatory single-nucleotide changes [Glinsky, 2020b], 8,384 genes regulated by stem cell-associated retroviral sequences (SCARS) [Glinsky 2021], as well as human genes and medicinal molecules affecting the susceptibility to SARS-CoV-2 coronavirus [Glinsky, 2020c].\u003c/p\u003e\n\u003cp\u003eInitial GSEA entail interrogations of each specific set of candidate down-stream target genes using ~70 distinct genomic databases, including comprehensive pathway enrichment Gene Ontology (GO) analyses. Upon completion, these analyses were followed by in-depth interrogations of the identified significantly-enriched gene sets employing selected genomic databases deemed most statistically informative at the initial GSEA. In all reported tables and plots (unless stated otherwise), in addition to the nominal p values and adjusted p values, the Enrichr software calculate the \u0026ldquo;combined score\u0026rdquo;, which is a product of the significance estimate and the magnitude of enrichment (combined score c = log(p) * z, where p is the Fisher\u0026rsquo;s exact test p-value and z is the z-score deviation from the expected rank).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eStatistical Analyses of the Publicly Available Datasets\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eAll statistical analyses of the publicly available genomic datasets, including error rate estimates, background and technical noise measurements and filtering, feature peak calling, feature selection, assignments of genomic coordinates to the corresponding builds of the reference human genome, and data visualization, were performed exactly as reported in the original publications and associated references linked to the corresponding data visualization tracks (http://genome.ucsc.edu/ ). Additional elements or modifications of statistical analyses are described in the corresponding sections of the Results. Statistical significance of the Pearson correlation coefficients was determined using GraphPad Prism version 6.00 software. Both nominal and Bonferroni adjusted p values were estimated and considered as reported in corresponding sections of the Results. The significance of the differences in the numbers of events between the groups was calculated using two-sided Fisher\u0026rsquo;s exact and Chi-square test, and the significance of the overlap between the events was determined using the hypergeometric distribution test [Tavazoie et al. 1999].\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eSupplementary Information is available online.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSupplementary information includes Supplementary Tables S1-S6; Supplemenmtary Figure S1; \u0026nbsp;and Supplementary Summaries S1-S4.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements.\u0026nbsp;\u003c/strong\u003eThis work was made possible by the open public access policies of major grant funding agencies and international genomic databases and the willingness of many investigators worldwide to share their primary research. Author would like to thank you Victoria Glinskii for invaluable expert assistance with graphical presentation of the results of this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis is a single author contribution. All elements of this work, including the conception of ideas, formulation, and development of concepts, execution of experiments, analysis of data, and writing of the paper, were performed by the author.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn part, this work was supported by OncoScar, LLC.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest statement\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo conflicts of interest to declare.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data supporting the reported observations and required to reproduce the findings are provided in the main body of the paper and Supplementary materials.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBarnada SM, Isopi A, Tejada-Martinez D, Goubert C, Patoori S, Pagliaroli L, et al. (2022). Genomic features underlie the co-option of SVA transposons as cis-regulatory elements in human pluripotent stem cells. PLoS Genet 18(6): e1010225. https://doi.org/10.1371/journal.pgen.1010225\u003c/li\u003e\n\u003cli\u003eBuenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. \"Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position\". Nature Methods. 2013; 10 (12): 1213\u0026ndash;8. doi:10.1038/nmeth.2688. PMC 3959825. PMID 24097267.\u003c/li\u003e\n\u003cli\u003eBuenrostro JD, Wu B, Chang HY, Greenleaf WJ. \"ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide\". Current Protocols in Molecular Biology. 2015; 109: 21.29.1\u0026ndash;21.29.9. doi:10.1002/0471142727.mb2129s109. PMC 4374986. PMID 25559105.\u003c/li\u003e\n\u003cli\u003eCarter T, Singh M, Dumbovic G, Chobirko JD, Rinn JL, Feschotte C. 2022. Mosaic cis-regulatory evolution drives transcriptional partitioning of HERVH endogenous retrovirus in the human embryo. Elife. 11: e76257. doi: 10.7554/eLife.76257.\u003c/li\u003e\n\u003cli\u003eChen, EY, et al., Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 2013. 14: 128.\u003c/li\u003e\n\u003cli\u003eChuong EB, Rumi MAK, Soares MJ, Baker JC. 2013. Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet. 45: 325\u0026ndash;9. https://doi.org/10.1038/ng.2553.\u003c/li\u003e\n\u003cli\u003eChuong EB, Elde NC, Feschotte C. 2017. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet. 18: 71\u0026ndash;86. https://doi.org/10.1038/nrg.2016.139 .\u003c/li\u003e\n\u003cli\u003eElbarbary RA, Lucas BA, Maquat LE. Retrotransposons as regulators of gene expression. Science. 2016; 351:aac7247. https://doi.org/10.1126/science.aac7247 .\u003c/li\u003e\n\u003cli\u003eFort A, Hashimoto K, Yamada D, Salimullah M, Keya CA, Saxena A, et al. 2014. Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet. 46: 558\u0026ndash;66. https://doi.org/10.1038/ng.2965.\u003c/li\u003e\n\u003cli\u003eFuentes DR, Swigut T, Wysocka J. 2018. Systematic perturbation of regulatory LTRs reveals widespread long-range effects on human gene regulation. Elife. 7:e35989. https://doi.org/10.7554/eLife.35989\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2015. Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human- Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs. \u003cem\u003eGenome Biol Evol \u003c/em\u003e\u003cstrong\u003e7\u003c/strong\u003e:1432\u0026ndash;1454. https://doi.org/10.1093/gbe/evv081\u003c/li\u003e\n\u003cli\u003eGlinsky, G.V. 2016a. Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens. \u003cem\u003eGenome Biol Evol\u003c/em\u003e 8: 2774-2788.\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2016b. Single cell genomics reveals activation signatures of endogenous SCARS networks in aneuploid human embryos and clinically intractable malignant tumors. Cancer Lett. 381: 176-193.\u003c/li\u003e\n\u003cli\u003eGlinsky G.V. 2017. Human-specific features of pluripotency regulatory networks link NANOG with fetal and adult brain development. BioRxiv. https://www.biorxiv.org/content/10.1101/022913v3 doi: https://doi.org/10.1101/022913\u003c/li\u003e\n\u003cli\u003eGlinsky G, Durruthy-Durruthy J, Wossidlo M, Grow EJ, Weirather JL, Au KF, Wysocka J, Sebastiano V. Single cell expression analysis of primate-specific retroviruses-derived HPAT lincRNAs in viable human blastocysts identifies embryonic cells co-expressing genetic markers of multiple lineages. Heliyon. 2018. 4: e00667. https://doi.org/10.1016/j.heliyon.2018.e00667 . PMID: 30003161; PMCID: PMC6039856.\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2020a. A catalogue of 59,732 human-specific regulatory sequences reveals unique to human regulatory patterns associated with virus-interacting proteins, pluripotency and brain development. DNA and Cell Biology 39: 126-143. https://doi.org/10.1089/dna.2019.4988\u0026nbsp;\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2020b. Impacts of genomic networks governed by human-specific regulatory sequences and genetic loci harboring fixed human-specific neuro-regulatory single nucleotide mutations on phenotypic traits of Modern Humans. Chromosome Res. 28: 331-354. https://doi.org/10.1007/s10577-020-09639-w\u0026nbsp;\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2021. Genomics-Guided Drawing of Molecular and Pathophysiological Components of Malignant Regulatory Signatures Reveals a Pivotal Role in Human Diseases of Stem Cell-Associated Retroviral Sequences and Functionally-Active hESC Enhancers. Frontiers in Oncology. 11: 974. https://doi.org/10.3389/fonc.2021.638363\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2022. Molecular diversity and phenotypic pleiotropy of ancient genomic regulatory loci derived from human endogenous retrovirus type H (HERVH) promoter LTR7 and HERVK promoter LTR5_Hs and their contemporary impacts on pathophysiology of Modern Humans. Mol Genet Genomics. 297: 1711-1740.\u003c/li\u003e\n\u003cli\u003eGlinsky GV. 2024. Gene ontology-guided proximity placement analyses of pan-primate regulatory LTR elements contributing to embryogenesis, development of physiological traits and pathological phenotypes of Modern Humans. Under review.\u003c/li\u003e\n\u003cli\u003eG\u0026ouml;ke J, Lu X, Chan Y-S, Ng H-H, Ly L-H, Sachs F, Szczerbinska I. 2015. Dynamic Transcription of Distinct Classes of Endogenous Retroviral Elements Marks Specific Populations of Early Human Embryonic Cells. 1052 \u003cem\u003eCell Stem Cell \u003c/em\u003e\u003cstrong\u003e16\u003c/strong\u003e:135\u0026ndash;141. doi:10.1016/j.stem.2015.01.005\u003c/li\u003e\n\u003cli\u003eGoubert C, Zevallos NA, Feschotte C. 2020. Contribution of unfixed transposable element insertions to human regulatory variation. Philos Trans R Soc B Biol Sci. 375: 20190331. https://doi.org/10.1098/rstb.2019.0331.\u003c/li\u003e\n\u003cli\u003eGrow EJ, Flynn RA, Chavez SL, Bayless NL, Wossidlo M, Wesche DJ, Martin L, Ware CB, Blish CA, Chang HY, Pera RA, Wysocka J. 2015. Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature 522:221\u0026ndash;225. https://doi.org/10.1038/nature14308 \u0026nbsp;\u003c/li\u003e\n\u003cli\u003eGuffanti G, Bartlett A, Klengel T, Klengel C, Hunter R, Glinsky G, Macciardi F. 2018. Novel bioinformatics approach identifies transcriptional profiles of lineage-specific transposable elements at distinct loci in the human dorsolateral prefrontal cortex. Mol Biol Evol. 35: 2435-2453.\u003c/li\u003e\n\u003cli\u003eJacobs FMJ, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, et al. 2014. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature. 516: 242\u0026ndash;5. https://doi.org/10.1038/nature13760.\u003c/li\u003e\n\u003cli\u003eJacques P-\u0026Eacute;, Jeyakani J, Bourque G. 2013. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 9: e1003504. https://doi.org/10.1371/journal.pgen.1003504.\u003c/li\u003e\n\u003cli\u003eKanton S, Boyle MJ, He Z, Santel M,Weigert A, Sanch\u0026iacute;s-Calleja F, Guijarro P, Sidow L, Fleck JS, Han D, Qian Z, Heide M, Huttner WB, Khaitovich P, P\u0026auml;\u0026auml;bo S, Treutlein B, Camp JG. 2019. Organoid single-cell genomic atlas uncovers human-specific features of brain development. Nature. 574: 418\u0026ndash;422.\u0026nbsp;https://doi.org/10.1038/s41586-019-1654-9\u003c/li\u003e\n\u003cli\u003eKuleshov MV, et al., Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res, 2016. 44(W1): W90-7.\u003c/li\u003e\n\u003cli\u003eKunarso G, Chia N-Y, Jeyakani J, Hwang C, Lu X, Chan Y-S, Ng H-H, Bourque G. 2010. Transposable elements have rewired the core regulatory network of human embryonic stem cells. \u003cem\u003eNature Genetics\u003c/em\u003e \u003cstrong\u003e42\u003c/strong\u003e:631\u0026ndash;634. doi:10.1038/ng.600\u003c/li\u003e\n\u003cli\u003eLu X, Sachs F, Ramsay L, Jacques P\u0026Eacute;, G\u0026ouml;ke J, Bourque G, et al. 2014. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat Struct Mol Biol. 21: 423\u0026ndash;5. https://doi.org/10.1038/nsmb.2799\u003c/li\u003e\n\u003cli\u003eMcLean, CY, Bristor, D, Hiller, M, Clarke, SL, Schaar, BT, Lowe, CB, Wenger, AM. Bejerano, G. 2010. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28: 495-501.\u003c/li\u003e\n\u003cli\u003eMcLean CY, Reno PL, Pollen AA, Bassan AI, Capellini TD, Guenther C, Indjeian VB, Lim X, Menke DB, Schaar BT, Wenger AM, Bejerano G, Kingsley DM. 2011. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature 471: 216-9.\u003c/li\u003e\n\u003cli\u003ePatoori S, Barnada SM, Large C, Murray JI, Trizzino M. 2022. Young transposable elements rewired gene regulatory networks in human and chimpanzee hippocampal intermediate progenitors. Development. 149: dev200413. doi: 10.1242/dev.200413.\u003c/li\u003e\n\u003cli\u003ePontis J, Planet E, Offner S, Turelli P, Duc J, Coudray A, et al. 2019. Hominoid-Specific Transposable Elements and KZFPs Facilitate Human Embryonic Genome Activation and Control Transcription in Naive Human ESCs. Cell Stem Cell. 24:724\u0026ndash;735.e5. https://doi.org/10.1016/j.stem.2019.03.012 .\u003c/li\u003e\n\u003cli\u003eRayan NA, del Rosario RCH, Prabhakar S. 2016. Massive contribution of transposable elements to mammalian regulatory sequences. Semin Cell Dev Biol. 57: 51\u0026ndash;6. https://doi.org/10.1016/j.semcdb.2016.05.004.\u003c/li\u003e\n\u003cli\u003eSasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, Kokubo N, et al. 2008. Possible involvement of SINEs in mammalian-specific brain formation. Proc Natl Acad Sci U S A. 105: 4220\u0026ndash;5. https://doi.org/10.1073/pnas.0709398105 .\u003c/li\u003e\n\u003cli\u003eSchmid CD, Bucher P. 2010 MER41 repeat sequences contain inducible STAT1 binding sites. PLoS ONE 5, (doi:10.1371/journal.pone.0011425)\u003c/li\u003e\n\u003cli\u003eSchmidt D et al. 2012 Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335\u0026ndash;348. (doi:10.1016/j.cell.2011.11.058)\u003c/li\u003e\n\u003cli\u003eSundaram V, Cheng Y, Ma Z, Li D, Xing X, Edge P, et al. 2014. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24: 1963\u0026ndash;76. https://doi.org/10.1101/gr.168872.113 .\u003c/li\u003e\n\u003cli\u003eSundaram V, Wysocka J. 2020 Transposable elements as a potent source of diverse cis-regulatory sequences in mammalian genomes. Phil. Trans. R. Soc. B375: 20190347. http://dx.doi.org/10.1098/rstb.2019.0347\u003c/li\u003e\n\u003cli\u003eXie Z, Bailey A, Kuleshov MV, Clarke DJB., Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM, Jeon M, \u0026amp; Ma\u0026rsquo;ayan A. Gene set knowledge discovery with Enrichr. Current Protocols, 1, e90. 2021. doi: 10.1002/cpz1.90\u003c/li\u003e\n\u003cli\u003eWang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, Burgess SM, Brachmann RK, Haussler D. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18 613\u0026ndash;18 618. (doi:10.1073/pnas. 0703637104)\u003c/li\u003e\n\u003cli\u003eWang J, Xie G, Singh M, Ghanbarian AT, Rask\u0026oacute; T, Szvetnik A, Cai H, Besser D, Prigione A, Fuchs NV, Schumann GG, Chen W, Lorincz MC, Ivics Z, Hurst LD, Izsv\u0026aacute;k Z. 2014. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. \u003cem\u003eNature \u003c/em\u003e\u003cstrong\u003e516\u003c/strong\u003e:405\u0026ndash;409. 1248 https://doi.org/10.1038/nature13804\u003c/li\u003e\n\u003cli\u003eWang L, Rishishwar L, Mari\u0026ntilde;o-Ram\u0026iacute;rez L, Jordan IK. 2016. Human population specific\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 5 are available in the Supplementary Files section.\u003c/p\u003e\n"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"OncoScar, LLC","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"transcription factor binding sites (TFBS), transposable elements (TE), human endogenous retrovirus type H (HERVH), human endogenous retrovirus type L (HERVL), human endogenous retrovirus type K (HERVK), LTR7, MLT2A1, MLT2A2, LTR5_Hs/HERVK, LINE, SINE/Alu, retrotransposition, primate evolution, mammalian offspring survival genes, human embryogenesis, brain development regulatory regions, human endogenous complexomes, viral-host protein-protein interactions, neoplasm metastasis, neurodevelopmental disorders, neurodegenerative diseases, human-specific phenotypic traits, human-specific regulatory sequences.","lastPublishedDoi":"10.21203/rs.3.rs-5442388/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5442388/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEmergence during mammalian evolution of common and divergent traits of genomic regulatory networks (GRNs) encompassing ubiquitous, compositionally nearly identical yet quantitatively distinct panels of DNA sequences of transcription factor binding sites (TFBS) for 716 proteins is reported. The evolutionary-conserved foundation of these GRNs appears assembled from arrays of DNA codes for ~770 TFBS, including 65 instances of immediately adjacent TFBS for two distinct TFs. A majority of protein constituents of these GRNs (770 of 716; 98%) is defined by Gene Ontology (GO) features of sequence-specific double-stranded DNA binding (GO: 1990837). Genome-wide and individual chromosome-level analyses of 17,935 ATAC-seq-defined brain development regulatory regions (BDRRs) revealed nearly universal representations of TFBS for TF-constituents of these networks, TFBS densities of which appear consistently higher within thousands BDRRs of Modern Humans compare to Chimpanzee. Transposable elements (TE), including LTR/HERV, SINE/Alu, SVA, and LINE families, appear to harbor consensus regulatory nodes of identified herein highly conserved sequence-specific double-stranded DNA binding networks. Notably, selections of quantitative features of TFBS panels of these GRNs manifest individual chromosome-specific profiles and species-specific divergence patterns. Collectively, this contribution highlights a previously unrecognized essential function of genomic DNA sequences derived from multiple TE families in providing genome-wide regulatory seed templates of sequence-specific double-stranded DNA binding GRNs. Since DNA sequences of TFBS panels for 716 proteins are encoded by transposons that remain active in genomes of present day humans, namely SVA and LINE families, retrotransposition-mediated spread of seeds for these GRNs may contribute to continuing divergent genomic evolution of human and chimpanzee brain development.\u003c/p\u003e","manuscriptTitle":"Highly conserved sequence-specific DNA binding networks of 716 transcription factors associated with continuing divergent genomic evolution of human and chimpanzee brain development.","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-15 04:34:06","doi":"10.21203/rs.3.rs-5442388/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"4fee1f2e-df8a-4be9-a052-f427a35199a4","owner":[],"postedDate":"November 15th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":40177556,"name":"Epigenetics \u0026 Genomics"}],"tags":[],"updatedAt":"2024-11-15T04:34:06+00:00","versionOfRecord":[],"versionCreatedAt":"2024-11-15 04:34:06","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5442388","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5442388","identity":"rs-5442388","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00