Ciselement length variability does not confer differential transcription factor occupancy at theD. melanogasterhistone locus

preprint OA: closed CC-BY-NC-4.0
📄 Open PDF Full text JSON View at publisher
Full text 56,540 characters · extracted from oa-pdf · 6 sections · click to expand

Abstract

Histone genes require precise regulation to maintain histone homeostasis and ensure nucleosome stoichiometry. Animal histone genes often have unique clustered genomic organization. However, there is variability of histone gene number and organization as well as differential regulation of the histone genes across species. The Drosophila melanogaster histone locus has unique organizational characteristics as it exists as a series of ~100 highly regular, tandemly repeated arrays of the 5 replication-dependent histone genes at a single locus. Yet D. melanogaster are viable with only 12 transgenic histone gene arrays. We hypothesized that the histone genes across the locus are differentially regulated. We discovered that the GA-repeat within the H3/H4 promoter is the only variable sequence across the histone gene arrays. The H3/H4 promoter GA-repeat is targeted by CLAMP to promote histone gene expression. We also show two additional GA-binding transcription factors, GAGA Factor and Pipsqueak, target the GA-repeat. When we further examined CLAMP and GAF targeting, we determined that neither CLAMP nor GAF show bias for any GA-repeat lengths. Furthermore, we found that the distribution of GA-repeats targeted by both CLAMP and GAF do not change throughout early development. Together our results suggest that the transcription factors targeting the H3/H4 GA- repeat do not impact differential regulation of the histone genes, but indicate that future studies should interrogate additional cis elements or factors that impact histone gene regulation. .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 3

Introduction

Histone genes need to be strictly regulated so there are neither too few nor too many histones at any given time in the cell. The canonical histone genes, H3, H4, H2A, H2B, and H1, are replication dependent; their regulation is strictly coupled to the cycle. Histone genes are expressed during S phase to package newly replicated DNA followed by halting of the gene expression by the end of G2. In part due to the requirement for strict coordinated cell cycle regulation, animal histone genes often have unique clustered genomic organization; however, there is variability of histone gene organization and differential regulation across species. Histone genes were originally cloned and sequenced from purple and green sea urchin, S. purpuratus and P. miliaris, respectively, in the late 70s from which we learned sea urchin genomes have two sets of histone genes. The first set consists of a tandem repeat of the five canonical histone genes termed the “early histone genes'' and a second set of 39 genes that are separated from the early genes, termed the “late histone genes” (Marzluff et al. 2006). These two histone gene sets are differentially regulated based on cell type and timing. The early histone genes are only expressed in the egg through the blastula stage whereas the late histone genes are expressed during late embryogenesis and continue expression through adulthood in all somatic cells (Marzluff et al. 2006). The human genome also carries two clusters of histone genes, a major cluster on chromosome 6 and a minor cluster on chromosome 1. All H1 genes are located in the major cluster which is spread across several megabases and contains ~60 histone genes in smaller sub-clusters while the minor cluster only contains around 10-12 histone genes (Seal et al. 2022; Ghule et al. 2023). Recent Hi-C data shows there are distinct promoter-promoter interactions between the subclusters of the major histone locus on chromosome 6, which suggest regulatory mechanisms could be different between the major and minor locus (Carty et al. 2017; Ghule et al. 2023). Transcription factors that regulate histone genes are shared between these loci, however the loci are differentially regulated to ensure there are correct stoichiometries of H3, H4, H2A, H2B and H1. The major histone locus also associates with the Cajal body throughout the cell cycle whereas the minor locus only associates with it during S phase (Ma et al. 2000; Shopland et al. 2001). From work in human embryonic stem cells, the H4 genes may have distinct regulation patterns between he major and minor loci based on tumor cell type. However, the patterns of H4 gene expression show only minor differences between loci in embryonic stem cells. This suggests that the overall contribution of histone transcripts from the major and minor loci are similar, implying that there are mechanisms of differential regulation that maintain this equilibrium despite differences in histone gene copy number between the loci (Becker et al. 2007). Fission yeast are an even more extreme example of how histone genes are differentially regulated. Fission yeast genomes contain three pairs of H3-H4 genes, along with just a single pair of H2Aalpha-H2B, and a lone H2Abeta gene. A study investigating the three pairs of H3-H4 genes found that the first and third pair are up-regulated while the second pair is normally downregulated, exhibiting oscillation of expression through the cell cycle (Takayama and Takahashi 2007). .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 4 The histone genes in Drosophila melanogaster are a unique example of clustered histone gene organization. The D. melanogaster genome carries a single repetitive histone locus on chromosome 2L. Based on recent locus assembly from long-read sequencing (Bongartz and Schloissnig 2019), the D. melanogaster histone locus includes approximately 100 tandemly repeated histone gene arrays. Each 5kb array includes the five canonical histone genes, H1, H3, H4, H2A and H2B, along with their respective cis regulatory elements and promoters. H3 and H4 share a bi-directional promoter that contains an important GA-repeat cis element critical for histone gene expression (Salzler et al. 2013; Rieder et al. 2017; Hodkinson et al. 2024). D. melanogaster can survive with a 12-array histone transgene (Günesdogan et al. 2010; Salzler et al. 2013; McKay et al. 2015) indicating that not every gene is necessary. The studies from yeast, sea urchin, and even humans led us to hypothesize that individual histone genes or groups of genes are differentially regulated based on developmental timing, gene copy number, and number of loci. To explore how the arrays at the endogenous D. melanogaster histone locus might be functionally different, we utilized a recent histone locus assembly completed through long-read sequencing (Bongartz and Schloissnig 2019) to search for sequence differences between the histone gene arrays. We discovered that the arrays are nearly identical in sequence, but the GA- repeat in the s H3/H4 promoter is variable in length ranging from 16-35 nucleotides. The H3/H4 promoter sequence can nucleate recruitment of specific histone regulatory factors, and the GA- repeat is specifically targeted by the transcription factor CLAMP (Salzler et al. 2013; Rieder et al. 2017; Koreski et al. 2020). Further, we recently confirmed that the GA-repeats are critical for histone locus factor recruitment (Rieder et al. 2017; Hodkinson et al. 2024). Therefore, we hypothesized that histone genes might experience differential regulation thorough variability of the GA-repeat and transcription factor occupancy. To test this hypothesis, we obtained existing ChIP-seq data from CLAMP and other GA-repeat binding factors, GAGA Factor (GAF) and Pipsqueak (Psq), and investigated their differential occupancy over the histone GA-repeats. We discovered that all three factors bind the range of GA-repeat lengths and, furthermore, show that CLAMP and GAF are unbiased in the GA-repeat lengths they bind. Our discovery of variable GA-repeats at the histone locus uncovered a previously unknown distinction of the ~100 histone gene arrays and may provide a target for future studies on histone array uniqueness and functionality. Furthermore, our observations suggest that the GA-repeat variability likely does not contribute to differential occupancy of transcription factors at histone gene arrays and implies other cis elements or cofactors that might contribute to differential histone gene regulation.

Results

The GA-repeat is variable in length across the histone gene arrays. Bongartz et al. (2019) produced a de novo assembly of the Drosophila melanogaster repetitive histone gene locus, identifying that the locus contains ~107 histone gene arrays, in alignment with previous estimates (Lifton et al. 1978; McKay et al. 2015) and recently confirmed by another group (Shukla et al. 2024). We aligned the gene arrays (Figure 1A) and discovered that they are nearly identical in sequence other than length variability of a GA-repeat present in the bidirectional promoter of genes H3 and H4 (Figure B). The GA-repeat varies in length from 16 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 5 base pairs to 35 base pairs (Figure 1B, C). The most common GA-repeat length is 21 bp (29 of the 107 arrays). We found some clustering of arrays with similar length GA-repeats such as those that have GA-repeats with 29 or 31 bp GA-repeats (Figure 1D). Figure 1: The GA-repeat length is variable across the histone gene arrays. (A) A diagram of a single histone gene array and the GA-repeat located in the H3/H4p. (B) We utilized previously assembled histone locus from Bongartz et al. (2019) to compare the sequences of the histone gene arrays. The arrays are virtually identical other than the GA-repeat in the H3/H4p, which varies in length. (C) We aligned six of the 300 bp H3/H4p (arrays 15-20, TATA boxes in maroon). Other than a single SNP (purple), the GA-repeat remains the only sequence variability. (D) A heatmap shows the positions of different GA-repeat lengths across the locus. Each array is represented by one vertical bar. (E) We designed primers to amplify the H3/H4p of the histone arrays to confirm the variability of the GA-repeat in vivo. Laddering of PCR products in an acrylamide gel confirmed GA-repeat variability. To confirm the GA-repeat variability in vivo, we designed primers to amplify 115 bp of the endogenous H3/H4p region that includes the GA-repeat region. PCR from genomic DNA is predicted to produce amplicons ranging from 110 bp (16 bp GA-repeat) to 129 bp (35 bp GA- repeat). We observed the expected laddering of PCR products on an acrylamide gel, confirming GA-repeat length variability in vivo (Figure 1E). We noticed several amplicons that exceeded the predicted length, possibly due to secondary structure forming due to the GA-repeat. .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 6 CLAMP, GAF, and Psq all target the GA-repeats in the H3/H4p Our observations indicate that the most dramatic sequence difference across the histone arrays is the wide variability of the GA-repeat length. We previously demonstrated that this sequence is targeted by the CLAMP transcription factor (Rieder et al. 2017) and that the interaction is important for HLB factor recruitment and histone gene expression (Rieder et al. 2017; Hodkinson et al. 2024). However, the Drosophila genome encodes two other GA-repeat binding transcription factors: GAGA Factor (GAF) and Pipsqueak (Psq) (Lehmann et al. 1998; van Steensel et al. 2003) (Figure 2A). We therefore hypothesized that these other GA-repeat binding factors also target the GA-repeats in the histone gene array. To test our hypothesis, we aligned previously generated ChIP-sequencing data to the histone gene array (Gutierrez-Perez et al. 2019; Gaskill et al. 2021; Duan et al. 2021). Figure 2: GAF, CLAMP, and Psq target the H3/H4p GA-repeat. (A) The binding motifs for GAF, CLAMP and Psq all contain GA-repeats. Binding motifs for GAF and CLAMP generated by the open access database JASPAR (Castro-Mondragon et al. 2022) and Psq binding motif recreated from Gutierrez-Perez et al. 2019. (B) We aligned ChIP-seq data for GAF (pink, two replicates overlayed (Gaskill et al. 2021)) in 2-3 hr embryos, CLAMP (green, three replicates overlayed (Duan et al. 2021)) in 2-4 hr embryos, and Psq (purple, two replicates overlayed (Gutierrez-Perez et al. 2019)) in 2-4 hr embryos to the single histone gene array. We normalized GAF and CLAMP data to respective inputs. We did not normalize Psq data because no inputs were provided in the original dataset. All three factors target the GA-repeat in the H3/H4p of the histone gene array. A representative input (blue) from the GAF and CLAMP ChIP-seq data is shown for comparison. Because of the repetitive nature of the histone locus, aligning sequencing data such as ChIP-seq becomes impractical as each read would map to more than one or even all of the ~ 100 arrays. Historically, to align sequencing data to the histone gene array, we utilized a condensed or Psq Binding Motif GAF Binding Motif CLAMP Binding Motif GAF normalized CLAMP normalizedPsq H3 H4 H2A H2B H1 Input A B .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 7 custom version of the histone gene array, similar to the single histone gene array in McKay et al. (2015). Using the condensed histone gene array also means there is only one GA-repeat cis length (21 bp). First, we confirmed that CLAMP robustly targets the H3/H4p GA-repeat (Figure 2B). CLAMP shows a clear peak over this region, as previously observed (Rieder et al. 2017; Koreski et al. 2020). Previously published GAF ChIP data from 2-3 hr embryos (Gaskill et al. 2021) and Psq data from Drosophila embryonic stem cells (Kc167 cells (Gutierrez-Perez et al. 2019)) also show a clear peak at the H3/H4p. Based on these data, CLAMP, GAF and Psq all target the histone locus. The factors may compete with each other or act synergistically. GA-binding factors do not show preference for GA-repeat length at the histone locus Although the ChIP peaks shown in Figure 2 for all three GA-repeat binding factors indicate that they target the histone genes, we cannot deduce which array(s) they target because we are only looking at the data aligned to a single histone array rather than the entire locus. We next wanted to deduce what arrays each of the three GA-repeat binding factors might target by examining whether they have a bias for certain length GA-repeats. Because the GA-repeats are the only sequence that differs between the ~ 100 histone gene arrays, determining what length GA-repeats CLAMP, GAF, and Psq target can help us infer which arrays they target. CLAMP shows preference for binding longer GA-repeats on the X chromosome while GAF shows preference for short GA-repeats (Kaye et al. 2018). In vitro, CLAMP binds DNA probes with long GA- repeats up to 30 nucleotides in length by EMSA, whereas GAF will only shift probes with shorter GA-repeats of 8 nucleotides (Kaye et al. 2018). Therefore, we hypothesized that these GA-repeat binding factors might target different histone arrays based on their binding preference for GA-repeat length. We developed a bioinformatics script that selected H3/H4p sequences from the ChIP-seq dataset by defining two anchor sequences, one upstream (5’) and one downstream (3’) of the GA-repeat with enough length to ensure specificity to the H3/H4p. The code then extracts the reads that match both anchors, scans to identify the GA-repeat and counts the number of nucleotides that make up the GA-repeat in that read. We utilized the ChIP input as a positive control, hypothesizing that we would confirm the GA-repeat lengths and frequencies we retrieved from the long-read sequencing results (Figure 1B). 0 10 20 30 40 50 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 0 10 20 30 40 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 Frequency (percentage, %) Input (0-2hr Embryos) Input (2-4hr Embryos) A B GA-repeat Length (nucleotides) Frequency (percentage, %) GA-repeat Length (nucleotides) .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 8 Figure 3: The H3/H4 promoter GA-repeat length variability is observed in different datasets. We designed a bioinformatics code to parse ChIP-seq datasets, extract reads containing the H3/H4 promoter GA-repeat and count the number of nucleotides within the repeat. We extracted reads from input ChIP-seq datasets of (A) 0-2 hr embryos (three replicates) and (B) 2-4 hr embryos (three replicates) and created histograms based on the GA-repeat lengths. The X-axis shows all GA-repeat lengths, and the Y-axis is the frequency each length was found represented as a percentage of extracted reads which contained that GA-repeat length. Data from Duan et al. 2021. When we generated histograms for GA-repeat length frequency from the input libraries of 0-2 hr and 2-4 hr embryos, we observed that the distribution of GA-repeat lengths mirrored the distribution we found from the long read Bongartz et al. (2019) data (Figure 1B). However, we did notice a few differences. We identified some GA-repeats that were shorter than expected due to SNPs in the middle of the GA-repeat (Supplementary Figure 1). In addition, we noticed some minor differences in the frequencies of element lengths (Figure 1B vs. Figure 3). This is likely due to genotype, as the Bongartz assembly was obtained from OregonR Drosophila, while the ChIP-seq datasets were produced from yellow-, white- maternal triple GAL-4 driver (MTD- GAL4, Bloomington, #31777) Drosophila (Ni et al. 2011). Because large, repetitive regions of the genome are subjected to frequent expansion and contraction due to unequal crossing over (Smith 1976; Shukla et al., 2024), few Drosophila strains have exactly the same GA-repeat length distribution (Shukla et al., 2024). Even individuals within an interbreeding population may have different numbers of arrays and therefore frequencies of GA-repeat lengths. Overall, however, we confirmed that the variability and length distribution of the histone locus GA- repeats is relatively reproducible. To determine the binding profiles of CLAMP and GAF at the variable GA-repeat, we used an available CLAMP ChIP-seq dataset from Duan et al. (2021) and generated a GAF ChIP dataset, both of which include data from 0-2 and 2-4 hr Drosophila embryos. These are relevant time points for histone gene expression; the early Drosophila embryo undergoes 14 nuclear division cycles during which the entire genome is replicated every 8-12 minutes. Therefore, a large number of histones are rapidly required (Tadros and Lipshitz 2009; Farrell and O’Farrell 2014; Harrison and Eisen 2015). The histone genes are targeted by specific factors as early as nuclear cycle 9 (Terzo et al. 2015), and zygotic histone genes are expressed by nuclear cycle 11 (Edgar and Schubiger 1986). CLAMP is maternally deposited and targets the histone locus in the early embryo, prior to detectable histone gene expression (Rieder et al. 2017). GAF is not thought to target the zygotic histone locus unless CLAMP is depleted (Rieder et al. 2017), although we discovered that it likely does so, at least from some datasets (Figure 2). .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 9 Figure 4: GAF and CLAMP target the same length GA-repeats. We extracted reads containing the H3/H4 promoter GA-repeat from ChIP-seq data for GAF in (A) 0-2 hr embryos (three replicates) and (B) 2-4 hr embryos (three replicates). We also extracted reads contain the H3/H4 promoter GA-repeat from ChIP-seq data for CLAMP in (C) 0-2 hr embryos (three replicates) and (D) 2-4 hr embryos (three replicates). In all histograms, the X-axis shows GA- repeat lengths, and the Y-axis is the average frequency each length was found represented as the percentage of extracted reads. Using these embryonic ChIP-seq datasets, we investigated the frequencies of GA-repeat lengths targeted by CLAMP and GAF. We found that all GA-repeat lengths were bound by CLAMP, which does not show any bias for specific GA-repeat lengths despite preferring long X-linked GA-repeats (Kaye et al. 2018). Furthermore, CLAMP seems to target each of the GA-repeat lengths at similar frequencies to their respective counts across the locus (input, Figure 3). Lastly, we found no difference between the distribution of GA-repeat lengths targeted by CLAMP based on age of embryo, suggesting that developmental timing does not impact CLAMP occupancy at the histone locus GA-repeats (Figure 4C, D). We next performed a similar analysis for GAF and retrieved similar results. GAF showed no bias for specific GA-repeat lengths in either 0-2 or 2-4 hr embryos (Figure 4A, B). Further, we found that GAF also seems to bind each of the GA-repeat lengths at similar frequencies to their respective counts across the locus (input, Figure 3) similar to CLAMP (Figure 4C, D). Although we identified a previously generated Psq ChIP-seq dataset from 3 hr embryos, we were unable to interrogate GA-repeat binding preference due to the short length (50 bp) of the 0 10 20 30 40 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 0 10 20 30 40 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 0 10 20 30 40 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 0 10 20 30 40 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 Frequency (percentage, %) GAF ChIP (0-2hr Embryos) GAF ChIP (2-4hr Embryos) CLAMP ChIP (0-2hr Embryos) CLAMP ChIP (2-4hr Embryos) A C B D GA-repeat Length (nucleotides) Frequency (percentage, %) GA-repeat Length (nucleotides) Frequency (percentage, %) GA-repeat Length (nucleotides) Frequency (percentage, %) GA-repeat Length (nucleotides) .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 10 sequencing reads, which is not sufficient to identify reads that contain both the anchor sequences and the GA-repeat.

Discussion

The Drosophila melanogaster histone locus comprises ~100 virtually identical histone gene arrays and is regulated by a unique nuclear body. It is unknown whether all ~100 histone genes are all targeted by the same transcription factors and produce the same mRNA output, as we are unable to map histone transcripts to their genes of origin. Some evidence points toward differential expression of genes. Animals carrying 12-array transgenes in the background of an endogenous locus deletion are viable (Günesdogan et al. 2010; McKay et al. 2015; Zhang et al. 2019) and express histone mRNAs at the same level as the endogenous locus, indicating that 100 genes are not required for viability. Other species, including other drosophilids, have varying numbers of histone genes in differing genomic arrangements, such as the closely related D. simulans whose genome carries only 15 histone arrays (unpublished) or ~40 MYa diverged D. virilis whose genome carries two histone loci which combined only contain 32 arrays (Russo et al. 1995; Schienman et al. 1998; Xie et al. 2022). It is difficult to assay how the D. melanogaster genes might be differentially expressed, as the histone coding sequences are virtually identical. We therefore sought to uncover mechanisms for differential histone gene regulation by investigating sequence differences between arrays. Using a recent long-read histone locus assembly (Bongartz et al. 2019), we discovered that the GA-repeat in the H3/H4 promoter is variable in length across the histone locus, while the rest of the 5 kb arrays are nearly identical in sequence. We previously demonstrated that CLAMP targets the GA-repeat in the H3/H4 promoter and confirmed that the GA-repeats are important for histone locus factor recruitment (Rieder et al. 2017; Hodkinson et al. 2024). These observations indicated the importance of the GA-repeat in overall histone gene regulation, so we hypothesized that this sequence variability might be functionally important in recruiting different transcription factors. We found that all GA-binding transcription factors target the element, but that none seems to have a bias for longer or shorter repeats. GA dinucleotide repeats are fairly common in many genomes and serve a variety of functions. Short tandem repeats (STRs) are commonly found in core promoter sequences to serve as targets for transcription factors or pioneer factors like GAF and CLAMP (Duan et al. 2021), which displace nucleosomes to ready the gene for transcription. Recent work looking at a conserved GA-repeats in the core promoter of early human embryonic development genes shows that differences in GA-repeat length at these genes can cause differences in expression levels (Valipour et al. 2013). These data confirm that GA-repeat length itself is sufficient to drive differential gene expression and, furthermore, may imply that the length of the GA-repeat in the histone gene arrays may also impact differential expression of the histone gene, even if this is not due to changes in CLAMP, GAF, or Psq binding. GA dinucleotide repeats can also serve as insulators. GAF binding at GA-repeats is critical for insulation between genes and unrelated, neighboring enhancer sequences (Lehmann 2004; Gaskill et al. 2021). In mice, GA-repeat motifs within the Hox gene clusters are nucleosome-free and, when GAF targets these regions, chromatin boundaries are established to create domains so the Hox genes are insulated from their neighboring regulatory elements (Srivastava et al. 2013). .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 11 Similarly, in D. melanogaster GAF localizes to the Fab-7 boundary element from the Hox genes Ubx, Abd-A and Abd-B. GAF can target GA-repeats at the Fab-7 element ,which can determine its function as an insulator at different developmental time points (Schweinsberg et al. 2004). It is possible that the GA-repeat in the histone array acts as an insulator and, although it is located within the H3/H4 promoter, it may serve multiple functions as the target for binding factors at a subset of arrays and as an insulator for others to modulate the expression if histone genes in different arrays. Studies in sea urchins, which have two clusters of histone genes that are differentially regulated, show that the specific downregulation of the “early” H2A genes is regulated by an upstream (5’) GA-repeat serving as an insulator (Di Caro et al. 2004). This study suggests that the regulation of individual histone genes can be governed by cis elements. Furthermore, this data emphasizes that dinucleotide repeats, and specifically GA-repeats, have important functions across species and in many genomic contexts. Our observations suggest the GA-repeat may not impact differential expression of the Drosophila histone genes, however here we did not explore if GA-repeat length impacted individual histone gene expression levels. The repetitiveness of the histone locus makes it impossible to assess the expression of individual histone genes because there is little coding sequence. Future experiments could leverage a “barcoded” 12-array transgene where silent mutations in the histone coding sequences allow determination of differential expression from each gene. Using this system, we could investigate how GA-repeat length impacts histone gene expression rather than just GA-repeat binding factor occupancy. GA-repeats are only one cis element that can affect gene expression and it is likely that there are secondary or several additional cis elements that are responsible for regulating histone genes (Horton et al. 2022; Hodkinson et al. 2023). The work here focused specifically on the GA- repeat as it is important for CLAMP binding and the only variable sequence between the arrays. However, additional cis elements in the H3/H4 or H2A/H2B promoter could impact differential gene regulation. Furthermore, it is possible that all the arrays are targeted by GA-repeat binding factors, but additional transcription factors then activate the genes. Here, we only consider the occupancy of the GA-repeat binding factors in the histone arrays, yet a body of factors regulates histone gene expression, known as the histone locus body (HLB) (Duronio and Marzluff 2017), the full composition of which is still unknown. Future studies exploring what other DNA-binding factors target the histone arrays, like the recently published screen from Hodkinson et al. 2024, as well as investigating how differential targeting may impact histone gene expression, will provide greater understanding of the intricacies involved in histone gene regulation. By utilizing the previously assembled histone locus sequencing data (Bongartz and Schloissnig 2019), we revealed the H3/H4 promoter GA-repeat cis element is the only variable sequence between the ~100 gene arrays. We leveraged previously published ChIP-sequencing datasets and determined that the variability in the GA-repeat does not impact the occupancy of GAF and CLAMP and suggests that these factors alone are not responsible for any differential regulation of the histone genes. Overall, our results have expanded our understanding of the sequence features of the D. melanogaster histone locus and given insight into histone gene regulation mechanisms. .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 12

Methods

Promoter Alignment We obtained the H3/H4 promoter sequences from the Bongartz et al. (2019) genome assembly (Figure 2A) and used reads extracted from input ChIP-sequncing data (Supplemental Figure 2). We aligned sequences using T-Coffee Multiple Sequence Alignment (Notredame et al. 2000) to create a ClustalW output and formatted the shading and features with Jalview (Waterhouse et al. 2009). ChIP-analysis and Data Visualization - IGV plots We directly imported individual FASTQ datasets into the web-based platform Galaxy (The Galaxy Community 2022) through the NCBI SRA Run Selector by selecting the desired runs and utilizing the computing Galaxy download feature. We retrieved the FASTQ files from SRA using the “Faster Download and Extract Reads in FASTQ format from NCBI SRA” Galaxy command. Because the ~100 histone gene arrays are extremely similar in sequence, we do not utilize the dm6 or dm3 genomes and instead collapse ChIP-seq data onto a single histone array. We used a custom “genome” that includes a single Drosophila melanogaster histone array similar to that in McKay et al. (2015), which we directly uploaded to Galaxy using the “upload data” feature, and normalized using the Galaxy command “NormalizeFasta” specifying an 80 bp line length for the output .fasta file. We aligned ChIP reads to the normalized histone gene array using Bowtie2 (Langmead and Salzberg 2012) to create .bam files using the user built-in index and “very sensitive end-to-end” parameter settings. We converted the .bam files to .bigwig files using the “bamCoverage” Galaxy command in which we set the bin size to 1 bp and set the effective genome size to user specified: 5000 bp (approximate size of l histone array). If an input dataset was available, we normalized ChIP datasets to input using the “bamCompare” Galaxy command in which we set the bin size to 1 bp. We visualized the .bigwig files using the Integrative Genome Viewer (Robinson et al. 2011). Table 1 ChIP-sequencing datasets. Specifics for the NCBI GEO datasets used including the GEO Accession number, the SRA Run selector numbers, the developmental time of each sample, and the cited source. Factor GEO Accession # SRA Run Selector Developmental Timepoint Citation GAF GAGA Factor (Trl) GSE152773 Anti GAF-GFP 1 -SRR12045586 2 - SRR12045588 Input 1- SRR12045585 2 - SRR12045587 2-3 hr, stage 5 embryos (Gaskill et al. 2021) CLAMP Chromatin linked adaptor GSE152613 CLAMP antibody 1 - SRR12024931 2 - SRR12024949 3 - SRR12024967 Input 2-4 hr embryos (Duan et al. 2021) .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 13 for MSL proteins 1 - SRR12024933 2 - SRR12024951 3 - SRR12024969 Psq Pipsqueak GSE118047 PsqM (PsqTot) antibody 1- SRR7638403 2 - SRR7638404 Kc167 Drosophila embryonic cell line (Gutierrez- Perez et al. 2019) ChIP-analysis and Bioinformatics Pipeline - GA-Repeat Histograms Our annotated pipeline is available on GitHub (https://github.com/rieder-lab/count_GA) in the script entitled count_ag_repeats.py. We utilized packages SeqIO from biopython (Cock et al. 2009) to parse through fastq files and regex from anaconda or pip for all functional outputs. We also utilized logging from anaconda or pip to create a built-in log for the run as an informational output. We designed the code to first identify, and extract reads that contain the H3/H4 promoter sequence by using two short, flanking “anchor” sequences to the left (5’) and the right (3’) of the GA-repeat (left sequence: TAGCAATCGT right sequence: CATTTCATTTGACGAGC). We used a counting mechanism to ensure that reads with both the left and the right anchor were extracted however there is also an information output for single matches. We then designed the code to scan through the extracted reads until encountering the specified string “AGAGAG” as a seed sequence for the GA-repeat. Once the GA-repeat is identified, we designed the code to count the number of nucleotides within the repeat. Of note, we designed the code to allow for 0 mismatches in the repeat which meaning repeats where two “A” nucleotides or two “G” nucleotides are adjacent to each other will only be counted until that “AA” or “GG” appears. We identified that there are a handful of GA-repeats that contains SNPs causing “AA” or “GG” stretches (Supplementary Figure 1). However, this feature of the pipeline is changeable to allow for any specified number of mismatches. The script outputs 6 files to a specified path destination. These outputs include a .tsv file with four columns of information; the first column is nucleotide count of the GA-repeat, the second column has the extracted repeat itself, and the third column with the trimmed read where the repeat originated, and the last column has the sequence ID (Supplementary Figure 2). This file allows confirmation of the GA-repeat nucleotide counts as well as access to the reads the pipeline extracted. The other 5 files are .fastq.gz files that include reads from the script parsing through the entire sequencing file which include: dual_match.fastq.gz containing all the full length reads that had both ancho sequences, left_only.fastq.gz containing reads that only matched the left anchor sequence, no_match.fastq.gz, containing reads that did not have either anchor sequence, right only.fastq.gz containing reads that only matched the right anchor sequence, and strange_match.fastq.gz contain reads with unexpected configurations of the anchor sequences such as forward and reverse complements of these sequences. ChIP Analysis – GA-Repeat Histograms CLAMP ChIP-seq datasets from Duan et al. 2021 were retrieved from is deposited at NCBI GEO. The accession number is (GSE152598). GAF ChIP data was performed as described in Duan et al. 2021 with 10uL of GAF antibody (Fuda et al. 2015). Table 2 ChIP-seq data used to generate GA-repeat length histograms .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 14 Target TF Developmental Timepoint GEO Accession # SRA Run Selector # Input 0-2 hr embryo GSE152613 Input 1- SRR12024924 2 - SRR12024942 3 - SRR12024960 Input 2-4 hr embryo GSE152613 Input 1 - SRR12024933 2 - SRR12024951 3 - SRR12024969 GAF 0-2 hr embryo pending GAF antibody pending GAF 2-4 hr embryo pending GAF antibody pending CLAMP 0-2 hr embryo GSE152613 CLAMP antibody 1- SRR12024922 2 - SRR12024940 3 - SRR12024958 CLAMP 2-4 hr embryo GSE152613 CLAMP antibody 1 - SRR12024931 2 - SRR12024949 3 - SRR12024967 Acknowledgments We would like to thank the Rieder Lab for supporting this work. We would specifically like to thank Dr. Casey Schmidt for her invaluable insight into early experimental design and execution. We would also like to thank Dr. David Gorkin who wrote the Python pipeline and was an invaluable source of support and bioinformatics knowledge throughout this project. This work was supported by T32GM00008490 and F31HD105452 to LJH; and R00HD092625 and R35GM142724 to LER. .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 15 Supplemental Figures Supplementary Figure 1: The GA-repeat can contain a variety of mismatches and sequence variation. We aligned a representative set of reads extracted by our bioinformatics pipeline where the H3/H4 promoter GA-repeats contains SNPs or stretches of repeating A or G nucleotides. One of the TATA boxes is labeled in maroon and the GA-repeat is labeled in green. SNPs are shown in purple and stretches of A or G nucleotides are shown in teal. (Note these sequences have been extracted and trimmed by our Python script). Supplementary Figure 2: Sample .tsv file output for the GA-repeat counting bioinformatics pipeline. Our pipeline parses through sequence.fastq.gz files and extracts reads with the H3/H4 promoter GA-repeat and counts the number of nucleotides that make up the repeat. The main output file for this script is a .tsv file containing four columns. The first column specifies the number of nucleotides that make up the GA-repeat, the second column has the extracted repeat itself, and the third column with the trimmed read where the repeat originated, and the last column has the sequence ID. .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 16

References

Becker K. A., J. L. Stein, J. B. Lian, A. J. van Wijnen, and G. S. Stein, 2007 Establishment of histone gene regulation and cell cycle checkpoint control in human embryonic stem cells. Journal of Cellular Physiology 210: 517–526. https://doi.org/10.1002/jcp.20903 Bongartz P., and S. Schloissnig, 2019 Deep repeat resolution-the assembly of the Drosophila Histone Complex. Nucleic Acids Res 47: e18. https://doi.org/10.1093/nar/gky1194 Carty M., L. Zamparo, M. Sahin, A. González, R. Pelossof, et al., 2017 An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data. Nat Commun 8: 15454. https://doi.org/10.1038/ncomms15454 Cock P. J. A., T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, et al., 2009 Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25: 1422–1423. https://doi.org/10.1093/bioinformatics/btp163 Di Caro D., R. Melfi, C. Alessandro, G. Serio, V. Di Caro, et al., 2004 Down-regulation of early sea urchin histone H2A gene relies on cis regulative sequences located in the 5’ and 3’ regions and including the enhancer blocker sns. J Mol Biol 342: 1367–1377. https://doi.org/10.1016/j.jmb.2004.07.101 Duan J., L. Rieder, M. M. Colonnetta, A. Huang, M. Mckenney, et al., 2021 CLAMP and Zelda function together to promote Drosophila zygotic genome activation. eLife. Duronio R. J., and W. F. Marzluff, 2017 Coordinating cell cycle-regulated histone gene expression through assembly and function of the Histone Locus Body. RNA Biol 14: 726–738. https://doi.org/10.1080/15476286.2016.1265198 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 17 Edgar B. A., and G. Schubiger, 1986 Parameters controlling transcriptional activation during early Drosophila development. Cell 44: 871–877. https://doi.org/10.1016/0092- 8674(86)90009-7 Farrell J. A., and P. H. O’Farrell, 2014 From egg to gastrula: how the cell cycle is remodeled during the Drosophila mid-blastula transition. Annu Rev Genet 48: 269–294. https://doi.org/10.1146/annurev-genet-111212-133531 Fuda N. J., M. J. Guertin, S. Sharma, C. G. Danko, A. L. Martins, et al., 2015 GAGA Factor Maintains Nucleosome-Free Regions and Has a Role in RNA Polymerase II Recruitment to Promoters. PLOS Genetics 11: e1005108. https://doi.org/10.1371/journal.pgen.1005108 Gaskill M. M., T. J. Gibson, E. D. Larson, and M. M. Harrison, 2021 GAF is essential for zygotic genome activation and chromatin accessibility in the early Drosophila embryo, (Y. M. Yamashita, and K. Struhl, Eds.). eLife 10: e66668. https://doi.org/10.7554/eLife.66668 Ghule P. N., J. R. Boyd, F. Kabala, A. J. Fritz, N. A. Bouffard, et al., 2023 Spatiotemporal higher-order chromatin landscape of human histone gene clusters at histone locus bodies during the cell cycle in breast cancer progression. Gene 872: 147441. https://doi.org/10.1016/j.gene.2023.147441 Günesdogan U., H. Jäckle, and A. Herzig, 2010 A genetic system to assess in vivo the functions of histones and histone modifications in higher eukaryotes. EMBO Rep 11: 772–6. https://doi.org/10.1038/embor.2010.124 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 18 Gutierrez-Perez I., M. J. Rowley, X. Lyu, V. Valadez-Graham, D. M. Vallejo, et al., 2019 Ecdysone-Induced 3D Chromatin Reorganization Involves Active Enhancers Bound by Pipsqueak and Polycomb. Cell Rep 28: 2715-2727.e5. https://doi.org/10.1016/j.celrep.2019.07.096 Harrison M. M., and M. B. Eisen, 2015 Transcriptional Activation of the Zygotic Genome in Drosophila. Curr Top Dev Biol 113: 85–112. https://doi.org/10.1016/bs.ctdb.2015.07.028 Hodkinson L. J., C. Smith, H. S. Comstra, B. A. Ajani, E. H. Albanese, et al., 2023a A bioinformatics screen reveals hox and chromatin remodeling factors at the Drosophila histone locus. BMC Genom Data 24: 54. https://doi.org/10.1186/s12863-023-01147-0 Hodkinson L. J., J. Gross, C. A. Schmidt, P. P. Diaz-Saldana, T. Aoki, et al., 2023b Sequence reliance of a Drosophila context-dependent transcription factor. 2023.12.07.570650. Hodkinson L. J., J. Gross, C. A. Schmidt, P. P. Diaz-Saldana, T. Aoki, et al., 2024 Sequence reliance of the Drosophila context-dependent transcription factor CLAMP. Genetics iyae060. https://doi.org/10.1093/genetics/iyae060 Horton C. A., A. M. Alexandari, M. G. B. Hayes, E. Marklund, J. M. Schaepe, et al., 2022 Short tandem repeats bind transcription factors to tune eukaryotic gene expression. 2022.05.24.493321. Kaye E. G., M. Booker, J. V. Kurland, A. E. Conicella, N. L. Fawzi, et al., 2018 Differential Occupancy of Two GA-Binding Proteins Promotes Targeting of the Drosophila Dosage Compensation Complex to the Male X Chromosome. Cell Rep 22: 3227–3239. https://doi.org/10.1016/j.celrep.2018.02.098 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 19 Koreski K. P., L. E. Rieder, L. M. McLain, A. Chaubal, W. F. Marzluff, et al., 2020 Drosophila histone locus body assembly and function involves multiple interactions. Mol Biol Cell 31: 1525–1537. https://doi.org/10.1091/mbc.E20-03-0176 Langmead B., and S. L. Salzberg, 2012 Fast gapped-read alignment with Bowtie 2. Nat Methods 9: 357–359. https://doi.org/10.1038/nmeth.1923 Lehmann M., T. Siegmund, K. G. Lintermann, and G. Korge, 1998 The pipsqueak protein of Drosophila melanogaster binds to GAGA sequences through a novel DNA-binding domain. J Biol Chem 273: 28504–28509. https://doi.org/10.1074/jbc.273.43.28504 Lehmann M., 2004 Anything else but GAGA: a nonhistone protein complex reshapes chromatin structure. Trends in Genetics 20: 15–22. https://doi.org/10.1016/j.tig.2003.11.005 Lifton R. P., M. L. Goldberg, R. W. Karp, and D. S. Hogness, 1978 The organization of the histone genes in Drosophila melanogaster: functional and evolutionary implications. Cold Spring Harb Symp Quant Biol 42 Pt 2: 1047–51. https://doi.org/10.1101/sqb.1978.042.01.105 Ma T., B. A. Van Tine, Y. Wei, M. D. Garrett, D. Nelson, et al., 2000 Cell cycle-regulated phosphorylation of p220(NPAT) by cyclin E/Cdk2 in Cajal bodies promotes histone gene transcription. Genes Dev 14: 2298–2313. https://doi.org/10.1101/gad.829500 Marzluff W. F., S. Sakallah, and H. Kelkar, 2006 The sea urchin histone gene complement. Developmental Biology 300: 308–320. https://doi.org/10.1016/j.ydbio.2006.08.067 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 20 McKay D. J., S. Klusza, T. J. Penke, M. P. Meers, K. P. Curry, et al., 2015 Interrogating the function of metazoan histones using engineered gene clusters. Dev Cell 32: 373–86. https://doi.org/10.1016/j.devcel.2014.12.025 Ni J.-Q., R. Zhou, B. Czech, L.-P. Liu, L. Holderbaum, et al., 2011 A genome-scale shRNA resource for transgenic RNAi in Drosophila. Nat Methods 8: 405–407. https://doi.org/10.1038/nmeth.1592 Notredame C., D. G. Higgins, and J. Heringa, 2000 T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302: 205–217. https://doi.org/10.1006/jmbi.2000.4042 Rieder L. E., K. P. Koreski, K. A. Boltz, G. Kuzu, J. A. Urban, et al., 2017 Histone locus regulation by the Drosophila dosage compensation adaptor protein CLAMP. Genes Dev 31: 1494–1508. https://doi.org/10.1101/gad.300855.117 Robinson J. T., H. Thorvaldsdóttir, W. Winckler, M. Guttman, E. S. Lander, et al., 2011 Integrative Genomics Viewer. Nat Biotechnol 29: 24–26. https://doi.org/10.1038/nbt.1754 Russo C. A., N. Takezaki, and M. Nei, 1995 Molecular phylogeny and divergence times of drosophilid species. Mol Biol Evol 12: 391–404. https://doi.org/10.1093/oxfordjournals.molbev.a040214 Salzler H. R., D. C. Tatomer, P. Y. Malek, S. L. McDaniel, A. N. Orlando, et al., 2013 A sequence in the Drosophila H3-H4 Promoter triggers histone locus body assembly and .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 21 biosynthesis of replication-coupled histone mRNAs. Dev Cell 24: 623–34. https://doi.org/10.1016/j.devcel.2013.02.014 Schienman J. E., E. R. Lozovskaya, and L. D. Strausbaugh, 1998 Drosophila virilis has atypical kinds and arrangements of histone repeats. Chromosoma 107: 529–539. https://doi.org/10.1007/s004120050339 Schweinsberg S., K. Hagstrom, D. Gohl, P. Schedl, R. P. Kumar, et al., 2004 The Enhancer- Blocking Activity of the Fab-7 Boundary From the Drosophila Bithorax Complex Requires GAGA-Factor-Binding Sites. Genetics 168: 1371–1384. https://doi.org/10.1534/genetics.104.029561 Seal R. L., P. Denny, E. A. Bruford, A. K. Gribkova, D. Landsman, et al., 2022 A standardized nomenclature for mammalian histone genes. Epigenetics & Chromatin 15: 1–18. https://doi.org/10.1186/s13072-022-00467-2 Shopland L. S., M. Byron, J. L. Stein, J. B. Lian, G. S. Stein, et al., 2001 Replication-dependent histone gene expression is related to Cajal body (CB) association but does not require sustained CB contact. Mol Biol Cell 12: 565–576. https://doi.org/10.1091/mbc.12.3.565 Shukla H. G., M. Chakraborty, and J. J. Emerson, 2024 Genetic variation in recalcitrant repetitive regions of the Drosophila melanogaster genome. 2024.06.11.598575. Smith G. P., 1976 Evolution of Repeated DNA Sequences by Unequal Crossover. Science 191: 528–535. https://doi.org/10.1126/science.1251186 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 22 Srivastava S., D. Puri, H. S. Garapati, J. Dhawan, and R. K. Mishra, 2013 Vertebrate GAGA factor associated insulator elements demarcate homeotic genes in the HOX clusters. Epigenetics & Chromatin 6: 8. https://doi.org/10.1186/1756-8935-6-8 Steensel B. van, J. Delrow, and H. J. Bussemaker, 2003 Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding. Proc Natl Acad Sci U S A 100: 2580–2585. https://doi.org/10.1073/pnas.0438000100 Tadros W., and H. D. Lipshitz, 2009 The maternal-to-zygotic transition: a play in two acts. Development 136: 3033–42. https://doi.org/10.1242/dev.033183 Takayama Y., and K. Takahashi, 2007 Differential regulation of repeated histone genes during the fission yeast cell cycle. Nucleic Acids Res 35: 3223–3237. https://doi.org/10.1093/nar/gkm213 Terzo E. A., S. M. Lyons, J. S. Poulton, B. R. S. Temple, W. F. Marzluff, et al., 2015 Distinct self-interaction domains promote Multi Sex Combs accumulation in and formation of the Drosophila histone locus body. Mol Biol Cell 26: 1559–1574. https://doi.org/10.1091/mbc.E14-10-1445 The Galaxy Community, 2022 The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Research 50: W345– W351. https://doi.org/10.1093/nar/gkac247 Valipour E., A. Kowsari, H. Bayat, M. Banan, S. Kazeminasab, et al., 2013 Polymorphic core promoter GA-repeats alter gene expression of the early embryonic developmental genes. Gene 531: 175–179. https://doi.org/10.1016/j.gene.2013.09.032 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint 23 Waterhouse A. M., J. B. Procter, D. M. A. Martin, M. Clamp, and G. J. Barton, 2009 Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189–1191. https://doi.org/10.1093/bioinformatics/btp033 Xie M., L. J. Hodkinson, H. S. Comstra, P. P. Diaz-Saldana, H. E. Gilbonio, et al., 2022 MSL2 targets histone genes in Drosophila virilis. 2022.12.14.520423. Zhang W., X. Zhang, Z. Xue, Y. Li, Q. Ma, et al., 2019 Probing the Function of Metazoan Histones with a Systematic Library of H3 and H4 Mutants. Dev Cell 48: 406-419.e5. https://doi.org/10.1016/j.devcel.2018.11.047 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted June 28, 2024. ; https://doi.org/10.1101/2024.06.24.600460doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-NC-4.0