Histone diversity in the archaeal domain of life

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 172,411 characters · extracted from preprint-html · click to expand
Histone diversity in the archaeal domain of life | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Histone diversity in the archaeal domain of life Karolin Luger, Shawn Laursen This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6985588/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Archaea represent a distinct domain of life that is genetically and biochemically unique from bacteria and eukaryotes. Two-thirds of all archaea encode histones, proteins that are ubiquitously used to structure chromatin in eukaryotes. Archaeal histone sequences are much less conserved than their eukaryotic counterparts, yet insight into how they structure DNA is limited to only a few species that fail to represent the diversity of the archaeal domain. Archaea have adapted to the most diverse and extreme environments on our planet, requiring protection of the genome against a multitude of external pressures. Here, we use bioinformatics, structure prediction, and molecular dynamics simulations to survey the diversity of histone-like sequences in all available archaeal genomes and to understand how they might interact with DNA. We have identified five distinct types of histones which are combined in seven different strategies, involving either single histones, multiple histones of the same type, or combinations of several types of histones in one genome. We show that some strategies correlate with environmental pressures, and some are phylogenetically restricted. Despite highly divergent amino acid sequences, structure predictions and simulations suggest similar histone DNA binding modes for most classes. Our work provides a guide to efficiently survey diverse strategies for histone-based DNA organization in archaea using biophysical and structural approaches, for a complete view of the rich diversity of histone strategies in the archaeal domain in a targeted manner. Biological sciences/Computational biology and bioinformatics/Data mining Biological sciences/Molecular biology/Chromatin/Nucleosomes Histone Archaea Chromatin Evolution DNA binding protein Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Histones are small, highly basic proteins consisting of three α helices connected by two short loops (the ‘histone fold’) that form either homo- or heterodimers via a ‘handshake motif’ 1 , 2 . These proteins are present in genomes across all domains of life and are also found in some viruses 3 – 8 . The best-studied histones are from the eukaryotic domain where heterodimers of histone H2B-H2A and H3-H4 assemble into an octamer that wraps 147 base pairs of DNA to form nucleosomes 3 . Eukaryotic histones are highly conserved and ubiquitously present across the entire domain. Homologues of the four types of histones also are encoded in the genomes of several ancient double-stranded DNA viruses that infect amoeba 9 , 10 . The amino acid sequences of these histones are rather divergent amongst giant viruses, and differ in many ways from those of eukaryotes. While the overall topology of nucleosomes reconstituted from these viruses appears to be conserved (at least for the two distantly related viruses where this has been studied 6 , 7 , 11 ), the individual histone chains can be found in a variety of tandem, triple, and even quadruple combinations and in truncated forms in different viruses 10 . A subset of bacteria also have histone-like proteins, which were likely acquired through horizontal gene transfer 12 . While some of these are attached to other domains of mostly unknown function, many are standalone histones that are abundantly expressed and associated with the nucleoid 5 . Only two of these putative bacterial histones have been studied in detail and their interaction with DNA is markedly different to that of eukaryotic histones. Histones from Bdellovibrio bacteriovorus create long protein-coated DNA filaments through ‘edge-on’ binding rather than wrapping the DNA to form discrete nucleosomes, although the binding mode on longer DNA is somewhat controversial 5 , 13 , 14 . A recent preprint suggests yet another binding mode for a histone encoded by Leptospira perolatii 15 . Clearly, more research is needed to understand how histones are used in bacterial genome organization. Histones are widespread in the domain of archaea. They were first discovered by John Reeve and coworkers in 1990 16 , and we now know that the majority of archaeal genomes encode at least one type of histone. As more archaeal species are discovered at a rapid rate through advances in genomic sequencing, there is evermore diversity to consider 17 – 19 . Archaeal histones exhibit much more sequence divergence than their eukaryotic counterparts 20 , which are amongst the most conserved proteins known 21 . Because archaea are found in many different and often punishing environments, their proteins must have evolved to cope with extreme conditions 22 – 26 . Unlike in bacteria, where histone genes are sparse, histones seem to be a deeply rooted feature of archaea, occurring in most higher taxa, with the notable exception of Thermoplasmata (formerly Crenarchaeota ) 26 – 28 . A select few archaea encode histones with tails, with the potential for post-translational modifications. These organisms are mainly from the Asgard phylum, which are thought to be most closely related to eukaryotes 20 , 29 . At least two closely related hyperthermophilic archaea, Thermococcus kodakarensis and Methanothermus fervidus , have histones that package DNA into so-called ‘hypernucleosomes’, slinky-like assemblies where the geometry of the DNA superhelix closely mimics the superhelix formed by stacked eukaryotic nucleosomes, using near-identical features of the histones to engage the DNA backbone 4 , 30 . To date, research into archaeal histone-DNA complexes is limited to these two organisms (but see a recent preprint article 31 ). In T. kodakarensis , histones contribute to transcription regulation 4 . Additional studies utilizing molecular modeling of sequences from methanogenic archaea, and ChIP-seq in Halobacterium salinarum have begun to shed light on the function of histones in these organisms 32 , 33 . Here, we parsed the diversity of histone sequences in archaea by mining predicted proteins databases. We grouped archaeal histones into five major clusters based on four biophysical properties (length, isoelectric point, hydrophobicity, and instability index). We then identified seven strategies by which different organisms combine histones; employing either a single histone, or various combinations of histones in one genome. To understand possible co-dependencies between histones, we analyzed the seven strategies separately, to allow us to tease apart, for example, whether basic histones that occur as the only histone in an organism have different features compared to those that co-exist with other basic or acidic histones. We predicted the structure of the main histone combinations and inferred their ability to bind DNA using molecular dynamic simulations, providing a starting point for targeted structural and biophysical analysis. Results Archaeal histones can be grouped into five clusters We first identified putative histones in the predicted proteomes of all 5,869 available archaeal genomes in release 220 of the Genome Taxonomy Database (GTDB) 34 . Protein coding sequences in this database were predicted from single genomic assemblies representing unique species. Metadata including sampling location, genome size, and GC content were also calculated or collected. To identify histone sequences, we used HMMSearch with archaeal (PF00808) and eukaryotic (PF00125) histone PFAM models. We tested a variety of search strategies using different HMMer tools with a range of stringency cutoffs and found that HMMSearch with a liberal stringency captured most of the diversity found in the sequence space, without adding too much noise (Supplemental Figure 1). We then applied DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to perform unsupervised clustering of the presumptive histone sequences, using the four easily calculated physical parameters with the most variance: length, instability index, isoelectric point (pI), and hydrophobicity/GRAVY score. Full clustering details can be found in the Methods. Briefly, we optimized the clustering parameters using small randomly sampled datasets, extracted the physical parameters that define each cluster, and used those bounds to label proteins in the overall dataset (Supplemental Table 1). We used these physical parameters to group histones into five distinct clusters of histone-like proteins (Figure 1a): basic singlets (cluster 1, blue), acidic singlets (cluster 2, red), acidic doublets (cluster 3, orange), acidic ‘miniatures’ (cluster 4, yellow), and acidic quadruplets (cluster 5, green). We selected the centroid sequence from each cluster and predicted their structures with AlphaFold3 (Figure 1b, Table 1). Basic and acidic singlets form the characteristic histone fold that resembles the experimentally determined structure of the basic singlet HMfB (pdb 1A7W). Cluster 3 histones comprise two histone fold domains that are linked in a single polypeptide chain (colored in black in Figure 1b), and they are predicted to form a structure that resembles the HMfB homodimer (pdb 1A7W). The acidic miniature histone (cluster 4) is predicted to have an α2 helix that is shortened by one turn and also has a very rudimentary α3 helix, and in this it resembles the bacterial histone Bd0055 (pdb 8VVX), although the latter is positively charged overall. Finally, cluster 5 histones are unusual in that they consist of a long acidic chain with four predicted histone fold motifs. Archaeal histones have a bimodal distribution in terms of their charge: overall, a surprisingly large percentage (26.7 %) of the 7,157 predicted histones are acidic in character, while histones with neutral charge are largely absent (Figure 1c). Histones are used in seven different strategies in the archaeal domain According to our cutoff, of the 5,869 archaeal genomes in the GTDB, 3,931 (67%) encode at least one putative histone (Figure 2a). Because each sequence represented in Figure 1a is associated with a unique species, we were able to determine which genomes encode more than one histone and which combinations are the most prevalent. About 60% of all histone-encoding genomes have only one single histone gene from either cluster 1, 2, 3, or 5 (Figure 2a, b) and species encoding more than three histones are rare. We classify the genomes encoding only one single histone from a specific cluster as ‘single 1,2,3 or 5’, to separate them from genomes which contain different combinations of histone that also may include histones from the same clusters. Among genomes harboring more than one histone, genomes containing two or more histones from cluster 1 (basic singlets) are the most prevalent strategy, termed ‘multiple 1’ (the model organism M. fervidus is an example for this strategy). We also observe combinations of representatives from clusters 1&2 and clusters 3&4, termed combination 1&2 or combination 3&4, respectively (Figure 2 b, Supplemental Figure 2). Representatives from cluster 4 (acidic miniatures) are almost always paired with an acidic singlet (cluster 3), and cluster 5 histones always occur as the sole histone-encoding gene. To simplify our analysis, we focused on general trends and restricted our further analysis to these seven most prevalent combinations of histones (single 1,2,3 and 5; multiple 1; combination 1&2; and combination 3&4) which represent > 98% of histone-encoding archaeal genomes (indicated by a line in Figure 2b). Some strategies are taxonomically restricted Histones that fall into cluster 1 (basic singlets) are widely dispersed across the entire domain of archaea and indeed seem to represent the typical archaeal histone (Figure 3a). They occur either as the sole histone or in combination with other basic singlets throughout the domain. Acidic singlets (cluster 2) are also pervasive, either as the only histone in the genome, or paired with a basic singlet. In contrast, histones from clusters 3, 4, and 5 are phylogenetically restricted to specific taxa. In particular, histones from cluster 3 (acidic doublets) are mostly restricted to the class of Halobacteria, while representatives of cluster 5 (acidic quadruplets) are exclusive to members of the order Poseidonales (Figure 3b). Of the two most frequent combinations of histones, combination 1&2 (basic and acidic singlet) occurs in large groups in Methanobacteria and Halobacteria, and in smaller groups elsewhere in the domain (Figure 3c, Supplemental Figure 3). Combination 3&4 (acidic doublet and acidic miniature) is restricted to Halobacteria (Figure 3d). We also confirmed previous findings that histones are exceedingly rare in the class Thermoplasmata (formerly Crenarchaeota) or in the order Sulfolobales (Figure 3b, Supplemental Figure 3) 23 . A full list of histones and their corresponding genomes can be found in the supplemental materials. Selective pressures may influence strategy type To understand the selective pressures associated with a specific histone cluster or strategy, we scoured metadata linked with the GTDB genomes for correlations. Specifically, we focused on genome size, GC content, coding density, and sampling location. Only two of the histone strategies (single 3 and combination 3&4) are found in organisms with genomes that are significantly larger, and have a higher GC content than those that do not encode histones (Figure 4a, b). This is probably because increased GC content is a known adaptation to high saline environments, and it is mostly halophiles that employ this strategy 35 . Protein coding density is somewhat higher in genomes encoding cluster 5 histones (single 5; Figure 4c). Despite these subtle differences, our analysis does not explain why a sub-group of archaea does not appear to employ histones. We also coded keywords found in genomic sample location annotations and found that some environmental pressures appear to correlate with specific combinations of histones (Supplemental Table 2). For example, archaea growing in anaerobic conditions tend to have combination 1&2 histones, and archaea found in extremely saline conditions seem to be enriched for combination 3&4 histones (Figure 4d, Supplemental Figure 4). It should be noted that these parameters are harder to quantify and verify, and as such the correlations have to be taken with caution. Sequence bias and conservation of archaeal histones To better understand the differences in physical parameters between all archaeal histones, we compiled the overall composition of amino acids in each histone cluster (Supplemental Figure 5). There is an abundance of amino acids with a high propensity to form α-helices such as alanine, isoleucine, leucine, and valine, as expected for histones which are primarily α-helical. In the different histone clusters, we saw enrichment of, or bias away from specific residues compared to the overall sequence composition of all archaea. Notably, archaeal histones outside of cluster 1 have acidic isoelectric points (Figure 1c, Table 1). This is surprising as eukaryotic histones invariably have a positive overall charge and require basic residues (arginine and lysine) to effectively bind DNA in eukaryotes 36 . Besides an enrichment in acidic residues, the acidic histones from clusters 3, 4, and 5 exhibit the classic halophilic protein adaptation of a compositional bias from lysine to arginine 35,37 . Histones from these groups are also characterized by a higher percentage of the aromatic amino acids phenylalanine and tyrosine, which are both known to stabilize proteins in harsh environments (Supplemental Figure 5d,e,h) 37 . Across all archaeal histones, tryptophan and cysteine are underrepresented compared to the universal proteome, perhaps due to their metabolically expensive nature 38 . Overall, archaeal histones are much divergent in their amino acid sequence than eukaryotic histones, which are among the most conserved proteins known 39 . The degree of conservation is particularly high in histones from halophilic organisms (cluster 3 and 4 histones) which could be due to a sampling bias towards closely related halophilic organisms, or due to a biological restriction to residues that facilitate function in hypersaline environments (acidic/aromatic residues, see above). The cluster 1 and 2 histones that exist as the sole histone in the genome exhibit no characteristic sequence features compared to the histones that co-exist with others (Supplemental Figure 3, compare panel a. b with e, f, and g). Similarly, cluster 3 histones have the same sequence features whether they occur alone or with cluster 4 histones. Cluster 4 is interesting as the length of the ~150 members is almost universally conserved to 55 amino acids and shares many of the characteristic features with the bacterial histone Bd0055 5 (Figure 1b). Cluster 5 histones are unique in that they always occur as the sole histone-encoding gene, and their sequences are not well conserved. Aside from the first histone fold motif in these sequences, they do not contain many of the classic histone signatures (see below), suggesting that they may have co-opted the histone fold to perform a different function in the cell. Specific ‘histone signature motifs’ are common to the majority of histones (boxed sequences in Supplemental Figure 6, and shown for HMfB in panel j). The ‘RKTV motif’ is located in the L2 loop connecting helices α2 to α3. While the first three amino acids in this motif (RKT) are present in nearly all archaeal histones, V is often substituted by I or L, but is universally a hydrophobic amino acid. In all known histone structures from all domains of life, this loop pairs with the less conserved L1 loop of the second histone in the histone fold dimer to form the L1L2 DNA binding motif 2 . In the L1 loop, the ‘RV motif’ is found throughout the majority of archaeal histones. Valine packs against the conserved hydrophobic side chain in the L2 RKTV motif to stabilize the underside of the paired L1-L2 loop, and the arginine extends into the minor groove of DNA and that is stabilized by a threonine in the RKTV motif (the RT pair). We also note the strong conservation of a salt bridge that stabilizes the L2 loop in its critical conformation (the ‘R-D clamp’), which involves the arginine in the RKTV motif and a conserved aspartate, invariably located 7 amino acids downstream in the α3 helix of the histone fold, even in the rudimentary α3 helix in cluster 4 histones (Supplemental Figure 6i). In eukaryotic, archaeal, and viral nucleosomes for which the structures are known, the fixed L1L2 configuration poises the main chain of both loops to contact the DNA phosphodiester backbone, and orients the arginine in the L1 loop (RV motif) to point into the compressed minor groove of the DNA. As such, these conserved amino acids in the L1-L2 loops represent a universal histone signature in addition to the ability to form histone fold dimers that might be useful to identify other histone-like proteins. Unique to archaeal histones, a glycine in the L1 loop that we previously showed to be essential for hypernucleosome formation in T. kodakarensis histone HTkA 4 is also highly conserved throughout histone clusters 1, 2, 3 (for class 3, only in the N-terminal histone domain), but not in clusters 4 and 5.This suggests that histone clusters 1-3 might be able to form closely stacked hypernucleosomes. Structural prediction of histone complexes: histone homo- and heterodimers To predict whether multiples of histones might be used to bind DNA and fold into nucleosome-like structures, we used AlphaFold3 to build models of a representative histone from each strategy as dimers or as tetrameters 40 . To choose unbiased representative histone candidates for each of the seven strategies, we calculated the center of mass in the four dimensions used in Figure 1 for each histone in each strategy from Figure 2 and identified the sequence closest to that point. For genomes that encode two histones, we chose a genome that encodes the histone closest to the center of mass of one of the clusters and used both histones from that genome as representatives (Table 2). The representatives from cluster 1, 2, and 3 form homodimers that resemble known structures of histone fold homodimers (Supplemental Figure 7). The N and C termini of single 5 histones can adopt conformations similar to histone fold dimers through intra- and interchain interactions, respectively. Basic histones that occur in genomes together with other histones are likely able to form both homo- or heterodimers (as shown experimentally for M. fervidus, an organism that employs the multiple 1 strategy) 41 . In the median organism representing this strategy, the second histone is missing a well-defined α3 helix, yet this histone is also able to homo- and heterodimerize in silico (Supplemental Figure 7b). A number of histones from this cluster appear to have a prematurely terminated α3 helix, yet are still able to form homo- and heterodimers (not shown). In the median organism combining a basic and acidic singlet within its genome (combination 1&2), homo- and heterodimers are predicted with similarly high levels of confidence (Supplemental Figure 7c). It is important to note that even the models of clusters with acidic charge maintain a basic putative DNA binding ridge on their outer surface. In our representative employing the combination 3&4 strategy, combining one acidic doublet with one acidic miniature, the doublet folds into a structure that is very similar to a canonical histone fold dimer. The acidic miniature histone from cluster 4 is predicted to fold into a homodimer that has closer resemblance to the bacterial histone Bd0055 than to HMfA or HMfB. Some, but not all archaeal histone-fold dimers form tetramers via a four-helix bundle structure The ability to form tetramers from histone fold dimers through well-defined four-helix bundle (4HB) structures is a hallmark of all canonical nucleosomes 3,4,6 . This interface is formed by the pairing of the C-terminal end of the long α2 helix and the α3 helix of two separate histone dimers (Supplemental Figure 8a, circled). We used AlphaFold3 to predict whether the histone fold dimers shown in Supplemental Figure 7 are capable of forming tetramers through the 4HB or any other means. Note that we display the solution with the highest level of confidence, with the acknowledgement that in some instances alternative solutions are created with only slightly less favorable IPTM scores (see, e.g. Supplemental Figure 8d, inset). Representative histones from strategies using a single cluster 2 or 3 histone are all predicted to form homo-tetramers through canonical 4HB assemblies that resemble archaeal HMfB and eukaryotic histones H2B-H4 and H3-H3’ 3,4 . Basic singlets from ‘single 1’ may form closed tetramers (as is the case for our median histone sequence) as well as open, canonical tetramers. Note that such closed tetramers have not been observed experimentally for any archaeal histone with the typical 28 amino acid long α helix, while open tetramers have been visualized in various complexes with DNA 4,30,31 . In our experience, these predictions have to be taken with a healthy dose of skepticism: for example, for HMfB, for which structures are known, AlphaFold3 predicts a closed and open tetramer as well as a ‘back-to-back tetramer’ that doesn’t involve the 4HB with closely spaced confidences, but generates an open tetramer resembling the experimentally determined structure when calculated in the presence of DNA. No combination of histone fold dimers from the single 5 representative is predicted to form higher order assemblies mediated by a 4HB (Supplemental Figure 8b, green). Our representative basic histone that co-occurs with a second basic histone (multiple 1) is only predicted to fold tetramers from a homodimer of histone A. Histone B alone, or combined with histone A, does not form a tetramer in silico, but we did not explore whether this is a general phenomenon of histones from this cluster (Supplemental Figure 8c). Histones from combination 1&2 (basic and an acidic singlet; a wide-spread combination) are predicted to form open tetramers either from the basic histone alone, or from basic-acidic histone fold dimers (Supplemental Figure 8d). The acidic histone fold homodimer can form a tetramer through a variety of arrangements with nearly the same confidence (inset). Finally, the combination of an acidic doublet and an acidic miniature (combination 3&4), specific to and prevalent in halophiles, is not predicted to form a heterotetramer. While the acidic doublet forms an open ‘tetramer’ structure, the acidic miniature assembles into either ‘back-to-back’ (shown) or face-to-face tetramers with similar confidence. AlphaFold3 predictions and simulations suggests that most histones form stable structures with DNA To explore whether these systems might form plausible complexes with DNA, we employed AlphaFold3 to predict structures with DNA and then evaluated their physical stability using all-atom molecular dynamics simulations. We predicted nucleosome models with the equivalent of 8 histone folds from each histone strategy and 147 bp DNA, sufficient for forming a nucleosome-like arrangement. All combinations that are able to form canonical, open tetramers via the 4HB are predicted to wrap DNA around the outside of the histone torus (Supplemental Figure 9), as is the representative that is predicted to form closed tetramers in absence of DNA (Supplemental Figure 9c, of note also the case for HMfB). Cluster 4 and 5 histones are not predicted to wrap DNA. Note that the IPTM scores are rather low for all models except for those with combination 1&2 (Supplemental Figure 9d). We ran molecular dynamics simulations of all structures that formed nucleosome-like structures for 100 ns in triplicate, to allow for relaxation and sampling of conformational flexibility. Our goal was to determine whether the AlphaFold3 models shown in Supplemental Figure 9 are energetically plausible. In these simulations, the human nucleosome and the archaeal structure from HMfB (for which there is a structure on shorter DNA, PDB 5T5K) remained tightly wound and experienced little conformational change, as judged by minimal movement of DNA during the simulation (Figure 5b). The median representative histone from an organism employing the same strategy (multiple 1) also formed stable nucleosome-like structures. Acidic single 2 histones form plausible structures, although they seem somewhat destabilized compared to the structures formed by the basic histones. Structures predicted with the single 3 histone cluster unraveled, eventually losing the protein-protein interactions crucial to maintaining a tightly wound nucleosome. As the original median histone formed an open structure, we also simulated a second nucleosome of this type which was predicted to form a closed structure (shown in Supplemental Figure 9), but both simulations resulted in similar ‘final’ structures with open conformations. Nevertheless, even these acidic cluster 2 and cluster 3 histones maintain their interaction with DNA throughout the simulation. Remember that single 2 and single 3 strategies are mostly found in halophilic organisms and, as such, simulations should likely be performed at much higher ionic strength. Indeed, similar simulations in 2 M KCl resulted in structures that remained closed (not shown). We also ran simulations of nucleosomes from genomes that encode multiple histones. We analyzed these histones in isolation and in combination (Figure 5), allowing us to query whether they might require a partner to form stable nucleosomes. For the two basic histones from the genome used to represent multiple 1 genomes, nucleosomes made from the combination of both histones or just histone B remained closed over the simulation. In contrast, nucleosomes made from just histone A appeared unstable (Figure 5a, b). This could suggest a means by which accessibility to the genome might be regulated, an idea that was recently supported by experimental data 31 . As predictions were unable to assemble both histones from combination 3&4 into a nucleosome structure, we manually placed the cluster 4 histone (which co-occurs with cluster 3 histones) into the obvious space made when predicting the nucleosome with only cluster 3 histone dimers. We also simulated a nucleosome from cluster 3 histones in isolation. In the end, both simulation strategies resulted in unstable structures, except in one of the three combination 3&4 simulations where a wrapped conformation was maintained throughout the duration of the simulation. Whether this acidic doublet combines with the acidic miniature histone to form a nucleosome-like structure or not remains unknown. Likely, these structures require precisely oriented histones or high ionic stremgth to function properly, as organisms encoding them often utilize a “salt-in” strategy to cope with high levels of extracellular salt. The prediction of just the cluster 4 or cluster 5 histones alone did not result in nucleosome-like structures. If anything, these investigations highlight the limitations of AlphaFold3 in the prediction of histone assemblies beyond histone fold dimers, and emphasize the requirement for at least some degree of energy minimization. Discussion Archaea are a diverse group of organisms that have adapted to a wide and extreme range of environments. With billions of years of evolution, this domain of life has diversified to meet unique challenges, and these adaptations presumably include strategies to protect and package genomes. This is particularly relevant for organisms that thrive at extreme temperatures, pH, and ionic strengths, all presenting challenges to genome integrity. While some archaea package their genomes exclusively with non-histone proteins such as Cren7, Alba, or Sul7d, the majority of them rely on histones 28 . Our search of available archaeal genomes reveals that 67% of known archaea encode histones that we group into five clusters depending on length, charge, hydrophobicity, and instability index. Because we used a rather conservative cutoff, it is likely that this percentage could be higher. The majority of histones are predicted to comprise a single, mostly tail-less histone fold domain with basic charge. Four out of five clusters of archaeal histones (26.7% of all histones) are acidic in character; they are either of canonical length (cluster 2), encode multiple histone folds in a single chain (cluster 3 and 5), or are predicted to have shorter α2 and α3 helices, resembling bacterial histones at least architecturally, if not in overall charge (cluster 4). Histones are encoded in genomes either by themselves or along with other histones, most commonly with one other histone of the same cluster, or combining two histones from different clusters. Intriguingly, 269 genomes encode an acidic and a basic histone that have < 60% sequence identity, displaying a diversification in histone sequences that precedes the split into H2A, H2B, H3 and H4 that must have happened in early eukaryotes. Using the structure prediction tool AlphaFold3, we predicted the structures of the ‘median histone’ for each cluster/strategy, in different oligomerization states and in complex with DNA. We deliberately chose the median histones rather than the nearest model organism to avoid bias and to best represent each cluster and strategy. Of note, many of these histones are derived from metagenomes and the corresponding organisms have not yet been cultivated. Our analyses suggest that four out of the five clusters form canonical histone fold dimers, most of which tetramerize via a four-helix bundle (4HB) interface that is a hallmark of eukaryotic histone interactions. Representatives of clusters, either alone or in combinations, that tetramerize via a 4HB are predicted to organize DNA into nucleosome-like structures that remain stable in molecular dynamics simulations. These structures are similar to the experimentally determined structure of a cluster 1 histone in complex with DNA, which we showed forms a ‘hypernucleosome’ that may flex and open stochastically 4 , 30 . Histone-DNA interactions are maintained throughout the simulations for representatives of most clusters and strategies, even for those that have an overall acidic character. This is probably because even they maintain the ‘basic ridge’ around their outside that serves as a DNA binding surface. Of note, our simulations have not yet considered the diverse environments that our ‘median organisms’ might dwell in. For example, cluster 3 histones are mostly found in halophiles, and as such simulations at high (> 2 M) ionic strength would be a better predictor of the plausibility of their histone-DNA complexes. Our data suggest that cluster 5 histones, even though they form plausible histone fold dimers, might not function in genome organization, and the role of cluster 4 histones (shorter acidic histones of highly restricted length to 55 amino acid, co-occurring with cluster 3 histones) remains unresolved. Importantly, given the limitations of AlphaFold (also demonstrated here), these predictions have to be verified experimentally. While eukaryotes have largely selected for a narrow and conserved set of four histone sequences (plus a variety of histone variants 8 ), archaea seem to be able to achieve genome organization with histones with much higher sequence diversity and using multiple combinatorial strategies. Nevertheless, the vast sequence space has brought into focus universal, functionally linked histone signatures, the RKTV motif and the R-D clamp in the L2 loop of the histone fold, the RV motif in the L1 loop, and the RT pair (Supplemental Fig. 8j). In combination, these motifs serve to rigidify the L1L2 pairing to allow it to make main-chain interaction with the phosphodiester backbone of the DNA, and to orient an arginine to protrude into the compressed minor groove of DNA (referred to as a sprocket arginine) 42 . These signatures have been described over 25 years ago, and are reinforced here in a vastly expanded sequence space. The diversification in histone sequence outside of these motifs likely allows archaea to adapt to a diverse and extreme set of intracellular conditions than could not be tolerated by eukaryotic systems, and might afford them the ability to live in these environments without compartmentalizing their genomes. Recently, other groups have used different tools to sample histone diversity across both archaea and bacteria. In a study by Dame and colleagues, histone sequences from archaea and bacteria were clustered into different groups based on sequence features 14 . This approach led the team to focus mainly on an array of bacterial histone sequences that are fused to other functional domains and whose functions are largely unknown. The work highlighted the power of approaches like HMMSearch to find disparate sequences which may fold into similar structures. Our study emphasizes the need to explore these understudied and diversified classes of histones and to explore the biology of organisms that may otherwise be overlooked. By selecting organisms that broadly sample the diversity of archaeal histones, we can allocate resources strategically to maximize discovery. As many of the organisms have never been cultured, a logical next step to this work is to use structural biology and biochemistry to uncover how these histones physically structure DNA. An intriguing addition to the sparse availability of experimental structures has recently been published as a preprint, and suggests subtleties of archaeal chromatin structures that are caused by variations in histone sequence 31 . As recent breakthroughs in culturing (and, one would hope, genetically manipulating) archaea are revolutionizing the field 43 – 46 , hypotheses gained from biophysical characterization could eventually be put to the test in the cell. Perspective Our work highlights the power of structural prediction tools such as AlphaFold, yet demonstrate that they cannot (yet) replace experimental structures and biophysical analyses. To use these predictive tools properly, context and prior knowledge are necessary to avoid over-interpretation. For example, AlphaFold predicted the tetrameric structures of many histones to adopt conformations that appeared ‘closed’, yet when reinforced with a DNA sequence that is biased in the PDB to form nucleosomes, these same histones formed nucleosome-like structures. AlphaFold and similar tools are built on massive amounts of training data and usually do well when re-predicting structures they have trained on. Some models ignore the basic laws of physics, placing atoms on top of other atoms and predicting structures that fall apart in molecular dynamics simulations (Fig. 5 ). At least for now, and for this system, the predictions are not yet ready to stand on their own without experimental validation, especially for the more complex models beyond histone tetramer, and in the presence of DNA. Methods Histone identification and HMMSearch optimization Predicted archaeal protein sequences were downloaded from GTDB, release 220 ( https://gtdb.ecogenomic.org/ ). This dataset included 11,277,496 proteins from 5,869 genomes, each with a specific taxonomic lineage. 7,140 putative histones were identified using an HMMsearch against PF00125 (PFAM for eukaryotic histones) and PF0808 (PFAM for archaeal histones). To establish which confidence thresholds to use with the JackHMMER and HMM- Search, we screened a range of expectancy values (E-Value) for each search strategy that went low enough to collect no hits and went high enough to be limited by filters built into the HMM algorithm (Supplemental Fig. 1b). We noticed that most of the search strategies slowly collected hits up to an inflection point, where the number of hits began to increase rapidly. We reasoned that after this point the search models return mostly noise sequences. By iteratively clustering around this inflection point we were able to determine that hits above these E-values mostly constituted noise. Although most hits at the inflection point overlapped between search strategies, small outlier groups existed, so we combined the hits from both PFAMs around the inflection point and performed the rest of our analysis on this set (Supplemental Fig. 1c). We eventually used E = 4.0 for PF00125 and E = 0.1 for PF00808. DBSCAN clustering Histone sequences were imported with associated metadata from the GTDB. Ambiguous sequences were filtered out. Physical parameters of sequences were calculated using ProtParam from the Bio.SeqUtils python package. Histones were clustered with DBSCAN (implemented through the SciKitLearn package) optimized for a silhouette score of 0.25 (e = 0.5 n = 40) on the four parameters with the highest variance: length, pI, GRAVY, and helical propensity. Parameters were standardized prior to clustering using z-score normalization. To determine these clustering parameters, the data were randomly sub-sampled and tested with a range of parameters to optimize the silhouette score (Supplemental Fig. 1). 0.25 was chosen as a target silhouette score, as it was able to reproduce clusters reliably after many rounds of clustering. After optimizing, the parameters were applied to two additional data subsets, verifying that the same number of clusters of roughly the same size were found in each. The physical parameters of each cluster were then calculated from the three subsets to define boundaries for the whole dataset. These ranges were tested on another three random subsets that were independently clustered to verify that the labels matched 95% of points in each test. The verified ranges were then used to label all points in the overall dataset. Proteins that failed to cluster into one of the five groups were removed from further analysis. Centers of mass and nearest neighbors were calculated for each cluster in standardized space and mapped back to real space. Edges were mapped linking histones coming from the same organism. Histones were then sorted into genomes and common strategies were calculated. Taxonomic data from GTDB was then used to map histone strategies onto a taxonomic tree using iTOL 47 . Metadata correlation After assigning histones to strategies, a practical cutoff of 100 histones per group was applied to simplify analysis. Histones from groups which did not meet the cutoff were not used for further analysis, but were still included in the database. Metadata associated with each strategy were aggregated and comparisons to genomes without histones (No histones group) were preformed using the Shiparo test from the SciPy.stats Python package. We chose this test to deal with comparisons between datasets containing uneven variances. We chose metadata that we felt were most relevant to understanding the presences of histone: genome size, genomic GC percentage, and gene coding density. Environmental pressure correlation We manually extracted the location data associated with each genome and coded keywords in each location to a set of standardized locations, which encompassed most of the genomes in the dataset. We then associated each of these locations with the environmental pressure(s) they most likely impart. A full list of keywords and coding can be found within the scripts. Sequence conservation Conservation of histones from each group was calculated by taking the average occupancy at each position of aligned histones (aligned with MUSCLE) using a custom Python script 48 . ’Highly conserved’ residues represent residues whose conservation is at least one standard deviation greater than the mean conservation for that alignment. Conservations were calculated for each type of histone, both before and after strategies were assigned. Compositional bias Amino acid composition of histone groups was calculated in Python using NumPy and plotted using Matplotlib. Composition was calculated on a per-residue biases, not as an aggregation of all the residues from all histones in a group. Because most histones in each group were of similar length, this normalization did not have a drastic effect, but still seemed appropriate to correct for a bias towards the composition longer sequences. Structural prediction We predicted the structures of histones from each strategy as dimers (two histone folds), tetramers (four histone folds), and nucleosomes (with the addition of 147bp of dsDNA) using AlphaFold3, as implemented through the online server. We visually inspected each of the five models outputted by AlphaFold3 and proceeded with analysis on the highest confidence model not containing major clashes (usually the highest confidence model, i.e model 0). IPTM scores are reported in the figures. Molecular dynamics simulations AlphaFold3 nucleosome predictions were used as starting models for simulations. Models were prepped for simulation using ChimeraX 49 . The terminal phosphate from each DNA strand was removed (to prevent simulation errors later), models were protonated, and then subjected to a few frames of MD implemented by using the ”Tug” function in ChimeraX and pulling on a single hydrogen atom at the terminus of a DNA strand. This ”Tug” step allowed the AlphaFold3 model to relax atoms and resolve clashes orders of magnitude faster than doing the same by hand. No gross topological changes were observed. All-atom molecular dynamics simulations with explicit solvent were carried out using AMBER and the ff14SB, bsc1, and tip3p forcefields (for protein, DNA, and water respectively) 50 . Structures were protonated again through TLEAP and hydrogen mass repartitioned in PARMED. Structures were placed in cubic boxes surrounding the structures by at least 25 ˚A, charge neutralized using potassium and chloride ions, potassium ions, and hydrated with water molecules. The structures were energy minimized in two, 5,000 step cycles: the first restraining the protein and DNA molecules to allow solvent relaxation and the second to allow full system relaxation. Minimized structures were then heated to 300 K and slowly brought to atmospheric pressure (1.01325 atm). The systems were then simulated for 100 ns in 4 fs steps. Simulation were performed in triplicate by starting the simulation over using a different random number during the heating phase. Distances between phosphates on neighboring residues at the center of a DNA strand were calculated as a proxy for nucleosome unfolding in representative simulations. Declarations Supplemental information Supplemental tables 1 and 2 Supplementary figures 1-9 Interactive 3D chart displaying clustering data Spreadsheet containing physical parameters of all archaeal histones Spreadsheet containing all classified histones organized by genome Acknowledgments. This work utilized the Alpine high performance computing resource at the University of Colorado Boulder. Alpine is jointly funded by the University of Colorado Boulder, the University of Colorado Anschutz, Colorado State University, and the National Science Foundation (award 2201538) 51 . We also used the Blanca condo computing resource at the University of Colorado Boulder. Blanca is jointly funded by computing users and the University of Colorado Boulder 52 . Funding. KL and SL are supported by the Howard Hughes Medical Institute. The Alpine computing cluster is jointly funded by the University of Colorado Boulder, the University of Colorado Anschutz, Colorado State University, and the National Science Foundation (2201538) 51 . Conflicts of interest/Competing interests. The authors declare no competing interests. Ethics approval and consent to participate. Not applicable Consent for publication. Not applicable Data availability. All underlying protein sequences and metadata were collected from GTDB. Histone sequences are provided in source data (excel spreadsheets). Materials availability. Not applicable Code availability. Code used to do analysis and run simulations is available on GitHub: https://github.com/shla9937/archaeal histone diversity Author contribution. SL conducted the analysis with input and editing from KL. SL and KL wrote the manuscript. References Arents, G., Burlingame, R. W., Wang, B. C., Love, W. E. & Moudrianakis, E. N. The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix. Proc. Natl. Acad. Sci. 88 , 10148–10152 (1991). Luger, K. & Richmond, T. J. DNA binding within the nucleosome core. Curr. Opin. Struct. Biol. 8 , 33–40 (1998). Luger, K., Mäder, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389 , 251–260 (1997). Mattiroli, F. et al. Structure of histone-based chromatin in Archaea. Science 357 , 609–612 (2017). Hocher, A. et al. Histones with an unconventional DNA-binding mode in vitro are major chromatin constituents in the bacterium Bdellovibrio bacteriovorus. Nat. Microbiol. 8 , 2006–2019 (2023). Liu, Y. et al. Virus-encoded histone doublets are essential and form nucleosome-like structures. Cell 184 , 4237-4250.e19 (2021). Toner, C. M., Hoitsma, N. M., Weerawarana, S. & Luger, K. Characterization of Medusavirus encoded histones reveals nucleosome-like structures and a unique linker histone. Nat. Commun. 15 , 9138 (2024). Talbert, P. B. & Henikoff, S. Histone variants at a glance. J. Cell Sci. 134 , jcs244749 (2021). Talbert, P. B., Armache, K.-J. & Henikoff, S. Viral histones: pickpocket’s prize or primordial progenitor? Epigenetics Chromatin 15 , 21 (2022). Irwin, N. A. T. & Richards, T. A. Self-assembling viral histones are evolutionary intermediates between archaeal and eukaryotic nucleosomes. Nat. Microbiol. 9 , 1713–1724 (2024). Valencia-Sánchez, M. I. et al. The structure of a virus-encoded nucleosome. Nat. Struct. Mol. Biol. 28 , 413–417 (2021). Alva, V. & Lupas, A. N. Histones predate the split between bacteria and archaea. Bioinformatics 35 , 2349–2353 (2019). Hu, Y. et al. Bacterial histone HBb from Bdellovibrio bacteriovorus compacts DNA by bending. Nucleic Acids Res. 52 , 8193–8204 (2024). Schwab, S. et al. Histones and histone variant families in prokaryotes. Nat. Commun. 15 , 7950 (2024). Hu, Y. et al. DNA Wrapping by a Tetrameric Bacterial Histone. 2025.05.08.652872 Preprint at https://doi.org/10.1101/2025.05.08.652872 (2025). Sandman, K., Krzycki, J. A., Dobrinski, B., Lurz, R. & Reeve, J. N. HMf, a DNA-binding protein isolated from the hyperthermophilic archaeon Methanothermus fervidus, is most closely related to histones. Proc. Natl. Acad. Sci. U. S. A. 87 , 5788–5791 (1990). Somboonna, N., Assawamakin, A., Wilantho, A., Tangphatsornruang, S. & Tongsima, S. Metagenomic profiles of free-living archaea, bacteria and small eukaryotes in coastal areas of Sichang island, Thailand. BMC Genomics 13 Suppl 7 , S29 (2012). Adams, M. W. W. Biochemical diversity among sulfur-dependent, hyperthermophilic microorganisms. FEMS Microbiol. Rev. 15 , 261–277 (1994). Epp Schmidt, D. J. et al. Metagenomics Reveals Bacterial and Archaeal Adaptation to Urban Land-Use: N Catabolism, Methanogenesis, and Nutrient Acquisition. Front. Microbiol. 10 , 2330 (2019). Henneman, B., Emmerik, C. van, Ingen, H. van & Dame, R. T. Structure and function of archaeal histones. PLOS Genet. 14 , e1007582 (2018). Patwal, I., Trinh, H., Golden, A. & Flaus, A. Histone sequence variation in divergent eukaryotes facilitates diversity in chromatin packaging. 2021.05.12.443918 Preprint at https://doi.org/10.1101/2021.05.12.443918 (2021). Wurtzel, O. et al. A single-base resolution map of an archaeal transcriptome. Genome Res. 20 , 133–141 (2010). Hocher, A. et al. Growth temperature and chromatinization in archaea. Nat. Microbiol. 7 , 1932–1942 (2022). Reed, C. J., Lewis, H., Trejo, E., Winston, V. & Evilia, C. Protein adaptations in archaeal extremophiles. Archaea Vanc. BC 2013 , 373275 (2013). Siglioccolo, A., Paiardini, A., Piscitelli, M. & Pascarella, S. Structural adaptation of extreme halophilic proteins through decrease of conserved hydrophobic contact surface. BMC Struct. Biol. 11 , 50 (2011). Guo, L. et al. Biochemical and structural characterization of Cren7, a novel chromatin protein conserved among Crenarchaea. Nucleic Acids Res. 36 , 1129–1137 (2008). Zhang, Z., Gong, Y., Guo, L., Jiang, T. & Huang, L. Structural insights into the interaction of the crenarchaeal chromatin protein Cren7 with DNA. Mol. Microbiol. 76 , 749–759 (2010). Laursen, S. P., Bowerman, S. & Luger, K. Archaea: The Final Frontier of Chromatin. J. Mol. Biol. 433 , 166791 (2021). Zaremba-Niedzwiedzka, K. et al. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541 , 353–358 (2017). Bowerman, S., Wereszczynski, J. & Luger, K. Archaeal chromatin ‘slinkies’ are inherently dynamic complexes with deflected DNA wrapping pathways. bioRxiv (2020) doi:10.1101/2020.12.08.416859. Ranawat, H. M. et al. Cryo-EM reveals open and closed Asgard chromatin assemblies. 2025.05.24.653377 Preprint at https://doi.org/10.1101/2025.05.24.653377 (2025). Stevens, K. M. et al. Histone variants in archaea and the evolution of combinatorial chromatin complexity. Proc. Natl. Acad. Sci. (2020) doi:10.1073/pnas.2007056117. Dulmage, K. A., Todor, H. & Schmid, A. K. Growth-Phase-Specific Modulation of Cell Morphology and Gene Expression by an Archaeal Histone Protein. mBio 6 , e00649-15 (2015). Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50 , D785–D794 (2022). Paul, S., Bag, S. K., Das, S., Harvill, E. T. & Dutta, C. Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. Genome Biol. 9 , R70 (2008). Hossain, K. A. et al. How acidic amino acid residues facilitate DNA target site selection. Proc. Natl. Acad. Sci. 120 , e2212501120 (2023). Tadeo, X. et al. Structural Basis for the Aminoacid Composition of Proteins from Halophilic Archea. PLOS Biol. 7 , 1–9 (2009). Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl. Acad. Sci. U. S. A. 99 , 3695–3700 (2002). Sato, S. et al. Cryo-EM structure of the nucleosome core particle containing Giardia lamblia histones. Nucleic Acids Res. 49 , 8934–8946 (2021). Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 1–3 (2024) doi:10.1038/s41586-024-07487-w. Marc, F., Sandman, K., Lurz, R. & Reeve, J. N. Archaeal Histone Tetramerization Determines DNA Affinity and the Direction of DNA Supercoiling*. J. Biol. Chem. 277 , 30879–30886 (2002). Hodges, A. J. et al. Histone Sprocket Arginine Residues Are Important for Gene Expression, DNA Repair, and Cell Viability in Saccharomyces cerevisiae. Genetics 200 , 795–806 (2015). Methyl-reducing methanogenesis by a thermophilic culture of Korarchaeia | Nature. https://www.nature.com/articles/s41586-024-07829-8. Kohtz, A. J. et al. Cultivation and visualization of a methanogen of the phylum Thermoproteota. Nature 632 , 1118–1123 (2024). Lynes, M. M., Jay, Z. J., Kohtz, A. J. & Hatzenpichler, R. Methylotrophic methanogenesis in the Archaeoglobi revealed by cultivation of Ca. Methanoglobus hypatiae from a Yellowstone hot spring. ISME J. 18 , wrae026 (2024). Rodrigues-Oliveira, T. et al. Actin cytoskeleton and complex cell architecture in an Asgard archaeon. Nature 613 , 332–339 (2023). Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49 , W293–W296 (2021). Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32 , 1792–1797 (2004). Goddard, T. D. et al. UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci. 27 , 14–25 (2018). AmberTools | Journal of Chemical Information and Modeling. https://pubs.acs.org/doi/10.1021/acs.jcim.3c01153. Alpine | Research Computing | University of Colorado Boulder. https://www.colorado.edu/rc/alpine. Blanca Condo Cluster | Research Computing | University of Colorado Boulder. https://www.colorado.edu/rc/resources/blanca. Tables Table 1. Median values of physical parameters of histone clusters. Number of sequences in each cluster, and median values for length of protein (amino acids), isoelectric point (pI), hydrophobicity (GRAVY score), instability index, and RMSD to histone HMfB from pdb 1A7W to the best aligning histone fold (Å) for each histone cluster were calculated after clustering in Figure 1. See Supplemental Table 2 for ranges of values of each cluster. Cluster Type # of Sequences Length pI GRAVY Instability RMSD to 1A7W 1 Basic singlet 4969 70 9.5 -0.26 34 1.978 2 Acidic singlet 685 68 6.6 -0.11 34 1.055 3 Acidic doublet 536 143 4.7 -0.18 41 1.810 4 Acidic miniature 153 55 4.3 -0.60 43 3.310 5 Acidic quadruplet 130 262 5.2 -0.35 46 1.795 Table 2. Representative genome for each strategy. Representative genomes were identified by encoding the closest histone to the average of all histones in each strategy, using the four physical parameters from Figure 1. For strategies with multiple histones, genomes were chosen by calculating the closest to the average histones for each type within the strategy, and then choosing the genome that had the most prevalent composition of histones for that strategy (two histones for multiple 1, one of each for combinations 1&2 and 3&4). All amino acid sequences are found in supplemental data. Strategy Species Histone ID(s) # of sequences Single 1 Single 2 Single 3 Single 5 Nitrosotalea sp028867735 Methanococcoides sp021108185 Haloferax marinum MGIIa-L1 sp8725u JAGWFW010000004.1_41 JAIORJ010000016.1_63 NZ_WKJQ01000001.1_1390 DUJJ01000204.1_9 1,665 225 431 123 Multiple 1 JACPII01 sp016188175 JACPII010000095.1_11, JACPII010000002.1_73 2,782 Combination 1&2 SZUA-1452 sp015662385 Type 1 - DQUH01000053.1_3 Type 2 - DQUH01000042.1_12 269 Combination 3&4 Halopenitus persicus Type 3 – NZ_FNPC01000002.1_292 Type 4 – NZ_FNPC01000016.1_27 100 Additional Declarations There is NO Competing Interest. Supplementary Files clusteredproteinsbyspeciesdb.csv dataset 1 proteinparameterdb.csv dataset 2 finalclusteringplot.html interactive Figure 1 SupplementalFigures.pdf Supplemental Figure 1: Iterative clustering strategy to generate Figure 1. a) All archaeal protein sequences from the GTDB (v220) were collected and selected for homology to archaeal and eukaryotic histones using HMMsearch against various inputs (outlined in b). Matching sequences were then filtered, physical protein parameters calculated, clustered using DBSCAN, and then unassigned sequences were removed. Co-occurrence networks (proteins in the same genome) were defined and centroids for each cluster were calculated based on nearest neighbor to center of mass of each cluster. The resulting data were plotted against three of the four features used to cluster them and colored according to cluster (1 – blue, 2 – red, 3 – orange, 4 – yellow, 5 – green). b) Number of hits obtained for a given E-value for HMMsearch (PF00125 and PF00808) or JackHMMer (Hmfb, Bd0055, H2A, H2B, H3 and H4 against all archaeal protein sequences. The inflection point denoted by the dashed line gave reproducible clusters when clustered over three randomly sampled subsets and was used to define histone hits for analysis. c) Overlap between histone sequences returned by HMMsearch using PF00125 (eukaryotic histones) and PF00808 (archaeal histones). Although there is a high degree of overlap, ~440 were unique to either archaeal or eukaryotic histone searches. Combining these two datasets resulted in robust clustering and captured the majority of diversity observed in the other search strategies. Supplemental Figure 2: Breakdown of the number of each type of histone within genomes encoding multiple histones. Most genomes with multiple histones encode two. Combination 1&2 genomes often contain unequal ratios of cluster 1 to cluster 2 histones. Genomes from combination 3&4 encode cluster 3 histones at a 1:1 ratio with cluster 4 histones, except for three cases where only two cluster 4 histones are present. Supplemental Figure 3: Number of genomes from archaeal phyla across employing a particular histone strategy. a) phyla which are represented by a large number of genomes. b) phyla with more sparse representation (note difference x-axes scale). Supplemental Figure 4: Histone strategy correlates with environmental pressure. Sampling locations for each genome in a strategy were curated from metadata and coded to common environmental pressures, then plotted as the percentage of genomes from that strategy that are associated with that pressure. Locations can be associated with multiple pressures. Pressures are ranked from most prevalent to least. Single 3 and Combination 3&4 showed a slight bias towards genomes from saline environments. Single 5 genomes are biased towards marine environments. Combination 1&2 genomes correlate with anaerobic environments. Supplemental Figure 5: Amino acid composition of histones, grouped by strategy. Contribution of each histone is normalized for length, so that larger proteins do not dominate the average composition. Overall, archaeal histones are enriched in small, hydrophobic residues and basic residues. Tryptophan and cysteine are rarer compared to the ‘universal’ proteome, likely due to their metabolic burden. Arrows denote enrichment or depletion of a particular genome compared to the generic archaeal amino acid distribution. a) All archaeal histones; b) basic singlets (single 1); c) acidic singlet (single 2), arrows denote the shift in composition from lysine to glutamate, responsible for their acidic character. d)Acidic doublets (single 3). Arrows denote the increase of aromatic residues tyrosine and phenylalanine, as well as a marked shift away from lysine in favor of arginine and an enrichment in aspartate over glutamate. e) Single 5 histones, arrows denote the decrease in abundance of alanine and lysine. f) Multiple cluster 1 histones. g) Combination 1&2 histones. Arrows indicate the increase of basic and reduction of acidic residues in cluster 1 histones (blue) over cluster 2 histones (red). h) Combination 3&4 histones. Arrows indicate the increase of arginines and decrease of lysines in cluster 3 histones over cluster 4 histones. Supplemental Figure 6: Conservation of histones by strategy. Amino acids that are conserved by more than one standard deviation than the mean conservation of an alignment are highlighted in color. Continuous clusters of residues having greater than one standard deviation of conversation are shaded in darker color for emphasis. Number of sequences in each alignment is denoted for each panel as “n=”. The average alignment length is denoted by “len=”. Conserved sequence motifs (RKTV motif, R-D clamp, RT pair, and G are boxed in the sequence, H-bonds for R-DNA clamp and RT pair are indicated). Predicted secondary structure designation (from Supplemental Figure 7) are shown to indicate histone fold elements. a-d) singles, e-i) combinations, j)shows the L1L2 loop with conserved features, as indicated in the sequence alignments (pdb 5T5K). Supplemental Figure 7: AlphaFold3 predictions of histone dimers. Models predicted from representative (‘median’) histone fold domains from each histone strategy (Table 2). Helices missing from the classic three-helix histone fold motif are indicated by circles. a) Histone dimers from representative organisms with only a single histone gene. For the single 5 histone, which contains five histone fold domains, the N-terminal two histone folds are split into separate chains and predicted as if belonging to two separate chains, whereas the C-terminal two folds were predicted as a single chain. b) Prediction of homo- and heterodimer histone fold structures from a representative of the multiple 1 strategy. c) Combination 1&2:basic and acidic histones can form homo- and heterodimers. Charged surface representation of the histone binding ridge are shown to the side of each homodimer Combination 3&4 histones were not folded together, as cluster 3 histones links two histone folds together in one polypeptide chain. All predictions have a high confidence score (IPTM). Supplemental Figure 8: AlphaFold3 prediction of representative histone tetramers. Models predicted from four histone fold domains for the median sequence of each histone strategy. a) 4HB structures from experimentally determined structures. The conserved histidine is shown in black. b) Predicted homo-tetramers from single histone strategies. The Single 1 histone can form either closed (as shown here) or open tetramers with similar levels of confidence. Tetramers from the Cluster 5 histone do not appear to form canonical histone tetramers via four-helix bundle structures. c) Homo-tetramers from the multiple 1 genome vary in their predicted ability to form canonical open histone tetramers. d) The acidic histone from combination 1&2 forms canonical histone tetramers if paired with its basic partner. In isolation, it is predicted to fold into four completely different tetramers with similar confidence (inset; α3 helices are colored in grey and yellow for histone fold dimer 1 and 2, respectively; for orientation). The cluster 3 histone from combination 3&4 in isolation forms an open tetramer, but does not combine with its cluster 4 partner. The cluster 4 histone in isolation is predicted to form ‘back-to-back’ structures (shown), as well as closed tetramers with similar confidence. Supplemental Figure 9: Predicting histone-DNA complexes with AlphaFold3. Models predicted from eight histone fold domains for a representative from each histone strategy, and 147bp of a nucleosome positioning sequence (‘601’ Widom DNA sequence). These serve as the starting structures for the simulations shown in Figure 5. a) Prediction of control structures of human and M. fervidus nucleosomes, closely resembling experimentally determined structures. b) Predictions for a basic and acidic singlet, and for the acidic doublet. For single 3, a second structure was predicted of a closely related histone (NZ_A0AIB010000141_204) that formed a closed nucleosome structure. c) the representative histones from the multiple 1 strategy appear to form nucleosome-like particles in each combination. d) Basic and acidic histones that co-exist in one genome fold into nucleosome-like structures either individually or in combination. In contrast, the acidic doublet does not combine with the acidic miniature, which on its own does not wrap DNA, nor does single 5. SupplementalTables.docx Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6985588","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":480716951,"identity":"2d6b7533-54da-4b2f-b3e0-496d1fff7d53","order_by":0,"name":"Karolin Luger","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCklEQVRIiWNgGAWjYBACCRDxwICBgQ/MrYAyeEDEATxaEoBa2MDcM1AGYS0MUJWMbURokWw/+/BBQsEdBjaJ9IuPK+fZybGx9z788KaCQY7vRgJWLdI86cYGCQbPgFpyig3Pbks2ZuM5biw55wyDsSQOLXIMaWwSCQaHQVrSJBu3HUhsk0hjkOZtY0jcgEsL/zO4lvSfjXPAWph/A7XU49IiLQG3Jf0YY2MDWAsbyJYEAxxaJGc8Ywb65TAPG88bZsmGYyC/HGOznHNGwnDmmQdYtUicT2N88OHPYTl+9vSHHxtq7ICMNuYbbyps5PmOY7cFBoARwWOAYhZe5VDAjt0do2AUjIJRMAoAXlBVlapd7VsAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0001-5136-5331","institution":"University of Colorado at Boulder and HHMI","correspondingAuthor":true,"prefix":"","firstName":"Karolin","middleName":"","lastName":"Luger","suffix":""},{"id":480716952,"identity":"7e7ef33c-78ee-4662-a12f-0f14746441cb","order_by":1,"name":"Shawn Laursen","email":"","orcid":"","institution":"University of Colorado at Boulder and HHMI","correspondingAuthor":false,"prefix":"","firstName":"Shawn","middleName":"","lastName":"Laursen","suffix":""}],"badges":[],"createdAt":"2025-06-26 17:20:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6985588/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6985588/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":86164610,"identity":"68fcedf6-a9b7-4b79-85cf-a21f641cf991","added_by":"auto","created_at":"2025-07-07 13:18:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":503062,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eArchaeal histones can be grouped into five clusters. a)\u003c/strong\u003e Clustering of 6,473 archaeal histone sequences using DBSCAN, plotted by length (number of residues), pI (isoelectric point), and GRAVY score (hydrophobicity). Sequences were clustered using these three dimensions plus a fourth, the instability index (not shown). Sequences closest to the center of each cluster in the four dimensions are denoted with a black dot; gene names for the centroids are listed in b). Control histones (human histones H2A, H2B, H3, and H4, bacterial histone Bd0055, and archaeal histone HTkA) are indicated by purple dots. An interactive version of this figure can be found in supplemental materials, all sequences are listed in supplemental spreadsheets. \u003cstrong\u003eb\u003c/strong\u003e) AlphaFold3 structure predictions of single chains of the centroids determined in \u003cstrong\u003ea), \u003c/strong\u003ealong with defining characteristic, gene name, genome, and AlphaFold3 PTM (confidence) score. N and C-termini are indicated. The linker region connecting the two histone fold domains in cluster 3 is shown in black. \u003cstrong\u003ec) \u003c/strong\u003eDistribution of isoelectric points of all archaeal histones shown in a). A multimodal distribution can be observed around pI values of 10, 8, 6.5 and 4.5. 26.7 % have an acidic isoelectric point. Isoelectric points around neutral are under-represented.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/3730649da862ca322c3bd3d8.png"},{"id":86164609,"identity":"b890be9c-7273-44f3-bb5b-bd0615af8baa","added_by":"auto","created_at":"2025-07-07 13:18:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":96380,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHistones are employed in seven major ‘histone strategies’ in archaea. a)\u003c/strong\u003e Distribution of predicted histone genes per genome. The majority of genomes contain three or fewer histones, and 33 % of genomes encode no clearly identifiable histone (using our relatively conservative cutoff). \u003cstrong\u003eb) \u003c/strong\u003eDistribution of genomes utilizing a particular histone strategy. These include genomes with only a single histone, genomes with multiples of the same type, or those with combinations of more than one type. Histone clusters are colored as in Figure 1. This study focuses on strategies that are represented by 100 or more genomes (indicated by vertical line)\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/09cdfc6707054479619cbb06.png"},{"id":86165527,"identity":"3bf39943-53cb-409b-9d32-68f63fe3ade7","added_by":"auto","created_at":"2025-07-07 13:26:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":245341,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSome, but not all, strategies are taxonomically restricted.\u0026nbsp;\u003c/strong\u003eColored dots indicate the presence of at least one genome in a taxon that employs a particular histone strategy. Histone clusters are colored as in Figure 1. \u003cstrong\u003ea\u003c/strong\u003e) Phylogeny of major archaeal groups. \u003cstrong\u003eb)\u003c/strong\u003e Phylogeny of Thermoplasmatota, where most taxa do not encode histones (highlighted in gray). Genomes encoding histones from cluster 5 are exclusive to the order Poseidoniales (families \u003cem\u003eThalassarchaeaceae\u003c/em\u003e and \u003cem\u003ePoseidoniaceae\u003c/em\u003e) and a single closely related genome (all highlighted in green). \u003cstrong\u003ec)\u003c/strong\u003e Phylogeny of the Phylum Thermoproteota showing the absence of histones from the order Sulfolobales (highlighted in gray). \u003cstrong\u003ed)\u003c/strong\u003e Phylogeny of Halobacteriota showing the exclusive presence of cluster 3 with cluster 4 (acidic doublet with acidic miniature histone) in the class Halobacteria (highlighted in orange), and the near exclusivity of the combination of single basic and acidic doublet histone, except for the genus Methanopyrus). More details are provided in Supplemental Figure 4. An interactive and expandable tree containing histone annotations of the seven major strategies is available through iTOL: https://itol.embl.de/tree/1281386427164781716417921\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/e20449f4648274c3d69afb97.png"},{"id":86164612,"identity":"317b72b8-cb04-4ca4-9507-e2871835d00d","added_by":"auto","created_at":"2025-07-07 13:18:50","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":123408,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCorrelation of histone presence and strategy with genome size, GC content, coding density, and environmental pressure. a\u003c/strong\u003e) genome sizes. \u003cstrong\u003eb\u003c/strong\u003e) GC content. \u003cstrong\u003ec\u003c/strong\u003e) gene coding density. \u003cstrong\u003ed)\u003c/strong\u003e Histone strategy versus environmental pressures (Supplemental Table 2).\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/9b64ec20db83818e8b0fce01.png"},{"id":86164620,"identity":"86e0c0e4-0a4b-4cc2-9d24-c85a04ca7686","added_by":"auto","created_at":"2025-07-07 13:18:50","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":592996,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMolecular dynamics simulations suggest that not all of the predicted nucleosome-like structures are stable.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll-atom molecular dynamics simulations of nucleosome-like particles predicted from each histone strategy (median representative, Table 2), in isolation or in combination. Simulations were started from an AlphaFold3 prediction shown in Supplemental Figure 9, using the equivalent of eight histone folds and 147 bp of Widom 601 double stranded DNA. AlphaFold models of the eukaryotic nucleosome and a nucleosome constructed from HMfA were used as controls. Simulations were run for 100 ns in triplicate. Side and face views of starting model and representative final model are shown. b) Change in DNA topology from beginning to end of simulation. RMSDs (Å) were calculated by averaging the RMSD of the DNA in each structure over the last 10 ns of the simulation to the starting frame. Because the DNA represents the topology of a nucleosome and is common to all the structures, we reasoned it was the most consistent way to monitor how much the models changed over time. Error bars represent standard error of the mean over three replicate simulations.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/421b9a392474ce4e59222872.png"},{"id":86166681,"identity":"c33e3d5c-f07f-4ad0-88cb-2ecb9bb67710","added_by":"auto","created_at":"2025-07-07 13:42:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2556100,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/f3ebbe06-1b53-4ded-8f75-c9eb91c583d5.pdf"},{"id":86165677,"identity":"6da4f348-2761-43ca-aeed-4b16b809c03c","added_by":"auto","created_at":"2025-07-07 13:34:50","extension":"csv","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":2241829,"visible":true,"origin":"","legend":"dataset 1","description":"","filename":"clusteredproteinsbyspeciesdb.csv","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/dcbf842973d59c3036915f58.csv"},{"id":86165528,"identity":"513abee3-fbdb-4dac-9c14-90db9b4a97ba","added_by":"auto","created_at":"2025-07-07 13:26:50","extension":"csv","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":3251050,"visible":true,"origin":"","legend":"dataset 2","description":"","filename":"proteinparameterdb.csv","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/41062bb516fab923fbb1663e.csv"},{"id":86164619,"identity":"0f7b2d50-2ed8-4573-925a-c34d432bd1cb","added_by":"auto","created_at":"2025-07-07 13:18:50","extension":"html","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":5851689,"visible":true,"origin":"","legend":"interactive Figure 1","description":"","filename":"finalclusteringplot.html","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/a79ccd3e7ec6a203381f9580.html"},{"id":86164623,"identity":"d98d7346-4923-4b7a-94ea-f609f4b945c9","added_by":"auto","created_at":"2025-07-07 13:18:50","extension":"pdf","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":21290941,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Figure 1: Iterative clustering strategy to generate Figure 1.\u003c/strong\u003e \u003cstrong\u003ea)\u003c/strong\u003e All archaeal protein sequences from the GTDB (v220) were collected and selected for homology to archaeal and eukaryotic histones using HMMsearch against various inputs (outlined in b). Matching sequences were then filtered, physical protein parameters calculated, clustered using DBSCAN, and then unassigned sequences were removed. Co-occurrence networks (proteins in the same genome) were defined and centroids for each cluster were calculated based on nearest neighbor to center of mass of each cluster. The resulting data were plotted against three of the four features used to cluster them and colored according to cluster (1 – blue, 2 – red, 3 – orange, 4 – yellow, 5 – green). \u003cstrong\u003eb) \u003c/strong\u003eNumber of hits obtained for a given E-value for HMMsearch (PF00125 and PF00808) or JackHMMer (Hmfb, Bd0055, H2A, H2B, H3 and H4 against all archaeal protein sequences. The inflection point denoted by the dashed line gave reproducible clusters when clustered over three randomly sampled subsets and was used to define histone hits for analysis. \u003cstrong\u003ec) \u003c/strong\u003eOverlap between histone sequences returned by HMMsearch using PF00125 (eukaryotic histones) and PF00808 (archaeal histones). Although there is a high degree of overlap, ~440 were unique to either archaeal or eukaryotic histone searches. Combining these two datasets resulted in robust clustering and captured the majority of diversity observed in the other search strategies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 2: Breakdown of the number of each type of histone within genomes encoding multiple histones. \u003c/strong\u003eMost genomes with multiple histones encode two. Combination 1\u0026amp;2 genomes often contain unequal ratios of cluster 1 to cluster 2 histones. Genomes from combination 3\u0026amp;4 encode cluster 3 histones at a 1:1 ratio with cluster 4 histones, except for three cases where only two cluster 4 histones are present.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 3: Number of genomes from archaeal phyla across employing a particular histone strategy.\u003c/strong\u003e \u003cstrong\u003ea) \u003c/strong\u003ephyla which are represented by a large number of genomes. \u003cstrong\u003eb) \u003c/strong\u003ephyla with more sparse representation (note difference x-axes scale).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 4: Histone strategy correlates with environmental pressure. \u003c/strong\u003eSampling locations for each genome in a strategy were curated from metadata and coded to common environmental pressures, then plotted as the percentage of genomes from that strategy that are associated with that pressure. Locations can be associated with multiple pressures. Pressures are ranked from most prevalent to least. Single 3 and Combination 3\u0026amp;4 showed a slight bias towards genomes from saline environments. Single 5 genomes are biased towards marine environments. Combination 1\u0026amp;2 genomes correlate with anaerobic environments.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 5: Amino acid composition of histones, grouped by strategy.\u003c/strong\u003e Contribution of each histone is normalized for length, so that larger proteins do not dominate the average composition. Overall, archaeal histones are enriched in small, hydrophobic residues and basic residues. Tryptophan and cysteine are rarer compared to the ‘universal’ proteome, likely due to their metabolic burden. Arrows denote enrichment or depletion of a particular genome compared to the generic archaeal amino acid distribution. \u003cstrong\u003ea) \u003c/strong\u003eAll archaeal histones; \u003cstrong\u003eb)\u003c/strong\u003e basic singlets (single 1); \u003cstrong\u003ec) \u003c/strong\u003eacidic singlet (single 2), arrows denote the shift in composition from lysine to glutamate, responsible for their acidic character. \u003cstrong\u003ed)\u003c/strong\u003eAcidic doublets (single 3). Arrows denote the increase of aromatic residues tyrosine and phenylalanine, as well as a marked shift away from lysine in favor of arginine and an enrichment in aspartate over glutamate.\u003cstrong\u003e e) \u003c/strong\u003eSingle 5 histones, arrows denote the decrease in abundance of alanine and lysine. \u003cstrong\u003ef) \u003c/strong\u003eMultiple cluster 1 histones.\u003cstrong\u003e g) \u003c/strong\u003eCombination 1\u0026amp;2 histones. Arrows indicate the increase of basic and reduction of acidic residues in cluster 1 histones (blue) over cluster 2 histones (red). \u003cstrong\u003eh) \u003c/strong\u003eCombination 3\u0026amp;4 histones. Arrows indicate the increase of arginines and decrease of lysines in cluster 3 histones over cluster 4 histones.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 6: Conservation of histones by strategy. \u003c/strong\u003e\u0026nbsp;\u0026nbsp;Amino acids that are conserved by more than one standard deviation than the mean conservation of an alignment are highlighted in color. Continuous clusters of residues having greater than one standard deviation of conversation are shaded in darker color for emphasis. Number of sequences in each alignment is denoted for each panel as “n=”. The average alignment length is denoted by “len=”. Conserved sequence motifs (RKTV motif, R-D clamp, RT pair, and G are boxed in the sequence, H-bonds for R-DNA clamp and RT pair are indicated). Predicted secondary structure designation (from Supplemental Figure 7) are shown to indicate histone fold elements. \u003cstrong\u003ea-d) \u003c/strong\u003esingles, \u003cstrong\u003ee-i)\u003c/strong\u003e combinations,\u003cstrong\u003e j)\u003c/strong\u003eshows the L1L2 loop with conserved features, as indicated in the sequence alignments (pdb 5T5K).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 7: AlphaFold3 predictions of histone dimers. \u003c/strong\u003eModels predicted from representative (‘median’) histone fold domains from each histone strategy (Table 2). Helices missing from the classic three-helix histone fold motif are indicated by circles. \u003cstrong\u003ea) \u003c/strong\u003eHistone dimers from representative organisms with only a single histone gene. For the single 5 histone, which contains five histone fold domains, the N-terminal two histone folds are split into separate chains and predicted as if belonging to two separate chains, whereas the C-terminal two folds were predicted as a single chain.\u003cstrong\u003e b) \u003c/strong\u003ePrediction of homo- and heterodimer histone fold structures from a representative of the multiple 1 strategy. \u003cstrong\u003ec) \u003c/strong\u003eCombination 1\u0026amp;2\u003cstrong\u003e:\u003c/strong\u003ebasic and acidic histones can form homo- and heterodimers.\u003cstrong\u003e \u003c/strong\u003e\u0026nbsp;Charged surface representation of the histone binding ridge are shown to the side of each homodimer Combination 3\u0026amp;4 histones were not folded together, as cluster 3 histones links two histone folds together in one polypeptide chain. All predictions have a high confidence score (IPTM).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 8: AlphaFold3 prediction of representative histone tetramers. \u003c/strong\u003eModels predicted from four histone fold domains for the median sequence of each histone strategy. a) 4HB structures from experimentally determined structures. The conserved histidine is shown in black.\u003cstrong\u003e b) \u003c/strong\u003ePredicted homo-tetramers from single histone strategies. The Single 1 histone can form either closed (as shown here) or open tetramers with similar levels of confidence. Tetramers from the Cluster 5 histone do not appear to form canonical histone tetramers via four-helix bundle structures. \u003cstrong\u003ec) \u003c/strong\u003eHomo-tetramers from the multiple 1 genome vary in their predicted ability to form canonical open histone tetramers. \u003cstrong\u003ed) \u003c/strong\u003eThe acidic histone from combination 1\u0026amp;2 forms canonical histone tetramers if paired with its basic partner. In isolation, it is predicted to fold into four completely different tetramers with similar confidence (inset; α3 helices are colored in grey and yellow for histone fold dimer 1 and 2, respectively; for orientation). The cluster 3 histone from combination 3\u0026amp;4 in isolation forms an open tetramer, but does not combine with its cluster 4 partner. The cluster 4 histone in isolation is predicted to form ‘back-to-back’ structures (shown), as well as closed tetramers with similar confidence.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplemental Figure 9: Predicting histone-DNA complexes with AlphaFold3.\u003c/strong\u003e Models predicted from eight histone fold domains for a representative from each histone strategy, and 147bp of a nucleosome positioning sequence (‘601’ Widom DNA sequence). These serve as the starting structures for the simulations shown in Figure 5. \u003cstrong\u003ea)\u003c/strong\u003e Prediction of control structures of human and \u003cem\u003eM. fervidus \u003c/em\u003enucleosomes, closely resembling experimentally determined structures. \u003cstrong\u003eb)\u003c/strong\u003e Predictions for a basic and acidic singlet, and for the acidic doublet. For single 3, a second structure was predicted of a closely related histone (NZ_A0AIB010000141_204) that formed a closed nucleosome structure. \u003cstrong\u003ec) \u003c/strong\u003ethe representative histones from the multiple 1 strategy appear to form nucleosome-like particles in each combination. \u003cstrong\u003ed) \u003c/strong\u003eBasic and acidic histones that co-exist in one genome fold into nucleosome-like structures either individually or in combination. In contrast, the acidic doublet does not combine with the acidic miniature, which on its own does not wrap DNA, nor does single 5.\u003c/p\u003e","description":"","filename":"SupplementalFigures.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/6a64d14417c91322a6c2c4bf.pdf"},{"id":86165530,"identity":"b6ea0560-f17a-44f3-886e-5fa3901eb2c1","added_by":"auto","created_at":"2025-07-07 13:26:50","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":17766,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementalTables.docx","url":"https://assets-eu.researchsquare.com/files/rs-6985588/v1/86e34c887af04259ecf5b737.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Histone diversity in the archaeal domain of life","fulltext":[{"header":"Introduction","content":"\u003cp\u003eHistones are small, highly basic proteins consisting of three α helices connected by two short loops (the \u0026lsquo;histone fold\u0026rsquo;) that form either homo- or heterodimers via a \u0026lsquo;handshake motif\u0026rsquo; \u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e,\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. These proteins are present in genomes across all domains of life and are also found in some viruses\u003csup\u003e\u003cspan additionalcitationids=\"CR4 CR5 CR6 CR7\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. The best-studied histones are from the eukaryotic domain where heterodimers of histone H2B-H2A and H3-H4 assemble into an octamer that wraps 147 base pairs of DNA to form nucleosomes\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Eukaryotic histones are highly conserved and ubiquitously present across the entire domain. Homologues of the four types of histones also are encoded in the genomes of several ancient double-stranded DNA viruses that infect amoeba\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. The amino acid sequences of these histones are rather divergent amongst giant viruses, and differ in many ways from those of eukaryotes. While the overall topology of nucleosomes reconstituted from these viruses appears to be conserved (at least for the two distantly related viruses where this has been studied\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e), the individual histone chains can be found in a variety of tandem, triple, and even quadruple combinations and in truncated forms in different viruses\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eA subset of bacteria also have histone-like proteins, which were likely acquired through horizontal gene transfer\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. While some of these are attached to other domains of mostly unknown function, many are standalone histones that are abundantly expressed and associated with the nucleoid\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Only two of these putative bacterial histones have been studied in detail and their interaction with DNA is markedly different to that of eukaryotic histones. Histones from \u003cem\u003eBdellovibrio bacteriovorus\u003c/em\u003e create long protein-coated DNA filaments through \u0026lsquo;edge-on\u0026rsquo; binding rather than wrapping the DNA to form discrete nucleosomes, although the binding mode on longer DNA is somewhat controversial\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. A recent preprint suggests yet another binding mode for a histone encoded by \u003cem\u003eLeptospira perolatii\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. Clearly, more research is needed to understand how histones are used in bacterial genome organization.\u003c/p\u003e \u003cp\u003eHistones are widespread in the domain of archaea. They were first discovered by John Reeve and coworkers in 1990\u003csup\u003e16\u003c/sup\u003e, and we now know that the majority of archaeal genomes encode at least one type of histone. As more archaeal species are discovered at a rapid rate through advances in genomic sequencing, there is evermore diversity to consider\u003csup\u003e\u003cspan additionalcitationids=\"CR18\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Archaeal histones exhibit much more sequence divergence than their eukaryotic counterparts\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, which are amongst the most conserved proteins known\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. Because archaea are found in many different and often punishing environments, their proteins must have evolved to cope with extreme conditions\u003csup\u003e\u003cspan additionalcitationids=\"CR23 CR24 CR25\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. Unlike in bacteria, where histone genes are sparse, histones seem to be a deeply rooted feature of archaea, occurring in most higher taxa, with the notable exception of \u003cem\u003eThermoplasmata\u003c/em\u003e (formerly \u003cem\u003eCrenarchaeota\u003c/em\u003e)\u003csup\u003e\u003cspan additionalcitationids=\"CR27\" citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. A select few archaea encode histones with tails, with the potential for post-translational modifications. These organisms are mainly from the Asgard phylum, which are thought to be most closely related to eukaryotes\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAt least two closely related hyperthermophilic archaea, \u003cem\u003eThermococcus kodakarensis\u003c/em\u003e and \u003cem\u003eMethanothermus fervidus\u003c/em\u003e, have histones that package DNA into so-called \u0026lsquo;hypernucleosomes\u0026rsquo;, slinky-like assemblies where the geometry of the DNA superhelix closely mimics the superhelix formed by stacked eukaryotic nucleosomes, using near-identical features of the histones to engage the DNA backbone\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. To date, research into archaeal histone-DNA complexes is limited to these two organisms (but see a recent preprint article\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e). In \u003cem\u003eT. kodakarensis\u003c/em\u003e, histones contribute to transcription regulation\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Additional studies utilizing molecular modeling of sequences from methanogenic archaea, and ChIP-seq in \u003cem\u003eHalobacterium salinarum\u003c/em\u003e have begun to shed light on the function of histones in these organisms\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. Here, we parsed the diversity of histone sequences in archaea by mining predicted proteins databases. We grouped archaeal histones into five major clusters based on four biophysical properties (length, isoelectric point, hydrophobicity, and instability index). We then identified seven strategies by which different organisms combine histones; employing either a single histone, or various combinations of histones in one genome. To understand possible co-dependencies between histones, we analyzed the seven strategies separately, to allow us to tease apart, for example, whether basic histones that occur as the only histone in an organism have different features compared to those that co-exist with other basic or acidic histones. We predicted the structure of the main histone combinations and inferred their ability to bind DNA using molecular dynamic simulations, providing a starting point for targeted structural and biophysical analysis.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eArchaeal histones can be grouped into five clusters\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe first identified putative histones in the predicted proteomes of all 5,869 available archaeal genomes in release 220 of the Genome Taxonomy Database (GTDB)\u003csup\u003e34\u003c/sup\u003e. Protein coding sequences in this database were predicted from single genomic assemblies representing unique species. Metadata including sampling location, genome size, and GC content were also calculated or collected. To identify histone sequences, we used HMMSearch with archaeal (PF00808) and eukaryotic (PF00125) histone PFAM models. We tested a variety of search strategies using different HMMer tools with a range of stringency cutoffs and found that HMMSearch with a liberal stringency captured most of the diversity found in the sequence space, without adding too much noise (Supplemental Figure 1).\u003c/p\u003e\n\u003cp\u003eWe then applied DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to perform unsupervised clustering of the presumptive histone sequences, using the four easily calculated physical parameters with the most variance: length, instability index, isoelectric point (pI), and hydrophobicity/GRAVY score. Full clustering details can be found in the Methods. Briefly, we optimized the clustering parameters using small randomly sampled datasets, extracted the physical parameters that define each cluster, and used those bounds to label proteins in the overall dataset (Supplemental Table 1).\u003c/p\u003e\n\u003cp\u003eWe used these physical parameters to group histones into five distinct clusters of histone-like proteins (Figure 1a): basic singlets (cluster 1, blue), acidic singlets (cluster 2, red), acidic doublets (cluster 3, orange), acidic ‘miniatures’ (cluster 4, yellow), and acidic quadruplets (cluster 5, green). \u0026nbsp;We selected the centroid sequence from each cluster and predicted their structures with AlphaFold3 (Figure 1b, Table 1). Basic and acidic singlets form the characteristic histone fold that resembles the experimentally determined structure of the basic singlet HMfB (pdb 1A7W). Cluster 3 histones comprise two histone fold domains that are linked in a single polypeptide chain (colored in black in Figure 1b), and they are predicted to form a structure that resembles the HMfB homodimer (pdb 1A7W). The acidic miniature histone (cluster 4) is predicted to have an α2 helix that is shortened by one turn and also has a very rudimentary α3 helix, and in this it resembles the bacterial histone Bd0055 (pdb 8VVX), although the latter is positively charged overall. Finally, cluster 5 histones are unusual in that they consist of a long acidic chain with four predicted histone fold motifs. Archaeal histones have a bimodal distribution in terms of their charge: overall, a\u0026nbsp;surprisingly large percentage (26.7 %) of the 7,157 predicted histones are acidic in character, while histones with neutral charge are largely absent (Figure 1c).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHistones are used in seven different strategies in the archaeal domain\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAccording to our cutoff, of the 5,869 archaeal genomes in the GTDB, 3,931 (67%) encode at least one putative histone (Figure 2a).\u0026nbsp;Because each sequence represented in Figure 1a is associated with a unique species, we were able to determine which genomes encode more than one histone and which combinations are the most prevalent. About 60% of all histone-encoding genomes have only one single histone gene from either cluster 1, 2, 3, or 5 (Figure 2a, b) and species encoding more than three histones are rare. We classify the genomes encoding only one single histone from a specific cluster as ‘single 1,2,3 or 5’, to separate them from genomes which contain different combinations of histone that also may include histones from the same clusters. Among genomes harboring more than one histone, genomes containing two or more histones from cluster 1 (basic singlets) are the most prevalent strategy, termed ‘multiple 1’ (the model organism \u003cem\u003eM. fervidus\u0026nbsp;\u003c/em\u003eis an example for this strategy). We also observe combinations of representatives from clusters 1\u0026amp;2 and clusters 3\u0026amp;4, termed combination 1\u0026amp;2 or combination 3\u0026amp;4, respectively (Figure \u003ca href=\"#_bookmark2\"\u003e2\u003c/a\u003eb, Supplemental Figure 2). Representatives from cluster 4 (acidic miniatures) are almost always paired with an acidic singlet (cluster 3), and cluster 5 histones always occur as the sole histone-encoding gene. To simplify our analysis, we focused on general trends and restricted our further analysis to these seven most prevalent combinations of histones (single 1,2,3 and 5; multiple 1; combination 1\u0026amp;2; and combination 3\u0026amp;4) which represent \u003cem\u003e\u0026gt;\u003c/em\u003e98% of histone-encoding archaeal genomes (indicated by a line in Figure 2b).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026nbsp;Some strategies are taxonomically restricted\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHistones that fall into cluster 1 (basic singlets) are widely dispersed across the entire domain of archaea and indeed seem to represent the typical archaeal histone (Figure 3a). \u0026nbsp;They occur either as the sole histone or in combination with other basic singlets throughout the domain. Acidic singlets (cluster 2) are also pervasive, either as the only histone in the genome, or paired with a basic singlet. In contrast, histones from clusters 3, 4, and 5 are phylogenetically restricted to specific taxa. In particular, histones from cluster 3 (acidic doublets) are mostly restricted to the class of Halobacteria, while representatives of cluster 5 (acidic quadruplets) are exclusive to members of the order Poseidonales (Figure 3b).\u003c/p\u003e\n\u003cp\u003eOf the two most frequent combinations of histones, combination 1\u0026amp;2 (basic and acidic singlet) occurs in large groups in Methanobacteria and Halobacteria, and in smaller groups elsewhere in the domain (Figure 3c, Supplemental Figure 3). Combination 3\u0026amp;4 (acidic doublet and acidic miniature) is restricted to Halobacteria (Figure 3d). We also confirmed previous findings that histones are exceedingly rare in the class Thermoplasmata (formerly Crenarchaeota) or in the order Sulfolobales (Figure 3b, Supplemental Figure 3)\u003csup\u003e23\u003c/sup\u003e. A full list of histones and their corresponding genomes can be found in the supplemental materials.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSelective pressures may influence strategy type\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo understand the selective pressures associated with a specific histone cluster or strategy, we scoured metadata linked with the GTDB genomes for correlations. Specifically, we focused on genome size, GC content, coding density, and sampling location. Only two of the histone strategies (single 3 and combination 3\u0026amp;4) are found in organisms with genomes that are significantly larger, and have a higher GC content than those that do not encode histones (Figure 4a, b). This is probably because increased GC content is a known adaptation to high saline environments, and it is mostly halophiles that employ this strategy\u003csup\u003e35\u003c/sup\u003e. Protein coding density is somewhat higher in genomes encoding cluster 5 histones (single 5; Figure 4c). Despite these subtle differences, our analysis does not explain why a sub-group of archaea does not appear to employ histones. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe also coded keywords found in genomic sample location annotations and found that some environmental pressures appear to correlate with specific combinations of histones (Supplemental Table 2). For example, archaea growing in anaerobic conditions tend to have combination 1\u0026amp;2 histones, and archaea found in extremely saline conditions seem to be enriched for combination 3\u0026amp;4 histones (Figure 4d, Supplemental Figure 4). It should be noted that these parameters are harder to quantify and verify, and as such the correlations have to be taken with caution.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSequence bias and conservation of archaeal histones\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo better understand the differences in physical parameters between all archaeal histones, we compiled the overall composition of amino acids in each histone cluster (Supplemental Figure 5). There is an abundance of amino acids with a high propensity to form α-helices such as alanine, isoleucine, leucine, and valine, as expected for histones which are primarily α-helical. In the different histone clusters, we saw enrichment of, or bias away from specific residues compared to the overall sequence composition of all archaea.\u0026nbsp;Notably, archaeal histones outside of cluster 1 have acidic isoelectric points (Figure 1c, Table 1). This is surprising as eukaryotic histones invariably have a positive overall charge and require basic residues (arginine and lysine) to effectively bind DNA in eukaryotes\u003csup\u003e36\u003c/sup\u003e. Besides an enrichment in acidic residues, the acidic histones from clusters 3, 4, and 5 exhibit the classic halophilic protein adaptation of a compositional bias from lysine to arginine\u003csup\u003e35,37\u003c/sup\u003e. Histones from these groups are also characterized by a higher percentage of the aromatic amino acids phenylalanine and tyrosine, which are both known to stabilize proteins in harsh environments (Supplemental Figure 5d,e,h)\u003csup\u003e37\u003c/sup\u003e. Across all archaeal histones, tryptophan and cysteine are underrepresented compared to the universal proteome, perhaps due to their metabolically expensive nature\u003csup\u003e38\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eOverall, archaeal histones are much divergent in their amino acid sequence than eukaryotic histones, which are among the most conserved proteins known\u003csup\u003e39\u003c/sup\u003e. The degree of conservation is particularly high in histones from halophilic organisms (cluster 3 and 4 histones) which could be due to a sampling bias towards closely related halophilic organisms, or due to a biological restriction to residues that facilitate function in hypersaline environments (acidic/aromatic residues, see above). \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe cluster 1 and 2 histones that exist as the sole histone in the genome exhibit no characteristic sequence features compared to the histones that co-exist with others (Supplemental Figure 3, compare panel a. b with e, f, and g). Similarly, cluster 3 histones have the same sequence features whether they occur alone or with cluster 4 histones. Cluster 4 is interesting as the length of the ~150 members is almost universally conserved to 55 amino acids and shares many of the characteristic features with the bacterial histone Bd0055\u003csup\u003e5\u003c/sup\u003e (Figure 1b). Cluster 5 histones are unique in that they always occur as the sole histone-encoding gene, and their sequences are not well conserved. Aside from the first histone fold motif in these sequences, they do not contain many of the classic histone signatures (see below), suggesting that they may have co-opted the histone fold to perform a different function in the cell.\u003c/p\u003e\n\u003cp\u003eSpecific ‘histone signature motifs’ are common to the majority of histones (boxed sequences in Supplemental Figure 6, and shown for HMfB in panel j). The ‘RKTV motif’ is located in the L2 loop connecting helices α2 to α3. While the first three amino acids in this motif (RKT) are present in nearly all archaeal histones, V is often substituted by I or L, but is universally a hydrophobic amino acid. In all known histone structures from all domains of life, this loop pairs with the less conserved L1 loop of the second histone in the histone fold dimer to form the L1L2 DNA binding motif\u003csup\u003e2\u003c/sup\u003e. In the L1 loop, the ‘RV motif’ is found throughout the majority of archaeal histones. Valine packs against the conserved hydrophobic side chain in the L2 RKTV motif to stabilize the underside of the paired L1-L2 loop, and the arginine extends into the minor groove of DNA and that is stabilized by a threonine in the RKTV motif (the RT pair). \u0026nbsp;We also note the strong conservation of a salt bridge that stabilizes the L2 loop in its critical conformation (the ‘R-D clamp’), which involves the arginine in the RKTV motif and a conserved aspartate, invariably located 7 amino acids downstream in the α3 helix of the histone fold, even in the rudimentary α3 helix in cluster 4 histones (Supplemental Figure 6i). In eukaryotic, archaeal, and viral nucleosomes for which the structures are known, the fixed L1L2 configuration poises the main chain of both loops to contact the DNA phosphodiester backbone, and orients the arginine in the L1 loop (RV motif) to point into the compressed minor groove of the DNA. \u003cstrong\u003eAs such, these conserved amino acids in the L1-L2 loops represent a universal histone signature in addition to the ability to form histone fold dimers that might be useful to identify other histone-like proteins.\u003c/strong\u003e \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eUnique to archaeal histones, a glycine in the L1 loop that we previously showed to be essential for hypernucleosome formation in \u003cem\u003eT. kodakarensis\u003c/em\u003e histone HTkA \u003csup\u003e4\u003c/sup\u003e is also highly conserved throughout histone clusters 1, 2, 3 (for class 3, only in the N-terminal histone domain), but not in clusters 4 and 5.This suggests that histone clusters 1-3 might be able to form closely stacked hypernucleosomes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStructural prediction of histone complexes: histone homo- and heterodimers\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo predict whether multiples of histones might be used to bind DNA and fold into nucleosome-like structures, we used AlphaFold3 to build models of a representative histone from each strategy as dimers or as tetrameters\u003csup\u003e40\u003c/sup\u003e. To choose unbiased representative histone candidates for each of the seven strategies, we calculated the center of mass in the four dimensions used in Figure 1 for each histone in each strategy from Figure 2 and identified the sequence closest to that point. For genomes that encode two histones, we chose a genome that encodes the histone closest to the center of mass of one of the clusters and used both histones from that genome as representatives (Table 2).\u003c/p\u003e\n\u003cp\u003eThe representatives from cluster 1, 2, and 3 form homodimers that resemble known structures of histone fold homodimers (Supplemental Figure 7). The N and C termini of single 5 histones can adopt conformations similar to histone fold dimers through intra- and interchain interactions, respectively.\u003c/p\u003e\n\u003cp\u003eBasic histones that occur in genomes together with other histones are likely able to form both homo- or heterodimers (as shown experimentally for \u003cem\u003eM. fervidus,\u0026nbsp;\u003c/em\u003ean organism that employs the multiple 1 strategy)\u003csup\u003e41\u003c/sup\u003e. In the median organism representing this strategy, the second histone is missing a well-defined α3 helix, yet this histone is also able to homo- and heterodimerize \u003cem\u003ein silico\u003c/em\u003e (Supplemental Figure 7b). \u0026nbsp;A number of histones from this cluster appear to have a prematurely terminated α3 helix, yet are still able to form homo- and heterodimers (not shown). \u0026nbsp;In the median organism combining a basic and acidic singlet within its genome (combination 1\u0026amp;2), homo- and heterodimers are predicted with similarly high levels of confidence (Supplemental Figure 7c). It is important to note that even the models of clusters with acidic charge maintain a basic putative DNA binding ridge on their outer surface. \u0026nbsp;In our representative employing the combination 3\u0026amp;4 strategy, combining one acidic doublet with one acidic miniature, the doublet folds into a structure that is very similar to a canonical histone fold dimer. The acidic miniature histone from cluster 4 is predicted to fold into a homodimer that has closer resemblance to the bacterial histone Bd0055 than to HMfA or HMfB.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSome, but not all archaeal histone-fold dimers form tetramers via a four-helix bundle structure\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe ability to form tetramers from histone fold dimers through well-defined four-helix bundle (4HB) structures is a hallmark of all canonical nucleosomes\u003csup\u003e3,4,6\u003c/sup\u003e. This interface is formed by the pairing of the C-terminal end of the long α2 helix and the α3 helix of two separate histone dimers (Supplemental Figure 8a, circled). We used AlphaFold3 to predict whether the histone fold dimers shown in Supplemental Figure 7 are capable of forming tetramers through the 4HB or any other means. \u0026nbsp;Note that we display the solution with the highest level of confidence, with the acknowledgement that in some instances alternative solutions are created with only slightly less favorable IPTM scores (see, e.g. Supplemental Figure 8d, inset).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRepresentative histones from strategies using a single cluster 2 or 3 histone are all predicted to form homo-tetramers through canonical 4HB assemblies that resemble archaeal HMfB and eukaryotic histones H2B-H4 and H3-H3’\u003csup\u003e3,4\u003c/sup\u003e. Basic singlets from ‘single 1’ may form closed tetramers (as is the case for our median histone sequence) as well as open, canonical tetramers. Note that such closed tetramers have not been observed experimentally for any archaeal histone with the typical 28 amino acid long α helix, while open tetramers have been visualized in various complexes with DNA\u003csup\u003e4,30,31\u003c/sup\u003e. \u0026nbsp;In our experience, these predictions have to be taken with a healthy dose of skepticism: for example, for HMfB, for which structures are known, AlphaFold3 predicts a closed and open tetramer as well as a ‘back-to-back tetramer’ that doesn’t involve the 4HB with closely spaced confidences, but generates an open tetramer resembling the experimentally determined structure when calculated in the presence of DNA. No combination of histone fold dimers from the single 5 representative is predicted to form higher order assemblies mediated by a 4HB (Supplemental Figure 8b, green).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOur representative basic histone that co-occurs with a second basic histone (multiple 1) is only predicted to fold tetramers from a homodimer of histone A. Histone B alone, or combined with histone A, does not form a tetramer \u003cem\u003ein silico,\u0026nbsp;\u003c/em\u003ebut we did not explore whether this is a general phenomenon of histones from this cluster (Supplemental Figure 8c). \u0026nbsp;Histones from combination 1\u0026amp;2 (basic and an acidic singlet; a wide-spread combination) are predicted to form open tetramers either from the basic histone alone, or from basic-acidic histone fold dimers (Supplemental Figure 8d). The acidic histone fold homodimer can form a tetramer through a variety of arrangements with nearly the same confidence (inset). Finally, the combination of an acidic doublet and an acidic miniature (combination 3\u0026amp;4), specific to and prevalent in halophiles, is not predicted to form a heterotetramer. While the acidic doublet forms an open ‘tetramer’ structure, the acidic miniature assembles into either ‘back-to-back’ (shown) or face-to-face tetramers with similar confidence.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAlphaFold3 predictions and simulations suggests that most histones form stable structures with DNA\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo explore whether these systems might form plausible complexes with DNA, we employed AlphaFold3 to predict structures with DNA and then evaluated their physical stability using all-atom molecular dynamics simulations. We predicted nucleosome models with the equivalent of 8 histone folds from each histone strategy and 147 bp DNA, sufficient for forming a nucleosome-like arrangement. All combinations that are able to form canonical, open tetramers via the 4HB are predicted to wrap DNA around the outside of the histone torus (Supplemental Figure 9), as is the representative that is predicted to form closed tetramers in absence of DNA (Supplemental Figure 9c, of note also the case for HMfB). Cluster 4 and 5 histones are not predicted to wrap DNA. Note that the IPTM scores are rather low for all models except for those with combination 1\u0026amp;2 (Supplemental Figure 9d).\u003c/p\u003e\n\u003cp\u003eWe ran molecular dynamics simulations of all structures that formed nucleosome-like structures for 100 ns in triplicate, to allow for relaxation and sampling of conformational flexibility. Our goal was to determine whether the AlphaFold3 models shown in Supplemental Figure 9 are energetically plausible. In these simulations,\u0026nbsp;the human nucleosome and the archaeal structure from HMfB (for which there is a structure on shorter DNA, PDB 5T5K) remained tightly wound and experienced little conformational change, as judged by minimal movement of DNA during the simulation (Figure 5b). The median representative histone from an organism employing the same strategy (multiple 1) also formed stable nucleosome-like structures. Acidic single 2 histones form plausible structures, although they seem somewhat destabilized compared to the structures formed by the basic histones. Structures predicted with the single 3 histone cluster unraveled, eventually losing the protein-protein interactions crucial to maintaining a tightly wound nucleosome. As the original median histone formed an open structure, we also simulated a second nucleosome of this type which was predicted to form a closed structure (shown in Supplemental Figure 9), but both simulations resulted in similar ‘final’ structures with open conformations. Nevertheless, even these acidic cluster 2 and cluster 3 histones maintain their interaction with DNA throughout the simulation. Remember that single 2 and single 3 strategies are mostly found in halophilic organisms and, as such, simulations should likely be performed at much higher ionic strength. Indeed, similar simulations in 2 M KCl resulted in structures that remained closed (not shown).\u003c/p\u003e\n\u003cp\u003eWe also ran simulations of nucleosomes from genomes that encode multiple histones. We analyzed these histones in isolation and in combination (Figure 5), allowing us to query whether they might require a partner to form stable nucleosomes. For the two basic histones from the genome used to represent multiple 1 genomes, nucleosomes made from the combination of both histones or just histone B remained closed over the simulation. In contrast, nucleosomes made from just histone A appeared unstable (Figure 5a, b). This\u0026nbsp;could\u0026nbsp;suggest\u0026nbsp;a\u0026nbsp;means by which accessibility to the genome might be regulated, an idea that was recently supported by experimental data\u003csup\u003e31\u003c/sup\u003e. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAs predictions were unable to assemble both histones from combination 3\u0026amp;4 into a nucleosome structure, we manually placed the cluster 4 histone (which co-occurs with cluster 3 histones) into the obvious space made when predicting the nucleosome with only cluster 3 histone dimers. We also simulated a nucleosome from cluster 3 histones in isolation. In the end, both simulation strategies resulted in unstable structures, except in one of the three combination 3\u0026amp;4 simulations where a wrapped conformation was maintained throughout the duration of the simulation. Whether this acidic doublet combines with the acidic miniature histone to form a nucleosome-like structure or not remains unknown. Likely, these structures require precisely oriented histones or high ionic stremgth to function properly, as organisms encoding them often utilize a “salt-in” strategy to cope with high levels of extracellular salt. The prediction of just the cluster 4 or cluster 5 histones alone did not result in nucleosome-like structures. If anything, these investigations highlight the limitations of AlphaFold3 in the prediction of histone assemblies beyond histone fold dimers, and emphasize the requirement for at least some degree of energy minimization.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eArchaea are a diverse group of organisms that have adapted to a wide and extreme range of environments. With billions of years of evolution, this domain of life has diversified to meet unique challenges, and these adaptations presumably include strategies to protect and package genomes. This is particularly relevant for organisms that thrive at extreme temperatures, pH, and ionic strengths, all presenting challenges to genome integrity. While some archaea package their genomes exclusively with non-histone proteins such as Cren7, Alba, or Sul7d, the majority of them rely on histones\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Our search of available archaeal genomes reveals that 67% of known archaea encode histones that we group into five clusters depending on length, charge, hydrophobicity, and instability index. Because we used a rather conservative cutoff, it is likely that this percentage could be higher. The majority of histones are predicted to comprise a single, mostly tail-less histone fold domain with basic charge.\u003c/p\u003e \u003cp\u003eFour out of five clusters of archaeal histones (26.7% of all histones) are acidic in character; they are either of canonical length (cluster 2), encode multiple histone folds in a single chain (cluster 3 and 5), or are predicted to have shorter α2 and α3 helices, resembling bacterial histones at least architecturally, if not in overall charge (cluster 4). Histones are encoded in genomes either by themselves or along with other histones, most commonly with one other histone of the same cluster, or combining two histones from different clusters. Intriguingly, 269 genomes encode an acidic and a basic histone that have \u0026lt; 60% sequence identity, displaying a diversification in histone sequences that precedes the split into H2A, H2B, H3 and H4 that must have happened in early eukaryotes.\u003c/p\u003e \u003cp\u003eUsing the structure prediction tool AlphaFold3, we predicted the structures of the ‘median histone’ for each cluster/strategy, in different oligomerization states and in complex with DNA. We deliberately chose the median histones rather than the nearest model organism to avoid bias and to best represent each cluster and strategy. Of note, many of these histones are derived from metagenomes and the corresponding organisms have not yet been cultivated.\u003c/p\u003e \u003cp\u003eOur analyses suggest that four out of the five clusters form canonical histone fold dimers, most of which tetramerize via a four-helix bundle (4HB) interface that is a hallmark of eukaryotic histone interactions. Representatives of clusters, either alone or in combinations, that tetramerize via a 4HB are predicted to organize DNA into nucleosome-like structures that remain stable in molecular dynamics simulations. These structures are similar to the experimentally determined structure of a cluster 1 histone in complex with DNA, which we showed forms a ‘hypernucleosome’ that may flex and open stochastically\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. Histone-DNA interactions are maintained throughout the simulations for representatives of most clusters and strategies, even for those that have an overall acidic character. This is probably because even they maintain the ‘basic ridge’ around their outside that serves as a DNA binding surface. Of note, our simulations have not yet considered the diverse environments that our ‘median organisms’ might dwell in. For example, cluster 3 histones are mostly found in halophiles, and as such simulations at high (\u0026gt; 2 M) ionic strength would be a better predictor of the plausibility of their histone-DNA complexes. Our data suggest that cluster 5 histones, even though they form plausible histone fold dimers, might not function in genome organization, and the role of cluster 4 histones (shorter acidic histones of highly restricted length to 55 amino acid, co-occurring with cluster 3 histones) remains unresolved. Importantly, given the limitations of AlphaFold (also demonstrated here), these predictions have to be verified experimentally.\u003c/p\u003e \u003cp\u003eWhile eukaryotes have largely selected for a narrow and conserved set of four histone sequences (plus a variety of histone variants\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e), archaea seem to be able to achieve genome organization with histones with much higher sequence diversity and using multiple combinatorial strategies. Nevertheless, the vast sequence space has brought into focus universal, functionally linked histone signatures, the RKTV motif and the R-D clamp in the L2 loop of the histone fold, the RV motif in the L1 loop, and the RT pair (Supplemental Fig.\u0026nbsp;8j). In combination, these motifs serve to rigidify the L1L2 pairing to allow it to make main-chain interaction with the phosphodiester backbone of the DNA, and to orient an arginine to protrude into the compressed minor groove of DNA (referred to as a sprocket arginine)\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. These signatures have been described over 25 years ago, and are reinforced here in a vastly expanded sequence space. The diversification in histone sequence outside of these motifs likely allows archaea to adapt to a diverse and extreme set of intracellular conditions than could not be tolerated by eukaryotic systems, and might afford them the ability to live in these environments without compartmentalizing their genomes.\u003c/p\u003e \u003cp\u003eRecently, other groups have used different tools to sample histone diversity across both archaea and bacteria. In a study by Dame and colleagues, histone sequences from archaea and bacteria were clustered into different groups based on sequence features\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. This approach led the team to focus mainly on an array of bacterial histone sequences that are fused to other functional domains and whose functions are largely unknown. The work highlighted the power of approaches like HMMSearch to find disparate sequences which may fold into similar structures.\u003c/p\u003e \u003cp\u003eOur study emphasizes the need to explore these understudied and diversified classes of histones and to explore the biology of organisms that may otherwise be overlooked. By selecting organisms that broadly sample the diversity of archaeal histones, we can allocate resources strategically to maximize discovery. As many of the organisms have never been cultured, a logical next step to this work is to use structural biology and biochemistry to uncover how these histones physically structure DNA. An intriguing addition to the sparse availability of experimental structures has recently been published as a preprint, and suggests subtleties of archaeal chromatin structures that are caused by variations in histone sequence\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. As recent breakthroughs in culturing (and, one would hope, genetically manipulating) archaea are revolutionizing the field \u003csup\u003e\u003cspan additionalcitationids=\"CR44 CR45\" citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e–\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u003c/sup\u003e, hypotheses gained from biophysical characterization could eventually be put to the test in the cell.\u003c/p\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003ePerspective\u003c/h2\u003e \u003cp\u003eOur work highlights the power of structural prediction tools such as AlphaFold, yet demonstrate that they cannot (yet) replace experimental structures and biophysical analyses. To use these predictive tools properly, context and prior knowledge are necessary to avoid over-interpretation. For example, AlphaFold predicted the tetrameric structures of many histones to adopt conformations that appeared ‘closed’, yet when reinforced with a DNA sequence that is biased in the PDB to form nucleosomes, these same histones formed nucleosome-like structures. AlphaFold and similar tools are built on massive amounts of training data and usually do well when re-predicting structures they have trained on. Some models ignore the basic laws of physics, placing atoms on top of other atoms and predicting structures that fall apart in molecular dynamics simulations (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). At least for now, and for this system, the predictions are not yet ready to stand on their own without experimental validation, especially for the more complex models beyond histone tetramer, and in the presence of DNA.\u003c/p\u003e "},{"header":"Methods","content":"\u003ch2\u003eHistone identification and HMMSearch optimization\u003c/h2\u003e\u003cp\u003ePredicted archaeal protein sequences were downloaded from GTDB, release 220 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://gtdb.ecogenomic.org/\u003c/span\u003e\u003cspan address=\"https://gtdb.ecogenomic.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). This dataset included 11,277,496 proteins from 5,869 genomes, each with a specific taxonomic lineage. 7,140 putative histones were identified using an HMMsearch against PF00125 (PFAM for eukaryotic histones) and PF0808 (PFAM for archaeal histones). To establish which confidence thresholds to use with the JackHMMER and HMM- Search, we screened a range of expectancy values (E-Value) for each search strategy that went low enough to collect no hits and went high enough to be limited by filters built into the HMM algorithm (Supplemental Fig.\u0026nbsp;1b). We noticed that most of the search strategies slowly collected hits up to an inflection point, where the number of hits began to increase rapidly. We reasoned that after this point the search models return mostly noise sequences. By iteratively clustering around this inflection point we were able to determine that hits above these E-values mostly constituted noise. Although most hits at the inflection point overlapped between search strategies, small outlier groups existed, so we combined the hits from both PFAMs around the inflection point and performed the rest of our analysis on this set (Supplemental Fig.\u0026nbsp;1c). We eventually used E = 4.0 for PF00125 and E = 0.1 for PF00808.\u003c/p\u003e\u003ch2\u003eDBSCAN clustering\u003c/h2\u003e\u003cp\u003eHistone sequences were imported with associated metadata from the GTDB. Ambiguous sequences were filtered out. Physical parameters of sequences were calculated using ProtParam from the Bio.SeqUtils python package. Histones were clustered with DBSCAN (implemented through the SciKitLearn package) optimized for a silhouette score of 0.25 (e = 0.5 n = 40) on the four parameters with the highest variance: length, pI, GRAVY, and helical propensity. Parameters were standardized prior to clustering using z-score normalization. To determine these clustering parameters, the data were randomly sub-sampled and tested with a range of parameters to optimize the silhouette score (Supplemental Fig.\u0026nbsp;1). 0.25 was chosen as a target silhouette score, as it was able to reproduce clusters reliably after many rounds of clustering. After optimizing, the parameters were applied to two additional data subsets, verifying that the same number of clusters of roughly the same size were found in each. The physical parameters of each cluster were then calculated from the three subsets to define boundaries for the whole dataset. These ranges were tested on another three random subsets that were independently clustered to verify that the labels matched 95% of points in each test. The verified ranges were then used to label all points in the overall dataset. Proteins that failed to cluster into one of the five groups were removed from further analysis. Centers of mass and nearest neighbors were calculated for each cluster in standardized space and mapped back to real space. Edges were mapped linking histones coming from the same organism. Histones were then sorted into genomes and common strategies were calculated. Taxonomic data from GTDB was then used to map histone strategies onto a taxonomic tree using iTOL\u003csup\u003e\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003ch2\u003eMetadata correlation\u003c/h2\u003e\u003cp\u003eAfter assigning histones to strategies, a practical cutoff of 100 histones per group was applied to simplify analysis. Histones from groups which did not meet the cutoff were not used for further analysis, but were still included in the database. Metadata associated with each strategy were aggregated and comparisons to genomes without histones (No histones group) were preformed using the Shiparo test from the SciPy.stats Python package. We chose this test to deal with comparisons between datasets containing uneven variances. We chose metadata that we felt were most relevant to understanding the presences of histone: genome size, genomic GC percentage, and gene coding density.\u003c/p\u003e\u003ch2\u003eEnvironmental pressure correlation\u003c/h2\u003e\u003cp\u003eWe manually extracted the location data associated with each genome and coded keywords in each location to a set of standardized locations, which encompassed most of the genomes in the dataset. We then associated each of these locations with the environmental pressure(s) they most likely impart. A full list of keywords and coding can be found within the scripts.\u003c/p\u003e\u003ch2\u003eSequence conservation\u003c/h2\u003e\u003cp\u003eConservation of histones from each group was calculated by taking the average occupancy at each position of aligned histones (aligned with MUSCLE) using a custom Python script\u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e. ’Highly conserved’ residues represent residues whose conservation is at least one standard deviation greater than the mean conservation for that alignment. Conservations were calculated for each type of histone, both before and after strategies were assigned.\u003c/p\u003e\u003ch2\u003eCompositional bias\u003c/h2\u003e\u003cp\u003eAmino acid composition of histone groups was calculated in Python using NumPy and plotted using Matplotlib. Composition was calculated on a per-residue biases, not as an aggregation of all the residues from all histones in a group. Because most histones in each group were of similar length, this normalization did not have a drastic effect, but still seemed appropriate to correct for a bias towards the composition longer sequences.\u003c/p\u003e\u003ch2\u003eStructural prediction\u003c/h2\u003e\u003cp\u003eWe predicted the structures of histones from each strategy as dimers (two histone folds), tetramers (four histone folds), and nucleosomes (with the addition of 147bp of dsDNA) using AlphaFold3, as implemented through the online server. We visually inspected each of the five models outputted by AlphaFold3 and proceeded with analysis on the highest confidence model not containing major clashes (usually the highest confidence model, i.e model 0). IPTM scores are reported in the figures.\u003c/p\u003e\u003ch2\u003eMolecular dynamics simulations\u003c/h2\u003e\u003cp\u003eAlphaFold3 nucleosome predictions were used as starting models for simulations. Models were prepped for simulation using ChimeraX\u003csup\u003e\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e. The terminal phosphate from each DNA strand was removed (to prevent simulation errors later), models were protonated, and then subjected to a few frames of MD implemented by using the ”Tug” function in ChimeraX and pulling on a single hydrogen atom at the terminus of a DNA strand. This ”Tug” step allowed the AlphaFold3 model to relax atoms and resolve clashes orders of magnitude faster than doing the same by hand. No gross topological changes were observed. All-atom molecular dynamics simulations with explicit solvent were carried out using AMBER and the ff14SB, bsc1, and tip3p forcefields (for protein, DNA, and water respectively)\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. Structures were protonated again through TLEAP and hydrogen mass repartitioned in PARMED. Structures were placed in cubic boxes surrounding the structures by at least 25 ˚A, charge neutralized using potassium and chloride ions, potassium ions, and hydrated with water molecules. The structures were energy minimized in two, 5,000 step cycles: the first restraining the protein and DNA molecules to allow solvent relaxation and the second to allow full system relaxation. Minimized structures were then heated to 300 K and slowly brought to atmospheric pressure (1.01325 atm). The systems were then simulated for 100 ns in 4 fs steps. Simulation were performed in triplicate by starting the simulation over using a different random number during the heating phase. Distances between phosphates on neighboring residues at the center of a DNA strand were calculated as a proxy for nucleosome unfolding in representative simulations.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eSupplemental information\u003c/strong\u003e\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eSupplemental tables 1 and 2\u003c/li\u003e\n \u003cli\u003eSupplementary figures 1-9\u003c/li\u003e\n \u003cli\u003eInteractive 3D chart displaying clustering data\u003c/li\u003e\n \u003cli\u003eSpreadsheet containing physical parameters of all archaeal histones\u003c/li\u003e\n \u003cli\u003eSpreadsheet containing all classified histones organized by genome\u003c/li\u003e\n\u003c/ol\u003e\u003cp\u003e\u003cstrong\u003eAcknowledgments.\u0026nbsp;\u003c/strong\u003eThis work utilized the Alpine high performance computing resource at the University of Colorado Boulder. Alpine is jointly funded by the University of Colorado Boulder, the University of Colorado Anschutz, Colorado State University, and the National Science Foundation (award 2201538)\u003csup\u003e51\u003c/sup\u003e. We also used the Blanca condo computing resource at the University of Colorado Boulder. Blanca is jointly funded by computing users and the University of Colorado Boulder\u003csup\u003e52\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding.\u0026nbsp;\u003c/strong\u003eKL and SL are supported by the Howard Hughes Medical Institute. \u0026nbsp;The\u0026nbsp;Alpine computing cluster is jointly funded by the University of Colorado Boulder, the University of Colorado Anschutz, Colorado State University, and the National Science Foundation (2201538)\u003csup\u003e51\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflicts of interest/Competing interests. \u0026nbsp;\u003c/strong\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate. \u0026nbsp;\u003c/strong\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication. \u0026nbsp;\u0026nbsp;\u003c/strong\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability. \u0026nbsp;\u0026nbsp;\u003c/strong\u003eAll underlying protein sequences and metadata were collected from GTDB. Histone sequences are provided in source data (excel spreadsheets). \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials availability. \u0026nbsp;\u0026nbsp;\u003c/strong\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability.\u0026nbsp;\u003c/strong\u003eCode used to do analysis and run simulations is available on GitHub: https://github.com/shla9937/archaeal histone diversity\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contribution. \u0026nbsp;\u0026nbsp;\u003c/strong\u003eSL conducted the analysis with input and editing from KL. SL and KL wrote the manuscript.\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eArents, G., Burlingame, R. W., Wang, B. C., Love, W. E. \u0026amp; Moudrianakis, E. N. The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e88\u003c/strong\u003e, 10148\u0026ndash;10152 (1991).\u003c/li\u003e\n\u003cli\u003eLuger, K. \u0026amp; Richmond, T. J. DNA binding within the nucleosome core. \u003cem\u003eCurr. Opin. Struct. Biol.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 33\u0026ndash;40 (1998).\u003c/li\u003e\n\u003cli\u003eLuger, K., M\u0026auml;der, A. W., Richmond, R. K., Sargent, D. F. \u0026amp; Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 \u0026Aring; resolution. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e389\u003c/strong\u003e, 251\u0026ndash;260 (1997).\u003c/li\u003e\n\u003cli\u003eMattiroli, F. \u003cem\u003eet al.\u003c/em\u003e Structure of histone-based chromatin in Archaea. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e357\u003c/strong\u003e, 609\u0026ndash;612 (2017).\u003c/li\u003e\n\u003cli\u003eHocher, A. \u003cem\u003eet al.\u003c/em\u003e Histones with an unconventional DNA-binding mode in vitro are major chromatin constituents in the bacterium Bdellovibrio bacteriovorus. \u003cem\u003eNat. Microbiol.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 2006\u0026ndash;2019 (2023).\u003c/li\u003e\n\u003cli\u003eLiu, Y. \u003cem\u003eet al.\u003c/em\u003e Virus-encoded histone doublets are essential and form nucleosome-like structures. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 4237-4250.e19 (2021).\u003c/li\u003e\n\u003cli\u003eToner, C. M., Hoitsma, N. M., Weerawarana, S. \u0026amp; Luger, K. Characterization of Medusavirus encoded histones reveals nucleosome-like structures and a unique linker histone. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 9138 (2024).\u003c/li\u003e\n\u003cli\u003eTalbert, P. B. \u0026amp; Henikoff, S. Histone variants at a glance. \u003cem\u003eJ. Cell Sci.\u003c/em\u003e \u003cstrong\u003e134\u003c/strong\u003e, jcs244749 (2021).\u003c/li\u003e\n\u003cli\u003eTalbert, P. B., Armache, K.-J. \u0026amp; Henikoff, S. Viral histones: pickpocket\u0026rsquo;s prize or primordial progenitor? \u003cem\u003eEpigenetics Chromatin\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 21 (2022).\u003c/li\u003e\n\u003cli\u003eIrwin, N. A. T. \u0026amp; Richards, T. A. Self-assembling viral histones are evolutionary intermediates between archaeal and eukaryotic nucleosomes. \u003cem\u003eNat. Microbiol.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1713\u0026ndash;1724 (2024).\u003c/li\u003e\n\u003cli\u003eValencia-S\u0026aacute;nchez, M. I. \u003cem\u003eet al.\u003c/em\u003e The structure of a virus-encoded nucleosome. \u003cem\u003eNat. Struct. Mol. Biol.\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 413\u0026ndash;417 (2021).\u003c/li\u003e\n\u003cli\u003eAlva, V. \u0026amp; Lupas, A. N. Histones predate the split between bacteria and archaea. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 2349\u0026ndash;2353 (2019).\u003c/li\u003e\n\u003cli\u003eHu, Y. \u003cem\u003eet al.\u003c/em\u003e Bacterial histone HBb from Bdellovibrio bacteriovorus compacts DNA by bending. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 8193\u0026ndash;8204 (2024).\u003c/li\u003e\n\u003cli\u003eSchwab, S. \u003cem\u003eet al.\u003c/em\u003e Histones and histone variant families in prokaryotes. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 7950 (2024).\u003c/li\u003e\n\u003cli\u003eHu, Y. \u003cem\u003eet al.\u003c/em\u003e DNA Wrapping by a Tetrameric Bacterial Histone. 2025.05.08.652872 Preprint at https://doi.org/10.1101/2025.05.08.652872 (2025).\u003c/li\u003e\n\u003cli\u003eSandman, K., Krzycki, J. A., Dobrinski, B., Lurz, R. \u0026amp; Reeve, J. N. HMf, a DNA-binding protein isolated from the hyperthermophilic archaeon Methanothermus fervidus, is most closely related to histones. \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e \u003cstrong\u003e87\u003c/strong\u003e, 5788\u0026ndash;5791 (1990).\u003c/li\u003e\n\u003cli\u003eSomboonna, N., Assawamakin, A., Wilantho, A., Tangphatsornruang, S. \u0026amp; Tongsima, S. Metagenomic profiles of free-living archaea, bacteria and small eukaryotes in coastal areas of Sichang island, Thailand. \u003cem\u003eBMC Genomics\u003c/em\u003e \u003cstrong\u003e13 Suppl 7\u003c/strong\u003e, S29 (2012).\u003c/li\u003e\n\u003cli\u003eAdams, M. W. W. Biochemical diversity among sulfur-dependent, hyperthermophilic microorganisms. \u003cem\u003eFEMS Microbiol. Rev.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 261\u0026ndash;277 (1994).\u003c/li\u003e\n\u003cli\u003eEpp Schmidt, D. J. \u003cem\u003eet al.\u003c/em\u003e Metagenomics Reveals Bacterial and Archaeal Adaptation to Urban Land-Use: N Catabolism, Methanogenesis, and Nutrient Acquisition. \u003cem\u003eFront. Microbiol.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 2330 (2019).\u003c/li\u003e\n\u003cli\u003eHenneman, B., Emmerik, C. van, Ingen, H. van \u0026amp; Dame, R. T. Structure and function of archaeal histones. \u003cem\u003ePLOS Genet.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, e1007582 (2018).\u003c/li\u003e\n\u003cli\u003ePatwal, I., Trinh, H., Golden, A. \u0026amp; Flaus, A. Histone sequence variation in divergent eukaryotes facilitates diversity in chromatin packaging. 2021.05.12.443918 Preprint at https://doi.org/10.1101/2021.05.12.443918 (2021).\u003c/li\u003e\n\u003cli\u003eWurtzel, O. \u003cem\u003eet al.\u003c/em\u003e A single-base resolution map of an archaeal transcriptome. \u003cem\u003eGenome Res.\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 133\u0026ndash;141 (2010).\u003c/li\u003e\n\u003cli\u003eHocher, A. \u003cem\u003eet al.\u003c/em\u003e Growth temperature and chromatinization in archaea. \u003cem\u003eNat. Microbiol.\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 1932\u0026ndash;1942 (2022).\u003c/li\u003e\n\u003cli\u003eReed, C. J., Lewis, H., Trejo, E., Winston, V. \u0026amp; Evilia, C. Protein adaptations in archaeal extremophiles. \u003cem\u003eArchaea Vanc. BC\u003c/em\u003e \u003cstrong\u003e2013\u003c/strong\u003e, 373275 (2013).\u003c/li\u003e\n\u003cli\u003eSiglioccolo, A., Paiardini, A., Piscitelli, M. \u0026amp; Pascarella, S. Structural adaptation of extreme halophilic proteins through decrease of conserved hydrophobic contact surface. \u003cem\u003eBMC Struct. Biol.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 50 (2011).\u003c/li\u003e\n\u003cli\u003eGuo, L. \u003cem\u003eet al.\u003c/em\u003e Biochemical and structural characterization of Cren7, a novel chromatin protein conserved among Crenarchaea. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 1129\u0026ndash;1137 (2008).\u003c/li\u003e\n\u003cli\u003eZhang, Z., Gong, Y., Guo, L., Jiang, T. \u0026amp; Huang, L. Structural insights into the interaction of the crenarchaeal chromatin protein Cren7 with DNA. \u003cem\u003eMol. Microbiol.\u003c/em\u003e \u003cstrong\u003e76\u003c/strong\u003e, 749\u0026ndash;759 (2010).\u003c/li\u003e\n\u003cli\u003eLaursen, S. P., Bowerman, S. \u0026amp; Luger, K. Archaea: The Final Frontier of Chromatin. \u003cem\u003eJ. Mol. Biol.\u003c/em\u003e \u003cstrong\u003e433\u003c/strong\u003e, 166791 (2021).\u003c/li\u003e\n\u003cli\u003eZaremba-Niedzwiedzka, K. \u003cem\u003eet al.\u003c/em\u003e Asgard archaea illuminate the origin of eukaryotic cellular complexity. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e541\u003c/strong\u003e, 353\u0026ndash;358 (2017).\u003c/li\u003e\n\u003cli\u003eBowerman, S., Wereszczynski, J. \u0026amp; Luger, K. Archaeal chromatin \u0026lsquo;slinkies\u0026rsquo; are inherently dynamic complexes with deflected DNA wrapping pathways. \u003cem\u003ebioRxiv\u003c/em\u003e (2020) doi:10.1101/2020.12.08.416859.\u003c/li\u003e\n\u003cli\u003eRanawat, H. M. \u003cem\u003eet al.\u003c/em\u003e Cryo-EM reveals open and closed Asgard chromatin assemblies. 2025.05.24.653377 Preprint at https://doi.org/10.1101/2025.05.24.653377 (2025).\u003c/li\u003e\n\u003cli\u003eStevens, K. M. \u003cem\u003eet al.\u003c/em\u003e Histone variants in archaea and the evolution of combinatorial chromatin complexity. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e (2020) doi:10.1073/pnas.2007056117.\u003c/li\u003e\n\u003cli\u003eDulmage, K. A., Todor, H. \u0026amp; Schmid, A. K. Growth-Phase-Specific Modulation of Cell Morphology and Gene Expression by an Archaeal Histone Protein. \u003cem\u003emBio\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, e00649-15 (2015).\u003c/li\u003e\n\u003cli\u003eParks, D. H. \u003cem\u003eet al.\u003c/em\u003e GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D785\u0026ndash;D794 (2022).\u003c/li\u003e\n\u003cli\u003ePaul, S., Bag, S. K., Das, S., Harvill, E. T. \u0026amp; Dutta, C. Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. \u003cem\u003eGenome Biol.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, R70 (2008).\u003c/li\u003e\n\u003cli\u003eHossain, K. A. \u003cem\u003eet al.\u003c/em\u003e How acidic amino acid residues facilitate DNA target site selection. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e120\u003c/strong\u003e, e2212501120 (2023).\u003c/li\u003e\n\u003cli\u003eTadeo, X. \u003cem\u003eet al.\u003c/em\u003e Structural Basis for the Aminoacid Composition of Proteins from Halophilic Archea. \u003cem\u003ePLOS Biol.\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 1\u0026ndash;9 (2009).\u003c/li\u003e\n\u003cli\u003eAkashi, H. \u0026amp; Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e \u003cstrong\u003e99\u003c/strong\u003e, 3695\u0026ndash;3700 (2002).\u003c/li\u003e\n\u003cli\u003eSato, S. \u003cem\u003eet al.\u003c/em\u003e Cryo-EM structure of the nucleosome core particle containing Giardia lamblia histones. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, 8934\u0026ndash;8946 (2021).\u003c/li\u003e\n\u003cli\u003eAbramson, J. \u003cem\u003eet al.\u003c/em\u003e Accurate structure prediction of biomolecular interactions with AlphaFold 3. \u003cem\u003eNature\u003c/em\u003e 1\u0026ndash;3 (2024) doi:10.1038/s41586-024-07487-w.\u003c/li\u003e\n\u003cli\u003eMarc, F., Sandman, K., Lurz, R. \u0026amp; Reeve, J. N. Archaeal Histone Tetramerization Determines DNA Affinity and the Direction of DNA Supercoiling*. \u003cem\u003eJ. Biol. Chem.\u003c/em\u003e \u003cstrong\u003e277\u003c/strong\u003e, 30879\u0026ndash;30886 (2002).\u003c/li\u003e\n\u003cli\u003eHodges, A. J. \u003cem\u003eet al.\u003c/em\u003e Histone Sprocket Arginine Residues Are Important for Gene Expression, DNA Repair, and Cell Viability in Saccharomyces cerevisiae. \u003cem\u003eGenetics\u003c/em\u003e \u003cstrong\u003e200\u003c/strong\u003e, 795\u0026ndash;806 (2015).\u003c/li\u003e\n\u003cli\u003eMethyl-reducing methanogenesis by a thermophilic culture of Korarchaeia | Nature. https://www.nature.com/articles/s41586-024-07829-8.\u003c/li\u003e\n\u003cli\u003eKohtz, A. J. \u003cem\u003eet al.\u003c/em\u003e Cultivation and visualization of a methanogen of the phylum Thermoproteota. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e632\u003c/strong\u003e, 1118\u0026ndash;1123 (2024).\u003c/li\u003e\n\u003cli\u003eLynes, M. M., Jay, Z. J., Kohtz, A. J. \u0026amp; Hatzenpichler, R. Methylotrophic methanogenesis in the Archaeoglobi revealed by cultivation of Ca. Methanoglobus hypatiae from a Yellowstone hot spring. \u003cem\u003eISME J.\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, wrae026 (2024).\u003c/li\u003e\n\u003cli\u003eRodrigues-Oliveira, T. \u003cem\u003eet al.\u003c/em\u003e Actin cytoskeleton and complex cell architecture in an Asgard archaeon. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e613\u003c/strong\u003e, 332\u0026ndash;339 (2023).\u003c/li\u003e\n\u003cli\u003eLetunic, I. \u0026amp; Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, W293\u0026ndash;W296 (2021).\u003c/li\u003e\n\u003cli\u003eEdgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 1792\u0026ndash;1797 (2004).\u003c/li\u003e\n\u003cli\u003eGoddard, T. D. \u003cem\u003eet al.\u003c/em\u003e UCSF ChimeraX: Meeting modern challenges in visualization and analysis. \u003cem\u003eProtein Sci.\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 14\u0026ndash;25 (2018).\u003c/li\u003e\n\u003cli\u003eAmberTools | Journal of Chemical Information and Modeling. https://pubs.acs.org/doi/10.1021/acs.jcim.3c01153.\u003c/li\u003e\n\u003cli\u003eAlpine | Research Computing | University of Colorado Boulder. https://www.colorado.edu/rc/alpine.\u003c/li\u003e\n\u003cli\u003eBlanca Condo Cluster | Research Computing | University of Colorado Boulder. https://www.colorado.edu/rc/resources/blanca.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003e\u003cstrong\u003eTable 1. Median values of physical parameters of histone clusters.\u003c/strong\u003e Number of sequences in each cluster, and median values for length of protein (amino acids), isoelectric point (pI), hydrophobicity (GRAVY score), instability index, and RMSD to histone HMfB from pdb 1A7W to the best aligning histone fold (\u0026Aring;) for each histone cluster were calculated after clustering in Figure\u0026nbsp;1. See Supplemental Table 2 for ranges of values of each cluster.\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 56px;\"\u003e\n \u003cp\u003eCluster\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 120px;\"\u003e\n \u003cp\u003eType\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 100px;\"\u003e\n \u003cp\u003e# of Sequences\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003eLength\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 41px;\"\u003e\n \u003cp\u003epI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eGRAVY\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003eInstability\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003eRMSD to 1A7W\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 56px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 120px;\"\u003e\n \u003cp\u003eBasic singlet\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 100px;\"\u003e\n \u003cp\u003e4969\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e70\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 41px;\"\u003e\n \u003cp\u003e9.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-0.26\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e34\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e1.978\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 56px;\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 120px;\"\u003e\n \u003cp\u003eAcidic singlet\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 100px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp;685\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 41px;\"\u003e\n \u003cp\u003e6.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-0.11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e34\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e1.055\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 56px;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 120px;\"\u003e\n \u003cp\u003eAcidic doublet\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 100px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp;536\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e143\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 41px;\"\u003e\n \u003cp\u003e4.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-0.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e41\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e1.810\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 56px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 120px;\"\u003e\n \u003cp\u003eAcidic miniature\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 100px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp;153\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e55\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 41px;\"\u003e\n \u003cp\u003e4.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-0.60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e3.310\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 56px;\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 120px;\"\u003e\n \u003cp\u003eAcidic quadruplet\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 100px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp;130\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 55px;\"\u003e\n \u003cp\u003e262\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 41px;\"\u003e\n \u003cp\u003e5.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e-0.35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e1.795\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2. Representative genome for each strategy.\u0026nbsp;\u003c/strong\u003eRepresentative genomes were identified by encoding the closest histone to the average of all histones in each strategy, using the four physical parameters from Figure 1. For strategies with multiple histones, genomes were chosen by calculating the closest to the average histones for each type within the strategy, and then choosing the genome that had the most prevalent composition of histones for that strategy (two histones for multiple 1, one of each for combinations 1\u0026amp;2 and 3\u0026amp;4). \u0026nbsp;All amino acid sequences are found in supplemental data. \u0026nbsp;\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"924\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 13.2821%;\"\u003e\n \u003cp\u003eStrategy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.7102%;\"\u003e\n \u003cp\u003eSpecies\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 26.3447%;\"\u003e\n \u003cp\u003eHistone ID(s)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 34.6872%;\"\u003e\n \u003cp\u003e# of sequences\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 13.2821%;\"\u003e\n \u003cp\u003eSingle 1\u003c/p\u003e\n \u003cp\u003eSingle 2\u003c/p\u003e\n \u003cp\u003eSingle 3\u003c/p\u003e\n \u003cp\u003eSingle 5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.7102%;\"\u003e\n \u003cp\u003e\u003cem\u003eNitrosotalea sp028867735\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003e\u003cem\u003eMethanococcoides sp021108185\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003e\u003cem\u003eHaloferax marinum\u003c/em\u003e\u003c/p\u003e\n \u003cp\u003e\u003cem\u003eMGIIa-L1 sp8725u\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 26.3447%;\"\u003e\n \u003cp\u003eJAGWFW010000004.1_41\u003c/p\u003e\n \u003cp\u003eJAIORJ010000016.1_63\u003c/p\u003e\n \u003cp\u003eNZ_WKJQ01000001.1_1390\u003c/p\u003e\n \u003cp\u003eDUJJ01000204.1_9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 34.6872%;\"\u003e\n \u003cp\u003e\u0026nbsp;1,665\u003c/p\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 225\u003c/p\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 431\u003c/p\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 123\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 13.2821%;\"\u003e\n \u003cp\u003eMultiple 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.7102%;\"\u003e\n \u003cp\u003e\u003cem\u003eJACPII01 sp016188175\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 26.3447%;\"\u003e\n \u003cp\u003eJACPII010000095.1_11,\u0026nbsp;\u003cbr\u003e\u0026nbsp;JACPII010000002.1_73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 37.9802%;\"\u003e\n \u003cp\u003e\u0026nbsp;2,782\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 13.2821%;\"\u003e\n \u003cp\u003eCombination 1\u0026amp;2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.7102%;\"\u003e\n \u003cp\u003e\u003cem\u003eSZUA-1452 sp015662385\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 26.3447%;\"\u003e\n \u003cp\u003eType 1 - DQUH01000053.1_3\u0026nbsp;\u003c/p\u003e\n \u003cp\u003eType 2 - DQUH01000042.1_12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 37.9802%;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 269\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 13.2821%;\"\u003e\n \u003cp\u003eCombination 3\u0026amp;4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 23.7102%;\"\u003e\n \u003cp\u003e\u003cem\u003eHalopenitus persicus\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 26.3447%;\"\u003e\n \u003cp\u003eType 3 \u0026ndash; NZ_FNPC01000002.1_292\u003c/p\u003e\n \u003cp\u003eType 4 \u0026ndash; NZ_FNPC01000016.1_27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 37.9802%;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; 100\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Histone, Archaea, Chromatin, Evolution, DNA binding protein","lastPublishedDoi":"10.21203/rs.3.rs-6985588/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6985588/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eArchaea represent a distinct domain of life that is genetically and biochemically unique from bacteria and eukaryotes. Two-thirds of all archaea encode histones, proteins that are ubiquitously used to structure chromatin in eukaryotes. Archaeal histone sequences are much less conserved than their eukaryotic counterparts, yet insight into how they structure DNA is limited to only a few species that fail to represent the diversity of the archaeal domain. Archaea have adapted to the most diverse and extreme environments on our planet, requiring protection of the genome against a multitude of external pressures. Here, we use bioinformatics, structure prediction, and molecular dynamics simulations to survey the diversity of histone-like sequences in all available archaeal genomes and to understand how they might interact with DNA. We have identified five distinct types of histones which are combined in seven different strategies, involving either single histones, multiple histones of the same type, or combinations of several types of histones in one genome. We show that some strategies correlate with environmental pressures, and some are phylogenetically restricted. Despite highly divergent amino acid sequences, structure predictions and simulations suggest similar histone DNA binding modes for most classes. Our work provides a guide to efficiently survey diverse strategies for histone-based DNA organization in archaea using biophysical and structural approaches, for a complete view of the rich diversity of histone strategies in the archaeal domain in a targeted manner.\u003c/p\u003e","manuscriptTitle":"Histone diversity in the archaeal domain of life","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-07 13:18:45","doi":"10.21203/rs.3.rs-6985588/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"74b13737-b565-4ae1-bc99-8094a6ebd3eb","owner":[],"postedDate":"July 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":51053041,"name":"Biological sciences/Computational biology and bioinformatics/Data mining"},{"id":51053042,"name":"Biological sciences/Molecular biology/Chromatin/Nucleosomes"}],"tags":[],"updatedAt":"2026-03-24T13:05:59+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-07 13:18:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6985588","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6985588","identity":"rs-6985588","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0