Abstract
7
Carbon monoxide dehydrogenases containing nickel-iron active sites ([NiFe]-CODHs) catalyze the 8
reversible oxidation of CO to CO₂, representing key targets for biocatalytic CO₂ red uction. Despite 9
dramatic differences in catalytic rates and O₂ tolerance between CODH variants, the molecular basis 10
for this functional diversity remains poorly understood. We applied comparative genomics and 11
synteny analysis to investigate the biochemical roles of CODH clades A -F using 1,376 CODH and 12
1,545 hybrid cluster protein sequences. Around ~30% of genomes encode multiple CODH isoforms . 13
Analysis revealed distinct gene clustering patterns correlating with biochemical function. Clades A, E, 14
and F exhib it a degree of distributional exclusivity. Clades C and D frequently co -occur with active 15
CODHs, suggesting auxiliary roles. Operon architecture analysis revealed functional specialization: 16
clade A links to acetyl -CoA synthase; clades A, E, F contain essen tial maturation machinery (CooC, 17
CooJ, CooT) correlating with catalytic activity; clade B associates with transporters; clade C with 18
electron transfer partners; clade D with transcriptional regulators. High CODH -HCP co -occurrence 19
(except clade A) suggests environmental interdependency. These findings establish clades A, E, F as 20
primary biocatalyst targets while defining regulatory functions for clades C, D, providing a genomics 21
framework for predicting CODH phenotypes. 22
Introduction
23
Genomic enzymology has been proven to help understand protein (super) families since the mid-24
1990s, helping to connect enzyme sequences to function through comparative genomics and 25
neighborhood analysis (Babbitt et al., 1996; Knox and Allen, 2023) . In this study, we are employing a 26
genome neighborhood and co -occurrence analysis to help understand reactivity and functionality of 27
the family of nickel containing carbon monoxide dehydrogenases ( [NiFe]-CODHs) and their 28
relationship to hybrid cluster proteins (HCPs). 29
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
[NiFe]-CODHs are ancient and diverse enzymes that catalyze the interconversion between carbon 30
monoxide (CO) and carbon dioxide (CO₂), a reaction of high interest for biotechnological applications, 31
including CO₂ capture and conversion. Research on th is enzyme spans over 60 years, and recent 32
studies have provided important biochemical insights, such a s their turnover frequency and oxygen 33
tolerance (Can et al., 2014) . These properties vary greatly between enzymes, not only between 34
separate phylogenetic clades, but also within clades, making functional prediction from sequence 35
alone challenging. Phylogenetic analyses of available [NiFe]-CODH (hereafter referred to as CODH) 36
sequences with different focus such as gene transfer (Techtmann et al., 2012), primary structure (Inoue 37
et al., 2018) , biodistribution (Inoue et al., 2022) and human gut microbiome (Katayama et al., 2024) 38
have enriched our understanding of this old and diverse enzyme family. From initially small data sets 39
of 17 sequences (Lindahl and Chang, 2001) to datasets well above 5000 sequences (this study). It has 40
been shown that up to eight distinct phylogenetic clades (Figure 1 ) can be distinguished with all of 41
them having sequence variations while preserving the overall fold , as seen with cryo -electron 42
microscopy (Biester et al., 2024) and x-ray crystallography (Basak et al., 2025; Domnik et al., 2017; 43
Gong et al., 2008; Jeoung et al., 2022; Jeoung and Dobbek, 2007; Wittenborn et al., 2020). 44
The biochemical characterization of this enzyme family is still ongoing, and it shows a wide range of 45
turnover frequencies as well as different degrees of O2 tolerance. For example, looking at two CODHs 46
from Carboxydothermus hydrogenoformans: ChCODH-II, a benchmark CODH known for its high CO 47
oxidation activity but low O2 tolerance, contrasts with ChCODH-IV — another enzyme from the same 48
Figure 1. Schematic phylogenetic tree of [ NiFe]-CODH, with selected CODHs marked with their respective position. Tree
was build using iqtree 2, 1000 ultrafast bootstrap, containing 5508 putative CODH sequences and one outgroup
(MBE6442607.1 hydroxylamine reductase [ Desulfovibrio desulfuricans ]) for rooting. Detailed searchable tree with
bootstrap values can be found in Supplementary File 9_Tree5.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
clade and organism — which retains 20% of activity after 1 h of O₂ exposure but displays reduced CO 49
oxidation capacity with an increased activation barrier and much lower KM (Domnik et al., 2017), both 50
belonging to clade F (Fig. 1). Similarly, a less active CODH from clade E, NvCODH (formerly known 51
as DvCODH) from Nitratidesulfovibrio vulgaris , has been reported to fully reactivate after initial 52
inactivation by O₂ exposure (Hadj-Said et al., 2015) . Also, the two CODH s from Thermococcus sp. 53
AM4, TcCODH-I and TcCODH-II belonging to clade E react slower with O2 compared to ChCODH-54
II, but have overall equal O2 sensitivity (Benvenuti et al., 2020). 55
In addition to the previously mentioned diversity within CODH ’s clades or organisms, with regards to 56
activity and oxygen tolerance , i t is known that some CODH s rely on maturases for full activation 57
while others do not. For example, RrCODH, from the phototroph Rhodospirillum rubrum, needs to be 58
expressed together with three maturases (CooC, CooJ and CooT) in order to be isolated in an active 59
form (Kerby et al., 1997). A similar situation arises for NvCODH, however, its genomic neighborhood 60
(Fig. 2) only contains one maturase (CooC) which is required for active production (Hadj-Said et al., 61
2015). On the contrary, ChCODH-II can be heterologously expressed without co -expression of any 62
maturases (Merrouch et al., 2018) . Also, ChCODH-I needs to be co -expressed with CooC in order to 63
reach high activity but it can also be expressed without it, albeit with reduced activity (Inoue et al., 64
2014). Interestingly, much of th e diversity with regards to activity, O 2 tolerance and maturase 65
dependence does not only occur between the different clades but also within them. 66
Due to the homology between the CODHs in this study and the fact that active CODHs have been 67
demonstrated from several of the clades, it is reasonable to assume that CODHs from all clades are 68
able to interconvert CO2/CO. However, a recent study by Dobbek and co -workers showed that 69
Carboxydothermus hydrogenoformans CODH-V (ChCODH-V) from clade D was not able to perform 70
this reaction (Jeoung et al., 2022) . They showed that this enzyme has a closer similarity to the family 71
of hybrid cluster proteins (HCPs), due t o its morphing active site , composed if iron, sulphur and 72
Figure 2. Operons of selected [NiFe ]-CODH. * NtCODH formally known as MtCODH form erly known as CtCODH due to
renaming of host organism (Gtari and Ventura, 2025) . ** NvCODH form erly known as DvCODH due to renaming of host
organism (Waite et al., 2020).
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
oxygen, responding with structural and stoichiometric changes t o changes in its redox state. A 73
connection between HCPs and CODHs has been pointed out previously by Inoue et al. (Inoue et al., 74
2018) due to their close phylogenetic relationship , and was further discussed by Fujishiro et al. 75
(Fujishiro and Takaoka, 2023) . HCPs can be divided in three phylogenetic classes, of which class III 76
exhibits a homodimeric structure like CODH. Generally, t he two enzyme fami lies share a similar 77
overall fold while their active sites differ grea tly, both in terms of amino acid and metallocofactor 78
composition. Similar to ChCODH-V, HCPs do not catalyze CO 2/CO interconversion but they do 79
display a range of activities at low rate, such as hydroxylamine reductase-, peroxidase-, nitric oxide 80
reductase- and S-nitrosylase activity. The main natural function of HCPs is debated but it was recently 81
established that it is most likely a nitric oxide reductase involved in nitric oxide detoxif ication (Hagen, 82
2022). 83
In this study, we contribute to paint a wholistic phylogenetic picture of CODHs by focusing on the 84
analysis of their genetic environment as well as harnessing the concept of synteny in which we use a 85
semi quantitative approach to predict characteristics of CODH, clade and subclade specific. We are 86
presenting certain clade specific trends in the operon composition in CODH . Since it is known that 87
many organisms have multiple isoforms of CODHs coded in their genome , we analyze the co-88
occurrence of CODH of different clades in an organism, as well as the co-occurrences of CODH and 89
HCP. With our findings we want to propose a systematic approach in the analysis of new CODH, with 90
the focus on identifying promising CO2 reduction catalysts, suitable for biotechnological application. 91
Results
92
Co-occurrence and Correlation. After evaluating the assemblies in regard to their count of CODH, it 93
was seen that around 30% of all assemblies encode for more than one CODH. For HCP this number is 94
much smaller, around 6%. As can be seen from Fig. 3 A, the occurrence of multiple isoforms from 95
specific clades within organisms varies. Clades B, C and D almost exclusively occur only once within 96
a genome, while clades A, E and F are more likely to co -occur with another isoform from th e same 97
clade. The overall trend that we o bserve is most likely underrepresenting the number of genomes 98
encoding multiple CODHs, since incomplete genomes are also included in these analyses . When 99
calculating the correlation of the co-occurrence of CODH from two different clades in one organism, a 100
pattern evolves (Fig. 3 B). Most obvious is the lack of co -occurrence of clades A, E and F with each 101
other. Furthermore, clade B CODH s seem to have little co -occurrence with other clades as well. 102
However, clades C and D more often co-occur with CODH from other clades, especially A, E and F. 103
Clades C and D also have a higher probability to co -occur with each other. As outlined in the 104
introduction, from biochemical studies it is known, that CODH s from clade A , E, and F are active 105
whereas CO2/CO interconversion activity is missing in CODH from clade D. This co -occurrence 106
might suggest that the redox sensing properties of CODH from clade D (and potentially clade C) are 107
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
useful for organisms already containing a functional CODH. Interestingly, a high co-occurrence was 108
also seen for CODH and HCP, with an exception for clade A CODH. 109
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Neighbor Analysis. We semi-quantitively evaluated the operon composition of 1351 CODHs (121 A, 110
130 B, 168 C, 253 D, 434 E, 245 F, Supplementary File 5_Tree1 and Supplementary File 6_Tree2 ) 111
with proteins which function we could predict using eggNOG (Huerta-Cepas et al ., 2019), the NCBI 112
product prediction and manual curation. The results are summarized in Fig. 4. In the following we 113
only report on neighbors that are encoded in the same operon as more than 10% of CODHs per clade 114
(see Table S1). Starting with CODHs fro m clade A , 93% contain a one carbon pool r elated gene in 115
their operon, followed by CooC (62%) and iron-sulfur (FeS) cluster containing protein ( 31%). As one 116
carbon pool r elated gene , we defined genes associated either with direct conversion of one carbon 117
Figure 3. (A) Count of organisms that contain one ore multiple [ NiFe]-CODH. (B) Probability matrix of co -occurrence of
[NiFe]-CODH from different clades in one organism. Raw data can be found in Supplementary File 1_Table S1.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
compounds such as format e dehydrogenases, or with the W ood-Ljungdahl pathway. Clade B CODH 118
operons mainly encode ABC transporter associated genes (64%). Furthermore, almost a quarter (24%) 119
of all CODHs from clade B could not be associated with any neighbor, and 12% are coded close to 120
transcriptional regulators. For CODH from clade C , the three main neighbors are proteins associated 121
with FeS cluster containing proteins (such as CooF) (72%), NAD(P) or FAD dependent 122
oxidoreductases (71%) and transcription (58%) or other (10%) regulation. The overall diversity of 123
neighboring proteins from clade D , and the fact that a major part of those CODH s seemingly do not 124
encode close to any other genes (64%) made it challenging to sum up their different codons, and no 125
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
clear pattern could be observed. Only transcription regulation proteins (9.9%) and general regulatory 126
proteins (9.5%) are worth mentioning in this context. Clade E and F both have a larger set of proteins 127
frequently observed in their associated operons. Operons encoding either clade E or F CODH contain 128
CooC like genes ( 59% E, 68% F), one carbon pool associated genes (49% E, 37% F) and FeS genes 129
(29% E, 53% F). Transcription regulators (17% E, 35% F) and NAD(P)/FAD -dependent 130
oxidoreductase (22% E, 42% F) have also been found. The maturation protein CooT was exclusively 131
found in operons from clade E (16%) and F (6.1%). The same holds true for CooJ but in clade F, CooJ 132
was seen in even fewer operons (12% E, <5.0% F). Additional Hydrogenases (25%) and their 133
Figure 4. (A) Phylogenetic tree of putative [NiFe]-CODH unrooted with 1376 sequences, color-coded if operon contains
one or more of a certain type of protein. Detailed searchable tree with bootstrap values can b e found in Supplementary
File 5_ Tree1 and Supplementary File 6_Tree 2. (B) Distribution of operon size for CODH and HCP genes. (C)
Proportion of [NiFe] -CODH from one clade being coded near a certain type of protein. Raw data can be found in
Supplementary File 1_Table S1.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
maturation machinery (17 %) are coded primarily for clade F CODH, as well as different types of 134
transporter proteins (11%). 135
A similar analysis of the operons encoding for HCP was performed and a total of 1476 HCP genes 136
were analyzed (class I: 1049, class II: 23, class III: 404, Supplementary File 7_Tree3 and 137
Supplementary File 8_Tree4 ) showing a low frequency of isoforms within organisms (Fig . S1). HCP 138
exhibited a large variety of neighbors, leading to difficulties in extracting meaningful information 139
from their operon composition (Fig. S2). Furthermore, class I and III had a high proportion of entries 140
without any neighbors (49% and 75%, respectively) which is reflected in their tendency to have fewer 141
proteins coded in their operons (see Fig. 4 B). However, we observed a high frequency of FeS cluster 142
proteins (18%) and transcription regulators ( 17%) for class I, as well as NAD(P) or FAD 143
oxidoreductases (96%) and transport proteins (78%) for class II HCP. It needs to be noted , however, 144
that our sample set for class II HCP is very small , so its information value is considerably lower 145
compared to the other classes/clades. 146
Discussion
147
Our analysis reveals substantial diversity in the occurrence, co-occurrence, and genomic context of 148
CODH and HCP genes, suggesting complex evolutionary and functional relationships within and 149
across microbial lineages. The observed differences in the frequ ency of multiple isoforms per genome 150
(~30% for CODH versus ~6% for HCP ) indicate that CODH s are more often retained in multiple 151
copies, potentially pointing to functional diversification among its isoforms. Similar values have been 152
reported by Techtmann et al., where they found that a striking 43% of organisms coded for more than 153
one CODH (Techtmann et al., 2012). On the other hand, Katayama et al., investigating only the human 154
gut microbiome found a number as low as 5.5% (Katayama et al., 2024). We suspect that this number 155
underrepresents the amount of organisms carrying multiple isoforms of CODH in the human gut, since 156
data refinements that exclude potential CODHs were performed (such as the strict requirement for a 157
[4Fe-4S] D-cluster, even though Inoue et al. reporte d on the diversity of the D -cluster(Inoue et al., 158
2018)). 159
There are many examples of organisms coding for multiple CODH isoforms, as outlined in Fig. 3 160
(Supplementary File 10_Tree6 and Supplementary File 11_Tree7) . Many of them have been known to 161
literature for a long time (even though their CODH abundance has only been discussed sporadically ) 162
with the most famous example being C. hydrogenoformans encoding five different CODHs (Wu et al., 163
2005). Another interesting example is Clostridium formicoaceticum, since this organism has a total of 164
six CODH isoforms encoded in its genome. It needs to be noted, that in our analysis, this organism did 165
not show up as an organism with six CODHs, see Fig. 3 A. This is due to our analysis only counting 166
CODH stemming from the same organism when their genes are associated with the same genome 167
assembly. We therefore rather u nderestimate counts of organism with multiple CODHs, such as the 168
aforementioned. The only CODH from C. formicoaceticum that has so far been isolated, characterized 169
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
and discussed is one of its clade E CODHs, that is associated with acetyl-CoA synthase (ACS) (Bao et 170
al., 2019; Diekert an d Thauer, 1978) . Other examples from literature attempted to investigate the 171
influence of CODH isoforms on the metabolism. Archaeglobus fulgidus contains three CODH genes, 172
two fr om clade A and one from clade D. Its clade D CODH seem s to have no role in the CO 173
metabolism of this organism (Hocking et al., 2015) . There also has been a report for the organism 174
Methanosarcina acetivorans , which harbors three CODH s, all of them belonging to clade A, where 175
only two of them ar e associated with ACS and are believed to be involved in the CO metabolism, the 176
other one being a lone gene and is seemingly not involved in carbon metabolism (Matschiavelli et al., 177
2012). Another interesting example is Thermoanaerobacter kivui, formerly known as Acetogenium 178
kivui (COLLINS et al., 1994) . When TkCODH-I (clade C) is deleted from the orga nism, the strain 179
loses its ability to grow slowly on CO, however , if grown on H 2+CO2 the overall acetate production is 180
greatly increased (Jain et al., 2021) . Similar effects have been shown for Clostridium 181
autoethanogenum, which contains three isoforms of CODH, and if its clade C CODH is deleted, it’s 182
lag phase is reduced and its growth rate is greatly increased (Liew et al., 2016) . The other two CODH 183
isoforms from this organism are from clade E, and D. Deletion of clade D CODH showed no 184
immediate effect on the organism, except moderately lower overall biomass yield (Liew et al., 2016) . 185
In our analysis we saw an increased frequency of co -occurrence of clades C and D with A, E, or F , 186
which together with biological data, may indicate a complementary role, possibly linked to redox 187
sensing or regulatory functions. This is especially evident with the examples for clade C CODHs from 188
T. kivui and C. autoethanogenum. For the case of clade D, which lacks catalytic activity towards 189
CO/CO2 interconversion (as reported by Jeoung et al. (Jeoung et al., 2022) through their recombinant 190
production of ChCODH-V) and until now has unknown influence in the metabolism that might only 191
manifest in harsher environments, since its believed to be involved in stress response (Jeoung et al., 192
2022) (similar to HCPs(Hagen, 2022), see below). However, experimental proof for this claim is still 193
missing. 194
Furthermore, clades A, E, and F rarely co-occur. Interestingly, many organisms do however contain 195
multiple copies of CODH s from one of these clades, such as M. acetivorans (clade A), 196
C. hydrogenofromans (clade F), and C. formicoaceticum (clade E). We suspect an evolutionary reason 197
behind this, as is also outlined by others (Adama et al., 2018; Lindahl and Chang, 2001) . Biochemical 198
data indicate that CODHs from these clades possess CO/CO₂ interconversion capability, as previously 199
mentioned. This is also in line with the genetic context of these CODH s, which is most often tuned for 200
this CO/CO2 interconversion chemistry (Fig. 4 C), see below. 201
The high rate of co-occurrence between CODH and HCP genes (except for clade A) suggests 202
functional integration, a shared metabolic niche, or involvement in a coordinated response to redox 203
stress, given that HCPs are thought to regulate nitric or oxidative stress(Hagen, 2022). The lack of co-204
localization for clade A CODHs might point to distinct metabolic roles or evolutionary constraints , or 205
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
on the high rate of archaeal genes in clade A CODHs, even though HCP genes are known to also be 206
found in archaea (Hagen, 2022) . Interestingly, the co -occurrence between clade D CODH and HCP 207
seems the highest for our data set, the reason for that is unclear . Similar to clade D CODH, empirical 208
proof that HCP influences the activity or expression of CODH is missing. 209
The genomic context analysis adds another layer of functional inference, as has been done before by 210
others with different foci (Inoue et al., 2018; Katayama et al., 2024; Matson et al., 2011; Techtmann et 211
al., 2012). We could show again that o perons containing clade A CODHs are highly conserved with 212
one carbon pool-related genes and CooC, as they are almost exclusively found as part of the Wood-213
Ljungdahl pathway , which has a typical arrangement similar to Methanosarcina barkeri s CODH 214
(MbCODH, Fig 2, S upplementary File 6_ Tree2). Recently, another representative from this group 215
from M. thermophila (MetCODH) has been resolved (Biester et al., 2024). 216
In contrast, clade B CODHs appear largely alone or associated with transport-related genes, raising the 217
possibility of a non -canonical or even degenerated function. Its operon composition is also rather 218
consistent and its arrangements only varies to a small extent , as ABC transporters are either coded 219
upstream (as for Ruminococcus flavefaciens ’ CODH, RfCODH, see Fig. 2 ) or downstream of the 220
CODH gene. Almost all operons analyzed do not contain any maturases, expect for a small cluster that 221
branches off rather early in the tree (Supplementary File 6_ Tree2). This might indicate that the need 222
for a maturase was lost due to re-purposing of the CODH. We yet await biochemical characterization 223
of any clade B CODH. 224
Clade C CODHs are associated with FeS cluster proteins , regulators and redox enzymes, pointing 225
towards more regulatory or redox -modulatory roles , which could also be indicated in knock -out 226
studies (Liew et al., 201 6). The only isolated example from this clade is TkCODH-I. Its operon 227
exhibits a composition only partially representable for clade C CODHs , contai ning only one other 228
gene coding for a FeS protein (Fig. 2). Furthermore, TkCODH-I’s sequence branches of f early and 229
seems to be rather distinct (Supplementary File 9_ Tree5), only having one other close relative from 230
Aceticella autotrophica (Frolov et al., 2023) . Furthermore, the reported isolated CODH from Jain and 231
co-workers (Jain et al., 2021) stems from a CO adapted strain (Weghoff and Müller, 2016) , which 232
might harbor mutation s in the protein sequences that are not accessible to us at the moment. Drawn 233
together, we conclude that right now TkCODH-I might not be a n optimal representative for clade C 234
CODHs and more clade C CODHs should be isolated to help us understand their biochemical 235
properties better. 236
The high operonic variability and frequency of solitary coding regions in clade D might reflect either 237
evolutionary drift or multifunctionality not restricted to operonic structure. Clade D will therefore not 238
be discussed further. 239
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Operons from clades E and F are more functionally complex, including components from the Wood-240
Ljungdahl pathway, hydrogenases, and additional redox partners, consistent with a diverse metabolic 241
role. That being said, their operon compositions and arrangement s showcased some interesting 242
clustering. Starting with clade F, which has the highest proportion of CODHs that might be associated 243
with hydrogenases, including the aforementioned ChCODH-I and RrCODH, both interacting with 244
hydrogenases to produce hydrogen in vivo (Fox et al., 1996; Soboh et al., 2002) , however with greatly 245
differing operon compositions (Fig. 2) . ChCODH-I like operons (see Supplemetary File 6_ Tree2) 246
contain their hydrogenase modules directly within the CODH operon, whereas RrCODH-like operons 247
do not include the hydrogenase module which is coded upstream of the CODH gene , with an 248
intergenic space of > 400 bp(Fox et al., 1996). RrCODH-like operons are the only clade F operons that 249
include two additional maturation enzymes , CooT and CooJ. Clade F also contain many ACS 250
associated CODHs, such as Neomoorella thermoacetica CODH, NtCODH, formerly known as 251
Moorella thermoacetica (Gtari and Ventura, 2025) . Those NtCODH-like operons all have the same 252
arrangement. This arrangement is distinct from Clade A and E ACS associated CODH operons. 253
In our dataset most of the hydrogenases and their maturation genes are associated with clade F, 254
suggesting active hydrogen metabolism, coupling CO oxidation to H₂ production or consumption as 255
has been suggested earlier for a wider range of CODH clades (Inoue et al., 2018; Techtmann et al., 256
2012). We believe that our data grossly underestimates this relat ionship overall, since operon 257
examples such as RrCODH and TcCODH-II showcase that hydrogenases associated with a CODH are 258
not necessarily encoded in the same operon. There has also been a report of a clade A CODH from the 259
methanogen M. thermophila (Terlesky and Ferry, 1988) being associated with a hydrogenase. 260
However, in later studies it was shown that the genome of M. thermophila does not contain a 261
hydrogenase (Smith and Ingram-Smith, 2007) . Investigation of the electron transport chain of its 262
membrane could not find a hydrogen oxidizing complex (Welte and Deppenmeier, 2011) . Together 263
with our analysis we conclude that hydrogenase association is a trait almost exclusive to clade E and F 264
CODHs. ChCODH-II’s operon seems to be rather uniquely constructed as a similar operon 265
composition only can be found for other Carboxydothermus species. A similar situation can be seen 266
for ChCODH-IV, where its operon containing FeS and NAD/FAD -dependent oxidoreductases is 267
closer in similarity to some clade E CODH. 268
Regarding the biggest clade, clade E, its diversity is striking . NvCODH, formerly known as DvCODH 269
(Waite et al., 2020) , from the organism Nitratidesulfovibrio vulgaris , has a very small operon with 270
only two genes in its close proximity, a transcriptional regulator (Zhou et al., 2012) and a maturation 271
enzyme (CooC), see Fig. 2 . This is seen for a huge number of both clade E and F CODHs. The 272
occurrence of neither CooJ nor CooT is striking, both only appear in two very distinct parts of clade E, 273
all of them being associated with one carbon pool metabolism, with one exce ption from Clostridium 274
pasteurianum BC1, which more resembles the clade F RrCODH-like operon . The previously 275
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
introduced archaeal CODHs, TcCODH-I and TcCODH-II from clade E, are contained in operons (Fig. 276
2), that are rather specific and only found for a few other Thermococcus or Pyrococcus species 277
(Benvenuti et al., 2020; Kim et al., 2015) . It needs to be noted, that TcCODH-I’s CooC gene is coded 278
outside of its operon, and on the opposite strand . The CooC gene is therefore not inc luded in our 279
analysis. We are only aware of examples from this type of operon, and don’t expect this to be a 280
common trait of the CODH maturation machinery. However, it needs to be noted that the CooC 281
proportion might be slightly underestimated. Interestingly, TcCODH-II like CODH all contain a CooT 282
like protein in their operon, forming the only cluster of CODH that contain only CooT like proteins 283
without CooJ . Within clade E, another unique genomic neighborhood from TkCODH-II must be 284
pointed out. From experimental data it is known that this CODH is associated with ACS (Jain et al., 285
2021), however, in our analysis we did not see this ACS complex in TkCODH-II’s operon. This is due 286
to the ACS subunit being coded further downstream of the CODH gene, not being taken into account 287
due to our initial parameters. 288
For HCPs, the high variability and low operon density — especially in classes I and III — point 289
towards more modular or conditionally expressed roles , similar to clade D CODH . The clear 290
patterning in class II operons, though based on a limited sample, may reflect specialized functions, 291
perhaps in niche-specific oxidoreductase activities. 292
Conclusion
293
As previously mentioned, the aim of this study is to identify which CODH clades harbor the most 294
promising enzymes for future application in CO 2 reduction. The operon composition of CODHs from 295
different clades show distinct differences and what we could gather from this information is that clade 296
A, E and F are the most likely clades to harbor CODHs able to efficiently convert CO2 to CO. These 297
clades are therefore the most interesting for CO2 reducing biotechnological applications, or as 298
inspiration for new synthetic catalysts. Also, literature has shown that the activity of many CODHs 299
depend on co -expression with maturation proteins such as CooC, and in some cases, CooJ and CooT 300
are also required for full activation. Although some CODHs (most notably CODH -II from C. 301
hydrogenoformans) can function independently of maturases, our neighborhood analysis indicates that 302
maturase-coding genes are predominantly found in operons from clades A, E, and F. This pattern 303
implies yet again that these clades may represent more biochemically acti ve or catalytically optimized 304
CODHs, making them promising targets for future functional studies and biotechnological 305
applications. 306
The function of Clade B could not be deduced based on its genomic environment but it seems to have 307
a remarkable self-standing function, that is not shared with any other CODHs. Its low co -occurrence 308
with other CODH clades within organisms also supports a unique role for Clade B . Clades C and D 309
are more likely to show low or even no activity towards CO 2/CO interconversion, as was deduced 310
from literature and the lack of C1 metabolism related genes in their operon s. However, Jain et. al. 311
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
recently showed low CO oxidation activity in a clade C CODH fro m T. kivui , but this enzyme 312
originated from a strain that had acquired the ability to grow on CO through laboratory evolution (Jain 313
et al., 2021) . Sequence data used to classify the CODH into clade C was from the original strain 314
(incapable of growing on CO) and data on the engineered strain is n ot available. It is therefore not 315
known whether the active CODH is the wild -type or an engineered enzyme and we cannot draw any 316
Conclusions
regarding the activity of clade C CODH s. Taken together, this makes clades B, C and D 317
less promising in the hunt for CO2 reduction catalysts. However, much is still unknown about these 318
enzymes such as their cellular function. 319
Future work should focus on experimental validation of the functional differences between CODH 320
isoforms, particularly in organisms where multiple clades co-occur. Additionally, transcriptomic and 321
proteomic studies could illuminate condition -dependent expression patterns and confirm proposed 322
regulatory functions. Finally, deeper phylogenomic analyses may reveal the evolutionary drivers 323
behind the observed distribution and diversification of these ancient redox enzymes. 324
Methods
325
Data collection and refinement . Multiple pBLAST searches (BLOSUM62, E < 0.05) in the NCBI 326
database were carried out using NCBI accession numbers provided by Inoue et al. (Inoue et al., 2018) 327
(A-1, WP_011305243; A-2, WP_010878596; A-3, OGW06734; A-4, OIP92259; A-5, ODS42986; A-328
6, OIP30420; B -1, WP_026514536; B -2, WP_015485077; B -3,WP_012645460; B -4, 329
WP_011393470; C -1, WP_039226206; C -2, WP_013237576; C -3, WP_010870233; C -4, 330
WP_044921150; D -1, WP_011342982; D -2, WP_015926279; D -3, WP_079933214; D -4, 331
WP_096205957; E -1, WP_012571978; E -2, WP_010939375; E -3, WP_088535808; F -1, 332
WP_011343033; F -2, WP_011389181; G -1, OGP75751) and Techtmann et al. (Techtmann et al., 333
2012) (mini CooS, WP_007288589.1). CODH from clade H (Inoue et al., 2022) was omitted due to 334
limited host information. Duplicates were removed using seqkit’s(Shen et al., 2016) rmdup. Sequences 335
of length below 400 amino acids (aa) were remov ed. Clustering was performed to further reduce data 336
size, by using cd-hit(Li et al., 2002, 2001; Li and Godzik, 2006) and a global sequence identity of 99% 337
or 90%, the later only used for tree generation. It was necessary to have high sequence similarity in the 338
clustering within organisms, since it was known that some organisms have multiple CODH with 339
striking sequence similarities in their genome such as Clostridium pasteurianum BC1(taxid: 86416) 340
that contains WP_015614757.1 and WP_015615315.1 with 93.27% simila rity. For the dataset 341
involved in neighbor analysis , taxonomic information for each sequence was retrieved using R-342
packages taxize (Chamberlain et al., 2020; Chamberlain and Szocs, 2013) and taxizedb (Chamberlain 343
et al., 2025) , and only sequences that could be related to a recor ded organism were kept 344
(Supplementary File 3_Table S3 and Supplementary File 4_Table S4) . Sequences were aligned using 345
E-INS-I from mafft (Katoh and Standley, 2013) and sequences that had gaps in important positions 346
related to D, B or C cluster or acid base active site residues were sorted out. The alignment was 347
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
trimmed using trimAl’s (Capella-Gutiérrez et al., 2009) automated1 option and a tree was generated 348
using FastTree (Price et al., 2010) . Via visual inspection further sequences that were not CODH 349
sequences were removed. The final list of CODH sequences used in the neighbor and correlation 350
analysis counted 1376. A sim ilar approach was done for HCP ( class I, Q01770.2; class II, 351
WP_000458809.1; class III, WP_013294878.1) and a final count of 1545 sequences was collected for 352
neighbor and correlation analysis. Another set of CODH genes was curated using the 90% cd -hit cut-353
off. It was aligned using mafft’s FFT-NS-1, sequences that had gaps in important positions related to 354
D, B or C cluster or acid base active site residues were sorted out . The alignment was trimmed using 355
trimAl. An initial tree was built using FastTree. Further sequences were removed after visual 356
inspection, which yielded a final dataset containing 5508 sequences. See Fig. S3 for detailed 357
flowchart. Custom code can be found and retrieved for github (Böhm, 2025a, 2025b, 2025c). 358
Neighbor analysis. Genome information was downloaded for the accession number lists generated for 359
CODH and HCP, which lead to the download of 955 and 1425 genomes, respectively. As neighboring 360
gene we defined a maximum of 15 genes upstream and downstream of the target gene , that had a 361
maximum intergenic distance of 300 base pairs (bp), as was done previously by Inoue et al. (Inoue et 362
al., 2018). We decided to use this rather large intergenic distance to include as many neighbors as 363
possible, and we expect that unrelated genes will disappear in the noise. For the same reason, we 364
included an overlap region of 50 bp for genes in the same o peron, which is rather high, as genes for 365
example in E. coli usually overlap 4 to 1 bp (Johnson and Chisholm, 2004) . Aa sequences for those 366
genes were retrieved from the NCBI database using entrez (Sayers, 2022) , and their fu nction was 367
predicted using eggnog (Huerta-Cepas et al., 2019) , the results from eggNOG as well as the product 368
prediction from NCBI were taken into account in the manual placing of selected functional groups. 369
The data was plotted using R (R Core Team, 2023) , tidyverse (Wickham et al., 2019) , 370
patchwork(Pedersen, 2025), ggnewscale(Elio Campitelli et al., 2025), ggtree (Yu et al., 2018, 2017) , 371
ggtreeExtra(Xu et al., 2021), and treeio(Wang et al., 2020) and for gene maps gggenes (Wilkins, 2023) 372
was used. Since CooJ determination was neither possible with the NCBI prediction nor via eggNOG, 373
we selected operons from clade E and F that contained CooS and CooT, and manually extracted some 374
accession numbers of potential CooJs which were used to PSI -BLAST (BLOSUM45, E < 0.001) for 375
further accession numbers, summary can be found in S upplementary File 1_Table S1. These numbers 376
were used to help annotate potential CooJs in our analysis, 68 potential CooJ genes could be 377
identified. 378
Correlation analysis . Correlation of CODH and HCP from different clades/classes was calculated 379
according the formula 380
𝑃(𝑋|𝑌) =
𝑁𝑋𝑌
𝑁𝑌
, 381
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
where N Y is the total number of assemblies containing protein from clade/class Y, N XY is the total 382
number of assemblies containing proteins from both clade/class X and Y, and P(X|Y) is the probability 383
that a genome coding for a protein from clade/class Y also codes for a protein from clade/class X. 384
Tree generation . In total, five trees were generated. Trees carrying phylogenetic information were 385
generated via iqtree2 (Minh et al., 2020) with the LG+I+R10 model and ultrafast bootstrapping with 386
1000 resampling for a dataset of 5508 CODH sequences, a dataset o f 1351 CODH sequences, and a 387
dataset of 1476 HCP sequences (see above for details on their generation) . For the 5508 sequence 388
CODH dataset an outgroup was introduced to root the tree ( MBE6442607.1). Sequences were aligned 389
within their dataset using mafft’s FFT-NS-2. The alignment was again trimmed using trimAl and built 390
using iqtree2 with the above parameters. For tree inspection and plotting ggtree (Yu et al., 2017) was 391
used. The two other trees generated are taxonomic trees, either only on taxid using a custom python 392
script and ete3 (Huerta-Cepas et al., 2016) , or from WoL: Reference Phylogeny for Microbes (Zhu, 393
2023; Zhu et al., 2019). 394
Acknowledgements
395
The Novo Nordisk Foundation (Grant reference number NNF21OC0066716) is gratefully 396
acknowledged for funding. 397
References
398
Adama PS, Borrela G, Gribaldoa S. 2018. Evolutionary history of carbon monoxide 399
dehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexes. Proc Natl Acad 400
Sci U S A 115:E5836–E5837. doi:10.1073/pnas.1716667115 401
Babbitt PC, Hasson MS, Wedekind JE, Palmer DRJ, Barrett WC, Reed GH, Rayment I, Ringe D, 402
Kenyon GL, Gerlt JA. 1996. The enolase superfamily: A general strategy for enzyme-403
catalyzed abstraction of the α -protons of carboxylic acids. Biochemistry 35:16489–16501. 404
doi:10.1021/bi9616413 405
Bao T, Cheng C, Xin X, Wang J, Wang M, Yang S-T. 2019. Deciphering mixotrophic Clostridium 406
formicoaceticum metabolism and energy conservation: Genomic analysis and experimental 407
studies. Genomics 111:1687–1694. doi:10.1016/j.ygeno.2018.11.020 408
Basak Y, Lorent C, Jeoung J-H, Zebger I, Dobbek H. 2025. Metalloradical-driven enzymatic CO2 409
reduction by a dynamic Ni–Fe cluster. Nat Catal 1–10. doi:10.1038/s41929-025-01388-5 410
Benvenuti M, Meneghello M, Guendon C, Jacq-Bailly A, Jeoung JH, Dobbek H, Leger C, Fourmond 411
V, Dementin S. 2020. The two CO-dehydrogenases of Thermococcus sp. AM4. Biochim 412
Biophys Acta Bioenerg 1861:148188. doi:10.1016/j.bbabio.2020.148188 413
Biester A, Grahame DA, Drennan CL. 2024. Capturing a methanogenic carbon monoxide 414
dehydrogenase/acetyl-CoA synthase complex via cryogenic electron microscopy. Proceedings 415
of the National Academy of Sciences 121:e2410995121. doi:10.1073/pnas.2410995121 416
Böhm M. 2025a. protein-to-genome. https://doi.org/10.5281/zenodo.16736767 417
Böhm M. 2025b. protein-per-organism. https://doi.org/10.5281/zenodo.16736754 418
Böhm M. 2025c. protein-neighbours. https://doi.org/10.5281/zenodo.16736722 419
Can M, Armstrong FA, Ragsdale SW. 2014. Structure, function, and mechanism of the nickel 420
metalloenzymes, CO dehydrogenase, and acetyl-CoA synthase. Chem Rev 114:4149–74. 421
doi:10.1021/cr400461p 422
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated alignment 423
trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. 424
doi:10.1093/bioinformatics/btp348 425
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Chamberlain S, Arendsee Z, Stirling T. 2025. taxizedb: Tools for Working with “Taxonomic” 426
Databases. doi:10.5281/zenodo.1158055 427
Chamberlain S, Szocs E. 2013. taxize - taxonomic search and retrieval in R. F1000Research 2. 428
Chamberlain S, Szoecs E, Foster Z, Arendsee Z, Boettiger C, Ram K, Bartomeus I, Baumgartner J, 429
O’Donnell J, Oksanen J, Tzovaras BG, Marchand P, Tran V, Salmon M, Li G, Grenié M. 430
2020. taxize: Taxonomic information from around the web (manual). 431
COLLINS MD, LAWSON PA, WILLEMS A, CORDOBA JJ, FERNANDEZ-GARAYZABAL J, 432
GARCIA P, CAI J, HIPPE H, FARROW JAE. 1994. The Phylogeny of the Genus 433
Clostridium: Proposal of Five New Genera and Eleven New Species Combinations. 434
International Journal of Systematic and Evolutionary Microbiology 44:812–826. 435
doi:10.1099/00207713-44-4-812 436
Diekert GB, Thauer RK. 1978. Carbon Monoxide Oxidation by Clostridium thermoaceticum and 437
Clostridium formicoaceticum. Journal of Bacteriology 136:597–606. 438
doi:10.1128/jb.136.2.597-606.1978 439
Domnik L, Merrouch M, Goetzl S, Jeoung JH, Leger C, Dementin S, Fourmond V, Dobbek H. 2017. 440
CODH-IV: A High-Efficiency CO-Scavenging CO Dehydrogenase with Resistance to O2. 441
Angew Chem Int Ed Engl 56:15466–15469. doi:10.1002/anie.201709261 442
Elio Campitelli, Teun van den Brand, olivroy. 2025. ggnewscale: Multiple Fill and Color Scales in 443
ggplot2. doi:10.5281/ZENODO.2543762 444
Fox JD, He Y, Shelver D, Roberts GP, Ludden PW. 1996. Characterization of the region encoding the 445
CO-induced hydrogenase of Rhodospirillum rubrum. Journal of Bacteriology 178:6200–6208. 446
doi:10.1128/jb.178.21.6200-6208.1996 447
Frolov EN, Elcheninov AG, Gololobova AV, Toshchakov SV, Novikov AA, Lebedinsky AV, 448
Kublanov IV. 2023. Obligate autotrophy at the thermodynamic limit of life in a new 449
acetogenic bacterium. Front Microbiol 14. doi:10.3389/fmicb.2023.1185739 450
Fujishiro T, Takaoka K. 2023. Class III hybrid cluster protein homodimeric architecture shows 451
evolutionary relationship with Ni, Fe-carbon monoxide dehydrogenases. Nat Commun 452
14:5609. doi:10.1038/s41467-023-41289-4 453
Gong W, Hao B, Wei Z, Ferguson DJ, Tallant T, Krzycki JA, Chan MK. 2008. Structure of the α2ε2 454
Ni-dependent CO dehydrogenase component of the Methanosarcina barkeri acetyl-CoA 455
decarbonylase/synthase complex. Proceedings of the National Academy of Sciences 456
105:9558–9563. doi:10.1073/pnas.0800415105 457
Gtari M, Ventura S. 2025. Proposal of Neomoorella gen. nov. as a replacement name for the 458
illegitimate prokaryotic genus name Moorella Collins et al. 1994. International Journal of 459
Systematic and Evolutionary Microbiology 75:006779. doi:10.1099/ijsem.0.006779 460
Hadj-Said J, Pandelia ME, Leger C, Fourmond V, Dementin S. 2015. The Carbon Monoxide 461
Dehydrogenase from Desulfovibrio vulgaris. Biochim Biophys Acta 1847:1574–83. 462
doi:10.1016/j.bbabio.2015.08.002 463
Hagen WR. 2022. Structure and function of the hybrid cluster protein. Coordination Chemistry 464
Reviews 457. doi:10.1016/j.ccr.2021.214405 465
Hocking WP, Roalkvam I, Magnussen C, Stokke R, Steen IH. 2015. Assessment of the Carbon 466
Monoxide Metabolism of the Hyperthermophilic Sulfate-Reducing Archaeon Archaeoglobus 467
fulgidus VC-16 by Comparative Transcriptome Analyses. Archaea 2015:235384. 468
doi:10.1155/2015/235384 469
Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: Reconstruction, Analysis, and Visualization of 470
Phylogenomic Data. Mol Biol Evol 33:1635–1638. doi:10.1093/molbev/msw046 471
Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, 472
Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, 473
functionally and phylogenetically annotated orthology resource based on 5090 organisms and 474
2502 viruses. Nucleic Acids Research 47:D309–D314. doi:10.1093/nar/gky1085 475
Inoue M, Nakamoto I, Omae K, Oguro T, Ogata H, Yoshida T, Sako Y. 2018. Structural and 476
Phylogenetic Diversity of Anaerobic Carbon-Monoxide Dehydrogenases. Front Microbi ol 477
9:3353. doi:10.3389/fmicb.2018.03353 478
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Inoue M, Omae K, Nakamoto I, Kamikawa R, Yoshida T, Sako Y. 2022. Biome-specific distribution 479
of Ni-containing carbon monoxide dehydrogenases. Extremophiles 26:9. doi:10.1007/s00792-480
022-01259-y 481
Inoue T, Takao K, Fukuyama Y, Yoshida T, Sako Y. 2014. Over-expression of carbon monoxide 482
dehydrogenase-I with an accessory protein co-expression: a key enzyme for carbon dioxide 483
reduction. Biosci Biotechnol Biochem 78:582–7. doi:10.1080/09168451.2014.890027 484
Jain S, Katsyv A, Basen M, Muller V. 2021. The monofunctional CO dehydrogenase CooS is essential 485
for growth of Thermoanaerobacter kivui on carbon monoxide. Extremophiles 26:4. 486
doi:10.1007/s00792-021-01251-y 487
Jeoung J-H, Dobbek H. 2007. Carbon Dioxide Activation at the Ni,Fe-Cluster of Anaerobic Carbon 488
Monoxide Dehydrogenase. Science 318:1461–1464. doi:10.1126/science.1148481 489
Jeoung JH, Fesseler J, Domnik L, Klemke F, Sinnreich M, Teutloff C, Dobbek H. 2022. A Morphing 490
[4Fe-3S-nO]-Cluster within a Carbon Monoxide Dehydrogenase Scaffold. Angew Chem Int 491
Ed Engl 61:e202117000. doi:10.1002/anie.202117000 492
Johnson ZI, Chisholm SW. 2004. Properties of overlapping genes are conserved across microbial 493
genomes. Genome Res 14:2268–2272. doi:10.1101/gr.2433104 494
Katayama YA, Kamikawa R, Yoshida T. 2024. Phylogenetic diversity of putative nickel -containing 495
carbon monoxide dehydrogenase -encoding prokaryotes in the human gut microbiome. 496
Microbial Genomics 10. doi:10.1099/mgen.0.001285 497
Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: 498
Improvements in Performance and Usability. Mol Biol Evol 30:772–780. 499
doi:10.1093/molbev/mst010 500
Kerby RL, Ludden PW, Roberts GP. 1997. In vivo nickel insertion into the carbon monoxide 501
dehydrogenase of Rhodospirillum rubrum: molecular and physiological characterization of 502
cooCTJ. Journal of Bacteriology 179:2259–2266. doi:10.1128/jb.179.7.2259-2266.1997 503
Kim M-S, Choi AR, Lee SH, Jung H-C, Bae SS, Yang T-J, Jeon JH, Lim JK, Youn H, Kim TW, Lee 504
HS, Kang SG. 2015. A Novel CO-Responsive Transcriptional Regulator and Enhanced H2 505
Production by an Engineered Thermococcus onnurineus NA1 Strain. Applied and 506
Environmental Microbiology 81:1708–1714. doi:10.1128/AEM.03019-14 507
Knox HL, Allen KN. 2023. Expanding the viewpoint: Leveraging sequence information in 508
enzymology. Current Opinion in Chemical Biology 72:102246. 509
doi:10.1016/j.cbpa.2022.102246 510
Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or 511
nucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158 512
Li W, Jaroszewski L, Godzik A. 2002. Tolerating some redundancy significantly speeds up clustering 513
of large protein databases. Bioinformatics 18:77–82. doi:10.1093/bioinformatics/18.1.77 514
Li W, Jaroszewski L, Godzik A. 2001. Clustering of highly homologous sequences to reduce the size 515
of large protein databases. Bioinformatics 17:282–283. doi:10.1093/bioinformatics/17.3.282 516
Liew F, Henstra AM, Winzer K, Köpke M, Simpson SD, Minton NP. 2016. Insights into CO2 517
Fixation Pathway of Clostridium autoethanogenum by Targeted Mutagenesis. mBio 518
7:10.1128/mbio.00427-16. doi:10.1128/mbio.00427-16 519
Lindahl PA, Chang B. 2001. THE EVOLUTION OF ACETYL-CoA SYNTHASE. Orig Life Evol 520
Biosph 31:403–434. 521
Matschiavelli N, Oelgeschläger E, Cocchiararo B, Finke J, Rother M. 2012. Function and Regulation 522
of Isoforms of Carbon Monoxide Dehydrogenase/Acetyl Coenzyme A Synthase in 523
Methanosarcina acetivorans. J Bacteriol 194:5377–5387. doi:10.1128/JB.00881-12 524
Matson EG, Gora KG, Leadbetter JR. 2011. Anaerobic Carbon Monoxide Dehydrogenase Diversity in 525
the Homoacetogenic Hindgut Microbial Communities of Lower Termites and the Wood 526
Roach. PLOS ONE 6:e19316. doi:10.1371/journal.pone.0019316 527
Merrouch M, Benvenuti M, Lorenzi M, Leger C, Fourmond V, Dementin S. 2018. Maturation of the 528
[Ni-4Fe-4S] active site of carbon monoxide dehydrogenases. J Biol Inorg Chem 23:613–620. 529
doi:10.1007/s00775-018-1541-0 530
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. 531
2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the 532
Genomic Era. Molecular Biology and Evolution 37:1530–1534. doi:10.1093/molbev/msaa015 533
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Pedersen TL. 2025. patchwork: The Composer of Plots. 534
Price MN, Dehal PS, Arkin AP. 2010. FastTree 2 – Approximately Maximum-Likelihood Trees for 535
Large Alignments. PLOS ONE 5:e9490. doi:10.1371/journal.pone.0009490 536
R Core Team. 2023. R: A Language and Environment for Statistical Computing. 537
Sayers E. 2022. A General Introduction to the E-utilitiesEntrez Programming Utilities Help [Internet]. 538
National Center for Biotechnology Information (US). 539
Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File 540
Manipulation. PLOS ONE 11:e0163962. doi:10.1371/journal.pone.0163962 541
Smith KS, Ingram-Smith C. 2007. Methanosaeta, the forgotten methanogen? Trends in Microbiology 542
15:150–155. doi:10.1016/j.tim.2007.02.002 543
Soboh B, Linder D, Hedderich R. 2002. Purification and catalytic properties of a CO -oxidizing:H2-544
evolving enzyme complex from Carboxydothermus hydrogenoformans. Eur J Biochem 545
269:5712–21. doi:10.1046/j.1432-1033.2002.03282.x 546
Techtmann SM, Lebedinsky AV, Colman AS, Sokolova TG, Woyke T, Goodwin L, Robb FT. 2012. 547
Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases. Front 548
Microbiol 3:132. doi:10.3389/fmicb.2012.00132 549
Terlesky KC, Ferry JG. 1988. Ferredoxin requirement for electron transport from the carbon monoxide 550
dehydrogenase complex to a membrane-bound hydrogenase in acetate-grown Methanosarcina 551
thermophila. Journal of Biological Chemistry 263:4075–4079. doi:10.1016/S0021-552
9258(18)68892-1 553
Waite DW, Chuvochina M, Pelikan C, Parks DH, Yilmaz P, Wagner M, Loy A, Naganuma T, Nakai 554
R, Whitman WB, Hahn MW, Kuever J, Hugenholtz P. 2020. Proposal to reclassify the 555
proteobacterial classes Deltaproteobacteria and Oligoflexia, and the phylum 556
Thermodesulfobacteria into four phyla reflecting major functional capabilities. International 557
Journal of Systematic and Evolutionary Microbiology 70:5972–6016. 558
doi:10.1099/ijsem.0.004213 559
Wang L-G, Lam TT-Y, Xu S, Dai Z, Zhou L, Feng T, Guo P, Dunn CW, Jones BR, Bradley T, Zhu H, 560
Guan Y, Jiang Y, Yu G. 2020. Treeio: An R Package for Phylogenetic Tree Input and Output 561
with Richly Annotated and Associated Data. Molecular Biology and Evolution 37:599–603. 562
doi:10.1093/molbev/msz240 563
Weghoff MC, Müller V. 2016. CO Metabolism in the Thermophilic Acetogen Thermoanaerobacter 564
kivui. Applied and Environmental Microbiology 82:2312–2319. doi:10.1128/AEM.00122-16 565
Welte C, Deppenmeier U. 2011. Membrane-Bound Electron Transport in Methanosaeta thermophila. 566
Journal of Bacteriology 193:2868–2870. doi:10.1128/jb.00162-11 567
Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry 568
L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel 569
D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. 2019. Welcome to the 570
Tidyverse. JOSS 4:1686. doi:10.21105/joss.01686 571
Wilkins D. 2023. gggenes: Draw Gene Arrow Maps in “ggplot2.” 572
Wittenborn EC, Guendon C, Merrouch M, Benvenuti M, Fourmond V, Leger C, Drennan CL, 573
Dementin S. 2020. The Solvent-Exposed Fe-S D-Cluster Contributes to Oxygen-Resistance in 574
Desulfovibrio vulgaris Ni-Fe Carbon Monoxide Dehydrogenase. ACS Catal 10:7328–7335. 575
doi:10.1021/acscatal.0c00934 576
Wu M, Ren Q, Durkin AS, Daugherty SC, Brinkac LM, Dodson RJ, Madupu R, Sullivan SA, Kolonay 577
JF, Haft DH, Nelson WC, Tallon LJ, Jones KM, Ulrich LE, Gonzalez JM, Zhulin IB, Robb 578
FT, Eisen JA. 2005. Life in hot carbon monoxide: the complete genome sequence of 579
Carboxydothermus hydrogenoformans Z-2901. PLoS Genet 1:e65. 580
doi:10.1371/journal.pgen.0010065 581
Xu S, Dai Z, Guo P, Fu X, Liu S, Zhou L, Tang W, Feng T, Chen M, Zhan L, Wu T, Hu E, Jiang Y, 582
Bo X, Yu G. 2021. ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic 583
Data. Molecular Biology and Evolution 38:4039–4042. doi:10.1093/molbev/msab166 584
Yu G, Lam TT-Y, Zhu H, Guan Y. 2018. Two Methods for Mapping and Visualizing Associated Data 585
on Phylogeny Using Ggtree. Molecular Biology and Evolution 35:3041–3043. 586
doi:10.1093/molbev/msy194 587
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. 2017. ggtree: an r package for visualization and 588
annotation of phylogenetic trees with their covariates and other associated data. Methods in 589
Ecology and Evolution 8:28–36. doi:10.1111/2041-210X.12628 590
Zhou A, Chen YI, Zane GM, He Z, Hemme CL, Joachimiak MP, Baumohl JK, He Q, Fields MW, 591
Arkin AP, Wall JD, Hazen TC, Zhou J. 2012. Functional Characterization of Crp/Fnr-Type 592
Global Transcriptional Regulators in Desulfovibrio vulgaris Hildenborough. Applied and 593
Environmental Microbiology 78:1168–1177. doi:10.1128/AEM.05666-11 594
Zhu Q. 2023. WoL: Reference Phylogeny for Microbes. WoL. https://biocore.github.io/wol/ 595
Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, 596
Kopylova E, McDonald D, Kosciolek T, Yin JB, Huang S, Salam N, Jiao J -Y, Wu Z, Xu ZZ, 597
Cantrell K, Yang Y, Sayyari E, Rabiee M, Morton JT, Podell S, Knights D, Li W -J, 598
Huttenhower C, Segata N, Smarr L, Mirarab S, Knight R. 2019. Phylogenomics of 10,575 599
genomes reve als evolutionary proximity between domains Bacteria and Archaea. Nat 600
Commun 10:5477. doi:10.1038/s41467-019-13443-4 601
602
603
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.