Investigating the native functions of [NiFe]-carbon monoxide dehydrogenases through genomic context analysis

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 62,815 characters · extracted from oa-pdf · 9 sections · click to expand

Abstract

7 Carbon monoxide dehydrogenases containing nickel-iron active sites ([NiFe]-CODHs) catalyze the 8 reversible oxidation of CO to CO₂, representing key targets for biocatalytic CO₂ red uction. Despite 9 dramatic differences in catalytic rates and O₂ tolerance between CODH variants, the molecular basis 10 for this functional diversity remains poorly understood. We applied comparative genomics and 11 synteny analysis to investigate the biochemical roles of CODH clades A -F using 1,376 CODH and 12 1,545 hybrid cluster protein sequences. Around ~30% of genomes encode multiple CODH isoforms . 13 Analysis revealed distinct gene clustering patterns correlating with biochemical function. Clades A, E, 14 and F exhib it a degree of distributional exclusivity. Clades C and D frequently co -occur with active 15 CODHs, suggesting auxiliary roles. Operon architecture analysis revealed functional specialization: 16 clade A links to acetyl -CoA synthase; clades A, E, F contain essen tial maturation machinery (CooC, 17 CooJ, CooT) correlating with catalytic activity; clade B associates with transporters; clade C with 18 electron transfer partners; clade D with transcriptional regulators. High CODH -HCP co -occurrence 19 (except clade A) suggests environmental interdependency. These findings establish clades A, E, F as 20 primary biocatalyst targets while defining regulatory functions for clades C, D, providing a genomics 21 framework for predicting CODH phenotypes. 22

Introduction

23 Genomic enzymology has been proven to help understand protein (super) families since the mid-24 1990s, helping to connect enzyme sequences to function through comparative genomics and 25 neighborhood analysis (Babbitt et al., 1996; Knox and Allen, 2023) . In this study, we are employing a 26 genome neighborhood and co -occurrence analysis to help understand reactivity and functionality of 27 the family of nickel containing carbon monoxide dehydrogenases ( [NiFe]-CODHs) and their 28 relationship to hybrid cluster proteins (HCPs). 29 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint [NiFe]-CODHs are ancient and diverse enzymes that catalyze the interconversion between carbon 30 monoxide (CO) and carbon dioxide (CO₂), a reaction of high interest for biotechnological applications, 31 including CO₂ capture and conversion. Research on th is enzyme spans over 60 years, and recent 32 studies have provided important biochemical insights, such a s their turnover frequency and oxygen 33 tolerance (Can et al., 2014) . These properties vary greatly between enzymes, not only between 34 separate phylogenetic clades, but also within clades, making functional prediction from sequence 35 alone challenging. Phylogenetic analyses of available [NiFe]-CODH (hereafter referred to as CODH) 36 sequences with different focus such as gene transfer (Techtmann et al., 2012), primary structure (Inoue 37 et al., 2018) , biodistribution (Inoue et al., 2022) and human gut microbiome (Katayama et al., 2024) 38 have enriched our understanding of this old and diverse enzyme family. From initially small data sets 39 of 17 sequences (Lindahl and Chang, 2001) to datasets well above 5000 sequences (this study). It has 40 been shown that up to eight distinct phylogenetic clades (Figure 1 ) can be distinguished with all of 41 them having sequence variations while preserving the overall fold , as seen with cryo -electron 42 microscopy (Biester et al., 2024) and x-ray crystallography (Basak et al., 2025; Domnik et al., 2017; 43 Gong et al., 2008; Jeoung et al., 2022; Jeoung and Dobbek, 2007; Wittenborn et al., 2020). 44 The biochemical characterization of this enzyme family is still ongoing, and it shows a wide range of 45 turnover frequencies as well as different degrees of O2 tolerance. For example, looking at two CODHs 46 from Carboxydothermus hydrogenoformans: ChCODH-II, a benchmark CODH known for its high CO 47 oxidation activity but low O2 tolerance, contrasts with ChCODH-IV — another enzyme from the same 48 Figure 1. Schematic phylogenetic tree of [ NiFe]-CODH, with selected CODHs marked with their respective position. Tree was build using iqtree 2, 1000 ultrafast bootstrap, containing 5508 putative CODH sequences and one outgroup (MBE6442607.1 hydroxylamine reductase [ Desulfovibrio desulfuricans ]) for rooting. Detailed searchable tree with bootstrap values can be found in Supplementary File 9_Tree5. .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint clade and organism — which retains 20% of activity after 1 h of O₂ exposure but displays reduced CO 49 oxidation capacity with an increased activation barrier and much lower KM (Domnik et al., 2017), both 50 belonging to clade F (Fig. 1). Similarly, a less active CODH from clade E, NvCODH (formerly known 51 as DvCODH) from Nitratidesulfovibrio vulgaris , has been reported to fully reactivate after initial 52 inactivation by O₂ exposure (Hadj-Said et al., 2015) . Also, the two CODH s from Thermococcus sp. 53 AM4, TcCODH-I and TcCODH-II belonging to clade E react slower with O2 compared to ChCODH-54 II, but have overall equal O2 sensitivity (Benvenuti et al., 2020). 55 In addition to the previously mentioned diversity within CODH ’s clades or organisms, with regards to 56 activity and oxygen tolerance , i t is known that some CODH s rely on maturases for full activation 57 while others do not. For example, RrCODH, from the phototroph Rhodospirillum rubrum, needs to be 58 expressed together with three maturases (CooC, CooJ and CooT) in order to be isolated in an active 59 form (Kerby et al., 1997). A similar situation arises for NvCODH, however, its genomic neighborhood 60 (Fig. 2) only contains one maturase (CooC) which is required for active production (Hadj-Said et al., 61 2015). On the contrary, ChCODH-II can be heterologously expressed without co -expression of any 62 maturases (Merrouch et al., 2018) . Also, ChCODH-I needs to be co -expressed with CooC in order to 63 reach high activity but it can also be expressed without it, albeit with reduced activity (Inoue et al., 64 2014). Interestingly, much of th e diversity with regards to activity, O 2 tolerance and maturase 65 dependence does not only occur between the different clades but also within them. 66 Due to the homology between the CODHs in this study and the fact that active CODHs have been 67 demonstrated from several of the clades, it is reasonable to assume that CODHs from all clades are 68 able to interconvert CO2/CO. However, a recent study by Dobbek and co -workers showed that 69 Carboxydothermus hydrogenoformans CODH-V (ChCODH-V) from clade D was not able to perform 70 this reaction (Jeoung et al., 2022) . They showed that this enzyme has a closer similarity to the family 71 of hybrid cluster proteins (HCPs), due t o its morphing active site , composed if iron, sulphur and 72 Figure 2. Operons of selected [NiFe ]-CODH. * NtCODH formally known as MtCODH form erly known as CtCODH due to renaming of host organism (Gtari and Ventura, 2025) . ** NvCODH form erly known as DvCODH due to renaming of host organism (Waite et al., 2020). .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint oxygen, responding with structural and stoichiometric changes t o changes in its redox state. A 73 connection between HCPs and CODHs has been pointed out previously by Inoue et al. (Inoue et al., 74 2018) due to their close phylogenetic relationship , and was further discussed by Fujishiro et al. 75 (Fujishiro and Takaoka, 2023) . HCPs can be divided in three phylogenetic classes, of which class III 76 exhibits a homodimeric structure like CODH. Generally, t he two enzyme fami lies share a similar 77 overall fold while their active sites differ grea tly, both in terms of amino acid and metallocofactor 78 composition. Similar to ChCODH-V, HCPs do not catalyze CO 2/CO interconversion but they do 79 display a range of activities at low rate, such as hydroxylamine reductase-, peroxidase-, nitric oxide 80 reductase- and S-nitrosylase activity. The main natural function of HCPs is debated but it was recently 81 established that it is most likely a nitric oxide reductase involved in nitric oxide detoxif ication (Hagen, 82 2022). 83 In this study, we contribute to paint a wholistic phylogenetic picture of CODHs by focusing on the 84 analysis of their genetic environment as well as harnessing the concept of synteny in which we use a 85 semi quantitative approach to predict characteristics of CODH, clade and subclade specific. We are 86 presenting certain clade specific trends in the operon composition in CODH . Since it is known that 87 many organisms have multiple isoforms of CODHs coded in their genome , we analyze the co-88 occurrence of CODH of different clades in an organism, as well as the co-occurrences of CODH and 89 HCP. With our findings we want to propose a systematic approach in the analysis of new CODH, with 90 the focus on identifying promising CO2 reduction catalysts, suitable for biotechnological application. 91

Results

92 Co-occurrence and Correlation. After evaluating the assemblies in regard to their count of CODH, it 93 was seen that around 30% of all assemblies encode for more than one CODH. For HCP this number is 94 much smaller, around 6%. As can be seen from Fig. 3 A, the occurrence of multiple isoforms from 95 specific clades within organisms varies. Clades B, C and D almost exclusively occur only once within 96 a genome, while clades A, E and F are more likely to co -occur with another isoform from th e same 97 clade. The overall trend that we o bserve is most likely underrepresenting the number of genomes 98 encoding multiple CODHs, since incomplete genomes are also included in these analyses . When 99 calculating the correlation of the co-occurrence of CODH from two different clades in one organism, a 100 pattern evolves (Fig. 3 B). Most obvious is the lack of co -occurrence of clades A, E and F with each 101 other. Furthermore, clade B CODH s seem to have little co -occurrence with other clades as well. 102 However, clades C and D more often co-occur with CODH from other clades, especially A, E and F. 103 Clades C and D also have a higher probability to co -occur with each other. As outlined in the 104 introduction, from biochemical studies it is known, that CODH s from clade A , E, and F are active 105 whereas CO2/CO interconversion activity is missing in CODH from clade D. This co -occurrence 106 might suggest that the redox sensing properties of CODH from clade D (and potentially clade C) are 107 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint useful for organisms already containing a functional CODH. Interestingly, a high co-occurrence was 108 also seen for CODH and HCP, with an exception for clade A CODH. 109 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint Neighbor Analysis. We semi-quantitively evaluated the operon composition of 1351 CODHs (121 A, 110 130 B, 168 C, 253 D, 434 E, 245 F, Supplementary File 5_Tree1 and Supplementary File 6_Tree2 ) 111 with proteins which function we could predict using eggNOG (Huerta-Cepas et al ., 2019), the NCBI 112 product prediction and manual curation. The results are summarized in Fig. 4. In the following we 113 only report on neighbors that are encoded in the same operon as more than 10% of CODHs per clade 114 (see Table S1). Starting with CODHs fro m clade A , 93% contain a one carbon pool r elated gene in 115 their operon, followed by CooC (62%) and iron-sulfur (FeS) cluster containing protein ( 31%). As one 116 carbon pool r elated gene , we defined genes associated either with direct conversion of one carbon 117 Figure 3. (A) Count of organisms that contain one ore multiple [ NiFe]-CODH. (B) Probability matrix of co -occurrence of [NiFe]-CODH from different clades in one organism. Raw data can be found in Supplementary File 1_Table S1. .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint compounds such as format e dehydrogenases, or with the W ood-Ljungdahl pathway. Clade B CODH 118 operons mainly encode ABC transporter associated genes (64%). Furthermore, almost a quarter (24%) 119 of all CODHs from clade B could not be associated with any neighbor, and 12% are coded close to 120 transcriptional regulators. For CODH from clade C , the three main neighbors are proteins associated 121 with FeS cluster containing proteins (such as CooF) (72%), NAD(P) or FAD dependent 122 oxidoreductases (71%) and transcription (58%) or other (10%) regulation. The overall diversity of 123 neighboring proteins from clade D , and the fact that a major part of those CODH s seemingly do not 124 encode close to any other genes (64%) made it challenging to sum up their different codons, and no 125 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint clear pattern could be observed. Only transcription regulation proteins (9.9%) and general regulatory 126 proteins (9.5%) are worth mentioning in this context. Clade E and F both have a larger set of proteins 127 frequently observed in their associated operons. Operons encoding either clade E or F CODH contain 128 CooC like genes ( 59% E, 68% F), one carbon pool associated genes (49% E, 37% F) and FeS genes 129 (29% E, 53% F). Transcription regulators (17% E, 35% F) and NAD(P)/FAD -dependent 130 oxidoreductase (22% E, 42% F) have also been found. The maturation protein CooT was exclusively 131 found in operons from clade E (16%) and F (6.1%). The same holds true for CooJ but in clade F, CooJ 132 was seen in even fewer operons (12% E, <5.0% F). Additional Hydrogenases (25%) and their 133 Figure 4. (A) Phylogenetic tree of putative [NiFe]-CODH unrooted with 1376 sequences, color-coded if operon contains one or more of a certain type of protein. Detailed searchable tree with bootstrap values can b e found in Supplementary File 5_ Tree1 and Supplementary File 6_Tree 2. (B) Distribution of operon size for CODH and HCP genes. (C) Proportion of [NiFe] -CODH from one clade being coded near a certain type of protein. Raw data can be found in Supplementary File 1_Table S1. .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint maturation machinery (17 %) are coded primarily for clade F CODH, as well as different types of 134 transporter proteins (11%). 135 A similar analysis of the operons encoding for HCP was performed and a total of 1476 HCP genes 136 were analyzed (class I: 1049, class II: 23, class III: 404, Supplementary File 7_Tree3 and 137 Supplementary File 8_Tree4 ) showing a low frequency of isoforms within organisms (Fig . S1). HCP 138 exhibited a large variety of neighbors, leading to difficulties in extracting meaningful information 139 from their operon composition (Fig. S2). Furthermore, class I and III had a high proportion of entries 140 without any neighbors (49% and 75%, respectively) which is reflected in their tendency to have fewer 141 proteins coded in their operons (see Fig. 4 B). However, we observed a high frequency of FeS cluster 142 proteins (18%) and transcription regulators ( 17%) for class I, as well as NAD(P) or FAD 143 oxidoreductases (96%) and transport proteins (78%) for class II HCP. It needs to be noted , however, 144 that our sample set for class II HCP is very small , so its information value is considerably lower 145 compared to the other classes/clades. 146

Discussion

147 Our analysis reveals substantial diversity in the occurrence, co-occurrence, and genomic context of 148 CODH and HCP genes, suggesting complex evolutionary and functional relationships within and 149 across microbial lineages. The observed differences in the frequ ency of multiple isoforms per genome 150 (~30% for CODH versus ~6% for HCP ) indicate that CODH s are more often retained in multiple 151 copies, potentially pointing to functional diversification among its isoforms. Similar values have been 152 reported by Techtmann et al., where they found that a striking 43% of organisms coded for more than 153 one CODH (Techtmann et al., 2012). On the other hand, Katayama et al., investigating only the human 154 gut microbiome found a number as low as 5.5% (Katayama et al., 2024). We suspect that this number 155 underrepresents the amount of organisms carrying multiple isoforms of CODH in the human gut, since 156 data refinements that exclude potential CODHs were performed (such as the strict requirement for a 157 [4Fe-4S] D-cluster, even though Inoue et al. reporte d on the diversity of the D -cluster(Inoue et al., 158 2018)). 159 There are many examples of organisms coding for multiple CODH isoforms, as outlined in Fig. 3 160 (Supplementary File 10_Tree6 and Supplementary File 11_Tree7) . Many of them have been known to 161 literature for a long time (even though their CODH abundance has only been discussed sporadically ) 162 with the most famous example being C. hydrogenoformans encoding five different CODHs (Wu et al., 163 2005). Another interesting example is Clostridium formicoaceticum, since this organism has a total of 164 six CODH isoforms encoded in its genome. It needs to be noted, that in our analysis, this organism did 165 not show up as an organism with six CODHs, see Fig. 3 A. This is due to our analysis only counting 166 CODH stemming from the same organism when their genes are associated with the same genome 167 assembly. We therefore rather u nderestimate counts of organism with multiple CODHs, such as the 168 aforementioned. The only CODH from C. formicoaceticum that has so far been isolated, characterized 169 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint and discussed is one of its clade E CODHs, that is associated with acetyl-CoA synthase (ACS) (Bao et 170 al., 2019; Diekert an d Thauer, 1978) . Other examples from literature attempted to investigate the 171 influence of CODH isoforms on the metabolism. Archaeglobus fulgidus contains three CODH genes, 172 two fr om clade A and one from clade D. Its clade D CODH seem s to have no role in the CO 173 metabolism of this organism (Hocking et al., 2015) . There also has been a report for the organism 174 Methanosarcina acetivorans , which harbors three CODH s, all of them belonging to clade A, where 175 only two of them ar e associated with ACS and are believed to be involved in the CO metabolism, the 176 other one being a lone gene and is seemingly not involved in carbon metabolism (Matschiavelli et al., 177 2012). Another interesting example is Thermoanaerobacter kivui, formerly known as Acetogenium 178 kivui (COLLINS et al., 1994) . When TkCODH-I (clade C) is deleted from the orga nism, the strain 179 loses its ability to grow slowly on CO, however , if grown on H 2+CO2 the overall acetate production is 180 greatly increased (Jain et al., 2021) . Similar effects have been shown for Clostridium 181 autoethanogenum, which contains three isoforms of CODH, and if its clade C CODH is deleted, it’s 182 lag phase is reduced and its growth rate is greatly increased (Liew et al., 2016) . The other two CODH 183 isoforms from this organism are from clade E, and D. Deletion of clade D CODH showed no 184 immediate effect on the organism, except moderately lower overall biomass yield (Liew et al., 2016) . 185 In our analysis we saw an increased frequency of co -occurrence of clades C and D with A, E, or F , 186 which together with biological data, may indicate a complementary role, possibly linked to redox 187 sensing or regulatory functions. This is especially evident with the examples for clade C CODHs from 188 T. kivui and C. autoethanogenum. For the case of clade D, which lacks catalytic activity towards 189 CO/CO2 interconversion (as reported by Jeoung et al. (Jeoung et al., 2022) through their recombinant 190 production of ChCODH-V) and until now has unknown influence in the metabolism that might only 191 manifest in harsher environments, since its believed to be involved in stress response (Jeoung et al., 192 2022) (similar to HCPs(Hagen, 2022), see below). However, experimental proof for this claim is still 193 missing. 194 Furthermore, clades A, E, and F rarely co-occur. Interestingly, many organisms do however contain 195 multiple copies of CODH s from one of these clades, such as M. acetivorans (clade A), 196 C. hydrogenofromans (clade F), and C. formicoaceticum (clade E). We suspect an evolutionary reason 197 behind this, as is also outlined by others (Adama et al., 2018; Lindahl and Chang, 2001) . Biochemical 198 data indicate that CODHs from these clades possess CO/CO₂ interconversion capability, as previously 199 mentioned. This is also in line with the genetic context of these CODH s, which is most often tuned for 200 this CO/CO2 interconversion chemistry (Fig. 4 C), see below. 201 The high rate of co-occurrence between CODH and HCP genes (except for clade A) suggests 202 functional integration, a shared metabolic niche, or involvement in a coordinated response to redox 203 stress, given that HCPs are thought to regulate nitric or oxidative stress(Hagen, 2022). The lack of co-204 localization for clade A CODHs might point to distinct metabolic roles or evolutionary constraints , or 205 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint on the high rate of archaeal genes in clade A CODHs, even though HCP genes are known to also be 206 found in archaea (Hagen, 2022) . Interestingly, the co -occurrence between clade D CODH and HCP 207 seems the highest for our data set, the reason for that is unclear . Similar to clade D CODH, empirical 208 proof that HCP influences the activity or expression of CODH is missing. 209 The genomic context analysis adds another layer of functional inference, as has been done before by 210 others with different foci (Inoue et al., 2018; Katayama et al., 2024; Matson et al., 2011; Techtmann et 211 al., 2012). We could show again that o perons containing clade A CODHs are highly conserved with 212 one carbon pool-related genes and CooC, as they are almost exclusively found as part of the Wood-213 Ljungdahl pathway , which has a typical arrangement similar to Methanosarcina barkeri s CODH 214 (MbCODH, Fig 2, S upplementary File 6_ Tree2). Recently, another representative from this group 215 from M. thermophila (MetCODH) has been resolved (Biester et al., 2024). 216 In contrast, clade B CODHs appear largely alone or associated with transport-related genes, raising the 217 possibility of a non -canonical or even degenerated function. Its operon composition is also rather 218 consistent and its arrangements only varies to a small extent , as ABC transporters are either coded 219 upstream (as for Ruminococcus flavefaciens ’ CODH, RfCODH, see Fig. 2 ) or downstream of the 220 CODH gene. Almost all operons analyzed do not contain any maturases, expect for a small cluster that 221 branches off rather early in the tree (Supplementary File 6_ Tree2). This might indicate that the need 222 for a maturase was lost due to re-purposing of the CODH. We yet await biochemical characterization 223 of any clade B CODH. 224 Clade C CODHs are associated with FeS cluster proteins , regulators and redox enzymes, pointing 225 towards more regulatory or redox -modulatory roles , which could also be indicated in knock -out 226 studies (Liew et al., 201 6). The only isolated example from this clade is TkCODH-I. Its operon 227 exhibits a composition only partially representable for clade C CODHs , contai ning only one other 228 gene coding for a FeS protein (Fig. 2). Furthermore, TkCODH-I’s sequence branches of f early and 229 seems to be rather distinct (Supplementary File 9_ Tree5), only having one other close relative from 230 Aceticella autotrophica (Frolov et al., 2023) . Furthermore, the reported isolated CODH from Jain and 231 co-workers (Jain et al., 2021) stems from a CO adapted strain (Weghoff and Müller, 2016) , which 232 might harbor mutation s in the protein sequences that are not accessible to us at the moment. Drawn 233 together, we conclude that right now TkCODH-I might not be a n optimal representative for clade C 234 CODHs and more clade C CODHs should be isolated to help us understand their biochemical 235 properties better. 236 The high operonic variability and frequency of solitary coding regions in clade D might reflect either 237 evolutionary drift or multifunctionality not restricted to operonic structure. Clade D will therefore not 238 be discussed further. 239 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint Operons from clades E and F are more functionally complex, including components from the Wood-240 Ljungdahl pathway, hydrogenases, and additional redox partners, consistent with a diverse metabolic 241 role. That being said, their operon compositions and arrangement s showcased some interesting 242 clustering. Starting with clade F, which has the highest proportion of CODHs that might be associated 243 with hydrogenases, including the aforementioned ChCODH-I and RrCODH, both interacting with 244 hydrogenases to produce hydrogen in vivo (Fox et al., 1996; Soboh et al., 2002) , however with greatly 245 differing operon compositions (Fig. 2) . ChCODH-I like operons (see Supplemetary File 6_ Tree2) 246 contain their hydrogenase modules directly within the CODH operon, whereas RrCODH-like operons 247 do not include the hydrogenase module which is coded upstream of the CODH gene , with an 248 intergenic space of > 400 bp(Fox et al., 1996). RrCODH-like operons are the only clade F operons that 249 include two additional maturation enzymes , CooT and CooJ. Clade F also contain many ACS 250 associated CODHs, such as Neomoorella thermoacetica CODH, NtCODH, formerly known as 251 Moorella thermoacetica (Gtari and Ventura, 2025) . Those NtCODH-like operons all have the same 252 arrangement. This arrangement is distinct from Clade A and E ACS associated CODH operons. 253 In our dataset most of the hydrogenases and their maturation genes are associated with clade F, 254 suggesting active hydrogen metabolism, coupling CO oxidation to H₂ production or consumption as 255 has been suggested earlier for a wider range of CODH clades (Inoue et al., 2018; Techtmann et al., 256 2012). We believe that our data grossly underestimates this relat ionship overall, since operon 257 examples such as RrCODH and TcCODH-II showcase that hydrogenases associated with a CODH are 258 not necessarily encoded in the same operon. There has also been a report of a clade A CODH from the 259 methanogen M. thermophila (Terlesky and Ferry, 1988) being associated with a hydrogenase. 260 However, in later studies it was shown that the genome of M. thermophila does not contain a 261 hydrogenase (Smith and Ingram-Smith, 2007) . Investigation of the electron transport chain of its 262 membrane could not find a hydrogen oxidizing complex (Welte and Deppenmeier, 2011) . Together 263 with our analysis we conclude that hydrogenase association is a trait almost exclusive to clade E and F 264 CODHs. ChCODH-II’s operon seems to be rather uniquely constructed as a similar operon 265 composition only can be found for other Carboxydothermus species. A similar situation can be seen 266 for ChCODH-IV, where its operon containing FeS and NAD/FAD -dependent oxidoreductases is 267 closer in similarity to some clade E CODH. 268 Regarding the biggest clade, clade E, its diversity is striking . NvCODH, formerly known as DvCODH 269 (Waite et al., 2020) , from the organism Nitratidesulfovibrio vulgaris , has a very small operon with 270 only two genes in its close proximity, a transcriptional regulator (Zhou et al., 2012) and a maturation 271 enzyme (CooC), see Fig. 2 . This is seen for a huge number of both clade E and F CODHs. The 272 occurrence of neither CooJ nor CooT is striking, both only appear in two very distinct parts of clade E, 273 all of them being associated with one carbon pool metabolism, with one exce ption from Clostridium 274 pasteurianum BC1, which more resembles the clade F RrCODH-like operon . The previously 275 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint introduced archaeal CODHs, TcCODH-I and TcCODH-II from clade E, are contained in operons (Fig. 276 2), that are rather specific and only found for a few other Thermococcus or Pyrococcus species 277 (Benvenuti et al., 2020; Kim et al., 2015) . It needs to be noted, that TcCODH-I’s CooC gene is coded 278 outside of its operon, and on the opposite strand . The CooC gene is therefore not inc luded in our 279 analysis. We are only aware of examples from this type of operon, and don’t expect this to be a 280 common trait of the CODH maturation machinery. However, it needs to be noted that the CooC 281 proportion might be slightly underestimated. Interestingly, TcCODH-II like CODH all contain a CooT 282 like protein in their operon, forming the only cluster of CODH that contain only CooT like proteins 283 without CooJ . Within clade E, another unique genomic neighborhood from TkCODH-II must be 284 pointed out. From experimental data it is known that this CODH is associated with ACS (Jain et al., 285 2021), however, in our analysis we did not see this ACS complex in TkCODH-II’s operon. This is due 286 to the ACS subunit being coded further downstream of the CODH gene, not being taken into account 287 due to our initial parameters. 288 For HCPs, the high variability and low operon density — especially in classes I and III — point 289 towards more modular or conditionally expressed roles , similar to clade D CODH . The clear 290 patterning in class II operons, though based on a limited sample, may reflect specialized functions, 291 perhaps in niche-specific oxidoreductase activities. 292

Conclusion

293 As previously mentioned, the aim of this study is to identify which CODH clades harbor the most 294 promising enzymes for future application in CO 2 reduction. The operon composition of CODHs from 295 different clades show distinct differences and what we could gather from this information is that clade 296 A, E and F are the most likely clades to harbor CODHs able to efficiently convert CO2 to CO. These 297 clades are therefore the most interesting for CO2 reducing biotechnological applications, or as 298 inspiration for new synthetic catalysts. Also, literature has shown that the activity of many CODHs 299 depend on co -expression with maturation proteins such as CooC, and in some cases, CooJ and CooT 300 are also required for full activation. Although some CODHs (most notably CODH -II from C. 301 hydrogenoformans) can function independently of maturases, our neighborhood analysis indicates that 302 maturase-coding genes are predominantly found in operons from clades A, E, and F. This pattern 303 implies yet again that these clades may represent more biochemically acti ve or catalytically optimized 304 CODHs, making them promising targets for future functional studies and biotechnological 305 applications. 306 The function of Clade B could not be deduced based on its genomic environment but it seems to have 307 a remarkable self-standing function, that is not shared with any other CODHs. Its low co -occurrence 308 with other CODH clades within organisms also supports a unique role for Clade B . Clades C and D 309 are more likely to show low or even no activity towards CO 2/CO interconversion, as was deduced 310 from literature and the lack of C1 metabolism related genes in their operon s. However, Jain et. al. 311 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint recently showed low CO oxidation activity in a clade C CODH fro m T. kivui , but this enzyme 312 originated from a strain that had acquired the ability to grow on CO through laboratory evolution (Jain 313 et al., 2021) . Sequence data used to classify the CODH into clade C was from the original strain 314 (incapable of growing on CO) and data on the engineered strain is n ot available. It is therefore not 315 known whether the active CODH is the wild -type or an engineered enzyme and we cannot draw any 316

Conclusions

regarding the activity of clade C CODH s. Taken together, this makes clades B, C and D 317 less promising in the hunt for CO2 reduction catalysts. However, much is still unknown about these 318 enzymes such as their cellular function. 319 Future work should focus on experimental validation of the functional differences between CODH 320 isoforms, particularly in organisms where multiple clades co-occur. Additionally, transcriptomic and 321 proteomic studies could illuminate condition -dependent expression patterns and confirm proposed 322 regulatory functions. Finally, deeper phylogenomic analyses may reveal the evolutionary drivers 323 behind the observed distribution and diversification of these ancient redox enzymes. 324

Methods

325 Data collection and refinement . Multiple pBLAST searches (BLOSUM62, E < 0.05) in the NCBI 326 database were carried out using NCBI accession numbers provided by Inoue et al. (Inoue et al., 2018) 327 (A-1, WP_011305243; A-2, WP_010878596; A-3, OGW06734; A-4, OIP92259; A-5, ODS42986; A-328 6, OIP30420; B -1, WP_026514536; B -2, WP_015485077; B -3,WP_012645460; B -4, 329 WP_011393470; C -1, WP_039226206; C -2, WP_013237576; C -3, WP_010870233; C -4, 330 WP_044921150; D -1, WP_011342982; D -2, WP_015926279; D -3, WP_079933214; D -4, 331 WP_096205957; E -1, WP_012571978; E -2, WP_010939375; E -3, WP_088535808; F -1, 332 WP_011343033; F -2, WP_011389181; G -1, OGP75751) and Techtmann et al. (Techtmann et al., 333 2012) (mini CooS, WP_007288589.1). CODH from clade H (Inoue et al., 2022) was omitted due to 334 limited host information. Duplicates were removed using seqkit’s(Shen et al., 2016) rmdup. Sequences 335 of length below 400 amino acids (aa) were remov ed. Clustering was performed to further reduce data 336 size, by using cd-hit(Li et al., 2002, 2001; Li and Godzik, 2006) and a global sequence identity of 99% 337 or 90%, the later only used for tree generation. It was necessary to have high sequence similarity in the 338 clustering within organisms, since it was known that some organisms have multiple CODH with 339 striking sequence similarities in their genome such as Clostridium pasteurianum BC1(taxid: 86416) 340 that contains WP_015614757.1 and WP_015615315.1 with 93.27% simila rity. For the dataset 341 involved in neighbor analysis , taxonomic information for each sequence was retrieved using R-342 packages taxize (Chamberlain et al., 2020; Chamberlain and Szocs, 2013) and taxizedb (Chamberlain 343 et al., 2025) , and only sequences that could be related to a recor ded organism were kept 344 (Supplementary File 3_Table S3 and Supplementary File 4_Table S4) . Sequences were aligned using 345 E-INS-I from mafft (Katoh and Standley, 2013) and sequences that had gaps in important positions 346 related to D, B or C cluster or acid base active site residues were sorted out. The alignment was 347 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint trimmed using trimAl’s (Capella-Gutiérrez et al., 2009) automated1 option and a tree was generated 348 using FastTree (Price et al., 2010) . Via visual inspection further sequences that were not CODH 349 sequences were removed. The final list of CODH sequences used in the neighbor and correlation 350 analysis counted 1376. A sim ilar approach was done for HCP ( class I, Q01770.2; class II, 351 WP_000458809.1; class III, WP_013294878.1) and a final count of 1545 sequences was collected for 352 neighbor and correlation analysis. Another set of CODH genes was curated using the 90% cd -hit cut-353 off. It was aligned using mafft’s FFT-NS-1, sequences that had gaps in important positions related to 354 D, B or C cluster or acid base active site residues were sorted out . The alignment was trimmed using 355 trimAl. An initial tree was built using FastTree. Further sequences were removed after visual 356 inspection, which yielded a final dataset containing 5508 sequences. See Fig. S3 for detailed 357 flowchart. Custom code can be found and retrieved for github (Böhm, 2025a, 2025b, 2025c). 358 Neighbor analysis. Genome information was downloaded for the accession number lists generated for 359 CODH and HCP, which lead to the download of 955 and 1425 genomes, respectively. As neighboring 360 gene we defined a maximum of 15 genes upstream and downstream of the target gene , that had a 361 maximum intergenic distance of 300 base pairs (bp), as was done previously by Inoue et al. (Inoue et 362 al., 2018). We decided to use this rather large intergenic distance to include as many neighbors as 363 possible, and we expect that unrelated genes will disappear in the noise. For the same reason, we 364 included an overlap region of 50 bp for genes in the same o peron, which is rather high, as genes for 365 example in E. coli usually overlap 4 to 1 bp (Johnson and Chisholm, 2004) . Aa sequences for those 366 genes were retrieved from the NCBI database using entrez (Sayers, 2022) , and their fu nction was 367 predicted using eggnog (Huerta-Cepas et al., 2019) , the results from eggNOG as well as the product 368 prediction from NCBI were taken into account in the manual placing of selected functional groups. 369 The data was plotted using R (R Core Team, 2023) , tidyverse (Wickham et al., 2019) , 370 patchwork(Pedersen, 2025), ggnewscale(Elio Campitelli et al., 2025), ggtree (Yu et al., 2018, 2017) , 371 ggtreeExtra(Xu et al., 2021), and treeio(Wang et al., 2020) and for gene maps gggenes (Wilkins, 2023) 372 was used. Since CooJ determination was neither possible with the NCBI prediction nor via eggNOG, 373 we selected operons from clade E and F that contained CooS and CooT, and manually extracted some 374 accession numbers of potential CooJs which were used to PSI -BLAST (BLOSUM45, E < 0.001) for 375 further accession numbers, summary can be found in S upplementary File 1_Table S1. These numbers 376 were used to help annotate potential CooJs in our analysis, 68 potential CooJ genes could be 377 identified. 378 Correlation analysis . Correlation of CODH and HCP from different clades/classes was calculated 379 according the formula 380 𝑃(𝑋|𝑌) = 𝑁𝑋𝑌 𝑁𝑌 , 381 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint where N Y is the total number of assemblies containing protein from clade/class Y, N XY is the total 382 number of assemblies containing proteins from both clade/class X and Y, and P(X|Y) is the probability 383 that a genome coding for a protein from clade/class Y also codes for a protein from clade/class X. 384 Tree generation . In total, five trees were generated. Trees carrying phylogenetic information were 385 generated via iqtree2 (Minh et al., 2020) with the LG+I+R10 model and ultrafast bootstrapping with 386 1000 resampling for a dataset of 5508 CODH sequences, a dataset o f 1351 CODH sequences, and a 387 dataset of 1476 HCP sequences (see above for details on their generation) . For the 5508 sequence 388 CODH dataset an outgroup was introduced to root the tree ( MBE6442607.1). Sequences were aligned 389 within their dataset using mafft’s FFT-NS-2. The alignment was again trimmed using trimAl and built 390 using iqtree2 with the above parameters. For tree inspection and plotting ggtree (Yu et al., 2017) was 391 used. The two other trees generated are taxonomic trees, either only on taxid using a custom python 392 script and ete3 (Huerta-Cepas et al., 2016) , or from WoL: Reference Phylogeny for Microbes (Zhu, 393 2023; Zhu et al., 2019). 394

Acknowledgements

395 The Novo Nordisk Foundation (Grant reference number NNF21OC0066716) is gratefully 396 acknowledged for funding. 397

References

398 Adama PS, Borrela G, Gribaldoa S. 2018. Evolutionary history of carbon monoxide 399 dehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexes. Proc Natl Acad 400 Sci U S A 115:E5836–E5837. doi:10.1073/pnas.1716667115 401 Babbitt PC, Hasson MS, Wedekind JE, Palmer DRJ, Barrett WC, Reed GH, Rayment I, Ringe D, 402 Kenyon GL, Gerlt JA. 1996. The enolase superfamily: A general strategy for enzyme-403 catalyzed abstraction of the α -protons of carboxylic acids. Biochemistry 35:16489–16501. 404 doi:10.1021/bi9616413 405 Bao T, Cheng C, Xin X, Wang J, Wang M, Yang S-T. 2019. Deciphering mixotrophic Clostridium 406 formicoaceticum metabolism and energy conservation: Genomic analysis and experimental 407 studies. Genomics 111:1687–1694. doi:10.1016/j.ygeno.2018.11.020 408 Basak Y, Lorent C, Jeoung J-H, Zebger I, Dobbek H. 2025. Metalloradical-driven enzymatic CO2 409 reduction by a dynamic Ni–Fe cluster. Nat Catal 1–10. doi:10.1038/s41929-025-01388-5 410 Benvenuti M, Meneghello M, Guendon C, Jacq-Bailly A, Jeoung JH, Dobbek H, Leger C, Fourmond 411 V, Dementin S. 2020. The two CO-dehydrogenases of Thermococcus sp. AM4. Biochim 412 Biophys Acta Bioenerg 1861:148188. doi:10.1016/j.bbabio.2020.148188 413 Biester A, Grahame DA, Drennan CL. 2024. Capturing a methanogenic carbon monoxide 414 dehydrogenase/acetyl-CoA synthase complex via cryogenic electron microscopy. Proceedings 415 of the National Academy of Sciences 121:e2410995121. doi:10.1073/pnas.2410995121 416 Böhm M. 2025a. protein-to-genome. https://doi.org/10.5281/zenodo.16736767 417 Böhm M. 2025b. protein-per-organism. https://doi.org/10.5281/zenodo.16736754 418 Böhm M. 2025c. protein-neighbours. https://doi.org/10.5281/zenodo.16736722 419 Can M, Armstrong FA, Ragsdale SW. 2014. Structure, function, and mechanism of the nickel 420 metalloenzymes, CO dehydrogenase, and acetyl-CoA synthase. Chem Rev 114:4149–74. 421 doi:10.1021/cr400461p 422 Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated alignment 423 trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. 424 doi:10.1093/bioinformatics/btp348 425 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint Chamberlain S, Arendsee Z, Stirling T. 2025. taxizedb: Tools for Working with “Taxonomic” 426 Databases. doi:10.5281/zenodo.1158055 427 Chamberlain S, Szocs E. 2013. taxize - taxonomic search and retrieval in R. F1000Research 2. 428 Chamberlain S, Szoecs E, Foster Z, Arendsee Z, Boettiger C, Ram K, Bartomeus I, Baumgartner J, 429 O’Donnell J, Oksanen J, Tzovaras BG, Marchand P, Tran V, Salmon M, Li G, Grenié M. 430 2020. taxize: Taxonomic information from around the web (manual). 431 COLLINS MD, LAWSON PA, WILLEMS A, CORDOBA JJ, FERNANDEZ-GARAYZABAL J, 432 GARCIA P, CAI J, HIPPE H, FARROW JAE. 1994. The Phylogeny of the Genus 433 Clostridium: Proposal of Five New Genera and Eleven New Species Combinations. 434 International Journal of Systematic and Evolutionary Microbiology 44:812–826. 435 doi:10.1099/00207713-44-4-812 436 Diekert GB, Thauer RK. 1978. Carbon Monoxide Oxidation by Clostridium thermoaceticum and 437 Clostridium formicoaceticum. Journal of Bacteriology 136:597–606. 438 doi:10.1128/jb.136.2.597-606.1978 439 Domnik L, Merrouch M, Goetzl S, Jeoung JH, Leger C, Dementin S, Fourmond V, Dobbek H. 2017. 440 CODH-IV: A High-Efficiency CO-Scavenging CO Dehydrogenase with Resistance to O2. 441 Angew Chem Int Ed Engl 56:15466–15469. doi:10.1002/anie.201709261 442 Elio Campitelli, Teun van den Brand, olivroy. 2025. ggnewscale: Multiple Fill and Color Scales in 443 ggplot2. doi:10.5281/ZENODO.2543762 444 Fox JD, He Y, Shelver D, Roberts GP, Ludden PW. 1996. Characterization of the region encoding the 445 CO-induced hydrogenase of Rhodospirillum rubrum. Journal of Bacteriology 178:6200–6208. 446 doi:10.1128/jb.178.21.6200-6208.1996 447 Frolov EN, Elcheninov AG, Gololobova AV, Toshchakov SV, Novikov AA, Lebedinsky AV, 448 Kublanov IV. 2023. Obligate autotrophy at the thermodynamic limit of life in a new 449 acetogenic bacterium. Front Microbiol 14. doi:10.3389/fmicb.2023.1185739 450 Fujishiro T, Takaoka K. 2023. Class III hybrid cluster protein homodimeric architecture shows 451 evolutionary relationship with Ni, Fe-carbon monoxide dehydrogenases. Nat Commun 452 14:5609. doi:10.1038/s41467-023-41289-4 453 Gong W, Hao B, Wei Z, Ferguson DJ, Tallant T, Krzycki JA, Chan MK. 2008. Structure of the α2ε2 454 Ni-dependent CO dehydrogenase component of the Methanosarcina barkeri acetyl-CoA 455 decarbonylase/synthase complex. Proceedings of the National Academy of Sciences 456 105:9558–9563. doi:10.1073/pnas.0800415105 457 Gtari M, Ventura S. 2025. Proposal of Neomoorella gen. nov. as a replacement name for the 458 illegitimate prokaryotic genus name Moorella Collins et al. 1994. International Journal of 459 Systematic and Evolutionary Microbiology 75:006779. doi:10.1099/ijsem.0.006779 460 Hadj-Said J, Pandelia ME, Leger C, Fourmond V, Dementin S. 2015. The Carbon Monoxide 461 Dehydrogenase from Desulfovibrio vulgaris. Biochim Biophys Acta 1847:1574–83. 462 doi:10.1016/j.bbabio.2015.08.002 463 Hagen WR. 2022. Structure and function of the hybrid cluster protein. Coordination Chemistry 464 Reviews 457. doi:10.1016/j.ccr.2021.214405 465 Hocking WP, Roalkvam I, Magnussen C, Stokke R, Steen IH. 2015. Assessment of the Carbon 466 Monoxide Metabolism of the Hyperthermophilic Sulfate-Reducing Archaeon Archaeoglobus 467 fulgidus VC-16 by Comparative Transcriptome Analyses. Archaea 2015:235384. 468 doi:10.1155/2015/235384 469 Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: Reconstruction, Analysis, and Visualization of 470 Phylogenomic Data. Mol Biol Evol 33:1635–1638. doi:10.1093/molbev/msw046 471 Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, 472 Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, 473 functionally and phylogenetically annotated orthology resource based on 5090 organisms and 474 2502 viruses. Nucleic Acids Research 47:D309–D314. doi:10.1093/nar/gky1085 475 Inoue M, Nakamoto I, Omae K, Oguro T, Ogata H, Yoshida T, Sako Y. 2018. Structural and 476 Phylogenetic Diversity of Anaerobic Carbon-Monoxide Dehydrogenases. Front Microbi ol 477 9:3353. doi:10.3389/fmicb.2018.03353 478 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint Inoue M, Omae K, Nakamoto I, Kamikawa R, Yoshida T, Sako Y. 2022. Biome-specific distribution 479 of Ni-containing carbon monoxide dehydrogenases. Extremophiles 26:9. doi:10.1007/s00792-480 022-01259-y 481 Inoue T, Takao K, Fukuyama Y, Yoshida T, Sako Y. 2014. Over-expression of carbon monoxide 482 dehydrogenase-I with an accessory protein co-expression: a key enzyme for carbon dioxide 483 reduction. Biosci Biotechnol Biochem 78:582–7. doi:10.1080/09168451.2014.890027 484 Jain S, Katsyv A, Basen M, Muller V. 2021. The monofunctional CO dehydrogenase CooS is essential 485 for growth of Thermoanaerobacter kivui on carbon monoxide. Extremophiles 26:4. 486 doi:10.1007/s00792-021-01251-y 487 Jeoung J-H, Dobbek H. 2007. Carbon Dioxide Activation at the Ni,Fe-Cluster of Anaerobic Carbon 488 Monoxide Dehydrogenase. Science 318:1461–1464. doi:10.1126/science.1148481 489 Jeoung JH, Fesseler J, Domnik L, Klemke F, Sinnreich M, Teutloff C, Dobbek H. 2022. A Morphing 490 [4Fe-3S-nO]-Cluster within a Carbon Monoxide Dehydrogenase Scaffold. Angew Chem Int 491 Ed Engl 61:e202117000. doi:10.1002/anie.202117000 492 Johnson ZI, Chisholm SW. 2004. Properties of overlapping genes are conserved across microbial 493 genomes. Genome Res 14:2268–2272. doi:10.1101/gr.2433104 494 Katayama YA, Kamikawa R, Yoshida T. 2024. Phylogenetic diversity of putative nickel -containing 495 carbon monoxide dehydrogenase -encoding prokaryotes in the human gut microbiome. 496 Microbial Genomics 10. doi:10.1099/mgen.0.001285 497 Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: 498 Improvements in Performance and Usability. Mol Biol Evol 30:772–780. 499 doi:10.1093/molbev/mst010 500 Kerby RL, Ludden PW, Roberts GP. 1997. In vivo nickel insertion into the carbon monoxide 501 dehydrogenase of Rhodospirillum rubrum: molecular and physiological characterization of 502 cooCTJ. Journal of Bacteriology 179:2259–2266. doi:10.1128/jb.179.7.2259-2266.1997 503 Kim M-S, Choi AR, Lee SH, Jung H-C, Bae SS, Yang T-J, Jeon JH, Lim JK, Youn H, Kim TW, Lee 504 HS, Kang SG. 2015. A Novel CO-Responsive Transcriptional Regulator and Enhanced H2 505 Production by an Engineered Thermococcus onnurineus NA1 Strain. Applied and 506 Environmental Microbiology 81:1708–1714. doi:10.1128/AEM.03019-14 507 Knox HL, Allen KN. 2023. Expanding the viewpoint: Leveraging sequence information in 508 enzymology. Current Opinion in Chemical Biology 72:102246. 509 doi:10.1016/j.cbpa.2022.102246 510 Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or 511 nucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158 512 Li W, Jaroszewski L, Godzik A. 2002. Tolerating some redundancy significantly speeds up clustering 513 of large protein databases. Bioinformatics 18:77–82. doi:10.1093/bioinformatics/18.1.77 514 Li W, Jaroszewski L, Godzik A. 2001. Clustering of highly homologous sequences to reduce the size 515 of large protein databases. Bioinformatics 17:282–283. doi:10.1093/bioinformatics/17.3.282 516 Liew F, Henstra AM, Winzer K, Köpke M, Simpson SD, Minton NP. 2016. Insights into CO2 517 Fixation Pathway of Clostridium autoethanogenum by Targeted Mutagenesis. mBio 518 7:10.1128/mbio.00427-16. doi:10.1128/mbio.00427-16 519 Lindahl PA, Chang B. 2001. THE EVOLUTION OF ACETYL-CoA SYNTHASE. Orig Life Evol 520 Biosph 31:403–434. 521 Matschiavelli N, Oelgeschläger E, Cocchiararo B, Finke J, Rother M. 2012. Function and Regulation 522 of Isoforms of Carbon Monoxide Dehydrogenase/Acetyl Coenzyme A Synthase in 523 Methanosarcina acetivorans. J Bacteriol 194:5377–5387. doi:10.1128/JB.00881-12 524 Matson EG, Gora KG, Leadbetter JR. 2011. Anaerobic Carbon Monoxide Dehydrogenase Diversity in 525 the Homoacetogenic Hindgut Microbial Communities of Lower Termites and the Wood 526 Roach. PLOS ONE 6:e19316. doi:10.1371/journal.pone.0019316 527 Merrouch M, Benvenuti M, Lorenzi M, Leger C, Fourmond V, Dementin S. 2018. Maturation of the 528 [Ni-4Fe-4S] active site of carbon monoxide dehydrogenases. J Biol Inorg Chem 23:613–620. 529 doi:10.1007/s00775-018-1541-0 530 Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. 531 2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the 532 Genomic Era. Molecular Biology and Evolution 37:1530–1534. doi:10.1093/molbev/msaa015 533 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint Pedersen TL. 2025. patchwork: The Composer of Plots. 534 Price MN, Dehal PS, Arkin AP. 2010. FastTree 2 – Approximately Maximum-Likelihood Trees for 535 Large Alignments. PLOS ONE 5:e9490. doi:10.1371/journal.pone.0009490 536 R Core Team. 2023. R: A Language and Environment for Statistical Computing. 537 Sayers E. 2022. A General Introduction to the E-utilitiesEntrez Programming Utilities Help [Internet]. 538 National Center for Biotechnology Information (US). 539 Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File 540 Manipulation. PLOS ONE 11:e0163962. doi:10.1371/journal.pone.0163962 541 Smith KS, Ingram-Smith C. 2007. Methanosaeta, the forgotten methanogen? Trends in Microbiology 542 15:150–155. doi:10.1016/j.tim.2007.02.002 543 Soboh B, Linder D, Hedderich R. 2002. Purification and catalytic properties of a CO -oxidizing:H2-544 evolving enzyme complex from Carboxydothermus hydrogenoformans. Eur J Biochem 545 269:5712–21. doi:10.1046/j.1432-1033.2002.03282.x 546 Techtmann SM, Lebedinsky AV, Colman AS, Sokolova TG, Woyke T, Goodwin L, Robb FT. 2012. 547 Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases. Front 548 Microbiol 3:132. doi:10.3389/fmicb.2012.00132 549 Terlesky KC, Ferry JG. 1988. Ferredoxin requirement for electron transport from the carbon monoxide 550 dehydrogenase complex to a membrane-bound hydrogenase in acetate-grown Methanosarcina 551 thermophila. Journal of Biological Chemistry 263:4075–4079. doi:10.1016/S0021-552 9258(18)68892-1 553 Waite DW, Chuvochina M, Pelikan C, Parks DH, Yilmaz P, Wagner M, Loy A, Naganuma T, Nakai 554 R, Whitman WB, Hahn MW, Kuever J, Hugenholtz P. 2020. Proposal to reclassify the 555 proteobacterial classes Deltaproteobacteria and Oligoflexia, and the phylum 556 Thermodesulfobacteria into four phyla reflecting major functional capabilities. International 557 Journal of Systematic and Evolutionary Microbiology 70:5972–6016. 558 doi:10.1099/ijsem.0.004213 559 Wang L-G, Lam TT-Y, Xu S, Dai Z, Zhou L, Feng T, Guo P, Dunn CW, Jones BR, Bradley T, Zhu H, 560 Guan Y, Jiang Y, Yu G. 2020. Treeio: An R Package for Phylogenetic Tree Input and Output 561 with Richly Annotated and Associated Data. Molecular Biology and Evolution 37:599–603. 562 doi:10.1093/molbev/msz240 563 Weghoff MC, Müller V. 2016. CO Metabolism in the Thermophilic Acetogen Thermoanaerobacter 564 kivui. Applied and Environmental Microbiology 82:2312–2319. doi:10.1128/AEM.00122-16 565 Welte C, Deppenmeier U. 2011. Membrane-Bound Electron Transport in Methanosaeta thermophila. 566 Journal of Bacteriology 193:2868–2870. doi:10.1128/jb.00162-11 567 Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry 568 L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel 569 D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. 2019. Welcome to the 570 Tidyverse. JOSS 4:1686. doi:10.21105/joss.01686 571 Wilkins D. 2023. gggenes: Draw Gene Arrow Maps in “ggplot2.” 572 Wittenborn EC, Guendon C, Merrouch M, Benvenuti M, Fourmond V, Leger C, Drennan CL, 573 Dementin S. 2020. The Solvent-Exposed Fe-S D-Cluster Contributes to Oxygen-Resistance in 574 Desulfovibrio vulgaris Ni-Fe Carbon Monoxide Dehydrogenase. ACS Catal 10:7328–7335. 575 doi:10.1021/acscatal.0c00934 576 Wu M, Ren Q, Durkin AS, Daugherty SC, Brinkac LM, Dodson RJ, Madupu R, Sullivan SA, Kolonay 577 JF, Haft DH, Nelson WC, Tallon LJ, Jones KM, Ulrich LE, Gonzalez JM, Zhulin IB, Robb 578 FT, Eisen JA. 2005. Life in hot carbon monoxide: the complete genome sequence of 579 Carboxydothermus hydrogenoformans Z-2901. PLoS Genet 1:e65. 580 doi:10.1371/journal.pgen.0010065 581 Xu S, Dai Z, Guo P, Fu X, Liu S, Zhou L, Tang W, Feng T, Chen M, Zhan L, Wu T, Hu E, Jiang Y, 582 Bo X, Yu G. 2021. ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic 583 Data. Molecular Biology and Evolution 38:4039–4042. doi:10.1093/molbev/msab166 584 Yu G, Lam TT-Y, Zhu H, Guan Y. 2018. Two Methods for Mapping and Visualizing Associated Data 585 on Phylogeny Using Ggtree. Molecular Biology and Evolution 35:3041–3043. 586 doi:10.1093/molbev/msy194 587 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. 2017. ggtree: an r package for visualization and 588 annotation of phylogenetic trees with their covariates and other associated data. Methods in 589 Ecology and Evolution 8:28–36. doi:10.1111/2041-210X.12628 590 Zhou A, Chen YI, Zane GM, He Z, Hemme CL, Joachimiak MP, Baumohl JK, He Q, Fields MW, 591 Arkin AP, Wall JD, Hazen TC, Zhou J. 2012. Functional Characterization of Crp/Fnr-Type 592 Global Transcriptional Regulators in Desulfovibrio vulgaris Hildenborough. Applied and 593 Environmental Microbiology 78:1168–1177. doi:10.1128/AEM.05666-11 594 Zhu Q. 2023. WoL: Reference Phylogeny for Microbes. WoL. https://biocore.github.io/wol/ 595 Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, 596 Kopylova E, McDonald D, Kosciolek T, Yin JB, Huang S, Salam N, Jiao J -Y, Wu Z, Xu ZZ, 597 Cantrell K, Yang Y, Sayyari E, Rabiee M, Morton JT, Podell S, Knights D, Li W -J, 598 Huttenhower C, Segata N, Smarr L, Mirarab S, Knight R. 2019. Phylogenomics of 10,575 599 genomes reve als evolutionary proximity between domains Bacteria and Archaea. Nat 600 Commun 10:5477. doi:10.1038/s41467-019-13443-4 601 602 603 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0