{"paper_id":"1d68b0c2-89be-4e3f-8b44-fd75e236f765","body_text":"Investigating the native functions of [NiFe]-carbon monoxide 1 \ndehydrogenases through genomic context analysis  2 \nMaximilian Böhma and Henrik Landa, * 3 \na) Molecular Biomimetics, Department of Chemistry – Ångström Laboratory, Uppsala University, Uppsala 4 \nSE-75120, Sweden 5 \n* Corresponding author, email: henrik.land@kemi.uu.se 6 \nAbstract 7 \nCarbon monoxide dehydrogenases containing nickel-iron active sites ([NiFe]-CODHs) catalyze the 8 \nreversible oxidation of CO to CO₂, representing key targets for biocatalytic CO₂ red uction. Despite 9 \ndramatic differences in catalytic rates and O₂ tolerance between CODH variants, the molecular basis 10 \nfor this functional diversity remains poorly understood.  We applied comparative genomics and 11 \nsynteny analysis to investigate the biochemical  roles of CODH clades A -F using 1,376 CODH and 12 \n1,545 hybrid cluster protein sequences. Around ~30% of genomes encode multiple CODH isoforms . 13 \nAnalysis revealed distinct gene clustering patterns correlating with biochemical function. Clades A, E, 14 \nand F exhib it a degree of distributional exclusivity. Clades C and D frequently co -occur with active 15 \nCODHs, suggesting auxiliary roles.  Operon architecture analysis revealed functional specialization: 16 \nclade A links to acetyl -CoA synthase; clades A, E, F contain essen tial maturation machinery (CooC, 17 \nCooJ, CooT) correlating with catalytic activity; clade B associates with transporters; clade C with 18 \nelectron transfer partners; clade D with transcriptional regulators. High CODH -HCP co -occurrence 19 \n(except clade A) suggests environmental interdependency. These findings establish clades A, E, F as 20 \nprimary biocatalyst targets while defining regulatory functions for clades C, D, providing a genomics 21 \nframework for predicting CODH phenotypes. 22 \nIntroduction 23 \nGenomic enzymology has been proven to help understand protein (super) families since the mid-24 \n1990s, helping to connect enzyme sequences to function through comparative genomics and 25 \nneighborhood analysis (Babbitt et al., 1996; Knox and Allen, 2023) . In this study, we are employing a 26 \ngenome neighborhood and co -occurrence analysis to help understand reactivity and functionality of 27 \nthe family of nickel containing carbon monoxide dehydrogenases ( [NiFe]-CODHs) and their 28 \nrelationship to hybrid cluster proteins (HCPs).  29 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\n[NiFe]-CODHs are ancient and diverse enzymes that catalyze the interconversion between carbon 30 \nmonoxide (CO) and carbon dioxide (CO₂), a reaction of high interest for biotechnological applications, 31 \nincluding CO₂ capture and conversion. Research on th is enzyme spans over 60 years, and recent 32 \nstudies have provided important biochemical insights, such a s their turnover frequency and oxygen 33 \ntolerance (Can et al., 2014) . These properties vary  greatly between enzymes,  not only between  34 \nseparate phylogenetic  clades, but also within clades, making functional prediction from sequence 35 \nalone challenging. Phylogenetic analyses of available [NiFe]-CODH (hereafter referred to as CODH)  36 \nsequences with different focus such as gene transfer (Techtmann et al., 2012), primary structure (Inoue 37 \net al., 2018) , biodistribution (Inoue et al., 2022)  and human gut microbiome  (Katayama et al., 2024)  38 \nhave enriched our understanding of this old and diverse enzyme family.  From initially small data sets 39 \nof 17 sequences (Lindahl and Chang, 2001)  to datasets well above 5000 sequences (this study).  It has 40 \nbeen shown that up to eight distinct phylogenetic clades  (Figure 1 ) can be distinguished with all of 41 \nthem having sequence variations while preserving the overall fold , as  seen with cryo -electron 42 \nmicroscopy (Biester et al., 2024)  and x-ray crystallography (Basak et al., 2025; Domnik et al., 2017; 43 \nGong et al., 2008; Jeoung et al., 2022; Jeoung and Dobbek, 2007; Wittenborn et al., 2020). 44 \nThe biochemical characterization of this enzyme family is still ongoing, and it shows a wide range of 45 \nturnover frequencies as well as different degrees of O2 tolerance. For example, looking at two CODHs 46 \nfrom Carboxydothermus hydrogenoformans: ChCODH-II, a benchmark CODH known for its high CO 47 \noxidation activity but low O2 tolerance, contrasts with ChCODH-IV — another enzyme from the same 48 \nFigure 1. Schematic phylogenetic tree of [ NiFe]-CODH, with selected CODHs marked with their respective position. Tree \nwas build using iqtree 2, 1000 ultrafast bootstrap, containing 5508 putative CODH sequences and one outgroup \n(MBE6442607.1 hydroxylamine reductase [ Desulfovibrio desulfuricans ]) for  rooting. Detailed searchable tree with \nbootstrap values can be found in Supplementary File 9_Tree5. \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nclade and organism — which retains 20% of activity after 1 h of O₂ exposure but displays reduced CO 49 \noxidation capacity with an increased activation barrier and much lower KM (Domnik et al., 2017), both 50 \nbelonging to clade F (Fig. 1). Similarly, a less active CODH from clade E, NvCODH (formerly known 51 \nas DvCODH) from Nitratidesulfovibrio vulgaris , has been reported to fully reactivate after initial 52 \ninactivation by O₂ exposure (Hadj-Said et al., 2015) . Also, the two CODH s from Thermococcus sp. 53 \nAM4, TcCODH-I and TcCODH-II belonging to clade E react slower with O2 compared to ChCODH-54 \nII, but have overall equal O2 sensitivity (Benvenuti et al., 2020). 55 \nIn addition to the previously mentioned diversity within CODH ’s clades or organisms, with regards to 56 \nactivity and oxygen tolerance , i t is known that some CODH s rely on maturases for full activation 57 \nwhile others do not. For example, RrCODH, from the phototroph Rhodospirillum rubrum, needs to be 58 \nexpressed together with three maturases (CooC, CooJ and CooT) in order to be isolated in an active 59 \nform (Kerby et al., 1997). A similar situation arises for NvCODH, however, its genomic neighborhood 60 \n(Fig. 2) only contains one maturase  (CooC) which is required for active production (Hadj-Said et al., 61 \n2015). On the contrary,  ChCODH-II can be heterologously expressed without co -expression of any 62 \nmaturases (Merrouch et al., 2018) . Also, ChCODH-I needs to be co -expressed with CooC in order to 63 \nreach high activity but it can also be expressed without it, albeit with reduced activity  (Inoue et al., 64 \n2014). Interestingly, much of th e diversity with regards to activity, O 2 tolerance and maturase 65 \ndependence does not only occur between the different clades but also within them.   66 \nDue to the homology between the CODHs in this study and the fact that active CODHs have been 67 \ndemonstrated from several of the clades, it is reasonable to assume  that CODHs from all clades  are 68 \nable to interconvert CO2/CO. However, a recent  study by Dobbek and co -workers showed that 69 \nCarboxydothermus hydrogenoformans CODH-V (ChCODH-V) from clade D was not able to perform 70 \nthis reaction (Jeoung et al., 2022) . They showed that this enzyme has a closer similarity to the family 71 \nof hybrid cluster proteins  (HCPs), due t o its morphing active site , composed if iron, sulphur and 72 \nFigure 2. Operons of selected [NiFe ]-CODH. * NtCODH formally known as MtCODH form erly known as CtCODH due to \nrenaming of host organism  (Gtari and Ventura, 2025) . ** NvCODH form erly known as DvCODH due to renaming of host \norganism (Waite et al., 2020). \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\noxygen, responding with structural and stoichiometric changes t o changes in its redox state. A 73 \nconnection between HCPs and CODHs has been pointed out previously by Inoue et al. (Inoue et al., 74 \n2018) due to their close phylogenetic relationship , and was further discussed by Fujishiro et al.  75 \n(Fujishiro and Takaoka, 2023) . HCPs can be divided in three phylogenetic classes, of which class III 76 \nexhibits a homodimeric structure like CODH. Generally, t he two enzyme fami lies share a similar 77 \noverall fold while their active sites differ grea tly, both in terms of amino acid and metallocofactor 78 \ncomposition. Similar to ChCODH-V, HCPs do not catalyze CO 2/CO interconversion  but they do 79 \ndisplay a range of activities at low rate, such as hydroxylamine reductase-, peroxidase-, nitric oxide 80 \nreductase- and S-nitrosylase activity. The main natural function of HCPs is debated but it was recently 81 \nestablished that it is most likely a nitric oxide reductase involved in nitric oxide detoxif ication (Hagen, 82 \n2022).   83 \nIn this study, we contribute to paint a wholistic phylogenetic picture of CODHs by focusing on the 84 \nanalysis of their genetic environment as well as  harnessing the concept of synteny in which we use a 85 \nsemi quantitative approach to predict characteristics of CODH, clade and subclade specific. We are 86 \npresenting certain clade specific trends in the operon composition in CODH . Since it is known that 87 \nmany organisms have multiple isoforms of CODHs coded in their genome , we analyze the co-88 \noccurrence of CODH of different clades  in an organism, as well as the  co-occurrences of CODH and 89 \nHCP. With our findings we want to propose a systematic approach in the analysis of new  CODH, with 90 \nthe focus on identifying promising CO2 reduction catalysts, suitable for biotechnological application. 91 \nResults  92 \nCo-occurrence and Correlation. After evaluating the assemblies in regard to their count of CODH, it 93 \nwas seen that around 30% of all assemblies encode for more than one CODH. For HCP this number is 94 \nmuch smaller, around 6%. As can be seen from Fig.  3 A, the occurrence of multiple isoforms from 95 \nspecific clades within organisms varies. Clades B, C and D almost exclusively occur only once within 96 \na genome, while clades A, E and F are more likely to co -occur with another isoform from th e same 97 \nclade. The overall trend that we o bserve is most likely underrepresenting  the number of genomes 98 \nencoding multiple CODHs, since incomplete genomes are  also included in these analyses . When 99 \ncalculating the correlation of the co-occurrence of CODH from two different clades in one organism, a 100 \npattern evolves (Fig. 3 B). Most obvious is the lack of co -occurrence of clades A, E and F with each 101 \nother. Furthermore, clade B CODH s seem to have little co -occurrence with other clades as well. 102 \nHowever, clades C and D more often co-occur with CODH from other clades, especially A, E and F. 103 \nClades C and D also have a higher probability to co -occur with each other.  As outlined in the 104 \nintroduction, from biochemical studies it is known, that CODH s from clade A , E, and F are active 105 \nwhereas CO2/CO interconversion activity is missing in CODH from clade D. This co -occurrence 106 \nmight suggest that the redox sensing properties  of CODH from clade D (and potentially clade C)  are 107 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nuseful for organisms already containing a functional CODH. Interestingly, a high co-occurrence was 108 \nalso seen for CODH and HCP, with an exception for clade A CODH. 109 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nNeighbor Analysis. We semi-quantitively evaluated the operon composition of 1351 CODHs (121 A, 110 \n130 B, 168 C, 253 D, 434 E, 245 F, Supplementary File 5_Tree1 and Supplementary File 6_Tree2 ) 111 \nwith proteins which function we could predict using eggNOG (Huerta-Cepas et al ., 2019), the NCBI 112 \nproduct prediction and  manual curation. The results are summarized in Fig. 4. In the following we 113 \nonly report on neighbors that are encoded in the same operon as  more than 10% of CODHs per clade 114 \n(see Table S1). Starting with CODHs fro m clade A , 93% contain a one carbon pool r elated gene in 115 \ntheir operon, followed by CooC  (62%) and iron-sulfur (FeS) cluster containing protein ( 31%). As one 116 \ncarbon pool r elated gene , we defined genes associated either with direct conversion of one carbon 117 \nFigure 3. (A) Count of organisms that contain one ore multiple [ NiFe]-CODH. (B) Probability matrix of co -occurrence of \n[NiFe]-CODH from different clades in one organism. Raw data can be found in Supplementary File 1_Table S1. \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\ncompounds such as format e dehydrogenases, or with the W ood-Ljungdahl pathway. Clade B CODH 118 \noperons mainly encode ABC transporter associated genes (64%). Furthermore, almost a quarter (24%) 119 \nof all CODHs from clade B could not be associated with any neighbor, and 12% are coded close to 120 \ntranscriptional regulators. For CODH from clade C , the three main neighbors are proteins associated 121 \nwith FeS cluster containing proteins (such as CooF)  (72%), NAD(P) or FAD dependent 122 \noxidoreductases (71%) and  transcription (58%) or other (10%) regulation. The overall diversity of 123 \nneighboring proteins from clade D , and the fact that a major part of those CODH s seemingly do not 124 \nencode close to any other genes (64%)  made it challenging to sum up their different codons, and no 125 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nclear pattern could be observed. Only transcription regulation proteins (9.9%) and general regulatory 126 \nproteins (9.5%) are worth mentioning in this context. Clade E and F both have a larger set of proteins 127 \nfrequently observed in their associated operons. Operons encoding either clade E or F CODH contain 128 \nCooC like genes ( 59% E, 68% F), one carbon pool  associated genes (49% E, 37% F) and FeS genes 129 \n(29% E, 53% F). Transcription regulators (17% E, 35% F) and NAD(P)/FAD -dependent 130 \noxidoreductase (22% E, 42% F) have also been found. The maturation protein CooT was exclusively 131 \nfound in operons from clade E (16%) and F (6.1%). The same holds true for CooJ but in clade F, CooJ 132 \nwas seen in even fewer operons (12% E, <5.0%  F). Additional Hydrogenases (25%) and their 133 \nFigure 4. (A) Phylogenetic tree of putative [NiFe]-CODH unrooted with 1376 sequences, color-coded if operon contains \none or more of a certain type of protein. Detailed searchable tree with bootstrap values can b e found in Supplementary \nFile 5_ Tree1 and Supplementary File 6_Tree 2. (B) Distribution of operon size for CODH and HCP genes. (C) \nProportion of [NiFe] -CODH from one clade being coded near a certain type of protein.  Raw data can be found in \nSupplementary File 1_Table S1. \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nmaturation machinery (17 %) are coded primarily for clade F CODH, as well as different types of 134 \ntransporter proteins (11%).  135 \nA similar analysis of the operons encoding for HCP was performed and a total of 1476 HCP genes 136 \nwere analyzed (class I: 1049, class II: 23, class III: 404, Supplementary File 7_Tree3 and 137 \nSupplementary File 8_Tree4 ) showing a low frequency of  isoforms within organisms (Fig . S1). HCP 138 \nexhibited a  large variety of neighbors, leading to difficulties in  extracting meaningful information 139 \nfrom their operon composition (Fig. S2). Furthermore, class I and III had  a high proportion of entries 140 \nwithout any neighbors (49% and 75%, respectively) which is reflected in their tendency to have fewer 141 \nproteins coded in their operons (see Fig. 4 B). However, we observed a high frequency of FeS cluster 142 \nproteins (18%) and transcription regulators ( 17%) for class I, as well as NAD(P) or FAD 143 \noxidoreductases (96%) and transport proteins (78%) for class II HCP. It needs to be noted , however, 144 \nthat our sample set for class II HCP is very small , so its information value is considerably lower 145 \ncompared to the other classes/clades.  146 \nDiscussion 147 \nOur analysis reveals substantial diversity in the occurrence, co-occurrence, and genomic context of 148 \nCODH and HCP genes, suggesting complex evolutionary and functional relationships within and 149 \nacross microbial lineages. The observed differences in the frequ ency of multiple isoforms per genome 150 \n(~30% for CODH versus ~6% for HCP ) indicate that CODH s are more often retained in multiple 151 \ncopies, potentially pointing to functional diversification among its isoforms.  Similar values have been 152 \nreported by Techtmann et al., where they found  that a striking 43% of organisms coded for more than 153 \none CODH (Techtmann et al., 2012). On the other hand, Katayama et al., investigating only the human 154 \ngut microbiome found a number as low as 5.5% (Katayama et al., 2024). We suspect that this number 155 \nunderrepresents the amount of organisms carrying multiple isoforms of CODH in the human gut, since 156 \ndata refinements that exclude potential CODHs were performed  (such as the strict requirement for a 157 \n[4Fe-4S] D-cluster, even though Inoue et al. reporte d on the  diversity of the D -cluster(Inoue et al., 158 \n2018)).  159 \nThere are many examples of organisms coding for multiple CODH isoforms, as outlined in Fig. 3 160 \n(Supplementary File 10_Tree6 and Supplementary File 11_Tree7) . Many of them have been known to 161 \nliterature for a long time (even though their CODH  abundance has only been discussed sporadically ) 162 \nwith the most famous example being C. hydrogenoformans encoding five different CODHs (Wu et al., 163 \n2005). Another interesting example is Clostridium formicoaceticum, since this organism has a total of 164 \nsix CODH isoforms encoded in its genome. It needs to be noted, that in our analysis, this organism did 165 \nnot show up as  an organism with six CODHs, see Fig. 3 A. This is due to our analysis only counting 166 \nCODH stemming from the same organism when their genes are associated with the same genome 167 \nassembly. We therefore rather u nderestimate counts of organism with multiple CODHs, such as the 168 \naforementioned. The only CODH from C. formicoaceticum that has so far been isolated, characterized 169 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nand discussed is one of its clade E CODHs, that is associated with acetyl-CoA synthase (ACS) (Bao et 170 \nal., 2019; Diekert an d Thauer, 1978) . Other examples from literature attempted to investigate the 171 \ninfluence of CODH isoforms on the metabolism. Archaeglobus fulgidus contains three CODH genes, 172 \ntwo fr om clade A  and one from clade D. Its clade D CODH seem s to have no role in  the CO 173 \nmetabolism of this organism  (Hocking et al., 2015) . There also has been a report for the organism 174 \nMethanosarcina acetivorans , which harbors three CODH s, all of them belonging to clade A, where 175 \nonly two of them ar e associated with ACS and are believed to be involved in the CO metabolism, the 176 \nother one being a lone gene and is seemingly not involved in carbon metabolism (Matschiavelli et al., 177 \n2012). Another interesting example is Thermoanaerobacter kivui, formerly known as Acetogenium 178 \nkivui (COLLINS et al., 1994) . When TkCODH-I (clade C) is deleted  from the orga nism, the strain 179 \nloses its ability to grow slowly on CO, however , if grown on H 2+CO2 the overall acetate production is 180 \ngreatly increased  (Jain et al., 2021) . Similar effects have been shown for Clostridium 181 \nautoethanogenum, which contains three isoforms of CODH, and if its clade C CODH  is deleted, it’s 182 \nlag phase is reduced and its growth rate is greatly increased  (Liew et al., 2016) . The other two CODH 183 \nisoforms from this organism are from clade E, and D. Deletion of clade D CODH showed no 184 \nimmediate effect on the organism, except moderately lower overall biomass yield  (Liew et al., 2016) . 185 \nIn our analysis we saw an increased frequency of co -occurrence of clades C and D with A, E, or F , 186 \nwhich together with biological data, may indicate a complementary  role, possibly linked to redox 187 \nsensing or regulatory functions. This is especially evident with the examples for clade C CODHs from 188 \nT. kivui and C. autoethanogenum. For the case  of clade D, which lacks catalytic activity towards 189 \nCO/CO2 interconversion (as reported by Jeoung et al.  (Jeoung et al., 2022)  through their recombinant 190 \nproduction of ChCODH-V) and until now has unknown influence in the metabolism that might only 191 \nmanifest in harsher environments, since its believed to be involved in stress response (Jeoung et al., 192 \n2022) (similar to HCPs(Hagen, 2022), see below). However, experimental proof for this claim is still 193 \nmissing.  194 \nFurthermore, clades A, E, and F rarely co-occur. Interestingly, many organisms do however contain 195 \nmultiple copies of CODH s from one of these  clades, such as M. acetivorans (clade A), 196 \nC. hydrogenofromans (clade F), and C. formicoaceticum (clade E). We suspect an evolutionary reason 197 \nbehind this, as is also outlined by others (Adama et al., 2018; Lindahl and Chang, 2001) . Biochemical 198 \ndata indicate that CODHs from these clades possess CO/CO₂ interconversion capability, as previously 199 \nmentioned. This is also in line with the genetic context of these CODH s, which is most often tuned for 200 \nthis CO/CO2 interconversion chemistry (Fig. 4 C), see below.  201 \nThe high rate of co-occurrence between CODH and HCP genes (except for clade A) suggests 202 \nfunctional integration, a shared metabolic niche, or involvement in a coordinated response to redox 203 \nstress, given that HCPs are thought to regulate nitric or oxidative stress(Hagen, 2022). The lack of co-204 \nlocalization for clade A CODHs might point to distinct metabolic roles or evolutionary constraints , or 205 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\non the high rate of archaeal genes in clade A CODHs, even though HCP genes are known to also be 206 \nfound in archaea  (Hagen, 2022) . Interestingly, the co -occurrence between clade D CODH and HCP  207 \nseems the highest for our data set, the reason for that is unclear . Similar to clade D CODH, empirical 208 \nproof that HCP influences the activity or expression of CODH is missing. 209 \nThe genomic context analysis adds another layer of functional inference, as has been done before by 210 \nothers with different foci (Inoue et al., 2018; Katayama et al., 2024; Matson et al., 2011; Techtmann et 211 \nal., 2012). We could show again that o perons containing clade A CODHs are highly conserved with 212 \none carbon pool-related genes and CooC, as they are almost exclusively found as part of  the Wood-213 \nLjungdahl pathway , which has a typical arrangement similar to Methanosarcina barkeri s CODH 214 \n(MbCODH, Fig 2, S upplementary File 6_ Tree2). Recently, another representative  from this group  215 \nfrom M. thermophila (MetCODH) has been resolved (Biester et al., 2024). 216 \nIn contrast, clade B CODHs appear largely alone or associated with transport-related genes, raising the 217 \npossibility of a non -canonical or even degenerated function. Its operon composition is also rather 218 \nconsistent and its arrangements only  varies to a small extent , as ABC transporters are either coded 219 \nupstream (as for  Ruminococcus flavefaciens ’ CODH,  RfCODH, see Fig. 2 ) or downstream of the 220 \nCODH gene. Almost all operons analyzed do not contain any maturases, expect for a small cluster that 221 \nbranches off rather early in the tree (Supplementary File 6_ Tree2). This might indicate that the need 222 \nfor a maturase was lost due to re-purposing of the CODH. We yet await biochemical characterization 223 \nof any clade B CODH. 224 \nClade C CODHs are associated with FeS cluster proteins , regulators and redox enzymes, pointing 225 \ntowards more regulatory or redox -modulatory roles , which could also be  indicated in knock -out 226 \nstudies (Liew et al., 201 6). The only isolated example from this clade is  TkCODH-I. Its operon 227 \nexhibits a composition only partially representable for clade C CODHs , contai ning only one other 228 \ngene coding for a FeS protein  (Fig. 2). Furthermore, TkCODH-I’s sequence branches of f early and 229 \nseems to be rather distinct  (Supplementary File 9_ Tree5), only having one other close relative  from 230 \nAceticella autotrophica (Frolov et al., 2023) . Furthermore, the reported isolated CODH from Jain and 231 \nco-workers (Jain et al., 2021)  stems from a CO adapted strain (Weghoff and Müller, 2016) , which 232 \nmight harbor mutation s in the protein sequences that are not accessible to us at the moment.  Drawn 233 \ntogether, we conclude that right now TkCODH-I might not be a n optimal representative for clade C 234 \nCODHs and more clade C CODHs should be isolated to help us understand their biochemical 235 \nproperties better. 236 \nThe high operonic variability and frequency of solitary coding regions in clade D might reflect either 237 \nevolutionary drift or multifunctionality not restricted to operonic structure.  Clade D will therefore not 238 \nbe discussed further.  239 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nOperons from clades E and F are more functionally complex, including components from the Wood-240 \nLjungdahl pathway, hydrogenases, and additional redox partners, consistent with a diverse metabolic 241 \nrole. That being said, their operon compositions and arrangement s showcased some interesting 242 \nclustering. Starting with clade F, which has the highest proportion of CODHs that might be associated 243 \nwith hydrogenases, including the  aforementioned ChCODH-I and RrCODH, both  interacting with 244 \nhydrogenases to produce hydrogen in vivo (Fox et al., 1996; Soboh et al., 2002) , however with greatly 245 \ndiffering operon compositions (Fig. 2) . ChCODH-I like operons (see Supplemetary File 6_ Tree2) 246 \ncontain their hydrogenase modules directly within the CODH operon, whereas RrCODH-like operons 247 \ndo not include the hydrogenase  module which is coded upstream of  the CODH gene , with an 248 \nintergenic space of > 400 bp(Fox et al., 1996). RrCODH-like operons are the only clade F operons that 249 \ninclude two additional maturation enzymes , CooT and CooJ.  Clade F also contain  many ACS 250 \nassociated CODHs, such as Neomoorella thermoacetica  CODH, NtCODH, formerly known as 251 \nMoorella thermoacetica  (Gtari and Ventura, 2025) . Those NtCODH-like operons all have the same 252 \narrangement. This arrangement is distinct from Clade A and E ACS associated CODH operons. 253 \n In our dataset most of the hydrogenases and their maturation genes are associated with clade F, 254 \nsuggesting active hydrogen metabolism, coupling CO oxidation to H₂ production or consumption as 255 \nhas been suggested earlier  for a wider range of CODH clades  (Inoue et al., 2018; Techtmann et al., 256 \n2012). We believe that our data grossly underestimates this relat ionship overall, since operon 257 \nexamples such as RrCODH and TcCODH-II showcase that hydrogenases associated with a CODH are 258 \nnot necessarily encoded in the same operon. There has also been a report of a clade A CODH from  the 259 \nmethanogen M. thermophila (Terlesky and Ferry, 1988)  being associated with a hydrogenase.  260 \nHowever, in later studies it was shown that the genome of M. thermophila does not contain a 261 \nhydrogenase (Smith and Ingram-Smith, 2007) . Investigation of the electron transport chain of its 262 \nmembrane could not find a hydrogen oxidizing complex  (Welte and Deppenmeier, 2011) . Together 263 \nwith our analysis we conclude that hydrogenase association is a trait almost exclusive to clade E and F 264 \nCODHs. ChCODH-II’s operon seems to be rather uniquely constructed as a similar operon 265 \ncomposition only can be found for other Carboxydothermus species. A similar situation can be seen 266 \nfor ChCODH-IV, where its operon containing FeS and NAD/FAD -dependent oxidoreductases is 267 \ncloser in similarity to some clade E CODH. 268 \nRegarding the biggest clade, clade E, its diversity is striking . NvCODH, formerly known as DvCODH 269 \n(Waite et al., 2020) , from the organism Nitratidesulfovibrio vulgaris , has a very small operon with 270 \nonly two genes in its close proximity, a transcriptional regulator (Zhou et al., 2012)  and a maturation 271 \nenzyme (CooC), see Fig. 2 . This is seen for a huge number of both clade E and F CODHs.  The 272 \noccurrence of neither CooJ nor CooT is striking, both only appear in two very distinct parts of clade E, 273 \nall of them being associated with one carbon pool  metabolism, with one exce ption from Clostridium 274 \npasteurianum BC1, which more resembles the clade F RrCODH-like operon . The previously 275 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nintroduced archaeal CODHs, TcCODH-I and TcCODH-II from clade E, are contained in operons (Fig. 276 \n2), that are rather specific and only found for a few other Thermococcus or Pyrococcus species 277 \n(Benvenuti et al., 2020; Kim et al., 2015) . It needs to be noted, that TcCODH-I’s CooC gene is coded 278 \noutside of its operon, and on the opposite strand . The CooC gene is therefore not inc luded in our 279 \nanalysis. We are only aware of examples from this type of operon, and don’t expect this to be a 280 \ncommon trait of the CODH maturation machinery. However, it needs to be noted that the CooC 281 \nproportion might be slightly underestimated. Interestingly, TcCODH-II like CODH all contain a CooT 282 \nlike protein in their operon, forming the only cluster of CODH that contain  only CooT like proteins  283 \nwithout CooJ . Within clade E, another unique genomic neighborhood  from TkCODH-II must be 284 \npointed out. From experimental data it is known that this CODH is associated with ACS (Jain et al., 285 \n2021), however, in our analysis we did not see this ACS complex in TkCODH-II’s operon. This is due 286 \nto the ACS subunit being coded further downstream of the CODH gene,  not being taken into account 287 \ndue to our initial parameters.  288 \nFor HCPs, the high variability and low operon density — especially in classes I and III — point 289 \ntowards more modular or conditionally expressed roles , similar to clade D CODH . The clear 290 \npatterning in class II operons, though based on a limited sample, may reflect specialized functions, 291 \nperhaps in niche-specific oxidoreductase activities. 292 \nConclusion 293 \nAs previously mentioned, the aim of this study is to identify which CODH clades harbor the most 294 \npromising enzymes for future application in CO 2 reduction. The operon composition of CODHs from 295 \ndifferent clades show distinct differences and what we could gather from this information is that clade 296 \nA, E and F are the most likely clades to harbor CODHs able to efficiently convert CO2 to CO. These 297 \nclades are therefore the most interesting for CO2 reducing biotechnological applications, or as 298 \ninspiration for new synthetic catalysts. Also, literature has shown that the activity of many CODHs 299 \ndepend on co -expression with maturation proteins such as CooC, and in some cases, CooJ and CooT 300 \nare also required for full activation. Although some CODHs (most notably CODH -II from C. 301 \nhydrogenoformans) can function independently of maturases, our neighborhood analysis indicates that 302 \nmaturase-coding genes are predominantly found in operons from clades A, E, and F. This pattern 303 \nimplies yet again that these clades may represent more biochemically acti ve or catalytically optimized 304 \nCODHs, making them promising targets for future functional studies and biotechnological 305 \napplications. 306 \nThe function of Clade B could not be deduced based on its genomic environment but it seems to have 307 \na remarkable self-standing function, that is not shared with any other CODHs. Its low co -occurrence 308 \nwith other CODH clades within organisms also supports a unique role for Clade B . Clades C and D 309 \nare more likely to show low or even no  activity towards CO 2/CO interconversion, as was deduced 310 \nfrom literature and the lack of C1 metabolism related genes in their operon s. However, Jain et. al.  311 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nrecently showed low CO oxidation activity in a clade C CODH fro m T. kivui , but this enzyme 312 \noriginated from a strain that had acquired the ability to grow on CO  through laboratory evolution (Jain 313 \net al., 2021) . Sequence data used to classify the CODH into clade C was from the original strain 314 \n(incapable of growing on CO)  and data on the engineered strain is n ot available. It is therefore not  315 \nknown whether the active CODH is the wild -type or an engineered enzyme  and we cannot draw any 316 \nconclusions regarding the activity of clade C CODH s. Taken together, this makes clades B, C and D 317 \nless promising in the hunt for  CO2 reduction catalysts. However, much is still unknown about these 318 \nenzymes such as their cellular function.  319 \nFuture work should focus on experimental validation of the functional differences between CODH 320 \nisoforms, particularly in organisms where multiple  clades co-occur. Additionally, transcriptomic and 321 \nproteomic studies could illuminate condition -dependent expression patterns and confirm proposed 322 \nregulatory functions. Finally, deeper phylogenomic analyses may reveal the evolutionary drivers 323 \nbehind the observed distribution and diversification of these ancient redox enzymes. 324 \nMethods 325 \nData collection and refinement . Multiple pBLAST searches (BLOSUM62, E < 0.05) in the NCBI 326 \ndatabase were carried out using NCBI accession numbers provided by Inoue et al.  (Inoue et al., 2018) 327 \n(A-1, WP_011305243; A-2, WP_010878596; A-3, OGW06734; A-4, OIP92259; A-5, ODS42986; A-328 \n6, OIP30420; B -1, WP_026514536; B -2, WP_015485077; B -3,WP_012645460; B -4, 329 \nWP_011393470; C -1, WP_039226206; C -2, WP_013237576; C -3, WP_010870233; C -4, 330 \nWP_044921150; D -1, WP_011342982; D -2, WP_015926279; D -3, WP_079933214; D -4, 331 \nWP_096205957; E -1, WP_012571978; E -2, WP_010939375; E -3, WP_088535808; F -1, 332 \nWP_011343033; F -2, WP_011389181; G -1, OGP75751) and Techtmann et al.  (Techtmann et al., 333 \n2012) (mini CooS, WP_007288589.1). CODH from clade H (Inoue et al., 2022)  was omitted due to 334 \nlimited host information. Duplicates were removed using seqkit’s(Shen et al., 2016) rmdup. Sequences 335 \nof length below 400 amino acids (aa) were remov ed. Clustering was performed to further reduce data 336 \nsize, by using cd-hit(Li et al., 2002, 2001; Li and Godzik, 2006) and a global sequence identity of 99% 337 \nor 90%, the later only used for tree generation. It was necessary to have high sequence similarity in the 338 \nclustering within organisms, since it was known that some organisms have multiple CODH with 339 \nstriking sequence similarities in their genome such as Clostridium pasteurianum BC1(taxid: 86416) 340 \nthat contains WP_015614757.1 and WP_015615315.1 with 93.27% simila rity. For the dataset 341 \ninvolved in neighbor analysis , taxonomic information for each sequence was retrieved using R-342 \npackages taxize (Chamberlain et al., 2020; Chamberlain and Szocs, 2013)  and taxizedb (Chamberlain 343 \net al., 2025) , and only sequences that could be related to a recor ded organism  were kept  344 \n(Supplementary File 3_Table S3 and Supplementary File 4_Table S4) . Sequences were aligned using 345 \nE-INS-I from mafft (Katoh and Standley, 2013)  and sequences that had gaps in important positions 346 \nrelated to D, B or C cluster or acid base active site residues were sorted out. The alignment was 347 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\ntrimmed using trimAl’s  (Capella-Gutiérrez et al., 2009)  automated1 option and a tree was generated 348 \nusing FastTree  (Price et al., 2010) . Via visual inspection further sequences that were not CODH 349 \nsequences were removed. The final list of CODH sequences used in the neighbor and correlation 350 \nanalysis counted 1376. A sim ilar approach was done for HCP ( class I, Q01770.2; class II, 351 \nWP_000458809.1; class III, WP_013294878.1) and a final count of 1545 sequences was collected for 352 \nneighbor and correlation analysis.  Another set of CODH genes was curated using the 90% cd -hit cut-353 \noff. It was aligned using mafft’s FFT-NS-1, sequences that had gaps in important positions related to 354 \nD, B or C cluster or acid base active site residues were sorted out . The alignment was trimmed using 355 \ntrimAl. An initial tree was built using FastTree. Further sequences were removed after visual 356 \ninspection, which yielded a final dataset containing 5508 sequences. See Fig. S3 for detailed 357 \nflowchart. Custom code can be found and retrieved for github (Böhm, 2025a, 2025b, 2025c). 358 \nNeighbor analysis. Genome information was downloaded for the accession number lists generated for 359 \nCODH and HCP, which lead to the download of 955 and 1425 genomes, respectively. As neighboring 360 \ngene we defined a maximum of 15 genes upstream and downstream of the target gene , that had a 361 \nmaximum intergenic distance of 300 base pairs (bp), as was done previously by Inoue et al. (Inoue et 362 \nal., 2018). We decided to use this rather large intergenic distance to include as many neighbors as 363 \npossible, and we expect that unrelated  genes will disappear in the noise. For the same reason, we 364 \nincluded an overlap region of 50 bp for genes in the same o peron, which is rather high, as genes for 365 \nexample in E. coli usually overlap 4 to 1 bp (Johnson and Chisholm, 2004) . Aa sequences for those 366 \ngenes were retrieved from the NCBI database using entrez (Sayers, 2022) , and their fu nction was 367 \npredicted using eggnog (Huerta-Cepas et al., 2019) , the results from eggNOG as well as the product 368 \nprediction from NCBI were taken into account in the manual placing of selected functional groups. 369 \nThe data was plotted using R  (R Core Team, 2023) , tidyverse (Wickham et al., 2019) , 370 \npatchwork(Pedersen, 2025), ggnewscale(Elio Campitelli et al., 2025), ggtree (Yu et al., 2018, 2017) , 371 \nggtreeExtra(Xu et al., 2021), and treeio(Wang et al., 2020) and for gene maps gggenes (Wilkins, 2023) 372 \nwas used. Since CooJ determination was neither possible with the NCBI prediction nor via eggNOG, 373 \nwe selected operons from clade E and F that contained CooS and CooT, and manually extracted some 374 \naccession numbers of potential CooJs which were used to PSI -BLAST (BLOSUM45, E < 0.001) for 375 \nfurther accession numbers, summary can be found in S upplementary File 1_Table S1. These numbers 376 \nwere used to help annotate potential CooJs in our analysis, 68 potential CooJ genes could be 377 \nidentified. 378 \nCorrelation analysis . Correlation of CODH and HCP from different clades/classes was calculated 379 \naccording the formula 380 \n𝑃(𝑋|𝑌) =\n𝑁𝑋𝑌\n𝑁𝑌\n, 381 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nwhere N Y is the total number of assemblies containing protein from clade/class Y, N XY is the total 382 \nnumber of assemblies containing proteins from both clade/class X and Y, and P(X|Y) is the probability 383 \nthat a genome coding for a protein from clade/class Y also codes for a protein from clade/class X.  384 \nTree generation . In total, five trees were generated. Trees carrying phylogenetic information were 385 \ngenerated via iqtree2 (Minh et al., 2020)  with the LG+I+R10 model and ultrafast bootstrapping with 386 \n1000 resampling for a dataset of 5508 CODH sequences, a dataset o f 1351 CODH sequences, and a 387 \ndataset of 1476 HCP sequences (see above for details on their generation) . For the 5508 sequence 388 \nCODH dataset an outgroup was introduced to root the tree ( MBE6442607.1). Sequences were aligned 389 \nwithin their dataset using mafft’s FFT-NS-2. The alignment was again trimmed using trimAl  and built 390 \nusing iqtree2 with the above parameters.  For tree inspection and plotting ggtree (Yu et al., 2017)  was 391 \nused. The two other trees generated are taxonomic trees, either only on taxid using a custom python 392 \nscript and ete3 (Huerta-Cepas et al., 2016) , or from WoL: Reference Phylogeny for  Microbes (Zhu, 393 \n2023; Zhu et al., 2019). 394 \nAcknowledgements 395 \nThe Novo Nordisk Foundation (Grant reference number NNF21OC0066716) is gratefully 396 \nacknowledged for funding. 397 \nReferences 398 \nAdama PS, Borrela G, Gribaldoa S. 2018. Evolutionary history of carbon monoxide 399 \ndehydrogenase/acetyl-CoA synthase, one of the oldest enzymatic complexes. Proc Natl Acad 400 \nSci U S A 115:E5836–E5837. doi:10.1073/pnas.1716667115 401 \nBabbitt PC, Hasson MS, Wedekind JE, Palmer DRJ, Barrett WC, Reed GH, Rayment I, Ringe D, 402 \nKenyon GL, Gerlt JA. 1996. The enolase superfamily: A general strategy for enzyme-403 \ncatalyzed abstraction of the α -protons of carboxylic acids. Biochemistry 35:16489–16501. 404 \ndoi:10.1021/bi9616413 405 \nBao T, Cheng C, Xin X, Wang J, Wang M, Yang S-T. 2019. Deciphering mixotrophic Clostridium 406 \nformicoaceticum metabolism and energy conservation: Genomic analysis and experimental 407 \nstudies. Genomics 111:1687–1694. doi:10.1016/j.ygeno.2018.11.020 408 \nBasak Y, Lorent C, Jeoung J-H, Zebger I, Dobbek H. 2025. Metalloradical-driven enzymatic CO2 409 \nreduction by a dynamic Ni–Fe cluster. Nat Catal 1–10. doi:10.1038/s41929-025-01388-5 410 \nBenvenuti M, Meneghello M, Guendon C, Jacq-Bailly A, Jeoung JH, Dobbek H, Leger C, Fourmond 411 \nV, Dementin S. 2020. The two CO-dehydrogenases of Thermococcus sp. AM4. Biochim 412 \nBiophys Acta Bioenerg 1861:148188. doi:10.1016/j.bbabio.2020.148188 413 \nBiester A, Grahame DA, Drennan CL. 2024. Capturing a methanogenic carbon monoxide 414 \ndehydrogenase/acetyl-CoA synthase complex via cryogenic electron microscopy. Proceedings 415 \nof the National Academy of Sciences 121:e2410995121. doi:10.1073/pnas.2410995121 416 \nBöhm M. 2025a. protein-to-genome. https://doi.org/10.5281/zenodo.16736767 417 \nBöhm M. 2025b. protein-per-organism. https://doi.org/10.5281/zenodo.16736754 418 \nBöhm M. 2025c. protein-neighbours. https://doi.org/10.5281/zenodo.16736722 419 \nCan M, Armstrong FA, Ragsdale SW. 2014. Structure, function, and mechanism of the nickel 420 \nmetalloenzymes, CO dehydrogenase, and acetyl-CoA synthase. Chem Rev  114:4149–74. 421 \ndoi:10.1021/cr400461p 422 \nCapella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated alignment 423 \ntrimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. 424 \ndoi:10.1093/bioinformatics/btp348 425 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nChamberlain S, Arendsee Z, Stirling T. 2025. taxizedb: Tools for Working with “Taxonomic” 426 \nDatabases. doi:10.5281/zenodo.1158055 427 \nChamberlain S, Szocs E. 2013. taxize - taxonomic search and retrieval in R. F1000Research 2. 428 \nChamberlain S, Szoecs E, Foster Z, Arendsee Z, Boettiger C, Ram K, Bartomeus I, Baumgartner J, 429 \nO’Donnell J, Oksanen J, Tzovaras BG, Marchand P, Tran V, Salmon M, Li G, Grenié M. 430 \n2020. taxize: Taxonomic information from around the web (manual). 431 \nCOLLINS MD, LAWSON PA, WILLEMS A, CORDOBA JJ, FERNANDEZ-GARAYZABAL J, 432 \nGARCIA P, CAI J, HIPPE H, FARROW JAE. 1994. The Phylogeny of the Genus 433 \nClostridium: Proposal of Five New Genera and Eleven New Species Combinations. 434 \nInternational Journal of Systematic and Evolutionary Microbiology  44:812–826. 435 \ndoi:10.1099/00207713-44-4-812 436 \nDiekert GB, Thauer RK. 1978. Carbon Monoxide Oxidation by Clostridium thermoaceticum and 437 \nClostridium formicoaceticum. Journal of Bacteriology  136:597–606. 438 \ndoi:10.1128/jb.136.2.597-606.1978 439 \nDomnik L, Merrouch M, Goetzl S, Jeoung JH, Leger C, Dementin S, Fourmond V, Dobbek H. 2017. 440 \nCODH-IV: A High-Efficiency CO-Scavenging CO Dehydrogenase with Resistance to O2. 441 \nAngew Chem Int Ed Engl 56:15466–15469. doi:10.1002/anie.201709261 442 \nElio Campitelli, Teun van den Brand, olivroy. 2025. ggnewscale: Multiple Fill and Color Scales in 443 \nggplot2. doi:10.5281/ZENODO.2543762 444 \nFox JD, He Y, Shelver D, Roberts GP, Ludden PW. 1996. Characterization of the region encoding the 445 \nCO-induced hydrogenase of Rhodospirillum rubrum. Journal of Bacteriology 178:6200–6208. 446 \ndoi:10.1128/jb.178.21.6200-6208.1996 447 \nFrolov EN, Elcheninov AG, Gololobova AV, Toshchakov SV, Novikov AA, Lebedinsky AV, 448 \nKublanov IV. 2023. Obligate autotrophy at the thermodynamic limit of life in a new 449 \nacetogenic bacterium. Front Microbiol 14. doi:10.3389/fmicb.2023.1185739 450 \nFujishiro T, Takaoka K. 2023. Class III hybrid cluster protein homodimeric architecture shows 451 \nevolutionary relationship with Ni, Fe-carbon monoxide dehydrogenases. Nat Commun  452 \n14:5609. doi:10.1038/s41467-023-41289-4 453 \nGong W, Hao B, Wei Z, Ferguson DJ, Tallant T, Krzycki JA, Chan MK. 2008. Structure of the α2ε2 454 \nNi-dependent CO dehydrogenase component of the Methanosarcina barkeri acetyl-CoA 455 \ndecarbonylase/synthase complex. Proceedings of the National Academy of Sciences 456 \n105:9558–9563. doi:10.1073/pnas.0800415105 457 \nGtari M, Ventura S. 2025. Proposal of Neomoorella gen. nov. as a replacement name for the 458 \nillegitimate prokaryotic genus name Moorella Collins et al. 1994. International Journal of 459 \nSystematic and Evolutionary Microbiology 75:006779. doi:10.1099/ijsem.0.006779 460 \nHadj-Said J, Pandelia ME, Leger C, Fourmond V, Dementin S. 2015. The Carbon Monoxide 461 \nDehydrogenase from Desulfovibrio vulgaris. Biochim Biophys Acta  1847:1574–83. 462 \ndoi:10.1016/j.bbabio.2015.08.002 463 \nHagen WR. 2022. Structure and function of the hybrid cluster protein. Coordination Chemistry 464 \nReviews 457. doi:10.1016/j.ccr.2021.214405 465 \nHocking WP, Roalkvam I, Magnussen C, Stokke R, Steen IH. 2015. Assessment of the Carbon 466 \nMonoxide Metabolism of the Hyperthermophilic Sulfate-Reducing Archaeon Archaeoglobus 467 \nfulgidus VC-16 by Comparative Transcriptome Analyses. Archaea 2015:235384. 468 \ndoi:10.1155/2015/235384 469 \nHuerta-Cepas J, Serra F, Bork P. 2016. ETE 3: Reconstruction, Analysis, and Visualization of 470 \nPhylogenomic Data. Mol Biol Evol 33:1635–1638. doi:10.1093/molbev/msw046 471 \nHuerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, 472 \nLetunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, 473 \nfunctionally and phylogenetically annotated orthology resource based on 5090 organisms and 474 \n2502 viruses. Nucleic Acids Research 47:D309–D314. doi:10.1093/nar/gky1085 475 \nInoue M, Nakamoto I, Omae K, Oguro T, Ogata H, Yoshida T, Sako Y. 2018. Structural and 476 \nPhylogenetic Diversity of Anaerobic Carbon-Monoxide Dehydrogenases. Front Microbi ol 477 \n9:3353. doi:10.3389/fmicb.2018.03353 478 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nInoue M, Omae K, Nakamoto I, Kamikawa R, Yoshida T, Sako Y. 2022. Biome-specific distribution 479 \nof Ni-containing carbon monoxide dehydrogenases. Extremophiles 26:9. doi:10.1007/s00792-480 \n022-01259-y 481 \nInoue T, Takao K, Fukuyama Y, Yoshida T, Sako Y. 2014. Over-expression of carbon monoxide 482 \ndehydrogenase-I with an accessory protein co-expression: a key enzyme for carbon dioxide 483 \nreduction. Biosci Biotechnol Biochem 78:582–7. doi:10.1080/09168451.2014.890027 484 \nJain S, Katsyv A, Basen M, Muller V. 2021. The monofunctional CO dehydrogenase CooS is essential 485 \nfor growth of Thermoanaerobacter kivui on carbon monoxide. Extremophiles 26:4. 486 \ndoi:10.1007/s00792-021-01251-y 487 \nJeoung J-H, Dobbek H. 2007. Carbon Dioxide Activation at the Ni,Fe-Cluster of Anaerobic Carbon 488 \nMonoxide Dehydrogenase. Science 318:1461–1464. doi:10.1126/science.1148481 489 \nJeoung JH, Fesseler J, Domnik L, Klemke F, Sinnreich M, Teutloff C, Dobbek H. 2022. A Morphing 490 \n[4Fe-3S-nO]-Cluster within a Carbon Monoxide Dehydrogenase Scaffold. Angew Chem Int 491 \nEd Engl 61:e202117000. doi:10.1002/anie.202117000 492 \nJohnson ZI, Chisholm SW. 2004. Properties of overlapping genes are conserved across microbial 493 \ngenomes. Genome Res 14:2268–2272. doi:10.1101/gr.2433104 494 \nKatayama YA, Kamikawa R, Yoshida T. 2024. Phylogenetic diversity of putative nickel -containing 495 \ncarbon monoxide dehydrogenase -encoding prokaryotes in the human gut microbiome. 496 \nMicrobial Genomics 10. doi:10.1099/mgen.0.001285 497 \nKatoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: 498 \nImprovements in Performance and Usability. Mol Biol Evol  30:772–780. 499 \ndoi:10.1093/molbev/mst010 500 \nKerby RL, Ludden PW, Roberts GP. 1997. In vivo nickel insertion into the carbon monoxide 501 \ndehydrogenase of Rhodospirillum rubrum: molecular and physiological characterization of 502 \ncooCTJ. Journal of Bacteriology 179:2259–2266. doi:10.1128/jb.179.7.2259-2266.1997 503 \nKim M-S, Choi AR, Lee SH, Jung H-C, Bae SS, Yang T-J, Jeon JH, Lim JK, Youn H, Kim TW, Lee 504 \nHS, Kang SG. 2015. A Novel CO-Responsive Transcriptional Regulator and Enhanced H2 505 \nProduction by an Engineered Thermococcus onnurineus NA1 Strain. Applied and 506 \nEnvironmental Microbiology 81:1708–1714. doi:10.1128/AEM.03019-14 507 \nKnox HL, Allen KN. 2023. Expanding the viewpoint: Leveraging sequence information in 508 \nenzymology. Current Opinion in Chemical Biology  72:102246. 509 \ndoi:10.1016/j.cbpa.2022.102246 510 \nLi W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or 511 \nnucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158 512 \nLi W, Jaroszewski L, Godzik A. 2002. Tolerating some redundancy significantly speeds up clustering 513 \nof large protein databases. Bioinformatics 18:77–82. doi:10.1093/bioinformatics/18.1.77 514 \nLi W, Jaroszewski L, Godzik A. 2001. Clustering of highly homologous sequences to reduce the size 515 \nof large protein databases. Bioinformatics 17:282–283. doi:10.1093/bioinformatics/17.3.282 516 \nLiew F, Henstra AM, Winzer K, Köpke M, Simpson SD, Minton NP. 2016. Insights into CO2 517 \nFixation Pathway of Clostridium autoethanogenum by Targeted Mutagenesis. mBio 518 \n7:10.1128/mbio.00427-16. doi:10.1128/mbio.00427-16 519 \nLindahl PA, Chang B. 2001. THE EVOLUTION OF ACETYL-CoA SYNTHASE. Orig Life Evol 520 \nBiosph 31:403–434. 521 \nMatschiavelli N, Oelgeschläger E, Cocchiararo B, Finke J, Rother M. 2012. Function and Regulation 522 \nof Isoforms of Carbon Monoxide Dehydrogenase/Acetyl Coenzyme A Synthase in 523 \nMethanosarcina acetivorans. J Bacteriol 194:5377–5387. doi:10.1128/JB.00881-12 524 \nMatson EG, Gora KG, Leadbetter JR. 2011. Anaerobic Carbon Monoxide Dehydrogenase Diversity in 525 \nthe Homoacetogenic Hindgut Microbial Communities of Lower Termites and the Wood 526 \nRoach. PLOS ONE 6:e19316. doi:10.1371/journal.pone.0019316 527 \nMerrouch M, Benvenuti M, Lorenzi M, Leger C, Fourmond V, Dementin S. 2018. Maturation of the 528 \n[Ni-4Fe-4S] active site of carbon monoxide dehydrogenases. J Biol Inorg Chem  23:613–620. 529 \ndoi:10.1007/s00775-018-1541-0 530 \nMinh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. 531 \n2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the 532 \nGenomic Era. Molecular Biology and Evolution 37:1530–1534. doi:10.1093/molbev/msaa015 533 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nPedersen TL. 2025. patchwork: The Composer of Plots. 534 \nPrice MN, Dehal PS, Arkin AP. 2010. FastTree 2 – Approximately Maximum-Likelihood Trees for 535 \nLarge Alignments. PLOS ONE 5:e9490. doi:10.1371/journal.pone.0009490 536 \nR Core Team. 2023. R: A Language and Environment for Statistical Computing. 537 \nSayers E. 2022. A General Introduction to the E-utilitiesEntrez Programming Utilities Help [Internet]. 538 \nNational Center for Biotechnology Information (US). 539 \nShen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File 540 \nManipulation. PLOS ONE 11:e0163962. doi:10.1371/journal.pone.0163962 541 \nSmith KS, Ingram-Smith C. 2007. Methanosaeta, the forgotten methanogen? Trends in Microbiology  542 \n15:150–155. doi:10.1016/j.tim.2007.02.002 543 \nSoboh B, Linder D, Hedderich R. 2002. Purification and catalytic properties of a CO -oxidizing:H2-544 \nevolving enzyme complex from Carboxydothermus hydrogenoformans. Eur J Biochem  545 \n269:5712–21. doi:10.1046/j.1432-1033.2002.03282.x 546 \nTechtmann SM, Lebedinsky AV, Colman AS, Sokolova TG, Woyke T, Goodwin L, Robb FT. 2012. 547 \nEvidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases. Front 548 \nMicrobiol 3:132. doi:10.3389/fmicb.2012.00132 549 \nTerlesky KC, Ferry JG. 1988. Ferredoxin requirement for electron transport from the carbon monoxide 550 \ndehydrogenase complex to a membrane-bound hydrogenase in acetate-grown Methanosarcina 551 \nthermophila. Journal of Biological Chemistry  263:4075–4079. doi:10.1016/S0021-552 \n9258(18)68892-1 553 \nWaite DW, Chuvochina M, Pelikan C, Parks DH, Yilmaz P, Wagner M, Loy A, Naganuma T, Nakai 554 \nR, Whitman WB, Hahn MW, Kuever J, Hugenholtz P. 2020. Proposal to reclassify the 555 \nproteobacterial classes Deltaproteobacteria and Oligoflexia, and the phylum 556 \nThermodesulfobacteria into four phyla reflecting major functional capabilities. International 557 \nJournal of Systematic and Evolutionary Microbiology 70:5972–6016. 558 \ndoi:10.1099/ijsem.0.004213 559 \nWang L-G, Lam TT-Y, Xu S, Dai Z, Zhou L, Feng T, Guo P, Dunn CW, Jones BR, Bradley T, Zhu H, 560 \nGuan Y, Jiang Y, Yu G. 2020. Treeio: An R Package for Phylogenetic Tree Input and Output 561 \nwith Richly Annotated and Associated Data. Molecular Biology and Evolution  37:599–603. 562 \ndoi:10.1093/molbev/msz240 563 \nWeghoff MC, Müller V. 2016. CO Metabolism in the Thermophilic Acetogen Thermoanaerobacter 564 \nkivui. Applied and Environmental Microbiology 82:2312–2319. doi:10.1128/AEM.00122-16 565 \nWelte C, Deppenmeier U. 2011. Membrane-Bound Electron Transport in Methanosaeta thermophila. 566 \nJournal of Bacteriology 193:2868–2870. doi:10.1128/jb.00162-11 567 \nWickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry 568 \nL, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel 569 \nD, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. 2019. Welcome to the 570 \nTidyverse. JOSS 4:1686. doi:10.21105/joss.01686 571 \nWilkins D. 2023. gggenes: Draw Gene Arrow Maps in “ggplot2.” 572 \nWittenborn EC, Guendon C, Merrouch M, Benvenuti M, Fourmond V, Leger C, Drennan CL, 573 \nDementin S. 2020. The Solvent-Exposed Fe-S D-Cluster Contributes to Oxygen-Resistance in 574 \nDesulfovibrio vulgaris Ni-Fe Carbon Monoxide Dehydrogenase. ACS Catal  10:7328–7335. 575 \ndoi:10.1021/acscatal.0c00934 576 \nWu M, Ren Q, Durkin AS, Daugherty SC, Brinkac LM, Dodson RJ, Madupu R, Sullivan SA, Kolonay 577 \nJF, Haft DH, Nelson WC, Tallon LJ, Jones KM, Ulrich LE, Gonzalez JM, Zhulin IB, Robb 578 \nFT, Eisen JA. 2005. Life in hot carbon monoxide: the complete genome sequence of 579 \nCarboxydothermus hydrogenoformans Z-2901. PLoS Genet  1:e65. 580 \ndoi:10.1371/journal.pgen.0010065 581 \nXu S, Dai Z, Guo P, Fu X, Liu S, Zhou L, Tang W, Feng T, Chen M, Zhan L, Wu T, Hu E, Jiang Y, 582 \nBo X, Yu G. 2021. ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic 583 \nData. Molecular Biology and Evolution 38:4039–4042. doi:10.1093/molbev/msab166 584 \nYu G, Lam TT-Y, Zhu H, Guan Y. 2018. Two Methods for Mapping and Visualizing Associated Data 585 \non Phylogeny Using Ggtree. Molecular Biology and Evolution  35:3041–3043. 586 \ndoi:10.1093/molbev/msy194 587 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint \n\nYu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. 2017. ggtree: an r package for visualization and 588 \nannotation of phylogenetic trees with their covariates and other associated data. Methods in 589 \nEcology and Evolution 8:28–36. doi:10.1111/2041-210X.12628 590 \nZhou A, Chen YI, Zane GM, He Z, Hemme CL, Joachimiak MP, Baumohl JK, He Q, Fields MW, 591 \nArkin AP, Wall JD, Hazen TC, Zhou J. 2012. Functional Characterization of Crp/Fnr-Type 592 \nGlobal Transcriptional Regulators in Desulfovibrio vulgaris Hildenborough. Applied and 593 \nEnvironmental Microbiology 78:1168–1177. doi:10.1128/AEM.05666-11 594 \nZhu Q. 2023. WoL: Reference Phylogeny for Microbes. WoL. https://biocore.github.io/wol/ 595 \nZhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, 596 \nKopylova E, McDonald D, Kosciolek T, Yin JB, Huang S, Salam N, Jiao J -Y, Wu Z, Xu ZZ, 597 \nCantrell K, Yang Y, Sayyari E, Rabiee M, Morton JT, Podell S, Knights D, Li W -J, 598 \nHuttenhower C, Segata N, Smarr L, Mirarab S, Knight R. 2019. Phylogenomics of 10,575 599 \ngenomes reve als evolutionary proximity between domains Bacteria and Archaea. Nat 600 \nCommun 10:5477. doi:10.1038/s41467-019-13443-4 601 \n  602 \n 603 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 19, 2025. ; https://doi.org/10.1101/2025.09.14.676152doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}