{"paper_id":"21e21469-a85c-450d-bd8a-0249b7a6d1e1","body_text":"This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.\nYou must log in to post a comment.\nThere are no comments or no comments have been made public for this article.\nThis is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.\nAdd a Comment\nYou must log in to post a comment.\nComments\nThere are no comments or no comments have been made public for this article.\nCarbonic anhydrases (CAs) attract interest for their critical roles in various physiological processes and potential application in CO2 sequestration to combat global warming. Despite being an important enzyme family, the classification and evolution of CAs remain elusive due to their high sequence diversity and long evolutionary history. In this paper, the in-silico strategy, Motif-weighted Alignment for Structure-based Protein Classification (MASPC) was developed, which uses OmegaFold simulated CA structures combined with weighted structural motif alignment, TM-weighted, to facilitate more precise polymorphic analysis of large enzyme datasets in a robust manner. The MASPC strategy was first validated by 74 ground-truth CA structures extracted from PDB, showing improved performance compared to sequence-based polymorphic analysis (ClustalO-RAxML). Subsequently, MASPC was applied to analyze a representative database, which contains 1603 CAs from 117 model organisms, with focus on α-, β-, and- γ- CA classes, to cover organisms from across life evolution history. The results indicated that α-, β-, and γ-CAs were well grouped in their own classes, with clearer clustering associated with the CA’s organism. The structural differences among the α-, β-, and γ-CAs revealed by MASPC supported the current understanding that CA classes are the results of convergent evolution. The sub-clusters in α- and β-CAs are highly associated with organisms according to their appearance in evolutionary history, demonstrating a close correlation between CA evolution and life evolution. Furthermore, the MASPC method was also applied to identify 27 potential α-CAs from the NCBI database with less than 40% sequence similarity to a template human carbonic anhydrase II (HCA-II) sequence, demonstrating possible applications in enzyme identification studies.\nhttps://doi.org/10.32942/X25S7R\nBioinformatics, Life Sciences\nProtein, alignment, evolution, Carbonic Anhydrase, carbon capture\nPublished: 2025-02-24 16:01\nLast Updated: 2025-02-24 16:01\nCC BY Attribution 4.0 International\nConflict of interest statement:\nNone\nData and Code Availability Statement:\nData and code is available at https://github.com/resplendentHSHI/TMweighted\nLanguage:\nEnglish","source_license":"CC-BY-4.0","license_restricted":false}