{"paper_id":"194a2ac5-3c86-4b93-bcd9-c9b8fe147a67","body_text":"Abstract\nRapid growth of entire genome data has revolutionized the field of phylogenomics, i.e., the problem of tree of life. Substantial studies demonstrated that genome phylogeny can be inferred based upon the generalized gene content approach. Two simple types were widely-used: the first-order gene content (J=1) for the presence or absence of a gene family, and the second-order gene content (J=2) for the extended gene content (absence, single-copy, or duplicates). Moreover, a specific form of birth-death-input process was invoked to model the evolutionary process of a gene family, taking gene duplication, gene loss and new gene origin or horizontal gene transfer into account.\nGu X. Genome distance and phylogenetic inference accommodating gene duplication, loss and new gene input, Mol Phylogenet Evol 2023].\nThough genome distance methods have been successful for genome phylogeny inference, the maximum likelihood (ML) approach is subject to a huge computation burden. In this article, I formulate a finite-state ML approximation to solve this problem. For a given J-order gene contents, the evolution of a gene family along a phylogeny is modeled by a stochastic process with a finite (J+1) number of states. Consequently, the computational cost of a finite-state likelihood for a given phylogeny is comparable to a typical sequence-based likelihood function. Two analyses were carried out as a proof of concept, including a simulation study to examine the performance of phylogenetic inference, and a case study to evaluate to what extent the Fixed-State ML can be used to determine the root of the genome phylogeny. Overall, the Fixed-State ML may shed lights on the feasibility of phylogenetic likelihood analysis on the pattern of genome evolution.\nCompeting Interest Statement\nThe authors have declared no competing interest.","source_license":"CC-BY-4.0","license_restricted":false}