Finite-State Likelihood: an Approximation for Genome Phylogeny Inference based on Generalized Gene Contents

preprint OA: closed CC-BY-4.0
Full text 1,845 characters · extracted from oa-doi-fallback · click to expand
Abstract Rapid growth of entire genome data has revolutionized the field of phylogenomics, i.e., the problem of tree of life. Substantial studies demonstrated that genome phylogeny can be inferred based upon the generalized gene content approach. Two simple types were widely-used: the first-order gene content (J=1) for the presence or absence of a gene family, and the second-order gene content (J=2) for the extended gene content (absence, single-copy, or duplicates). Moreover, a specific form of birth-death-input process was invoked to model the evolutionary process of a gene family, taking gene duplication, gene loss and new gene origin or horizontal gene transfer into account. Gu X. Genome distance and phylogenetic inference accommodating gene duplication, loss and new gene input, Mol Phylogenet Evol 2023]. Though genome distance methods have been successful for genome phylogeny inference, the maximum likelihood (ML) approach is subject to a huge computation burden. In this article, I formulate a finite-state ML approximation to solve this problem. For a given J-order gene contents, the evolution of a gene family along a phylogeny is modeled by a stochastic process with a finite (J+1) number of states. Consequently, the computational cost of a finite-state likelihood for a given phylogeny is comparable to a typical sequence-based likelihood function. Two analyses were carried out as a proof of concept, including a simulation study to examine the performance of phylogenetic inference, and a case study to evaluate to what extent the Fixed-State ML can be used to determine the root of the genome phylogeny. Overall, the Fixed-State ML may shed lights on the feasibility of phylogenetic likelihood analysis on the pattern of genome evolution. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0