{"paper_id":"1be937aa-0ca8-4fd1-ae8a-2cd0282a405b","body_text":"Phylogenetic modelling of compositional\nand exchange rate changes over time\nPeter G. Foster (orcid)\nNatural History Museum, London SW7 5BD, UK\nAbstract\nChanges in the process of evolution occurs over time, including compositional tree heterogene-\nity (CTH) and exchange rate tree heterogeneity (ERTH). Models that can accommodate CTH\nand ERTH in molecular evolution are described. Fit of these models was compared using a\nlikelihood ratio test in maximum likelihood, and in Bayesian analysis using the conditional pre-\ndictive ordinate (CPO)-based log pseudomarginal likelihood (LPML), also leave-one-out cross-\nvalidation (LOO-CV). CTH and ERTH can be flexibly modelled in a Bayesian framework with\ntree-heterogeneous models that tune themselves to the amount of heterogeneity in the data being\nanalysed.\nSince phylogenetic analysis is usually done using tree-homogeneous models, effects of CTH and\nERTH on subsequent phylogenetic analysis using such models were described. Compositional\neffects due to CTH were seen as expected, for example where unrelated taxa with similar com-\npositions would group together in homogeneous analysis. Similar effects were also demonstrated\ndue to ERTH.\nDetection of CTH and ERTH by modelling is compared to detection using matched pairs\ntests (MPTs) that have been used to test molecular sequences for stationarity, reversibility,\nand homogeneity (SRH). Comparisons between modelling and MPTs on data simulated on very\nsimple trees showed that the two approaches were equivalent, but simulations on larger trees\nshowed that the two approaches differed greatly. Modelling showed greater power, especially\nin detection of ERTH, and some ERTH was completely invisible to MPTs but was decisively\ndetected by modelling.\nDetection and modelling of CTH and ERTH is shown in two empirical examples.\nKeywords\nphylogenetic models; time-heterogeneity; maximum likelihood; Bayesian analysis; CPO; LPML\n1\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nIntroduction\nSuccess of model-based molecular phylogenetic inference depends on whether the models that\nare used are adequate. While the simplest phylogenetic models had assumptions that were\nnot biologically realistic, phylogenetic models have been improved in various ways, relaxing\nassumptions to make the models more realistic and better fit the molecular data. This study\nlooks at relaxing the assumption of having an unchanging evolutionary process over time.\nA phylogenetic model such as the GTR model has free composition and exchange rate param-\neters (Tavaré, 1986). These parameters are generally re-optimized for every new phylogenetic\nanalysis. This is necessary in part because these parameters differ in different evolutionary\ngroups, because there are differences in evolutionary process over time. In a similar way there\nare group-specific empirical amino acid models (Abascal et al., 2007; Adachi & Hasegawa, 1996;\nLe et al., 2017; Rota-Stabelli et al., 2009; Yang et al., 1998). Again that is because the process\nof evolution changes over the tree of life. Like the parameters of the GTR model, these empirical\namino acid models are composed of a composition component and an exchange rate component;\nboth of these differ between models, and imply that there is heterogeneity in both.\nThis study describes methods used to detect and models used to accommodate these tree-\nheterogeneous processes. Detecting by modelling is common in phylogenetics, where molecular\nsequences are evaluated with and without the model component of interest, and looking for\na better model fit when that component is included. That is done here in both a maximum\nlikelihood and Bayesian framework, and models are compared by looking at how well the models\nfit the data. There are models that accommodate compositional changes over time (Blanquart &\nLartillot, 2006, 2008; Foster, 2004; Galtier et al., 1999; Galtier & Gouy, 1998 ; Yang & Roberts,\n1995). Models that accommodate exchange rate changes over time have not been used as much\n(Foster et al., 2009), and this will be looked at more closely here.\nMethods\nThe evolution of molecular sequences, DNA and protein, can be described by a continuous-time\nMarkov process. The simplest models, such as the Jukes-Cantor model, can be described with\nclosed-form equations, while more complex models, such as the GTR model, can be described\nby a rate matrix ( Jukes & Cantor, 1969; Tavaré, 1986). As described by Swofford et al. 1996,\nthat rate matrix Q can be decomposed into a composition vector π and rate matrix R. This\nparameterization is commonly used ( Minh et al., 2020; Ronquist et al., 2012; Swofford, 2002),\nand is used here (Figure 1).\nAlignments of molecular sequences will often have taxa that have different character state\ncompositions; this is further evidence that the process of evolution changes over time. Here this\nwill be referred to as compositional tree heterogeneity, or CTH (Table 1). This study will also\nlook at changes in exchange rates over time, which will be called exchange rate tree heterogeneity\nor ERTH. CTH and ERTH can be modelled, and here focus will be on NDCH, NDRH, NDCH2,\nand NDRH2 ( Foster, 2004, 2025; Foster et al., 2009). Changes in the evolutionary process over\ntime can occur anywhere on the tree, gradually or suddenly. This is approximated in the tree-\nheterogeneous models described here by allowing changes to the model parameters only at nodes,\nand so “node discrete” in the ND– models described. Between nodes the process is modelled as\nhomogeneous.\nIn NDCH and NDRH the models would usually not be fully parameterized over the tree, where\neach branch or node has a separate set of parameters. Rather, to avoid overparameterization,\ncomposition vectors and exchange rate matrices would be shared such that they can be assigned to\nmore than one part of the tree. This is a partially relaxed version of the more usual homogeneous\n2\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\ncomposition exchange rates ( R)\nQ rate matrix\nbranch\nlengths\nprobability matrix P\ndata tree\nlikelihood\nFigure 1: Parameterization context. It is possible for the composition and the R exchange rates\nto differ over the tree. The R-parameters are often described as a matrix, but since the elements\non the diagonal are not applicable, and the remaining elements are usually symmetrical, the\nR-parameters are often described as a short vector. A transition probability matrix P can be\nmade from Q using the branch length ν by P = eQν. Together with the sequence data and the\npruning algorithm on a tree, a likelihood can then be calculated.\ncase where the single composition vector and rate matrix are shared by all the nodes over the\ntree. There are many possible ways that a small number of composition vectors and exchange\nrate matrices can be assigned on the nodes of a large tree, and as implemented in maximum\nlikelihood that pattern is fixed for each analysis. It would be desirable to allow reassignment\nand optimization of how the composition vectors and exchange rate matrices are placed on the\ntree, and the way this problem was addressed in a Bayesian framework was to have MCMC\nproposals to allow reassignment of different composition vectors or rate matrices to different\nbranches. A problem with both NDCH and NDRH, however, is that the number of composition\nvectors and rate matrices is fixed at the beginning of an analysis, and usually one would not\nknow how many are needed to adequately model the heterogeneity until running the analysis. It\nis possible to fully parameterize the tree, with different composition vectors assigned to all the\nnodes of the tree or have different rate matrices assigned to all branches of the tree. When fully\nparameterized this way the model should be able to accommodate the maximal heterogeneity\nallowed by this strategy, although it comes with the possibility of over-parameterization. The\nproblem of over-parameterization is addressed with the NDCH2 and NDRH2 implementations\nof these tree heterogeneous models, which, in a Bayesian MCMC, use a prior probability on\nthe values of the compositions and exchange rates to decrease the tendency to overfit. In these\nmodels the values of the composition vectors or rate matrices are constrained with a Dirichlet\nprior that tends to keep the proposals more or less close to a sampled reference value. The\nprior has a hyperparameter that controls the strength of the constraint on the proposals. That\nhyperparameter would generally be free and sampled, and in this way the model can tune itself to\nthe amount of heterogeneity in the data. In the current implementation these priors are separate\nbetween the leaf nodes and the internal nodes.\nIn a maximum likelihood framework the likelihood ratio test is used to compare models, using\nthe χ2 approximation to assess significance. A measure of model fit in a Bayesian context based\non the conditional predictive ordinate (CPO; cross-predictive ordinate in Lartillot 2023) was\ndescribed by Lewis et al. 2014, and is used here for model comparison. The CPO is a site-specific\nmeasure of model fit that can be combined over sites to make the log pseudomarginal likelihood\n3\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\n(LPML), a measure of overall model fit. This method was further described and recommended\nin a commentary by Lartillot 2023. It is described there as the CPO approach to leave-one-out\ncross-validation (LOO-CV), and is especially attractive because it is computationally efficient.\nModels were implemented in the software p4 ( Foster, 2025 ). DNA models were used in\nthis study, but the software also implements the models for protein and arbitrary (“standard”)\ndatatype.\nMethods for matched pairs tests for symmetry, MPTs, were described in Ababneh et al. 2006\nThe test for symmetry (MPTS, Bowker’s test) can be decomposed to two more specific tests —\nthe matched pairs test for marginal symmetry (MPTMS) and the matched pairs test for internal\nsymmetry (MPTIS). In their usual form these tests are pairwise, testing one pair of sequences,\nwhich is how it was done in this study. The methods were re-implemented in p4.\nTree of Life rRNA data was obtained from C.J.Cox ( Cox et al., 2008). Phylogenetic analyses\nof these were done with the data and model partitioned into SSU and LSU partitions. The\namong-partition rates were fixed ML estimates to facilitate faster convergence.\nThe alignment from the study of Asian geckos was obtained from Dryad, and the ND2 partition,\n1011 sites, was extracted (Brown et al., 2012b, 2012a). The single blank sequence was removed,\nleaving 40 sequences in the alignment. The remaining sequences ranged from 837 to 1011 nt,\nwith 489 alignment gaps. The Bayesian analysis tree was provided with the data, which was\nre-rooted on a bifurcating root on the outgroup Tarentola.mauritanic for analysis here. The\ntree had two unresolved internal splits. Bayesisan MCMC analysis was done using the fixed\ntree, using models GTR, NDCH2, NDRH2, and NDCH2+NDRH2, all with gamma-distributed\namong-site rate variation (Yang, 1994).\n4\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nTable 1: Abbreviations\nCTH Compositional tree heterogeneity\nERTH Exchange rate tree heterogeneity\nNDCH Node discrete compositional heterogeneity model, to model CTH. It has\na fixed number of composition vectors over the tree, generally fewer than\nthe number of nodes in the tree; the composition vectors would then be\nshared over a subset of nodes. As implemented, placement of composi-\ntion vectors on the tree is fixed in simulations and in ML, but the place-\nment can change in the Bayesian MCMC implementation. Notation such\nas NDCH(4) would mean having four composition vectors. NDCH-fully\nmeans fully parameterized, with a composition vector on all nodes.\nNDRH Similar to NDCH, to model ERTH\nNDXH NDCH or NDRH\nNDCH2 Fully parameterized NDCH, to model CTH, with a prior on the distance\nfrom composition vectors to a sampled reference value.\nNDRH2 Similar to NDCH2, to model ERTH\nNDXH2 NDCH2 or NDRH2\nLRT likelihood ratio test\nLPML, LOO-CV Log pseudomarginal likelihood, a measure of model fit used in Bayesian\nanalysis, based on the conditional predictive ordinate (CPO) (Lewis et\nal., 2014). Also the CPO (cross-predictive ordinate) implementation of\nleave-one-out cross validation (LOO-CV) (Lartillot, 2023).\nSRH An evolutionary model or process that is stationary, reversible, homoge-\nneous. Tree-heterogeneous model components might be locally SRH but\nnot SRH tree-wide.\nMPT Matched pairs test\nMPTS Matched pairs test for symmetry. Bowker’s test\nMPTMS Matched pairs test for marginal symmetry. Stuart’s test\nMPTIS Matched pairs test for internal symmetry. Ababneh’s test\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nResults\nComparing models using LRT and LPML\nWe assess whether an attribute of evolution is a worthwhile component in a model by comparing\nmodels with and without that component. In maximum likelihood we can compare using the\nLRT, the likelihood ratio test. For simple tree-heterogeneous models, such as the NDCH(2)\nmodel with two composition vectors instead of the single composition vector in the GTR model,\nthe null distribution is χ2-distributed (Supplementary Figure S1).\nFor larger phylogenetic problems with more complex models it can be better to analyse in a\nBayesian framework using an MCMC, and then compare models by comparing LPML values.\nUsing LPML was introduced in a phylogenetic context in Lewis et al. 2014 but has not been\ncommonly used (but see Lartillot 2023). To use it for comparing models to decide whether an\nincrease in LPML is meaningful, we need to know how it behaves for null comparisons such\nas comparing a well-fitting model with an over-parameterized one. Several such comparisons\nwere made using simulations on four-taxon trees (Figure 2) measuring LPML differences (Figure\n3). There it can be seen that for all conditions tested, for most replicates the LPML for the\noverparameterized MCMC was less than that for the simulation model, showing a penalty for\noverparameterization. To see if this was also the case with larger trees, some tests were done with\n40-taxon trees, under similar conditions as above with GTR simulations and analysis under the\nsimulation model versus overparameterized models (Supplementary Figure S2). The differences\nwere in the range -8 to 4, similar to that shown in the 4-taxon tree results in Figure 3. Although\nthis survey has been limited in scope and may well depend on the context, we can tentatively\nsuggest that an LPML difference of more than about 10 will be considered meaningful.\nA\nB\nC\nD\nFigure 2: Four-taxon simulation tree. For heterogeneous simulations new model components\nwere placed on the “D” branch.\nEnough parameters without overparameterization\nThe demonstration here shows the effect of the prior in the NDXH2 (NDCH2 or NDRH2) models,\ncompared to similar models, NDXH-Fully, without the prior. Both the NDXH-Fully and the\nNDXH2 models are fully parameterized over the tree, meaning that the tree is fully populated\nwith separate composition vectors or exchange rate matrices. Data sets were simulated with small\nor large amounts of CTH or ERTH and then analysed for model fit using these models (Figure 4).\nWith only a small amount of heterogeneity, either CTH or ERTH, fully parameterized models\nwould be overparameterized, and benefit from the constraint provided by the NDXH2 strategy\ncompared to the NDXH-Fully model (panels A and C). With a large amount of heterogeneity\nthe fit of the NDXH-Fully model is not much worse than the NDXH2 model (panels B and\nD). Sampled hyperparameters show that the NDXH2 model adjusts itself to the amount of\nheterogeneity in the data (Figure 5). Larger hyperperameter values exert a stronger constraint\non the parameters.\n6\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nΔ LPML\n-10 -8 -6 -4 -2 0 2 4\n0\n5\n10\n15\n20\n25\nsimulation overparameterized\nmodel model\nGTR NDCH2\nGTR NDRH2\nGTR NDCH2+NDRH2\nNDCH NDCH2+NDRH2\nNDRH NDCH2+NDRH2\nFigure 3: Change in LPML due to overparameterization using data from four-taxon simulations.\nMCMC analyses were done with the simulation model and with the overparameterized model.\nThe LPML was calculated for both, and the difference plotted (overparameterized - simulation).\nEach condition used 100 replicates. Most differences showed an LPML penalty for overparame-\nterization.\n-202100\n-202050\n-202000\n-201950\nA. Simulation:\nsmall CTH\nAnalysis model\nGTR NDCH2 NDCH-\nfully\n-202200\n-202000\n-201800\n-201600\nB. Simulation:\nlarge CTH\n-201300\n-201200\n-201100C. Simulation:\nsmall ERTH\nAnalysis model\nGTR NDRH2 NDRH-\nfully\n-201600\n-201400\n-201200\nD. Simulation:\nlarge ERTH\nFigure 4: Comparing model fit between NDXH2 and NDXH-Fully. Model fit was measured by\nLPML. Datasets on an arbitrary 16-taxon tree were simulated containing either a small or a\nlarge amount of either CTH or ERTH, and then analysed for model fit on the fixed tree.\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\n0\n200\n400\n600\n800\nCTH\nsmall\nCTH\nlarge\n0\n50\n100\n150\n200\nERTH\nsmall\nERTH\nlarge\nFigure 5: Hyperparameters controlling the constraint in the NDXH2 analyses in Figure 4. Sam-\nples of hyperparameters for leaf nodes taken during the MCMC are shown.\nCross-estimation between CTH and ERTH\nThe NDCH model was formulated to accommodate CTH only, and the NDRH model was formu-\nlated to accommodate ERTH, only. Ideally NDCH would not detect ERTH, nor would NDRH\ndetect CTH; however such cross-estimation is evident. To show this a 4 × 4 matrix of ML anal-\nyses was done, using four simulation conditions (with neither CTH nor ERTH, with CTH, with\nERTH, and with both) and the corresponding four analysis models. Simulations were done on\nthe four-taxon tree shown in Figure 2. Likelihood ratios were calculated between the GTR anal-\nysis and the other three models. Means are shown (Table 2). Likelihood ratios from simulations\nusing the GTR model (row A) have a mean of only a few log units, as expected only showing a\nslight increase due to overparameterization. Simulations including CTH (row B) show a mean\nlikelihood ratio of 1319 log units when using the NDCH model; such a large value is expected\nbecause the data were simulated with CTH. However, those simulations also show a mean likeli-\nhood ratio of 96 log units when using the NDRH model, which is unexpected because the NDRH\nmodel does not model the CTH in those simulations. Similarly simulations incorporating ERTH\n(row C) have an expected large mean likelihood ratio of 2742 when using the NDRH model,\nbut also an unexpected mean likelihood ratio of 279 log units when using the NDCH. We can\nhowever notice that in the simulations with CTH (row B) the mean likelihood ratio using the\nNDCH+NDRH model is only 2.5 log units larger than the likelihood ratio from NDCH alone,\nand does not reflect the addition of the erroneous NDRH contribution. Similarly in the ERTH\n(row C) simulations, the NDCH+NDRH mean likelihood ratio is only 1.6 log units more than\nthe likelihood ratio from NDRH alone. This observation about the combined NDCH+NDRH\nmodel allows us to infer the relative contributions of the NDCH and NDRH when there is cross-\nestimation as seen here. In the results with simulations with both CTH and ERTH (row D)\nthe mean likelihood ratio using NDCH+NDRH is much greater than either NDCH or NDRH\nseparately, and allows us to infer that these data have both CTH and ERTH.\nTable 2: Cross-estimation of CTH and ERTH using ML. Four simulations on 4-taxon trees, 2000\nreplicates each, were evaluated with GTR, NDCH, NDRH, and NDCH + NDRH. Likelihood\nratios between the heterogeneous model and GTR were evaluated; means are shown.\nSimulation NDCH NDRH NDCH+NDRH\nA: no CTH, no ERTH 1.54 2.47 4.03\nB: with CTH, no ERTH 1319.36 95.99 1321.87\nC: no CTH, with ERTH 278.70 2741.62 2743.17\nD: with CTH, with ERTH 973.77 2425.77 3988.15\n8\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nA parallel set of analyses was done in a Bayesian framework using LPML to measure fit of\nthe model (Figure 6). Panel A shows LPML results from analysis of GTR simulations. The\nLPML for the GTR analysis was the highest. The LPML values for the other three models were,\nrelative to the GTR value, NDCH2: -2.3, NDRH2: -6.0, NDCH2+NDRH2: -9.0 (Supplementary\nTable S1), which are expected due to overparameterization. Panel B sequences were simulated\nwith CTH, and has a high LPML when modelled by NDCH2. However, there is also an increase\nin LPML of the NDRH2 model over the GTR model, erroneously showing ERTH. Reciprocally\nCTH was erroneously detected in panel C by NDCH2 in sequences containing ERTH but no\nCTH. However, we can notice that in such cases, modelling with both NDCH2 and NDRH2\ntogether appears to behave usefully. In Figure 6 panel B, where some ERTH is erroneously\ndetected, when analyzed with both NDCH2 and NDRH2 together the LPML is only about 6 log\nunits greater than the LPML value for NDCH2 alone. Similarly in Figure 6 panel C, analysis\nwith both NDCH2 and NDRH2 together results in an LPML value that is only about 4 log\nunits greater than the LPML value for NDRH2 alone. This suggests that when there is cross-\nestimation, by modelling CTH and ERTH separately and together we can estimate both CTH\nand ERTH, and their relative contributions. Figure 6 D shows an analysis of simulation data\nthat have both CTH and ERTH, where analysis with NDCH2 and NDRH2 together has a model\nfit greater than either separately, and allows inference of both CTH and ERTH.\nComposition and exchange rate effects in simulations analysed with\ntree-homogeneous models\nUsually molecular phylogenetic analysis is done with tree-homogeneous models, and so we would\nlike to know how tree-heterogeneous sequences behave when analysed with tree-homogeneous\nmodels. To demonstrate, simulated DNA datasets were generated on a tree with CTH or ERTH\nin various patterns over the simulation tree (Figure 7, left column). Results in the middle column\nof that Figure show tree-homogeneous analysis, showing topological distortions.\nIt may be that the most common and expected effect occurs when CTH or ERTH differences\nfollow, from the root to the leaves, the same evolutionary path as the tree (Figure 7 rows B,\nF). In this case we have attraction of sister taxa, which tends to raise support for grouping the\nsisters in homogeneous analysis. Of the effects shown, this one might be considered the most\nbenign because it increases support for groupings that should be together, but it still must be\nconsidered a distortion because it increases support undeservedly. Likely the most well-known\neffect shown is compositional attraction, where taxa that are remote on the tree but with similar\nsequence compositions are attracted in analysis (Figure 7 row C). Such an attraction can also\noccur due to ERTH, without any CTH (Figure 7 row G).\nWhen sister taxa differ in composition or exchange rates, they can repel each other (Figure 7\nrows D, H). Such repulsion can be considered as non-specific attraction to some other part of the\ntree. The complex scenario described in Figure 7 rows E and I is meant to describe an interplay\nof attractions and repulsions. While it would be easy to predict the distortions from the follow\nor attract scenarios on homogeneous analysis, it would be hard to predict the analysis topology\nfrom a repel or complex pattern.\nThe simulated alignments shown were also analysed with an NDCH2+NDRH2 model, which\nrecovered the simulation topology in all cases (right column in Figure 7) together with a better\nmodel fit (Supplementary Table S2). For those analyses involving CTH, posterior predictive\nsimulations show that the model measured with the heterogeneous analysis fit the composition\nwhile the homogeneous analysis did not (Supplementary Table S3).\nLooking more closely at Figure 7 row B, we can see that the model fit is much better in\nthe tree-heterogeneous analysis compared to the tree-homogeneous analysis (LPML increased by\n9\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\n-781850\n-781800\n-781750\n-781700\n-781650\n-781600\nA\nSimulation:\nNo CTH\nNo ERTH\n-860000\n-855000\n-850000B\nSimulation:\nWith CTH\nNo ERTH\n-803000\n-802000\n-801000\n-800000\nC\nSimulation:\nNo CTH\nWith ERTH\nAnalysis model\nGTR NDCH2 NDRH2 NDCH2+\nNDRH2\n-870000\n-865000\n-860000\n-855000\n-850000\n-845000\nD\nSimulation:\nWith CTH\nWith ERTH\nFigure 6: Cross-estimation of CTH and ERTH. Datasets were simulated with and without both\nCTH and ERTH, and then analysed in an MCMC with and without NDCH2 and NDRH2.\nModel fit was measured with LPML. Simulations were made on a fixed, arbitrary 7-taxon tree.\nFor simulations incorporating CTH or ERTH, three different fixed composition vectors or three\ndifferent exchange rate parameter sets were placed on the simulation tree semi-randomly such\nthat each model component was present in at least two locations on the tree.\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\n1094.8, Supplementary Table S2, Row B). This is an example of an analysis that had decreased\nsupport for topological resolution (from 100% to 70%), yet was a better fit.\nCross-estimation is not seen with matched pairs tests as it is with models\nMatched pairs tests (MPTs) have been described and advocated as a way to test SRH — station-\narity, reversibility, and homogeneity — in molecular sequences (Ababneh et al., 2006). The MPTs\nare a family of three related tests. The most general MPT is Bowker’s test for symmetry, but\nit can be decomposed into two more specific tests — the test for marginal symmetry (MPTMS)\nthat tests for stationarity, and the test for internal symmetry (MPTIS). Here simulated align-\nments are tested using the MPTMS and MPTIS, and compared to testing using modelling and\nthe LRT as done above. These strategies, the LRT and MPTs, have similar goals, and although\nthe methods differ greatly we would expect them to agree.\nMPTs are performed on aligned sequences; they do not need trees, models, or a phylogenetic\nanalysis. This means that they can be used before phylogenetic analysis — an advantage over\nmodel comparison to identify CTH and ERTH. This has been incorporated into a suggested\nphylogenetic protocol that involves screening for alignments or sequences that do not meet the\nassumptions of the proposed analysis methods ( Jermiin et al., 2020). However, in their usual\nform the MPTs are performed on pairs of sequences in an alignment, while what we generally\nwant is an assessment on an entire alignment. Using MPTs on many pairs in an alignment\npotentially leads to problems with multiple comparisons (Ababneh et al., 2006 ). Here the MPTs\nare applied to simulated data where heterogeneity of the pairs is known in advance, and only a\nsingle pair of sequences is measured with the MPTs, thereby avoiding multiple comparison.\nWe first ask whether the MPTs detect CTH and ERTH, and whether they suffer from the\nproblem of cross-estimation of CTH and ERTH as described above for model-based comparison.\nTo do this the MPTMS and MPTIS were applied to the simulations used in Table 2. Mean\nP-values for those tests are shown in Table 3, and are either zero or about 0.5. P-values close\nto 0.50 are because they form a uniform distribution, with no significance except for the Type-\nI error rate. Mean P-values of 0.0 reflect significance — the MPTMS is detecting CTH and\nthe MPTIS is detecting ERTH. If there was cross-estimation then then we would expect to see\nlowering of the mean P-value from the MPTIS of the B-series simulations, and we would expect\nto see lowering of the mean P-value from the MPTMS of the C-series simulations. However,\nthere is no evidence in Table 3 for cross-estimation as seen above with with models (Table 2,\nFigure 6).\nTable 3: Datasets simulated under conditions of CTH and ERTH were tested with MPTs. Mean\nP-values over 2000 replicates are shown.\nSimulations P MPTMS P MPTIS\nA: no CTH, no ERTH 0.49 0.50\nB: with CTH, no ERTH 0.00 0.52\nC: no CTH, with ERTH 0.50 0.00\nD: with CTH, with ERTH 0.00 0.00\nPower of MPTs compared to LRT\nStatistical power of the MPTMS was first compared to the LRT using simulations on a one-\nbranch, two-taxon tree. To challenge the sensitivity of these tests, simulations were made with a\nsmall amount of CTH, such that the mean P-value over 2000 replicates was 0.29 for both tests,\n11\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nA. homogeneous\nCTH\nSimulation Tree-homogeneous\nanalysis\nTree-heterogeneous\nanalysis\nA\nB\nC\nD\nA\nB\nC\nD\n100/96\nA\nB\nC\nD\n100\nB. follow\nA\nB\nC\nD\nA\nB\nC\nD\n100/100\nA\nB\nC\nD\n70\nC. attract\nA\nB\nC\nD\nA\nC\nB\nD\n100/100\nA\nB\nC\nD\n100\nD. repel\nA\nB\nC\nD\nA\nD\nB\nC\n94/71\nA\nB\nC\nD\n100\nE. complex\nERTH\nA\nB\nC\nD\nA\nC\nB\nD\n56/38\nA\nB\nC\nD\n100\nF. follow\nA\nB\nC\nD\nA\nB\nC\nD\n98/92\nA\nB\nC\nD\n82\nG. attract\nA\nB\nC\nD\nA\nC\nB\nD\n100/100\nA\nB\nC\nD\n100\nH. repel\nA\nB\nC\nD\nA\nD\nB\nC\n96/65\nA\nB\nC\nD\n100\nI. complex\nA\nB\nC\nD\nA\nC\nB\nD\n52/37\nA\nB\nC\nD\n45\nFigure 7: Effects of composition and exchange rate differences over the tree on subsequent analysis\nof simulated DNA datasets. Analysis used a tree-homogeneous model (middle column), or a tree-\nheterogeneous model (right column). Coloured branches on the simulation trees show the CTH\n(Rows B – E) or ERTH (Rows F – I) that was superimposed on the background Jukes-Cantor\nevolution shown in black. Resulting datasets were then analysed with a GTR model using\neither an MCMC in p4 or maximum likelihood and bootstrap with IQTree with results shown\nin the homogeneous analysis column (support shown as posterior probability percent/bootstrap\npercent). The datasets were also analysed using tree heterogeneous models with results as shown\nin the column on the right; the model was NDCH2+NDRH2 in all cases.\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nand then the two tests were compared using PP-plots (Figure 8, Panel A). The P-values of the\ntwo tests were almost identical, differing by less than 10−5 each on average, showing that the\ntwo tests had the same sensitivity to detect CTH under these conditions. Panel B uses the same\ndata points as in panel A, but shows the P-value of each MPTMS plotted against the P-value of\nthe LRT for the same simulation, again showing complete agreement.\nThen the power of the MPTs were compared to that of the LRT using simulations on four-\ntaxon trees as in Figure 2, with results shown in Figure 8 panels C–F. To test the power to detect\nCTH, simulations were made with a small amount of CTH placed on leaf D in the simulation\ntree. Then the MPTMS was compared to the LRT using GTR versus NDCH(2) in a PP-plot,\nshowing that the LRT was more sensitive (Figure 8 Panel C). When the comparison was made\ndataset-by-dataset it is seen that the two tests greatly disagreed (Panel D). Although comparing\nthe tests this way from simulations on a one-branch tree showed compete agreement (panel B),\nthis was not so when done with simulations on a four-taxon tree (panel D). Panel E simulations\nsimilarly contained a small amount of ERTH on the branch leading to taxon D in the simulation\ntree, and compares the MPTIS with an LRT of GTR versus NDRH(2) using a PP-plot, showing\nthat the LRT was in general much more sensitive than the MPTIS here. Again the point-by-point\nindividual analyses greatly disagreed with each other (Panel F).\nSome ERTH is invisible to MPTIS\nIn comparing the MPTs with the LRT it became evident that some kinds of simulated data that\ncontain significant amounts of ERTH, as simulated by the NDRH model and as assessed with\nthe LRT, are such that the ERTH is not visible to the MPTIS (Figures 9 and 10). In the first\nexample, one pair of R-matrices shows ERTH with both the LRT and the MPTIS, but a similar\npair shows ERTH with the LRT but not with the MPTIS.\nExample using Tree of Life rRNA\nThis example uses an rRNA dataset composed of concatenated SSU and LSU genes spanning\nthe Tree of Life ( Cox et al., 2008). This study revived the eocyte hypothesis for the origin of\neukaryotes (Rivera & Lake, 1992). It was analysed in Cox et al. 2008 with the GTR and the\nNDCH model, and it is re-analyzed here.\nThis dataset was analysed in Cox et al. 2008 with the GTR model to show a “three-domains”\ntree of life with 73% posterior probability (as in Figure 11 A, which has that tree with 68%\nsupport). They then used the NDCH(2) model, which showed 75% support for the grouping\neocytes with eukaryotes. The repeat here using the NDCH2 model showed 96% support for\nthat split (Figure 11 B). Figure 12 shows a higher LPML for NDCH2+NDRH2 than for either\nNDCH2 or NDRH2 alone, meaning that the data have both CTH and ERTH (The difference\nbetween the LPML for NDCH2 and NDCH2+NDRH2 is 63.1 log units; Supplementary Table S4).\nUsing NDRH2 alone shows 60% support for eocytes plus eukaryotes, and using NDCH2+NDRH2\nincreases support for that grouping to 99%, higher than either NDCH2 or NDRH2 separately.\nIt appears that both CTH and ERTH contribute to the GTR model resulting in the topology\nshown in Figure 11 A. As measured by posterior predictive simulation using the X 2 test quantity,\nmodel fit of the composition of the models used for the rRNA analyses showed that the models\nusing NDCH2 fit the data, while the GTR and the NDRH2-only models did not (Supplementary\nTable S5).\n13\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nuniform\n0.0 1.0\nP from tests\n0.0\n1.0\nA\nP LRT\n0.0 1.0\nP MPTMS\n0.0\n1.0\nB\nuniform\n0.0 1.0\nP from tests\n0.0\n1.0\nC\nP LRT\n0.0 1.0\nP MPTMS\n0.0\n1.0\nD\nuniform\n0.0 1.0\nP from tests\n0.0\n1.0\nE\nP LRT\n0.0 1.0\nP MPTIS\n0.0\n1.0\nF\nFigure 8: Comparing the statistical power of the MPTs with the LRT. Two thousand replicate\nalignments, each of length 100000, were made for each condition. Panel A shows a PP-plot from\nsimulations on a one-branch, two-taxon tree with a small amount of CTH, and shows the LRT\nP-values in black and the MPTMS P-values in red, both plotted against a uniform distribution.\nPanel B uses the same data points as in panel A, but plotted simulation-by-simulation. Panels\nC–F are from simulations on a four-taxon tree, with a small amount of CTH in panels C and D,\nand a small amount of ERTH in panels E and F. Panel C is a PP-plot showing LRT P-values in\nblack and MPTMS P-values in red, both plotted against a uniform distribution. Panel D uses\nthe same data points as panel C, but plotted simulation-by-simulation, showing that evaluations\nof the two tests for each simulated alignment differed greatly. Panels E and F are similar to\npanels C and D, but using an LRT using NDRH(2) and the MPTIS to detect ERTH.\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nFigure 9: Some ERTH is invisible to the MPTIS. The R-matrices used were as shown in Panel\nA, where the order of the nucleotides is A, C, G, T. For the first test they were placed on the\ntree such that RblueV (V for visible) was on the blue branch and the Rblack was on all the other\nbranches (Panel B). Composition was 25% for each character state. Under these simulation\nconditions datasets are made with ERTH but no CTH, and the resulting ERTH was significant\nwith both the LRT and using the MPTIS (P=0 for all 2000 replicate simulations for both tests).\nHowever, if the RblueV was replaced with the RblueI-matrix ( I for invisible), simulated datasets\nwere made that remain significant with the LRT (P=0 for all 2000 replicate simulations) but\nwith the MPTIS test were significant only at the Type-I error rate, as shown in the PP-plot in\nPanel C.\nFigure 10: Another example of ERTH that is invisible to the MPTIS. Simulations were made on\nthe three-taxon tree shown in Panel B with R-matrices as shown in Panel A (with nucleotide\norder A, C, G, T) placed on the tree such that Rblue was placed on the blue branch and Rblack\nwas placed on the other two branches. Compositions were equal (25% each character state)\nand there was no CTH. LRTs were done with GTR versus NDRH, and all 2000 replicates were\nsignificant (P=0 for all). However, the MPTIS P-values were uniform as shown in the PP-plot\nin Panel C.\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nA\nGTR\nNanoarchaeum\nEuryarchaeota\nEocytes\nEukaryotes\n68\n76\nB\nNDCH2\nNanoarchaeum\nEuryarchaeota\nEocytes\nEukaryotes\n52\n96\nC\nNDRH2\nEuryarchaeota\nNanoarchaeum\nEocytes\nEukaryotes\n48\n60\nD\nNDCH2+\nNDRH2\nEuryarchaeota\nNanoarchaeum\nEocytes\nEukaryotes\n55\n99\nFigure 11: Topologies obtained for different models used in Cox et al (2008) rRNA re-analysis.\nTrees are rooted on Bacteria.\nGTR NDCH2 NDRH2 NDCH2+\nNDRH2\nLPML\n-24000\n-23900\n-23800\n-23700\n-23600\nFigure 12: Analysis of Cox et al rRNA. This shows evidence for both CTH and ERTH.\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nExample using Asian gecko ND2\nThis reanalysis is from a study by Brown et al. 2012b, 2012a. While the Tree of Life example\nabove used highly diverged sequences and might therefore be expected to show CTH and ERTH,\nthe study in this example used the ND2 gene from a few genera of asian geckos, and so are from\na much smaller taxonomic group than the Tree of Life rRNA example above. Pearson’s χ2 test\nrejected compositional homogeneity (P=0) for this dataset. This dataset, including two more\nshort genes, was previously examined using the MaxSymTest as implemented in IQTree, finding\nsome compositional heterogeneity by the MPTMS, but not failing the MPTIS (See Figure 2 in\nNaser-Khdour et al., 2019). LPML values to show model fit are shown in Figure 13, and show\nthat the LPML using NDCH2+NDRH2 is greater than the LPML for NDCH2 alone by 23.3\nlog units (Supplemental Table S6), meaning that the dataset has both CTH and some ERTH.\nPosterior predictive simulations using the X 2 test quantity to measure model fit showed that\nthe models that included NDCH2 fit the composition of the data, but the GTR model and the\nNDRH2-only model did not (Supplemental Table S7).\nGTR NDCH2 NDRH2 NDCH2+\nNDRH2\nLPML\n-24100\n-24050\n-24000\n-23950\n-23900\n-23850\nFigure 13: Gecko ND2 analysis.\nDiscussion\nThis study demonstrates phylogenetic modelling of both compositional and exchange rate tree-\nheterogeneity (CTH and ERTH), using tree-heterogeneous NDCH and NDRH models both for\nsimulation and for analysis under maximum likelihood, and NDCH2 and NDRH2 for use in\nBayesian analyses. The NDXH2 models, NDCH2 and NDRH2, are fully parameterized over the\ntree, but avoid overparameterization by constraining the CTH and ERTH parameters, and so\nhave a better fit to tree-heterogeneous data than a similar model without that constraint (Figure\n4, NDXH2 versus NDXH-Fully)\nModels were compared in maximum likelihood using the likelihood ratio test. In a Bayesian\nanalysis models were compared with the log pseudomarginal likelihood, LPML (also LOO-CV).\nThe LPML was introduced to phylogenetics in Lewis et al. 2014, but has has not been commonly\nused. Tests to become more familiar with its behaviour show that it often shows a penalty for\noverparameterization (Figure 3). Posterior-predictive simulations were useful in assessment of\nmodel fit, and were used in this study (for example Supplementary Table S3). They have the\nadvantage of providing an assessment of absolute fit of the model to the data ( Bollback, 2002;\nFoster, 2004). However, here these were only used to assess fit of models to CTH because a test\nquantity to measure compositional heterogeneity, X 2, was at hand. Without a test quantity for\nERTH, posterior predictive simulations for ERTH were not done here.\nCross-estimation of CTH and ERTH was seen using the tree-heterogeneous models. This\nwas seen both with the LRT in ML, and with LPML in Bayesian analysis (Table 2, Figure 6).\n17\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nHowever, by modelling CTH and ERTH both separately and together it is possible to infer the\nrelative contributions of CTH and ERTH. This cross-estimation is not seen when using MPTs\n(Table 3).\nModels in common use are tree-homogeneous. When used with these models data that have\nCTH and ERTH may lead to poor parameter estimates and topological distortions (Figure 7). For\nexample, compositional attraction was shown, where unrelated taxa with similar compositions are\nattracted to one another when using homogeneous analysis. A parallel effect with ERTH was also\nshown, where taxa with similar exchange rates are attracted to each other, showing a previously\nunknown way for phylogenetic analysis to go wrong when mis-modelled. Another pattern seen\nwas erosion of support as the model fit improved, when the poorly-fitting tree-homogeneous model\nhad erroneously high support, and the better-fitting heterogeneous model decreased support\n(Figure 7 Rows B and F). This is a reminder that the goal is accurate estimation, not simply high\nsupport for topological resolution, and that a model that fits well benefits parameter estimation\nas well as topology estimation.\nDetection of CTH and ERTH was compared between MPTs and model-based LRTs. It is\nshown that the MPTMS will detect CTH, and the MPTIS will detect ERTH (Table 3). An\nadvantage of MPTs is that they can be used before phylogenetic analysis, and this would allow\nidentification and removal of problematic taxa or alignments (Jermiin et al., 2020 ). However,\nsince CTH and ERTH are features of evolution, focussing only on tests that flag violations of SRH\nwith the intention of subsequent exclusive use of tree-homogeneous models is methodologically\nincomplete, because we may want to be able to also assess fit of tree-heterogeneous models to tree-\nheterogeneous data. Model-based comparisons are alignment-wide, which is generally what is\nwanted, while the MPTs are done between sequence pairs, complicating alignment-wide screening\n(see for example the solution described in Naser-Khdour et al. 2019). As mentioned above, MPTs\ndo not suffer from cross-estimation of CTH and ERTH as do model-based comparisons (Table 2\ncompared to Table 3), which makes interpretation of the MPTs more direct.\nIt would be expected that MPTs and model-based detection of CTH and ERTH would agree,\nand this is true in comparisons testing CTH on simulations on one-branch, two-taxon trees\n(Figure 8, panels A and B). However, the two approaches differ when tested on four-taxon trees\n(Figure 8 panels C – F). In the four-taxon tests the LRT was found to have more statistical\npower in general. Furthermore, comparing these approaches simulation-by-simulation show that\nMPTs and modelling are very different (Figure 8 D,F). The two approaches differ to the extent\nthat some simulations that contain significant ERTH as measured by the model-based LRT are\ninvisible to the MPTIS (Figures 9, 10).\nERTH was seen in the Tree of Life rRNA dataset, which might be expected because those taxa\nwere highly diverged (Figure 12 panel A). However, we also measured some ERTH, as well as\nsubstantial CTH, in the gecko alignment, the taxa of which are more closely related (Figure 13).\nERTH was found to be extensive in a survey made using a version of the MPTIS (Naser-Khdour\net al., 2019 ). Modelling found some evidence for ERTH in gecko ND2, while it was not found\nby Naser-Khdour et al. 2019, possibly because modelling is more sensitive (Figure 8 panel E),\nor different in other ways (Figure 8, panel F).\nThere is compelling potential for further improvement of phylogenetic models when we note\nthat biological sequences will generally have compositional heterogeneity over alignment sites\n(Lartillot & Philippe, 2004). We can model that with profile-based models, such as the CAT\nmodel (Lartillot & Philippe, 2004). It would be useful to accommodate CTH and ERTH as\nwell as among-site compositional heterogeneity in the same model (Blanquart & Lartillot, 2008;\nFeuda et al., 2017; Naser-Khdour et al., 2019).\n18\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nReferences\nAbabneh, F., Jermiin, L. S., Ma, C., & Robinson, J. (2006). Matched-pairs tests of homogeneity\nwith applications to homologous nucleotide sequences. Bioinformatics, 22, 1225–1231. https:\n//doi.org/10.1093/bioinformatics/btl064\nAbascal, F., Posada, D., & Zardoya, R. (2007). MtArt: a new model of amino acid replacement\nfor Arthropoda. Molecular Biology and Evolution, 24, 1–5. https://doi.org/10.1093/molbev\nAdachi, J., & Hasegawa, M. (1996). Model of amino acid substitution in proteins encoded by\nmitochondrial DNA. Journal of Molecular Evolution, 42, 459–468. https://doi.org/10.1007/\nBF02498640\nBlanquart, S., & Lartillot, N. (2006). A Bayesian compound stochastic process for modeling\nnonstationary and nonhomogeneous sequence evolution. Molecular Biology and Evolution , 23,\n2058–2071. https://doi.org/10.1093/molbev/msl091\nBlanquart, S., & Lartillot, N. (2008). A site- and time-heterogeneous model of amino acid re-\nplacement. Molecular Biology and Evolution , 25, 842–858. https://doi.org/10.1093/molbev/\nmsn018\nBollback, J. P. (2002). Bayesian model adequacy and choice in phylogenetics. Molecular Bi-\nology and Evolution, 19(7), 1171–1180. https://doi.org/10.1093/oxfordjournals.molbev.\na004175\nBrown, R. M., Siler, C. D., Das, I., & Min, Y. (2012a). Data from: Testing the phylogenetic\naffinities of Southeast Asia’s rarest geckos: Flap-legged geckos (Luperosaurus), Flying geckos\n(Ptychozoon) and their relationship to the pan-Asian genus Gekko [Dataset] . https://doi.org/\n10.5061/dryad.7bn0fr99\nBrown, R. M., Siler, C. D., Das, I., & Min, Y. (2012b). Testing the phylogenetic affinities\nof Southeast Asia’s rarest geckos: flap-legged geckos (Luperosaurus), flying geckos (Ptychozoon)\nand their relationship to the pan-Asian genus Gekko. Molecular Phylogenetics and Evolution,\n63, 915–921. https://doi.org/10.1016/j.ympev.2012.02.019\nCox, C. J., Foster, P. G., Hirt, R. P., Harris, S. R., & Embley, T. M. (2008). The archae-\nbacterial origin of eukaryotes. Proceedings of the National Academy of Sciences (USA), 105,\n20356–20361. https://doi.org/10.1073/pnas.0810647105\nFeuda, R., Dohrmann, M., Pett, W., Philippe, H., Rota-Stabelli, O., Lartillot, N., Wörheide,\nG., & Pisani, D. (2017). Improved modeling of compositional heterogeneity supports sponges as\nsister to all other animals. Current Biology, 27, 3864–3870. https://doi.org/10.1016/j.cub.\n2017.11.008\nFoster, P. G. (2004). Modeling compositional heterogeneity. Systematic Biology, 53, 485–495.\nhttps://doi.org/10.1080/10635150490445779\nFoster, P. G. (2025). p4. A Python phyloinformatic toolkit, and an implementation of\ntree-heterogeneous models of evolution [Computer software]. https://github.com/pgfoster/\np4-phylogenetics\nFoster, P. G., Cox, C. J., & Embley, T. M. (2009). The primary divisions of life: a phy-\nlogenomic approach employing composition-heterogeneous methods. Philosophical Transactions\nof the Royal Society B: Biological Sciences , 364, 2197–2207. https://doi.org/10.1098/rstb.\n2009.0034\nGaltier, N., & Gouy, M. (1998). Inferring pattern and process: Maximum-likelihood imple-\nmentation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis.\nMolecular Biology and Evolution , 15, 871–879. https://doi.org/10.1093/oxfordjournals.\nmolbev.a025991\nGaltier, N., Tourasse, N., & Gouy, M. (1999). A nonhyperthermophilic common ancestor to\nextant life forms. Science, 283, 220–221. https://doi.org/10.1126/science.283.5399.220\n19\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint \n\nJermiin, L. S., Catullo, R. A., & Holland, B. R. (2020). A new phylogenetic protocol: dealing\nwith model misspecification and confirmation bias in molecular phylogenetics. NAR Genomics\nand Bioinformatics, 2, lqaa041. https://doi.org/10.1093/nargab/lqaa041\nJukes, T. H., & Cantor, C. R. (1969). Evolution of Protein Molecules. In H. Munro (Ed.),\nMammalian Protein Metabolism (pp. 21–132). Academic Press. https://doi.org/10.1016/\nB978-1-4832-3211-9.50009-7\nLartillot, N. (2023). Identifying the best approximating model in Bayesian phylogenetics:\nBayes factors, cross-validation or wAIC? Systematic Biology , 72, 616–638. https://doi.org/\n10.1093/sysbio/syad004\nLartillot, N., & Philippe, H. (2004). A Bayesian mixture model for across-site heterogeneities\nin the amino-acid replacement process. Molecular Biology and Evolution , 21, 1095–1109. https:\n//doi.org/10.1093/molbev/msh112\nLe, V. S., Dang, C. C., & Le, Q. S. (2017). Improved mitochondrial amino acid substitution\nmodels for metazoan evolutionary studies. BMC Evolutionary Biology, 17, 1–13. https://doi.\norg/10.1186/s12862-017-0987-y\nLewis, P. O., Xie, W., Chen, M.-H., Fan, Y., & Kuo, L. (2014). Posterior predictive Bayesian\nphylogenetic model selection. Systematic Biology, 63, 309–321. https://doi.org/10.1093/\nsysbio/syt068\nMinh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von Haeseler,\nA., & Lanfear, R. (2020). IQ-TREE 2: new models and efficient methods for phylogenetic\ninference in the genomic era. Molecular Biology and Evolution, 37, 1530–1534. https://doi.\norg/10.1093/molbev/msaa015\nNaser-Khdour, S., Minh, B. Q., Zhang, W., Stone, E. A., & Lanfear, R. (2019). The prevalence\nand impact of model violations in phylogenetic analysis. Genome Biology and Evolution, 11,\n3341–3352. https://doi.org/10.1093/gbe/evz193\nRivera, M. C., & Lake, J. A. (1992). Evidence that eukaryotes and eocyte prokaryotes are\nimmediate relatives. Science, 257 (5066), 74–76. https://doi.org/10.1126/science.1621096\nRonquist, F., Teslenko, M., Van Der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget,\nB., Liu, L., Suchard, M. A., & Huelsenbeck, J. P. (2012). MrBayes 3.2: efficient Bayesian\nphylogenetic inference and model choice across a large model space. Systematic Biology , 61,\n539–542. https://doi.org/10.1093/sysbio/sys029\nRota-Stabelli, O., Yang, Z., & Telford, M. J. (2009). MtZoa: a general mitochondrial amino\nacid substitutions model for animal evolutionary studies. Molecular Phylogenetics and Evolution ,\n52, 268–272. https://doi.org/10.1016/j.ympev.2009.01.011\nSwofford, D. L. (2002). PAUP*. Phylogenetic analysis using parsimony (*and other methods),\nVersion 4 [Computer software]. Sinauer Associates, Sunderland, Massachusetts.\nSwofford, D. L., Olsen, G. J., Waddell, P. J., & Hillis, D. M. (1996). Phylogenetic inference.\nIn D. M. Hillis, C. Moritz, & B. K. Mable (Eds.), Molecular Systematics, 2nd Edition (pp.\n407–514). Sinauer, Sunderland, Massachsetts.\nTavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA se-\nquences. Lectures on Mathematics in the Life Sciences, 17, 57–86.\nYang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with\nvariable rates over sites: approximate methods. Journal of Molecular Evolution , 39, 306–314.\nhttps://doi.org/10.1007/BF00160154\nYang, Z., & Roberts, D. (1995). On the use of nucleic acid sequences to infer early branchings\nin the tree of life. Molecular Biology and Evolution, 12, 451–458.\nYang, Z., Nielsen, R., & Hasegawa, M. (1998). Models of amino acid substitution and ap-\nplications to mitochondrial protein evolution. Molecular Biology and Evolution , 15, 1600–1611.\nhttps://doi.org/10.1093/oxfordjournals.molbev.a025888\n20\n.CC-BY 4.0 International licensemade available under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is \nThe copyright holder for this preprintthis version posted March 17, 2025. ; https://doi.org/10.1101/2025.03.14.643246doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}