{"paper_id":"cb3460d0-0640-465e-99b1-71c8f351689d","body_text":"Global diversity, predictors, and predictions of AMR evolutionary 1 \npathways in Klebsiella pneumoniae 2 \nOlav N. L. Aga1,2, Sabrina J. Moyo3,4,5, Joel Manyahi4, Upendo Kibwana4, Iren H. Löhr6,1, Nina 3 \nLangeland1,3,7,8, Bjørn Blomberg1,3,7, Iain G. Johnston9,2, * 4 \n1. Department of Clinical Science, University of Bergen, Bergen, Norway 5 \n2. Computational Biology Unit, University of Bergen, Bergen, Norway 6 \n3. Department of Medicine, Haukeland University Hospital, Bergen, Norway 7 \n4. Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania 8 \n5. Tropical Disease Biology Department, Liverpool School of Tropical Medicine, Liverpool, UK 9 \n6. Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway 10 \n7. National Advisory Unit for Tropical Infectious Diseases, Haukeland University Hospital, Bergen, Norway 11 \n8. The Norwegian Institute of Public Health, Oslo, Norway 12 \n9. Department of Mathematics, University of Bergen, Bergen, Norway 13 \n* correspondence to iain.johnston@uib.no 14 \nAbstract 15 \nAntimicrobial resistance (AMR) is a substantial and growing global health burden. Understanding, 16 \nand predicting, its evolution in speciﬁc pathogens will help responses across scales from individual 17 \npatient cases to large-scale policy. Data-driven approaches to this question often focus more on 18 \ngenomes and less on the evolutionary dynamics generating these genomes. Here, we use global 19 \ndata on AMR features in Klebsiella pneumoniae with hypercubic transition path sampling 20 \n(HyperTraPS), a machine learning approach, for Bayesian inference of the evolutionary pathways of 21 \nAMR in K. pneumoniae (KpAMR) in 102 diUerent countries, territories and areas. We identify 22 \n“globally consistent” evolutionary behaviours that hold across countries, and “globally divergent” 23 \nbehaviours including carbapenem and ﬂuoroquinolone resistance that vary across countries. We 24 \nshow how these divergent dynamics covary both with public health superregion and drug use policy, 25 \nand reveal competing evolutionary pathways within and between countries.  Using newly-26 \nsequenced data across several decades from sub-Saharan Africa, we show that this inferred global 27 \nroadmap of KpAMR evolution successfully predicts prospective evolutionary dynamics. Together, 28 \nwe hope that the ability to characterize and predict evolutionary dynamics of AMR acquisition, 29 \nconnected to socio-economic and drug policy predictors, will help strengthen our understanding of 30 \nAMR evolution worldwide. 31 \nIntroduction 32 \nAntimicrobial resistance (AMR), the resistance of microbial pathogens to the medicines we use to 33 \ntreat them, is a major threat to global health. Antibiotic resistance, the resistance of bacteria to 34 \nantibacterials, is an important subset of AMR. In 2019 and 2021, 1.27m and 1.14m deaths 35 \nrespectively were attributable to bacterial AMR (many more were associated with AMR), with highest 36 \nrates in sub-Saharan Africa (Murray et al., 2022; Naghavi et al., 2024). The evolution of genetic and 37 \nphenotypic features that cause AMR in bacterial pathogens is the focus of tremendous research 38 \ninterest, from basic biology through epidemiology, clinical studies and health policy.  39 \nKlebsiella pneumoniae (Kp) is a gram-negative bacterium and a major cause of nosocomial 40 \ninfections across settings. (Related species and subspecies of Kp form the KpSC (species complex); 41 \nin this report we will focus on K. pneumoniae sensu stricto). Kp is found in multiple environmental 42 \nniches and can both cause opportunistic infections in hospitalized patients and community 43 \nacquired infections, including severe infections caused by hypervirulent strains (Wyres et al., 2020). 44 \nKp has been identiﬁed as an important traUicker for antibiotic resistance genes from diUerent 45 \necological niches into several other clinically important pathogenic bacteria (Wyres & Holt, 2018). 46 \nKp contributes signiﬁcantly to the global burden of AMR. In 2021, 45 600 and 36 000 deaths were 47 \nattributed to carbapenem and ﬂuoroquinolone resistant Kp respectively. In 2021, Kp accounted for 48 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n12.7% of all deaths attributable to AMR in the population over ﬁve years old and 19.4% of AMR-49 \nattributable deaths in children under ﬁve years (Naghavi et al., 2024). 50 \n 51 \nGiven the substantial health burden associated with the evolution of AMR in Kp, large-scale data-52 \ndriven research has explored the genomic structure of, and AMR features in, Kp populations 53 \n(Argimón et al., 2021; Hetland et al., 2025; Holt et al., 2015; Munk et al., 2022; Wyres et al., 2020). 54 \nHere, we consider a complementary perspective: the evolutionary pathways by which these 55 \nfeatures are acquired (Renz et al., 2024). Many related subquestions here can broadly be uniﬁed as: 56 \nfrom a global perspective, in what orders are AMR features acquired over diUerent instances of 57 \npathogen evolution? For example, given a newly observed isolate in the clinic with a given set of 58 \nAMR features, can we predict which drug resistance(s) will evolve next (and thereby inform 59 \ntreatment guidelines accordingly)? And can we identify which extrinsic features – from physical 60 \nenvironment, demography, drug policy, or others – inﬂuence the evolutionary acquisition of AMR?  61 \nThese questions require approaches that can use the expanding volumes of AMR data to make and 62 \ntrain descriptive and predictive evolutionary models. However, the large sets of features involved in 63 \ntypical AMR systems are often not addressable using traditional methods from evolutionary biology. 64 \nFor example, an often-studied dataset of Mycobacterium tuberculosis isolates contains 65 \nsusceptibility-resistance information for a panel of 10 diUerent drugs (Casali et al., 2014). 66 \nComparative methods like the Mk (Markov k-state) model often struggle for more than around 7 67 \npotentially coupled characters under study (Johnston & Diaz-Uriarte, 2024). An alternative class of 68 \nmethods for studying the evolution of multiple characters – evolutionary accumulation modelling or 69 \nEvAM – has emerged jointly from the cancer and evolution literatures (Diaz-Uriarte & Herrera-Nieto, 70 \n2022; Renz et al., 2024; Schill et al., 2024), but has traditionally considered independent samples, 71 \nwithout the phylogenetic relationships found in bacterial populations (Dewar et al., 2025). While 72 \nmany EvAM approaches have the ability to infer dependencies between multiple interacting 73 \ncharacters, neglecting this phylogenetic relatedness can lead to substantial artefacts in analysis 74 \n(Maddison & FitzJohn, 2015). 75 \nHypercubic transition path sampling (HyperTraPS) is an EvAM approach designed to incorporate 76 \nphylogenetic information into the (Bayesian) inference of evolutionary pathways for multiple 77 \ninteracting characters (Aga et al., 2024; Greenbury et al., 2020; Johnston & Williams, 2016). Other 78 \nEvAM approaches including TreeMHN (Luo et al., 2023), HyperHMM (Moen & Johnston, 2023), and 79 \nHyperMk (Johnston & Diaz-Uriarte, 2024), also support phylogenetically coupled data, but we focus 80 \non HyperTraPS here because of its ﬂexibility (Aga et al., 2024; Renz et al., 2024). In this article, we 81 \nharness large-scale global datasets on AMR in Kp (KpAMR) with HyperTraPS to learn the structure 82 \nand variability of evolutionary pathways of drug resistance in Kp across the globe, as well as the 83 \nextrinsic factors shaping these pathways.  84 \n  85 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nFigure 1. Global genomic data on KpAMR. (A) We use 47 721 Klebsiella pneumoniae (Kp) isolates from 86 \naround the world, with (i) country-speciﬁc counts of isolates given by colour and years of isolation given in (ii). 87 \n(B) UpSet plot giving counts of di\\erent KpAMR characters across these isolates. Character names: AGly 88 \n(resistance to aminoglycosides); Col (colistin); Fcyn (Fosfomycin); Flq (Fluoroquinolones); Gly 89 \n(Glycopeptides); MLS (Macrolides); Phe (Phenicols); Rif (Rifampin); Sul (Sulfonamides); Tet (Tetracyclines); Tgc 90 \n(Tigecycline); Tmt (Trimethoprim); Bla_a (β-lactam resistance without extended spectrum or inhibitor-91 \nresistance); Bla_inhR (resistance to β-lactam/inhibitor combinations); Bla_ESBL (resistance to extended-92 \nspectrum β-lactams); Bla_ESBL_inhR (resistance to extended-spectrum β-lactam/inhibitor combinations); 93 \nBla_Carb (resistance to carbapenems); SHV (SHV β-lactamase with expanded enzyme activity), Bla_chr (SHV 94 \nalleles conferring  resistance to ampicillin), Omp (outer membrane protein). Further details available via 95 \nKleborate documentation at https://github.com/klebgenomics/Kleborate. We will use -a: acquired (via 96 \nhorizontal gene transfer); -m: mutation; -chr: chromosomal (intrinsic). 97 \nResults 98 \nLocal and global inferred pathways of AMR evolution in Klebsiella 99 \nTo learn and compare the likely pathways by which KpAMR is acquired in diUerent countries, we 100 \nobtained a dataset of 47 721 genomes from 102 diUerent countries, territories and areas, and 101 \nextracted details of the presence or absence of genes corresponding to each of 22 AMR classes, 102 \nusing PathogenWatch and Kleborate (Argimón et al., 2021) (Lam et al., 2021) (Methods; Fig. 1A-B). 103 \nWe refer to presence/absence of resistance genes for each of these classes as binary “resistance 104 \ncharacters” (“character” referring to a particular property of a species in evolutionary biology). The 105 \nset of characters we consider, following Kleborate, is given in the Fig. 1 caption; in this report we will 106 \nuse the shorthand deﬁned there, grouping genes by drug resistance class (and Lahey class for β-107 \nlactamases) (Gupta et al., 2014; Lam et al., 2021; Tsang et al., 2024). 108 \nFor each country in our dataset, we next used hypercubic transition path sampling (HyperTraPS), a 109 \nmachine learning approach, to infer the “pathways” of KpAMR character acquisition: that is, the 110 \nordered sequences with which characters are acquired over time. Figs. 2A-C illustrate this pipeline 111 \nfor a single country example (Romania, chosen for its moderate number of samples). HyperTraPS 112 \ntakes presence-absence patterns of characters on a phylogeny as input (Fig. 2A) and outputs an 113 \nevolutionary model describing the pathways of character acquisition most supported by the data 114 \n(Aga et al., 2024; Greenbury et al., 2020; Johnston & Williams, 2016; Renz et al., 2024). This model 115 \ncan be summarized, for example, as the set of posterior probabilities Pij with which character i is 116 \nacquired at ordering j in a putative evolutionary process (Fig. 2B). We refer to a plot where Pij is 117 \nvisualised with point size, as in Fig. 2B, a “bubble plot” . The full output of the inference process is a 118 \nparameterised transition network describing the probability of diUerent sequences of transitions 119 \nthrough the state space of possible resistance characters (Fig. 2C). Respecting the fact that many 120 \nKpAMR characters may be acquired reversibly (for example, via gain and loss of plasmids) and 121 \nHyperTraPS assumes irreversible dynamics, we conﬁrmed with synthetic control studies that 122 \nreversible dynamics marginally increase uncertainty in our evolutionary inference but do not lead to 123 \nsystematic errors (Supp. Fig. 1). 124 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n125 \nFigure 2. Inferring evolutionary dynamics from genome data across countries. (A) Example source data 126 \nfrom Romania, with 118 Kp isolates connected via a putative phylogeny derived from LIN codes (see Methods) 127 \nand assigned presence (dark pixel) or absence (white pixel) markers for resistance to each of 22 antibiotic 128 \ngroups. (B) Summary “bubble plot” of evolutionary pathways inferred for Romanian data with HyperTraPS. The 129 \narea of a bubble gives the posterior probability that a given character (row) is acquired at a given ordering in 130 \nthe accumulative evolutionary process. For example, Bla_chr is inferred to be a likely early acquisition, while 131 \nBla_ESBL-a likely occurs later and Tgc_a is inferred likely later still (or never). (C) A subset of the hypercubic 132 \ntransition graph inferred by HyperTraPS for Romanian data. Each edge’s width gives the probability of a 133 \ntransition involving a single character acquisition from the upper state to the lower state. The uppermost state 134 \nis the ancestral state with no AMR characters. Only early transitions above a threshold probability 0.025 are 135 \nplotted for clarity. (D) Global plot, taking mean acquisition probability over all countries in the dataset. 136 \nProbabilities higher than the “null model” of uniform acquisitions are plotted in blue.  137 \n 138 \nThe large majority of genomes with timing information were isolated in the period 2015-2020 (Fig. 139 \n1A). Our approach here does not directly consider the speciﬁc, real-world timings of the 140 \nevolutionary transitions involved – although, given reliable data on the divergence times of all pairs 141 \nof isolates, HyperTraPS can readily analyse continuous-time data (Aga et al., 2024; Renz et al., 142 \n2025). Rather, our approach uses the relative evolutionary timings from the source data phylogeny 143 \n(for example, Fig. 2A) to report the orderings of events: which acquisitions precede or follow which 144 \nothers, given that characters may inﬂuence each other, with the timescale of these events (see 145 \nDiscussion).  146 \nWhile Figs. 2A-C demonstrate the approach for a single country, Fig. 2D gives the average 147 \nacquisition probabilities across all countries in our dataset, and Supp. Fig. 2 gives the 148 \ncorresponding summary plots for every country in the dataset. Resistance to some antibiotic groups 149 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nare consistently observed across all countries: following very common presence of Bla-chr and 150 \nSHV-m, resistance characters against betalactams and extended-spectrum betalactams (ESBLs) 151 \n(Bla-a, Bla_ESBL-a) are often acquired earlier, while resistance to colistin (drug of last resort for 152 \nmulti-drug resistant Kp; Col-a and Col-m) is acquired later. Country-speciﬁc diUerences exist in the 153 \nevolutionary acquisition of other characters, including resistance to carbapenems (Bla_Carb-a).  154 \nIn several instances, bimodality in the inferred evolutionary dynamics is observed. That is, a given 155 \ncharacter may be acquired at an earlier stage or a later stage in the evolutionary process, but not at 156 \nintermediate stages. Such bimodality is characteristic of several diUerent evolutionary pathways 157 \nexisting in a system. Examples can readily be seen in Fig. 2B for Romania and Fig. 2D for the global 158 \nmean behaviour, where SHV-m is likely acquired either very early or at more intermediate stages, 159 \nwith little probability in between. These competing dynamics are directly visible in Fig. 2C, where 160 \ndistinct, canalised pathways exist: one involving SHV-m as the ﬁrst acquisition, one involving 161 \nBla_chr as the ﬁrst acquisition and SHV-m only following rather later. Across countries (Supp. Fig. 2), 162 \nbimodality is observed in several characters: some examples are in Bla_chr (Tanzania, Zambia); Bla-163 \na (Ireland, S Korea; Bla_ESBL-a (Laos); and SHV-m (Romania and at least 15 other countries across 164 \nGDB regions). The bimodality in the mutational and chromosomal characters reﬂects the 165 \ndiUerences in evolutionary dynamics between horizontal-gene-transfer-mediated acquisitions and 166 \nthe other KpAMR characters we consider. 167 \nAcross subsets of countries, interactions between our KpAMR resistance characters are observed 168 \n(Supp. Fig. 3). For example, in around a quarter of cases, the acquisition of tigecycline resistance 169 \n(Tgc_a) is inferred to promote the acquisition of glycopeptide resistance (Gly_a), and the acquisition 170 \nof sulfonamide resistance (Sul_a) is inferred to promote the acquisition of trimethoprim resistance 171 \n(Tmt_a). In some cases, the acquisition of MLS resistance (MLS_a) is inferred to repress the 172 \nacquisition of carbapenem resistance (Bla_carb_a), suggesting possible interactions that could be 173 \nclinically exploited in combination therapies (see Discussion). 174 \nHowever, although Supp. Fig. 2 gives a detailed “roadmap” of KpAMR evolution across countries, it 175 \nis hard to immediately extract global-scale insights from this representation. We therefore next 176 \nconsidered how to compare these results in a reduced-dimensionality picture. 177 \nGlobal consistency in evolutionary acquisition of resistance to a subset of drug families  178 \nTo proceed, we used principal components analysis (PCA) to embed the high-dimensional set of 179 \nposterior probabilities in a 3D space (Fig. 3). Here, the axes of highest variance in the inferred 180 \nevolutionary pathways are identiﬁed and used to naturally lay out the individual country outputs. 181 \nComparison of country-by-country behaviour along these axes can then reveal major sources of 182 \nvariability (and similarity) in inferred KpAMR evolution across countries.  183 \nInferred evolutionary pathways for Kp across countries consistently involve early acquisition of 184 \ncharacters conferring resistance to aminoglycosides, sulfonamides, trimethoprim, and betalactams 185 \n(Agly, Sul, Tmt and Bla characters). Aminoglycosides and sulfonamides were both discovered and 186 \nentered clinical use early in the mid 19th century. Resistance properties inferred to be acquired at 187 \nlater orderings include extended spectrum beta-lactam and phenicol resistance, acquired via 188 \nmobile genetic elements (Bla_ESBL-a, Phe-a). Acquisitions inferred to be acquired late across 189 \ncountries include inhR-driven resistance (Bla_inhR-a, Bla_ESBL_inhR-a), and resistance to later- 190 \nand ﬁnal-line drugs including colistin, fosfomycin, tigecyclin, and glycopeptides (Col-a, Fcyn-a, Tgc-191 \na, Gly-a). These form a set of “globally consistent” evolutionary observations. 192 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 193 \nFigure 3. Global structure of inferred pathways of AMR evolution. (A-B) Principal components analysis 194 \n(PCA) of country-speciﬁc “bubble plots” as in Fig. 2B, reﬂecting inferred evolutionary dynamics of KpAMR in 195 \ndi\\erent countries. Individual countries are plotted on PCAs 1-2 (A) and 2-3 (B), individually and with ellipses 196 \ndemarcating their corresponding GBD region (acronyms below). (C-D) Example “bubble plots” from countries 197 \nreﬂecting the extremes of PCA1 (C) and PCA2 (D). In (C), Venezuela reﬂects poorly characterised evolutionary 198 \ndynamics, with relatively uniform probabilities everywhere; Nigeria reﬂects precisely characterised 199 \nevolutionary dynamics, with individual characters having highly likely, precisely speciﬁed orderings. In (D), 200 \ncountry-speciﬁc di\\erences in ordering emerge, including in those characters highlighted with boxes. For 201 \nexample, KpAMR evolution in Argentina is likely to involve Flq-m acquisition earlier than other characters; 202 \nevolution in Senegal is likely to involve Flq-m acquisition later than other characters. GBD regions: CEEECA, 203 \nCentral Europe, Eastern Europe, Central Asia; HI, High-income; LAC, Latin American and Caribbean; NAME, 204 \nNorth Africa and Middle East; SA, South Asia; SEEAO, South-east and East Asia and Oceania; SSAf, Sub-205 \nSaharan Africa. Zanzibar (dark outline) is from data newly sequenced in this study. Character names given in 206 \nFig. 1 caption. 207 \n 208 \nCountry-to-country di@erences in evolutionary acquisition of resistance to carbapenems, 209 \nﬂuoroquinolones, tetracycline, and others  210 \nThe general patterns above appear across evolutionary dynamics across countries and regions, with 211 \nthe ﬁrst principal component of inferred Kp variability (accounting for 23% of the overall variance) 212 \nsimply reporting the precision with which these patterns can be characterised (Fig. 3C, Fig. 4). In 213 \ncountries supporting less precise inference (for example, with limited datasets), the expected 214 \nordering of these characters is more uniform and closer to the average null hypothesis of all 215 \ncharacters behaving equally; in countries with more precise output the expected orderings diverge 216 \nto early and late values (Fig. 3C, Fig. 4). 217 \n 218 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 219 \nFigure 4. KpAMR characters determining aspects of global variability in evolutionary dynamics. Each 220 \ncountry has an expected acquisition ordering for each KpAMR character. The plots show how these patterns of 221 \nexpected ordering are linked to the principal components of global KpAMR evolution variability in Fig. 3: PCA1 222 \n(top, grey); PCA2 (centre, red); and PCA3 (bottom, blue). (A) “Globally consistent” characters: those that 223 \ncovary tightly with PCA1. All characters here have either early or late expected orderings across all countries 224 \n(either left or right of the central vertical line). Their position on PCA1 reﬂects the precision of the inference 225 \noutput. Less precise outputs (low PCA1 values, like Venezuela in Fig. 3C) will have orderings close to the 226 \ncentral “null”; more precise outputs (high PCA1 values, like Nigeria in Fig. 3C) will have more divergent “early” 227 \nand “late” orderings towards the edges of the plot. (B-C) “Globally divergent” characters: those that covary 228 \nmore tightly with PCA2 (B) or PCA3 (C). Expected orderings in these character sets span a wider range, 229 \nshowing sources of variability in the evolutionary dynamics that are independent of PCA1 (and hence less 230 \ninﬂuenced by observational uncertainty). An alternative visualisation is given in Supp. Fig. 4. Character names 231 \ngiven in Fig. 1 caption. 232 \n 233 \nOnce variability in this precision is accounted for (by considering the ﬁrst principal component), 234 \nvariability corresponding to other diUerences in the inferred dynamics can be extracted. Several 235 \ncharacters covary comparatively little with the (precision-aligned) PCA1 and instead show stronger 236 \ncovariance with PCA2 or PCA3, suggesting that there is genuine country-to-country variability in 237 \nthese characters after controlling for diUerences in observational noise (Fig. 3D, Fig. 4, Supp. Fig. 4). 238 \nPCA2 (accounting for 15% of overall variance) displays strong covariance with carbapenem, 239 \nﬂuoroquinolone, MLS, and tetracycline resistance and OMP mutation (Bla_Carb-a, Flq-m, MLS-a, 240 \nTet-a, Omp-m,). PCA3 (accounting for 8% of overall variance) displays covariance with Flq-a, Rif-a, 241 \nSHV-m. None of these “globally divergent” characters display the “null-to-extreme” divergence seen 242 \nin those characters most closely related to PCA1, suggesting genuine country-to-country 243 \ndiUerences in evolutionary dynamics, rather than observational diUerences, are responsible for this 244 \nvariability. 245 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 246 \nFigure 5. Geographical variation in inferred AMR evolutionary pathways. (A-B) Distributions of country-247 \nspeciﬁc PCA projection values across di\\erent GBD regions for (A) PCA2 and (B) PCA3. In (A), sub-Saharan 248 \nAfrican countries show systematically higher positions in PCA2 than other regions. (C) Inferred acquisition 249 \norderings of KpAMR characters by GBD region. The left ﬁve characters covary with PCA2; the right three covary 250 \nwith PCA3. Carbapenem resistance and OMP mutations are inferred to be acquired substantially later in sub-251 \nSaharan Africa; rifampicin and ﬂuoroquinolone resistance are inferred to be acquired substantially earlier in S, 252 \nSE, E Asia and Oceania. New Zanzibar data from this study is shown by large circles. Character names given in 253 \nFig. 1 caption. 254 \nGeographical correlates with Kp evolutionary dynamics 255 \nTo explore the sources of this evolutionary variability in more detail, we asked whether geographical 256 \nregion was linked to diUerent variants of the inferred evolutionary dynamics (after accounting for 257 \nobservational diUerences). We ﬁrst asked whether countries from diUerent regions showed 258 \nsystematic diUerences in their PCA2 and PCA3 projections – corresponding to diUerent orderings of 259 \nthe “globally divergent” country-speciﬁc characters above (Fig. 5). We found that sub-Saharan 260 \nAfrica was a dramatic outlier in PCA2, involving notably later acquisition of carbapenem and 261 \nﬂuoroquinolone resistance and OMP mutation than the other regions (Fig. 5A, C). This ordering 262 \nresult agrees with the continuous-time history of treatment in the region, where carbapenem and 263 \nﬂuoroquinolone use was established later (especially for children) than in other regions. Other 264 \nregions were broadly consistent in PCA2 but showed substantial diUerences in PCA3. These 265 \ndiUerences correspond mainly to acquired resistance to ﬂuoroquinolone and rifampicin, with 266 \nsubstantially earlier inferred acquisition in South-Southeast-East Asia and Oceania, and 267 \nsubstantially later in Central and Eastern Europe, Central Asia, and sub-Saharan Africa (Fig. 5B, C). 268 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 269 \nFigure 6. Connecting AMR character acquisition propensity with drug treatment regimens across 270 \ncountries. Median (across observed years 2016-2021) per-capita drug use in di\\erent drug classes (J codes), 271 \nwith expected acquisition ordering of associated resistance characters. “Globally consistent” characters Tet-272 \na, Bla_inhR-a, Sul-a, AGly-a were associated with PCA1 in the global analysis and display little correlation 273 \nbetween evolutionary ordering and drug use levels. “Globally divergent” characters Bla_Carb-a and MLS-a 274 \nwere associated with PCA2 in the global analysis and display stronger correlations, with earlier evolution of 275 \nresistance linked to higher levels of drug use. Character names given in Fig. 1 caption. 276 \nPolicy predictors of variance in AMR evolutionary dynamics  277 \nWe next asked whether known policy of antibiotic use inﬂuence the inferred evolutionary dynamics 278 \nfrom our analysis. We gathered statistics on drug use from 2016-2021 in countries available from 279 \nthe WHO GLASS surveillance programme (Global Antimicrobial Resistance and Use Surveillance 280 \nSystem (GLASS) Report 2022, 2022) and looked at correlations between per-capita drug use 281 \nstatistics and the inferred evolutionary dynamics of resistance to those drugs (Fig. 6). We found that 282 \nfor those drug types associated with PCA1 (where variability in inferred dynamics is largely 283 \ndetermined by observation uncertainty) there was little correlation between drug use and the 284 \nordering of resistance acquisition. However, for carbapenems, associated with PCA2 (more non-285 \nobservational country-to-country variability), there was a link between drug usage statistics and 286 \ninferred resistance evolution, with resistance evolving earlier in countries with high drug use rates. 287 \nThe direction of this correlation was also observed in MLS, also associated with PCA2, though this 288 \nrelationship did not show statistical support at the p < 0.05 level. This pattern of correlations is 289 \nconsistent with the picture above: acquisition of some (PCA1) KpAMR characters is relatively 290 \nconstant across countries and circumstances, while other (PCA2) characters are more plastic and 291 \ndepend more tightly on country- and region-speciﬁc drivers. 292 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nTesting predictions of AMR evolutionary dynamics with newly sequenced, cross-decadal 293 \ndata 294 \nOne motivating application of evolutionary inference to AMR problems is the ability to predict the 295 \nnext steps in evolutionary pathways (Renz et al., 2024). A question of potentially direct clinical 296 \nrelevance is: given a clinically observed isolate with a given set of AMR characters, which 297 \ncharacter(s) will likely evolve next – and can treatment guidelines be adapted accordingly? 298 \nPrevious studies of AMR with HyperTraPS have demonstrated its ability to predict properties of 299 \nevolutionary transitions in a withheld subset of the overall dataset, following the common training-300 \ntest machine learning paradigm (Aga et al., 2024). Here, we are in the unique position of using 301 \nindependently obtained and Kp genome data to test the predictions of the evolutionary models 302 \ntrained above. We sequenced Kp isolates from patients with bloodstream infections in 2001-2 and 303 \n2017-8 in Tanzania and 2015-6 in Zanzibar, collected in previous studies (Blomberg et al., 2007; 304 \nMoyo et al., 2020; Onken et al., 2015, 2024) (Fig. 7; see Methods). Bioinformatic analysis placed 305 \nthese new genomes in a phylogenetic tree with a subset of existing genomes from the training data, 306 \nincluding their AMR proﬁles (Fig. 7A, Supp. Fig. 5). Using only transitions independent of those in the 307 \ntraining data (see Methods), we tested the predictions of the HyperTraPS model trained on data from 308 \nTanzania. Speciﬁcally, for each new transition, we queried the trained model about likely next steps 309 \nfrom the ancestral state, and recorded the predicted ranks of the next steps actually present in the 310 \ntransition (Fig. 7B). By far the most common outcome was that the true next step was ranked ﬁrst 311 \n(predicted to be most likely); compared to predictions from an untrained model, the trained model 312 \nwas systematically much better at predicting the true next steps.  313 \nThis analysis demonstrates that within-country predictions of future evolutionary dynamics are 314 \npossible given a trained HyperTraPS model. We also asked whether the model could predict 315 \nevolutionary dynamics at the regional scale. We use Zanzibar as a test case here – a region of 316 \nTanzania that was not explicitly represented in the original Kleborate training dataset. Our ﬁndings 317 \nwould predict KpAMR in Zanzibar to fall into the sub-Saharan Africa set of behaviours, with high 318 \nvalues on PCA2 and associated signatures of late carbapenem resistance and others (Fig. 5). To this 319 \nend, we applied HyperTraPS inference to newly sequenced isolates from Zanzibar. The resulting 320 \ninferred dynamics clearly place Zanzibar within the set of sub-Saharan Africa behaviours (Fig. 3A-B), 321 \nwith high values on PCA2 arising from later acquisition of carbapenem resistance, and high values 322 \non PCA3 arising from later acquisition of rifampicin resistance (Fig. 5). 323 \nWe also explored the representation of diUerent KpAMR characters in the snapshot genome 324 \nsequences alone, without connecting with evolutionary dynamics. Fig. 7C shows the relative 325 \nprevalence of acquired KpAMR characters in existing and newly-sequenced data, not accounting for 326 \nphylogenetic relatedness (and without representation of an explicit non-AMR control group). With 327 \nthis sampling, most characters showed an increase from 2001-2 to 2015-6, particularly in mobile 328 \ngenetic acquisition of ﬂuoroquinolone resistance, while the relative prevalence of acquired 329 \nphenicol resistance and ﬂuoroquinolone resistance mutations decreased. Proportions of these 330 \ncharacters were in broad agreement with known coarser-grained observations from Tanzania 331 \n(Mapunjo et al., 2025; Sangeda et al., 2021) and existing Tanzanian genomes from Pathogenwatch, 332 \nbut we observed more acquired rifampicin resistance and lower acquired phenicol resistance than 333 \nin existing samples. Our samples from Zanzibar showed lower relative proportions of 334 \naminoglycoside and β-lactam resistance, and higher prevalence of outer membrane protein 335 \nmutations. 336 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 337 \nFigure 7. Prediction of new evolutionary transitions in KpAMR. (A) Pathogenwatch (grey) and newly 338 \nsequenced 2001-2 (blue) and 2017-8 (red) genome properties from Tanzania, connected by a phylogeny 339 \nestimated from average nucleotide identity (ANI). (B) For each independent transition in the set of newly 340 \nsequenced observations, we used HyperTraPS trained on existing data to predict which characters would 341 \nlikely evolve next given the ancestral state. The bars show the predicted rankings of the characters that were 342 \ntruly observed to evolve in the data. The trained model (blue-green) overwhelmingly predicted top rankings for 343 \nthe characters that were truly acquired; an untrained model (red) predicts intermediate rankings. Character 344 \nnames given in Fig. 1 caption. (C) Proportion of Tanzanian isolates in Kleborate and new datasets (uncorrected 345 \nfor relatedness) where di\\erent KpAMR characters are present. 346 \n 347 \nDiscussion 348 \nPowerful data-driven approaches to study the evolution of AMR are emerging in parallel with large 349 \ngenomic datasets (Argimón et al., 2021; Hetland et al., 2025; Holt et al., 2015; Munk et al., 2022; 350 \nWyres et al., 2020). Most of these studies focus on the current genomic structure of pathogen 351 \npopulations. In this research we have attempted to take a complementary perspective, using large-352 \nscale genome data while focusing on the (inferred) evolutionary dynamics that produce these 353 \ngenomic patterns. We believe that this new way of working has substantial power to support other 354 \ndata-driven approaches, particularly in the prediction of unseen and future evolutionary behaviours 355 \nof pathogens (Renz et al., 2025).  356 \nThe core technology we use, HyperTraPS, is an instance of evolutionary accumulation modelling or 357 \nEvAM (Diaz-Uriarte & Herrera-Nieto, 2022; Diaz-Uriarte & Johnston, 2024). EvAM has previously 358 \nbeen used to learn AMR pathways, including in HIV (Beerenwinkel et al., 2005) and multidrug 359 \nresistance in tuberculosis (Aga et al., 2024; Greenbury et al., 2020; Moen & Johnston, 2023; Renz et 360 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nal., 2024). One question that has remained is to what extent such results reﬂect universal, general 361 \nbehaviors, versus a response to the speciﬁc selective pressures arising from a given country’s drug 362 \nuse (for example). Our global comparison suggests a form of answer to this question. A subset of 363 \nthe KpAMR characters we consider appear to behave similarly across countries regardless of 364 \nspeciﬁc selective pressure, drug regimes, and other diUerences (those aligned with PCA1). Another 365 \nset of characters (those aligned with PCA2-3) evolves in a way that is more dependent on country- 366 \nand region-speciﬁc inﬂuences, which we have shown include public health superregions and drug 367 \nuse levels – although certainly other factors, from population change to the prevalence of other 368 \ndiseases, will also inﬂuence these dynamics. While these results are speciﬁc to Kp, and the 369 \ncharacters we consider, this combination of global and country-speciﬁc behaviors may reasonably 370 \nhold across other pathogens. 371 \nInferring evolutionary dynamics of multiple, coupled traits across groups is a challenging problem. 372 \nThere will be countless, multiscale selective inﬂuences that impact the evolution of speciﬁc 373 \nlineages of Kp around the globe. Our perspective attempts to “coarse-grain” away from these 374 \n(unobservable) speciﬁc pressures and consider their aggregated eUect on large-scale genomic 375 \nproperties in populations. This perspective is not without challenges. Some of these are: (a) The 376 \nacquisition of many of the KpAMR characters we consider is reversible (for example, mediated by 377 \nmobile genetic elements), which challenges an assumption of our modelling. (b) We assume that 378 \ngenomes from a country can be reasonably placed on their own phylogeny. (c) We use datasets 379 \nwhich likely exhibit bias towards AMR presence: in most circumstances, due to timing and/or 380 \nhuman research focus, genome databases will be enriched for cases with AMR characters acquired. 381 \nAnd (d), in most of this analysis, we use relative evolutionary timings rather than the absolute real-382 \nworld (day-month-year) timings associated with observed genomes. For (a), we have shown with a 383 \nsynthetic control study that violating the irreversibility assumption does not dramatically challenge 384 \nour ﬁndings. (We note, in parallel to this point, that simultaneous acquisition of multiple features is 385 \nvery compatible with our approach; the resulting output simply places equal probabilities over all 386 \ncompatible ordering pathways.) For (b), coupling between countries (through, for example, host or 387 \npathogen migration) will not aUect the interpretation of our results, which make statements 388 \nconditioned on the set of pathogens reported within a country. And likewise for (c), the enrichment 389 \nof AMR characters in our source data does not introduce systematic bias in our results, which report 390 \non accumulation of AMR characters, controlling for the existing AMR proﬁle of an ancestral state in 391 \neach case. Our phylogenetic reconstruction bridges the gap between observed, AMR-enriched 392 \ngenomes and their unobserved, less-enriched ancestors, ensuring that the whole spectrum of 393 \ncharacter proﬁles is captured in the underlying model. 394 \nFor (d), the precise timescales of the evolutionary process are in principle addressable using a 395 \nversion of our approach called HyperTraPS-CT (continuous time) (Aga et al., 2024), but this requires 396 \na consistent estimation of divergence times between all isolates in the dataset. Importantly, our 397 \napproach here does not completely neglect a timescale – it is implicit in the phylogenetic 398 \nrelationships included in the source data, and this inclusion both accounts for the temporal 399 \nordering of events and “pseudoreplication” due to the relatedness of sampled genomes (Boyko & 400 \nBeaulieu, 2023; Dewar et al., 2025; Renz et al., 2025). However, this approach based on relative 401 \nevolutionary timings means that we cannot directly tie diUerences in inferred evolutionary 402 \nbehaviour to known real-world historical events shaping AMR proﬁles: shifts in drug policy, for 403 \nexample, or outbreaks or other events changing the eUective size of the evolving population. While 404 \nour results, focusing on the ordering of acquisition events, already demonstrate descriptive and 405 \npredictive power in KpAMR evolution across countries, a continuous time picture on a smaller scale 406 \nin the future (perhaps a single country or small set) would enable ﬁner-grained linking between 407 \nmodel prediction and the real-world details of policy and disease history. 408 \nIn learning estimates for evolutionary dynamics, HyperTraPS also infers potential interactions 409 \nbetween characters, including whether the acquisition of one character makes acquiring another 410 \ncharacter less likely. Such “repressive interactions” , if truly present, could form the basis of 411 \ncombination therapies: resistance to drug X makes resistance to drug Y less likely. These 412 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \ninteractions correspond to the phenomenon of collateral sensitivity in AMR, where resistance to one 413 \ndrug induces sensitivity to another (Pál et al., 2015; Szybalski & Bryson, 1952): for example, in 414 \nEscherichia coli, beta-lactamase expression is linked to heightened sensitivity to colistin (among 415 \nother) (Herencias et al., 2024). While not observed consistently throughout all countries in our 416 \ndataset, a subset of countries showed statistical support for some such “repressive interactions” 417 \n(Supp. Fig. 3). These include, for example, MLS resistance repressing the acquisition of carbapenem 418 \nresistance. Such indications could suggest more detailed followup investigation to verify and 419 \nexplore mechanisms behind these possibly exploitable collateral interactions. 420 \nIn conclusion, we hope to show that evolutionary accumulation modelling – inferring the historic 421 \nevolutionary pathways of AMR acquisition, in addition to considering contemporary properties – can 422 \ncomplement and enhance established ways of working in AMR genomic analysis. By considering 423 \nthe dynamics by which global KpAMR patterns have become established, a collection of research 424 \ndirections are opened or expanded, including the similarities and diUerences between countries, 425 \nlinks with social and environmental covariates, and predictions of future dynamics. We anticipate 426 \nthat these methods may readily be applied to provide insight into AMR evolution in other pathogens 427 \nin future. 428 \nMethods 429 \nExisting Klebsiella data acquisition 430 \nWe obtained all 47 721 Klebsiella pneumoniae sensu stricto records from pathogenwatch as of 431 \nMarch 2024 (Argimón et al., 2021). The most common isolation years were 2015-2020, with a peak 432 \nat 2018 (Fig. 1A). We used AMR features (genes and mutations) reported by drug class as reported 433 \nby Kleborate version 2.3 (Lam et al., 2021). Kleborate groups genes or mutations known to confer 434 \nresistance to clinically relevant antibiotic groups and related phenotypes. Betalactamases are 435 \nfurther grouped by enzyme activity (Lam et al., 2021). We consider a set of L = 22 drug classes for 436 \neach genome, dichotomizing Kleborate output by presence of any gene or mutation related to each 437 \ngroup. That is, if there is any gene or mutation present for a resistance phenotype, the isolate is 438 \nconsidered resistant, otherwise it is considered susceptible. This assumption holds for most 439 \nresistance phenotypes considered in this study. However, ﬂuoroquinolone resistance (Flq-m) often 440 \nrequire multiple mutations to cause resistance. Thus, the presence of these characters cannot be 441 \ninterpreted as a resistance phenotype, but rather an increased MIC (minimum inhibitory 442 \nconcentration). The isolates were divided by country based on metadata, and countries were 443 \ngrouped into superregions as deﬁned by the Global Burden of Disease regional classiﬁcation 444 \nsystem (Rudd et al., 2020). A coarse-grained phylogeny was generated via LIN-codes (Life 445 \nidentiﬁcation numbers) using lincoding.py supplied by (Hennart et al., 2022). 2 102 genomes were 446 \nexcluded due to missing metadata. Any country with less than two genomes available were 447 \nexcluded due to the inability to construct a tree. The resistance proﬁles and phylogenetic tree were 448 \ncombined into a phylogenetic tree annotated with resistance to drug classes.  449 \nEvolutionary pathway inference with hypercubic transition path sampling (HyperTraPS) 450 \nHyperTraPS (Aga et al., 2024) models an “evolutionary space” containing every possible state of a 451 \nsystem involving L binary characters, then infers the probabilities of (Markovian) transitions between 452 \nstates in this space (here, involving the ordered accumulation of diUerent AMR characters) that is 453 \nmost compatible with observed data (Greenbury et al., 2020; Johnston & Williams, 2016). A recent 454 \noverview of the method in an AMR context is given in (Renz et al., 2024). The source data is a 455 \ncollection of length-L “barcodes” labelling presence or absence of our characters -- resistance to 456 \ndrug classes as deﬁned in Kleborate -- on the tips of a phylogeny. Ancestral state reconstruction 457 \n(here, assuming that AMR resistance character accumulation is rare and irreversible, so that an 458 \nancestor possesses a character if and only if all descendants possess it) is used to infer ancestral 459 \nstates, and thereby construct a set of transitions from the data. These transitions are the input data 460 \nfor the HyperTraPS algorithm. The target of inference is an L x L matrix 𝜃, where 𝜃!! is the base rate 461 \nfor acquiring character i, and 𝜃!\" is the inﬂuence that acquisition of character j has on the base rate 462 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nof acquisition of character i (as in mutual hazard networks (Schill et al., 2020)). In this way, positive 463 \nand negative interactions between the acquisition of diUerent AMR characters is supported, so that 464 \n(for example) acquisition of resistance to drug X can make acquisition of resistance to drug Y more 465 \n(or less) likely. The output of the inference process is, most broadly, a transition matrix 𝜆, with 466 \nelements 𝜆#→% describing the probability with which an isolate in state a will next transition to state 467 \nb.  For each of the 102 countries, territories and areas we ran HyperTraPS on three diUerent random 468 \nseeds to ensure that the results were consistent across runs (Supp. Fig. 2). Simulations were run for 469 \n106 steps with 103 random walkers for sampling, employing a penalised likelihood. The US genomes 470 \nwere sub-sampled to a subset of 1000 genomes; three diUerent subsamples clustered closer 471 \ntogether than with any other countries in the PCA plot.  472 \nSynthetic control study for inference of reversible characters 473 \nTo determine the eUect of our assumption of irreversible accumulation of AMR characters (given 474 \nthat the loss of plasmids is not uncommon in Kp evolution (Holt et al., 2015)), we constructed a 475 \nsynthetic control study. A random phylogeny over 128 tips was constructed using a birth-death 476 \nprocess with birth rate 1 and death rate 0.5 (chosen empirically to roughly match the phylogenies 477 \nfrom our case studies) using phangorn (Schliep, 2011). A synthetic accumulation process reﬂecting 478 \na single evolutionary pathway was simulated on this phylogeny. Reversibility was captured by 479 \nmodelling random loss of the ﬁrst L/2 characters as a Poisson process with a given rate parameter. 480 \nThe same HyperTraPS pipeline (assuming irreversibility), including ancestral state reconstruction 481 \nand pathway inference, was run on both the pre-loss and post-loss datasets, and the outputs 482 \ncompared (Supp. Fig. 1). 483 \nGlobal structure in inferred pathways 484 \nThe outputs of HyperTraPS can be summarized as a matrix P, where element Pij gives the posterior 485 \nprobability that character i is acquired at ordering j in a putative evolutionary process from an 486 \nancestor with no AMR characters towards a ﬁnal state with all AMR characters. For example, P32 487 \ngives the probability that the third character is the second to be acquired in such an evolutionary 488 \nprocess. We obtained this matrix for each country in our dataset, then performed principal 489 \ncomponents analysis (PCA) on the set of matrices (Williams et al., 2013). We observed the structure 490 \nof matrices with “bubble plots” , where an array of points is plotted with areas proportional to Pij. 491 \nDrug usage data 492 \nWe collected antibiotic consumption data from 57 countries, territories and areas enrolled in the 493 \nWHO Glass surveillance program in the period 2016 to 2021 (Global Antimicrobial Resistance and 494 \nUse Surveillance System (GLASS) Report 2022, 2022). Overlaying the consumption data with the 495 \ncountries that had 3 or more available genomes reduced the number to 53. We use the median 496 \nconsumption statistics over the available time points for each country. 497 \nNew Klebsiella data acquisition 498 \nThe 79 newly sequenced Kp isolates in this article are derived from a collection of studies 499 \nperformed in Tanzania and Zanzibar over the last three decades. These comprise bacterial isolates 500 \nfrom (a) blood samples from pediatric patients with bloodstream infections in Dar-es-Salaam, 501 \nTanzania in 2001-2002; (b) blood samples from adult patients with bloodstream infections in Dar-502 \nes-Salaam, Tanzania in 2017-2018; (c) fecal samples from the patients in (b); and (d) blood samples 503 \nfrom adult and pediatric patients with bloodstream infections in Mnazi Mmoja hospital, Zanzibar in 504 \n2015-2016. These cultures were sequenced by MicrobesNG (MicrobesNG, Birmingham, UK)\tusing 505 \nIllumina HiSeq technology. Assembled contigs were provided by the sequencing service, which 506 \nformed the initial step for the bioinformatics pipeline below.  507 \nNew Klebsiella data analysis 508 \nThe analysis pipeline begins with the contigs from the initial sequencing and assembly process. We 509 \nﬁrst used dnadiQ, which wraps nucmer from the MUMmer 3.0 package (Kurtz et al., 2004), to 510 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nscaUold the contigs from a given isolate on the Klebsiella reference genome 511 \nGCA_000240185.2_ASM24018v2 (Liu et al., 2012) and report statistics of the alignment. We 512 \nretained only isolates with <20% unaligned bases as instances of Klebsiella. Combining the newly 513 \nsequenced isolates with the existing Tanzanian genomes from Pathogenwatch, we then ran pairwise 514 \ndnadiQ across the combined set. We extracted the average identity across the aligned regions for 515 \neach pair and recorded this as the ANI (average nucleotide identity). We then used d = 1-ANI as a 516 \ndistance measure for phylogenetic tree estimation using the unweighted pair group method with 517 \narithmetic mean (UPGMA) (Sokal & Michener, 1958) implemented in the phangorn (Schliep, 2011) 518 \npackage in R (R Core Team, 2022). We conﬁrmed that an alternative method, neighbourhood joining 519 \n(Saitou & Nei, 1987), did not give qualitatively diUerent results for the predictions of AMR evolution. 520 \nWe used AMR proﬁles of existing and new isolates to reconstruct ancestral states across this 521 \ncomplete tree, then identiﬁed the set of subtrees that had only newly-sequenced isolates at their 522 \ntips. As transitions within such subtrees are independent of transitions in the (Pathogenwatch) 523 \ntraining data set, we used this set of transitions as our independent test set. 524 \nTesting predicted dynamics from the trained model on new data 525 \nWe queried the model ﬁtted from the training data to predict the likely next states from each 526 \nprecursor state in this independent set of transitions (Aga et al., 2024; Renz et al., 2024). We ranked 527 \nthe possible next steps from each state from most likely to least likely under the ﬁtted model. We 528 \nthen recorded the ranks corresponding to the actual step(s) corresponding to the observed 529 \ntransition, and compared these to a null model where each possible transition could occur with 530 \nequal probability. 531 \nData and code availability 532 \nThe code base for this study is publically available at https://github.com/StochasticBiology/kp-533 \nevolution-inference. In addition to the software referenced above, we use R (R Core Team, 2022) 534 \nwith libraries hypertrapsct for evolutionary inference (Aga et al., 2024), dplyr (Wickham et al., 2023), 535 \ntidyverse (Wickham et al., 2019), countrycode (Arel-Bundock et al., 2018) for data manipulation, 536 \nlme4 (Bates et al., 2015) for statistics, phytools (Revell, 2012) for phylogenetic analysis, and ggplot2 537 \n(Wickham, 2016), ggpubr (Kassambara, 2020), ggbeeswarm (Clarke et al., 2016), ggrepel 538 \n(Slowikowski, 2021), ggupset (Ahlmann-Eltze, 2025) for visualisation. We also use a Python script for 539 \nLINcoding by Melanie Hennart, which uses numpy (Harris et al., 2020). Authors’ note: we are in the 540 \nprocess of uploading the new Kp sequences to a repository; a link will be provided when this 541 \nprocess is complete. In the meantime the raw FASTA ﬁles (which are the input for the pipeline) can 542 \nbe downloaded from https://osf.io/36r45 . 543 \nAcknowledgements 544 \nThis project has received funding from the European Research Council (ERC) under the European 545 \nUnion’s Horizon 2020 Research and Innovation Programme [Grant agreement No. 805046 546 \n(EvoConBiO) to I.G.J.]. This project was supported by the Trond Mohn Foundation [project HyperEvol 547 \nunder grant agreement No. TMS2021TMT09], through the Centre for Antimicrobial Resistance in 548 \nWestern Norway (CAMRIA) [TMS2020TMT11], This project has received support from ERA-Net: JPI-549 \nAMR STRESST project (NFR333432). The authors are grateful to the CAMRIA collaboration and the 550 \nStochastic Biology Group at UiB for useful discussions. 551 \n 552 \nReferences 553 \nAga, O. N., Brun, M., Giannakis, K., Dauda, K. A., Diaz-Uriarte, R., & Johnston, I. G. (2024). HyperTraPS-CT: 554 \nInference and prediction for accumulation pathways with ﬂexible data and model structures. PLOS 555 \nComputational Biology, 20(9), e1012393. 556 \nAhlmann-Eltze, C. (2025). ggupset: Combination Matrix Axis for ‘ggplot2’ to Create ‘UpSet’ Plots [R package]. 557 \nArel-Bundock, V ., Enevoldsen, N., & Yetman, C. (2018). countrycode: An R package to convert country names 558 \nand country codes. Journal of Open Source Software, 3(28), 848. https://doi.org/10.21105/joss.00848 559 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nArgimón, S., David, S., Underwood, A., Abrudan, M., Wheeler, N. E., Kekre, M., Abudahab, K., Yeats, C. A., 560 \nGoater, R., Taylor, B., Harste, H., Muddyman, D., Feil, E. J., Brisse, S., Holt, K., Donado-Godoy, P ., 561 \nRavikumar, K. L., Okeke, I. N., Carlos, C., … NIHR Global Health Research Unit on Genomic 562 \nSurveillance of Antimicrobial Resistance. (2021). Rapid Genomic Characterization and Global 563 \nSurveillance of Klebsiella Using Pathogenwatch. Clinical Infectious Diseases, 73(Supplement_4), 564 \nS325–S335. https://doi.org/10.1093/cid/ciab784 565 \nBates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-E\\ects Models Using lme4. Journal 566 \nof Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01 567 \nBeerenwinkel, N., Däumer, M., Sing, T., Rahnenführer, J., Lengauer, T., Selbig, J., Ho\\mann, D., & Kaiser, R. 568 \n(2005). Estimating HIV evolutionary pathways and the genetic barrier to drug resistance. The Journal 569 \nof Infectious Diseases, 191(11), 1953–1960. 570 \nBlomberg, B., Manji, K. P ., Urassa, W. K., Tamim, B. S., Mwakagile, D. S. M., Jureen, R., Msangi, V ., Tellevik, M. 571 \nG., Holberg-Petersen, M., Harthug, S., Maselle, S. Y., & Langeland, N. (2007). Antimicrobial resistance 572 \npredicts death in Tanzanian children with bloodstream infections: A prospective cohort study. BMC 573 \nInfectious Diseases, 7, 43. https://doi.org/10.1186/1471-2334-7-43 574 \nBoyko, J. D., & Beaulieu, J. M. (2023). Reducing the biases in false correlations between discrete characters. 575 \nSystematic Biology, 72(2), 476–488. 576 \nCasali, N., Nikolayevskyy, V ., Balabanova, Y ., Harris, S. R., Ignatyeva, O., Kontsevaya, I., Corander, J., Bryant, J., 577 \nParkhill, J., Nejentsev, S., Horstmann, R. D., Brown, T., & Drobniewski, F. (2014). Evolution and 578 \ntransmission of drug-resistant tuberculosis in a Russian population. Nature Genetics, 46(3), Article 3. 579 \nhttps://doi.org/10.1038/ng.2878 580 \nClarke, E., Sherrill-Mix, S., & Dawson, C. (2016). ggbeeswarm: Categorical scatter (violin point) plots. CRAN: 581 \nContributed Packages. https://cir.nii.ac.jp/crid/1360019690351052672 582 \nDewar, A. E., Belcher, L. J., & West, S. A. (2025). A phylogenetic approach to comparative genomics. Nature 583 \nReviews Genetics, 26(6), 395–405. https://doi.org/10.1038/s41576-024-00803-0 584 \nDiaz-Uriarte, R., & Herrera-Nieto, P . (2022). EvAM-Tools: Tools for evolutionar y accumulation and cancer 585 \nprogression models. Bioinformatics, 38(24), 5457–5459. 586 \nDiaz-Uriarte, R., & Johnston, I. G. (2024). A picture guide to cancer progression and monotonic accumulation 587 \nmodels: Evolutionary assumptions, plausible interpretations, and alternative uses 588 \n(arXiv:2312.06824). arXiv. https://doi.org/10.48550/arXiv.2312.06824 589 \nGlobal Antimicrobial Resistance and Use Surveillance System (GLASS) Report 2022 (1st ed). (2022). World 590 \nHealth Organization. 591 \nGreenbury, S. F ., Barahona, M., & Johnston, I. G. (2020). HyperTraPS: Inferring probabilistic patterns of trait 592 \nacquisition in evolutionary and disease progression pathways. Cell Systems, 10(1), 39–51. 593 \nGupta, S. K., Padmanabhan, B. R., Diene, S. M., Lopez-Rojas, R., Kempf, M., Landraud, L., & Rolain, J.-M. 594 \n(2014). ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial 595 \ngenomes. Antimicrobial Agents and Chemotherapy, 58(1), 212–220. 596 \nhttps://doi.org/10.1128/AAC.01310-13 597 \nHarris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P ., Cournapeau, D., Wieser, E., Taylor, J., 598 \nBerg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, 599 \nJ. F., Wiebe, M., Peterson, P ., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 600 \n585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2 601 \nHennart, M., Guglielmini, J., Bridel, S., Maiden, M. C. J., Jolley, K. A., Criscuolo, A., & Brisse, S. (2022). A Dual 602 \nBarcoding Approach to Bacterial Strain Nomenclature: Genomic Taxonomy of Klebsiella pneumoniae 603 \nStrains. Molecular Biology and Evolution, 39(7), msac135. https://doi.org/10.1093/molbev/msac135 604 \nHerencias, C., Álvaro-Llorente, L., Ramiro-Martínez, P ., Fernández-Calvet, A., Muñoz-Cazalla, A., DelaFuente, 605 \nJ., Graf, F. E., Jaraba-Soto, L., Castillo-Polo, J. A., Cantón, R., San Millán, Á., & Rodríguez-Beltrán, J. 606 \n(2024). β-lactamase expression induces collateral sensitivity in Escherichia coli. Nature 607 \nCommunications, 15(1), 4731. https://doi.org/10.1038/s41467-024-49122-2 608 \nHetland, M. A. K., Winkler, M. A., Kaspersen, H. P ., Håkonsholm, F ., Bakksjø, R.-J., Bernho\\, E., Delgado-Blas, 609 \nJ. F., Brisse, S., Correia, A., Fostervold, A., Lam, M. M. C., Lunestad, B.-T., Marathe, N. P., 610 \nRa\\elsberger, N., Samuelsen, Ø., Sunde, M., Sundsfjord, A., Urdahl, A. M., Wick, R. R., … Holt, K. E. 611 \n(2025). A genome-wide One Health study of Klebsiella pneumoniae in Norway reveals overlapping 612 \npopulations but few recent transmission events across reservoirs. Genome Medicine, 17(1), 42. 613 \nhttps://doi.org/10.1186/s13073-025-01466-0 614 \nHolt, K. E., Wertheim, H., Zadoks, R. N., Baker, S., Whitehouse, C. A., Dance, D., Jenney, A., Connor, T. R., Hsu, 615 \nL. Y ., Severin, J., Brisse, S., Cao, H., Wilksch, J., Gorrie, C., Schultz, M. B., Edwards, D. J., Nguyen, K. V ., 616 \nNguyen, T. V ., Dao, T. T., … Thomson, N. R. (2015). Genomic analysis of diversity, population structure, 617 \nvirulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. 618 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nProceedings of the National Academy of Sciences, 112(27), E3574–E3581. 619 \nhttps://doi.org/10.1073/pnas.1501049112 620 \nJohnston, I. G., & Diaz-Uriarte, R. (2024). A hypercubic Mk model framework for capturing reversibility in 621 \ndisease, cancer, and evolutionary accumulation modelling (p. 2024.06.27.600959). bioRxiv. 622 \nhttps://doi.org/10.1101/2024.06.27.600959 623 \nJohnston, I. G., & Williams, B. P . (2016). Evolutionary Inference across Eukaryotes Identiﬁes Speciﬁc Pressures 624 \nFavoring Mitochondrial Gene Retention. Cell Systems, 2(2), 101–111. 625 \nhttps://doi.org/10.1016/j.cels.2016.01.013 626 \nKassambara, A. (2020). Ggpubr:“ggplot2” based publication ready plots. R Package Version 0.4. 0, 438. 627 \nKurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C., & Salzberg, S. L. (2004). 628 \nVersatile and open software for comparing large genomes. Genome Biology, 5(2), R12. 629 \nhttps://doi.org/10.1186/gb-2004-5-2-r12 630 \nLam, M. M. C., Wick, R. R., Watts, S. C., Cerdeira, L. T., Wyres, K. L., & Holt, K. E. (2021). A genomic 631 \nsurveillance framework and genotyping tool for Klebsiella pneumoniae and its related species 632 \ncomplex. Nature Communications, 12(1), 4188. https://doi.org/10.1038/s41467-021-24448-3 633 \nLiu, P ., Li, P ., Jiang, X., Bi, D., Xie, Y ., Tai, C., Deng, Z., Rajakumar, K., & Ou, H.-Y.  ( 2 0 1 2 ) .  C o m p l e t e  g e n o m e  634 \nsequence of Klebsiella pneumoniae subsp. Pneumoniae HS11286, a multidrug-resistant strain 635 \nisolated from human sputum. Journal of Bacteriology, 194(7), 1841–1842. 636 \nhttps://doi.org/10.1128/JB.00043-12 637 \nLuo, X. G., Kuipers, J., & Beerenwinkel, N. (2023). Joint inference of exclusivity patterns and recurrent 638 \ntrajectories from tumor mutation trees. Nature Communications, 14(1), Article 1. 639 \nhttps://doi.org/10.1038/s41467-023-39400-w 640 \nMaddison, W. P ., & FitzJohn, R. G. (2015). The Unsolved Challenge to Phylogenetic Correlation Tests for 641 \nCategorical Characters. Systematic Biology, 64(1), 127–136. https://doi.org/10.1093/sysbio/syu070 642 \nMapunjo, S., Mbwasi, R., Nkiligi, E. A., Wilbroad, A., Francis, E. N., Msovela, K., Yahya, T., Mpembeni, R., 643 \nMasunga, E., Nkungu, K., Saitoti, S., Lusaya, E., & Konduri, N. (2025). National consumption of 644 \nantimicrobials in Tanzania: 2020–2022. JAC-Antimicrobial Resistance, 7(2), dlaf026. 645 \nhttps://doi.org/10.1093/jacamr/dlaf026 646 \nMoen, M. T., & Johnston, I. G. (2023). HyperHMM: E\\icient inference of evolutionary and progressive dynamics 647 \non hypercubic transition graphs. Bioinformatics, 39(1), btac803. 648 \nMoyo, S. J., Manyahi, J., Blomberg, B., Tellevik, M. G., Masoud, N. S., Aboud, S., Manji, K., Roberts, A. P ., 649 \nHanevik, K., Mørch, K., & Langeland, N. (2020). Bacteraemia, Malaria, and Case Fatality Among 650 \nChildren Hospitalized With Fever in Dar es Salaam, Tanzania. Frontiers in Microbiology, 11, 2118. 651 \nhttps://doi.org/10.3389/fmicb.2020.02118 652 \nMunk, P ., Brinch, C., Møller, F . D., Petersen, T. N., Hendriksen, R. S., Seyfarth, A. M., Kjeldgaard, J. S., 653 \nSvendsen, C. A., van Bunnik, B., Berglund, F ., Larsson, D. G. J., Koopmans, M., Woolhouse, M., & 654 \nAarestrup, F . M. (2022). Genomic analysis of sewage from 101 countries reveals global landscape of 655 \nantimicrobial resistance. Nature Communications, 13(1), 7251. https://doi.org/10.1038/s41467-022-656 \n34312-7 657 \nMurray, C. J. L., Ikuta, K. S., Sharara, F ., Swetschinski, L., Aguilar, G. R., Gray, A., Han, C., Bisignano, C., Rao, P ., 658 \nWool, E., Johnson, S. C., Browne, A. J., Chipeta, M. G., Fell, F., Hackett, S., Haines-Woodhouse, G., 659 \nHamadani, B. H. K., Kumaran, E. A. P ., McManigal, B., … Naghavi, M. (2022). Global burden of 660 \nbacterial antimicrobial resistance in 2019: A systematic analysis. The Lancet, 399(10325), 629–655. 661 \nhttps://doi.org/10.1016/S0140-6736(21)02724-0 662 \nNaghavi, M., Vollset, S. E., Ikuta, K. S., Swetschinski, L. R., Gray, A. P ., Wool, E. E., Aguilar, G. R., Mestrovic, T., 663 \nSmith, G., Han, C., Hsu, R. L., Chalek, J., Araki, D. T., Chung, E., Raggi, C., Hayoon, A. G., Weaver, N. 664 \nD., Lindstedt, P . A., Smith, A. E., … Murray, C. J. L. (2024). Global burden of bacterial antimicrobial 665 \nresistance 1990–2021: A systematic analysis with forecasts to 2050. The Lancet, 404(10459), 1199–666 \n1226. https://doi.org/10.1016/S0140-6736(24)01867-1 667 \nOnken, A., Moyo, S., Miraji, M. K., Bohlin, J., Marijani, M., Manyahi, J., Kibwana, K. O., Müller, F ., Jenum, P . A., 668 \nAbeid, K. A., Reimers, M., Langeland, N., Mørch, K., & Blomberg, B. (2024). Predominance of 669 \nmultidrug-resistant Salmonella Typhi genotype 4.3.1 with low-level ciproﬂoxacin resistance in 670 \nZanzibar. PLOS Neglected Tropical Diseases, 18(4), e0012132. 671 \nhttps://doi.org/10.1371/journal.pntd.0012132 672 \nOnken, A., Said, A. K., Jørstad, M., Jenum, P . A., & Blomberg, B. (2015). Prevalence and Antimicrobial 673 \nResistance of Microbes Causing Bloodstream Infections in Unguja, Zanzibar. PLoS ONE, 10(12), 674 \ne0145632. https://doi.org/10.1371/journal.pone.0145632 675 \nPál, C., Papp, B., & Lázár, V. (2015). Collateral sensitivity of antibiotic-resistant microbes. Trends in 676 \nMicrobiology, 23(7), 401–407. https://doi.org/10.1016/j.tim.2015.02.009 677 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nR Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical 678 \nComputing, Vienna, Austria. 2012. 679 \nRenz, J., Dauda, K. A., Aga, O. N. L., Diaz-Uriarte, R., Löhr, I. H., Blomberg, B., & Johnston, I. G. (2024). 680 \nEvolutionary accumulation modelling in AMR: Machine learning to infer and predict evolutionary 681 \ndynamics of multi-drug resistance (arXiv:2411.00219). arXiv. 682 \nhttps://doi.org/10.48550/arXiv.2411.00219 683 \nRenz, J., Dauda, K. A., Aga, O. N. L., Diaz-Uriarte, R., Löhr, I. H., Blomberg, B., & Johnston, I. G. (2025). 684 \nEvolutionary accumulation modeling in AMR: Machine learning to infer and predict evolutionary 685 \ndynamics of multi-drug resistance. mBio, 16(6), e00488-25. https://doi.org/10.1128/mbio.00488-25 686 \nRevell, L. J. (2012). phytools: An R package for phylogenetic comparative biology (and other things). Methods 687 \nin Ecology and Evolution, 2, 217–223. 688 \nRudd, K. E., Johnson, S. C., Agesa, K. M., Shackelford, K. A., Tsoi, D., Kievlan, D. R., Colombara, D. V ., Ikuta, K. 689 \nS., Kissoon, N., Finfer, S., Fleischmann-Struzek, C., Machado, F . R., Reinhart, K. K., Rowan, K., 690 \nSeymour, C. W., Watson, R. S., West, T. E., Marinho, F ., Hay, S. I., … Naghavi, M. (2020). Global, 691 \nregional, and national sepsis incidence and mortality, 1990–2017: Analysis for the Global Burden of 692 \nDisease Study. The Lancet, 395(10219), 200–211. https://doi.org/10.1016/S0140-6736(19)32989-7 693 \nSaitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic 694 \ntrees. Molecular Biology and Evolution, 4(4), 406–425. 695 \nSangeda, R. Z., Saburi, H. A., Masatu, F . C., Aiko, B. G., Mboya, E. A., Mkumbwa, S., Bitegeko, A., Mwalwisi, Y . 696 \nH., Nkiligi, E. A., Chambuso, M., Sillo, H. B., Fimbo, A. M., & Horumpende, P . G. (2021). National 697 \nAntibiotics Utilization Trends for Human Use in Tanzania from 2010 to 2016 Inferred from Tanzania 698 \nMedicines and Medical Devices Authority Importation Data. Antibiotics, 10(10), 1249. 699 \nhttps://doi.org/10.3390/antibiotics10101249 700 \nSchill, R., Klever, M., Rupp, K., Hu, Y . L., Lösch, A., Georg, P ., Pfahler, S., Vocht, S., Hansch, S., Wettig, T., 701 \nGrasedyck, L., & Spang, R. (2024). Reconstructing Disease Histories in Huge Discrete State Spaces. KI 702 \n- Künstliche Intelligenz. https://doi.org/10.1007/s13218-023-00822-9 703 \nSchill, R., Solbrig, S., Wettig, T., & Spang, R. (2020). Modelling cancer progression using mutual hazard 704 \nnetworks. Bioinformatics, 36(1), 241–249. 705 \nSchliep, K. P . (2011). phangorn: Phylogenetic analysis in R. Bioinformatics, 27(4), 592–593. 706 \nhttps://doi.org/10.1093/bioinformatics/btq706 707 \nSlowikowski, K. (2021). ggrepel: Automatically Position Non-Overlapping Text Labels with’ggplot2’ . R package 708 \nversion 0.9. 1, 2021. 709 \nSokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University 710 \nof Kansas Science Bulletin, 38, 1409. 711 \nSzybalski, W., & Bryson, V . (1952). Genetic studies on microbial cross resistance to toxic agents i. Journal of 712 \nBacteriology, 64(4), 489–499. https://doi.org/10.1128/jb.64.4.489-499.1952 713 \nTsang, K. K., Lam, M. M. C., Wick, R. R., Wyres, K. L., Bachman, M., Baker, S., Barry, K., Brisse, S., Campino, S., 714 \nChiaverini, A., Cirillo, D. M., Clark, T., Corander, J., Corbella, M., Cornacchia, A., Cuénod, A., D’Alterio, 715 \nN., Di Marco, F ., Donado-Godoy, P ., … KlebNET-GSP AMR Genotype-Phenotype Group. (2024). 716 \nDiversity, functional classiﬁcation and genotyping of SHV β-lactamases in Klebsiella pneumoniae. 717 \nMicrobial Genomics, 10(10), 001294. https://doi.org/10.1099/mgen.0.001294 718 \nWickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. 719 \nhttps://link.springer.com/book/10.1007/978-0-387-98141-3 720 \nWickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, 721 \nL., Hester, J., Kuhn, M., Pedersen, T., Miller, E., Bache, S., Müller, K., Ooms, J., Robinson, D., Seidel, D., 722 \nSpinu, V ., … Yutani, H. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 723 \n1686. https://doi.org/10.21105/joss.01686 724 \nWickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. 725 \nR package version 1.1. 2. Computer Software. 726 \nWilliams, B. P ., Johnston, I. G., Covsho\\, S., & Hibberd, J. M. (2013). Phenotypic landscape inference reveals 727 \nmultiple evolutionary paths to C4 photosynthesis. eLife, 2, e00961. 728 \nhttps://doi.org/10.7554/eLife.00961 729 \nWyres, K. L., & Holt, K. E. (2018). Klebsiella pneumoniae as a key tra\\icker of drug resistance genes from 730 \nenvironmental to clinically important bacteria. Current Opinion in Microbiology, 45, 131–139. 731 \nhttps://doi.org/10.1016/j.mib.2018.04.004 732 \nWyres, K. L., Lam, M. M. C., & Holt, K. E. (2020). Population genomics of Klebsiella pneumoniae. Nature 733 \nReviews. Microbiology, 18(6), 344–359. https://doi.org/10.1038/s41579-019-0315-1 734 \n 735 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n  736 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \nSupplementary Information 737 \n 738 \n 739 \nSupplementary Figure 1. Synthetic control study on reversibility. (top) A synthetic dataset involving the 740 \nprogressive accumulation of characters X1 to X10. The output inferred from HyperTraPS accurately reports the 741 \naccumulation ordering of characters, with ambiguity about the ﬁrst three characters which are observed 742 \nacross all samples. (bottom) A synthetic dataset involving reversibility. The same accumulation process is 743 \nsimulated, but with the potential for acquired characters to be lost reversibly. The output from HyperTraPS 744 \n(which assumes irreversibility) is not dramatically di\\erent, except for a moderate ampliﬁcation of the 745 \nuncertainty in the early characters. The e\\ect of reversibility here is to increase uncertainty rather than to 746 \nintroduce any systematic bias in the outputs. 747 \n 748 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 749 \nSupplementary Figure 2. All posterior orderings of AMR character acquisitions. This “global roadmap” 750 \nsummarises “bubble plots” describing inferred evolutionary dynamics of KpAMR characters in di\\erent 751 \ncountries. Each row is a country, ordered vertically by descending number of Kp samples. Each column is a 752 \nKpAMR character; the horizontal axis within columns gives evolutionary orderings (from early to late). The size 753 \nof a point gives the probability that that character is acquired at that ordering in that country. For example, 754 \nacross the vast majority of datasets (including the top row, USA, for example), Bla_chr has a high probability of 755 \nearly acquisition and Gly_acquired has a high probability of late acquisition. In Cameroon, all characters have 756 \nsimilar inferred evolutionary orderings, reﬂecting the limited data available to make more precise statements. 757 \nDi\\erent colour points give the output of runs with di\\erent random seeds, to demonstrate consistency of the 758 \nestimates. Character names given in Fig. 1 caption. 759 \n 760 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 761 \nSupplementary Figure 3. Inferred interactions between KpAMR characters. An arrow from character X to 762 \ncharacter Y reports an inferred interaction between the evolutionary acquisition of those characters. Blue 763 \narrow: acquiring X makes acquiring Y more likely (“promoting”). Red arrow: acquiring X makes acquiring Y less 764 \nlikely (“repressing”). The number on each arrow gives the proportion of countries in which that interaction was 765 \ninferred. Character names given in Fig. 1 caption. 766 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n767 \nSupplementary Figure 4. KpAMR characters determining aspects of global variability in evolutionary 768 \ndynamics (alternative visualisation). Connected to Fig. 4. KpAMR characters (horizontal axis) are plotted by 769 \ntheir expected acquisition ordering (vertical axis) for each country (points). The country’s projection on PCA1, 770 \nPCA2, and PCA3, for those characters that most strongly covary with each PCA axis, are given by the colour of 771 \na point. As in Fig. 4, those characters that covary with PCA1 form a consistent spectrum linked to precision: 772 \nlow values of PCA1 correspond to less precise, more uniform timing, while higher values lead to more precise 773 \nearlier or later timings. Characters covarying with the other PCAs display wider ranges connected to PCA1-774 \nindependent variability. Character names given in Fig. 1 caption. 775 \n  776 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint \n\n \n \n 777 \nSupplementary Figure 5. New KpAMR data from Zanzibar and Dar es Salaam, Tanzania. Phylogenies and 778 \nKpAMR proﬁles from new datasets: (A) Zanzibar 2015-6. (B) Tanzania 2001-2. (C) Tanzania 2017-8. (D) 779 \nPathogenwatch Tanzania. (E) Circular projection of the phylogeny of Tanzanian isolates in Fig. 7A. Colours: 780 \ngreen, 2001-2; blue, 2017-8; red, existing Pathogenwatch data. 781 \n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted September 21, 2025. ; https://doi.org/10.1101/2025.09.20.677523doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}