Abstract
Highly pathogenic avian influenza (HPAI) viruses cross species barriers and have the potential to cause
pandemics. In North America, HPAI A(H5N1) viruses related to the goose/Guangdong 2.3.4.4b
hemagglutinin phylogenetic clade have infected wild birds, poultry, and mammals. Our genomic analysis
and epidemiological investigation showed that a reassortment event in wild bird populations preceded a 5
single wild bird-to-cattle transmission episode. The movement of asymptomatic cattle has likely played a
role in the spread of HPAI within the United States dairy herd. Some molecular markers in virus populations
were detected at low frequency that may lead to changes in transmission efficiency and phenotype after
evolution in dairy cattle. Continued transmission of H5N1 HPAI within dairy cattle increases the risk for
infection and subsequent spread of the virus to human populations. 10
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
3
Introduction
Highly pathogenic avian influenza (HPAI) viruses have critical consequences for animal health, the
agricultural economy, and may have pandemic potential. HPAI related to the goose/Guangdong 2.3.4.4
hemagglutinin (HA) H5NX phylogenetic clade has spread to nearly 100 countries (1), infections resulted
in mortality events , and it is recognized as a panzootic , crossing multiple species barriers. Additionally, 5
HPAI virus circulation is enzootic in Europe, with consistent detections representing a potential shift in the
biology and transmission of HPAI (2). Following an initial trans-Atlantic incursion in late 2021 (3, 4), the
HPAI H5N1 clade 2.3.4.4b virus caused widespread outbreaks across North America (5). The outbreaks
resulted in extensive mortality events in wild bird species, mortality and culling of poultry when detected
in agricultural systems, a significant number of interspecies transmission events into wild mammals , and 10
human infections.
The relatively frequent and recent transmission of HPAI clade 2.3.4.4 between avian species and mammals
globally has resulted in persistence of genetic features associated with mammalian adaptation. Experimental
studies on a subset of viruses from the HPAI clade 2.3.4.4 have demonstrated that they can bind to both 15
human α2,6-linked and avian α2,3-linked sialic acid receptors (6, 7). Genomic analyses have documented
that approximately half of the sequences from mammals globally within the HPAI H5N1 clade 2.3.4.4b
have amino acid signatures in the polymerase basic (PB) 2 protein that have been associated with
mammalian adaptation through enhanced viral replication, host -specific polymerase activity, and
temperature sensitivity (E627K, D701N, and/or T271A) (5, 8 ). Additionally, the introduction of HPAI 20
H5N1 into farmed mink in Europe in 2022 provided evidence that transmission to, and within, a population
of mammalian hosts could result in mutations in the hemagglutinin associated with human receptor
recognition and in the neuraminidase protein that affected sialic acid binding in a manner similar to human
influenza A viruses (9-11). These results are particularly important as from January 2022 to April 1, 2024
there were 13 reported human cases of H5N1 from the HPAI clade 2.3.4.4b worldwide, with some having 25
severe consequences, including mortality (12). Consequently, it is critical to determine how evolution of
the HPAI clade 2.3.4.4b in wild birds and the associated spillovers and transmission in mammals impacts
genomic and phenotype features that alter the potential for human infection and transmission (13, 14).
On March 25 2024, HPAI H5N1 clade 2.3.4.4b was confirmed in dairy cattle in Texas following multistate 30
reports of decreased milk yields. Shortly thereafter, the virus was identified in cattle in eight other US states
by members of the National Animal Health Laboratory Network (NAHLN) (15-17). Virus was
predominantly found in mammary tissue and milk . It was also detected in cats and peridomestic animals
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
4
that died on affected premises. Overall, the detection of influenza A virus (IAV) in cattle has been rarely
documented (18, 19), but there is prior evidence for its replication within the mammary gland (20, 21),
association with a reduction in milk yield (22), and experimental studies have demonstrated that bovine
calves are susceptible to infection and may asymptomatically shed virus (23). The goal of this study was to
analyze available genetic sequence data collected following the introduction of HPAI H5N1 in late 2021 5
into the Atlantic flyway of North America and its onward circulation and reassortment with North American
wild-bird origin low pathogenicity viruses that resulted in over 100 distinct genotypes (24). These data were
synthesized with newly generated whole-genome sequence data and epidemiological information from the
outbreak among US dairy cattle to understand when the interspecies transmission event occurred and the
consequences of animal movement on the persistence and evolution of the virus. To achieve this, w e 10
performed phylodyn amic analysis of the US HPAI H5N1 viruses detected in dairy cattle alongside
epidemiologically linked wild bird, poultry, and peridomestic animal data. In addition, we present a within-
host evolutionary assessment of the virus to determine how onward transmission in dairy cattle affects
genomic diversity and whether this increases the potential for this host to serve as a reservoir for IAV with
zoonotic potential. 15
Results
H5 clade 2.3.4.4b introduction into the United States
The H5N1 clade 2.3.4.4b genotype A1 (24) was first identified in wild birds collected December 2021 (3).
Over 100 genotypes representing different gene constellations have been characterized but 70% of all US 20
viruses fall into only 7 genotypes (25). Despite these genetic differences, spillover events have been
associated with the predominance of a genotype rather than any specific link between genotype and host
specificity. Introduction of genotype A3 was identified in April 2022 likely via the Pacific flyway followed
by A4. Another two introductions were identified via the Atlantic flyway, including an A6 Eurasian virus
that had a reassorted Eurasian neuraminidase (H5N5). Widespread detections in wild bird populations (26) 25
continue to result in point source spillovers to poultry (27) but there is limited evidence for lateral
transmission among poultry flocks. Virus of multiple genotypes have also been detected in approximately
20 mammal species in the US often related to mortality or severe neurologic signs, but these appear to be
dead end hosts (28, 29). In early March 2024, HPAI H5N1 was identified in neurologic goat kids on a farm
where poultry had recently been depopulated for HPAI; this event was unrelated to the dairy event and 30
involved a different virus genotype.
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
5
Epidemiological investigations detected HPAI H5N1 in dairy cattle across the United States
In late January 2024, production veterinarians observed dairy cattle displaying unexplained reductions in
milk production, decreased feed intake, and changes in milk quality. Members of the National Animal
Health Laboratory Network (NAHLN) identified influenza A virus in milk and a few nasal swabs from a
Texas dairy and forwarded samples to the National Veterinary Services Laboratories (NVSL) for 5
confirmatory testing as epidemiologic investigations continued. Testing revealed the presence of H5N1
clade 2.3.4.4b genotype B3.13. Shortly after the identification of HPAI H5N1 genotype B3.13 in Texas, it
was confirmed in additional Texas herds and herds in other states (16). Samples collected between 7 March
2024 and 8 April 2024 have virus characterized as genotype B3.13 from 26 dairy cattle premises across
eight states and six poultry premises in three states (Data S1). The NVSL conducted whole genome 10
sequencing and analysis using a custom software program that identified transmission chains based on
genomic similarity (vSNP) to provide rapid feedback in support of epidemiologic investigations (Data S2,
Fig. S1). The phylogenetic and available epidemiologic data indicate d that the genotype B3.13 virus was
being moved between dairy cattle premises, as well as domestic poultry premises, via multiple transmission
routes (Fig. 1). Detections of the B3.13 genotype in cattle locations that have no known epidemiologic links 15
to confirmed premises suggest there are affected herds that have not yet been identified.
Genomic epidemiology demonstrated a single interspecies transmission event
To determine when HPAI H5 clade 2.3.4.4b virus w as introduced into cattle in the US, we conducted a
phylogenetic analysis using whole genome sequence data collected from poultry, wild birds, and mammals 20
(Fig. 2, Fig. S2). From 2022 to present, clade 2.3.4.4b viruses have been reported in over 9,000 wild birds
in at least 163 species across 49 states and Washington D.C. , in over 200 mammals in at least 20 species
across 29 states and Washington D.C., and over 1,120 poultry flocks across 48 states. For the HA gene
segment, the cattle isolated H5N1 2.3.4.4b sequences clustered within a single monophyletic clade strongly
supporting a hypothesis for a single spillover event followed by lateral transmission. The single spillover 25
was also supported by the phylogeny reconstructed from concatenated whole genome sequences of B3.13
genotype strains (Fig. 2), and the single introduction pattern was congruent across the PB2, PA, NP, NA,
MP, and NS gene segments (Fig. S3 -S6). For PB1, over 97% of cattle sequences also formed a single
monophyletic clade; the remaining sequences were placed in a parental clade due to high similarity among
the gene sequences and low phylogenetic signal (Fig. S6). The single introduction hypothesis was further 30
supported by single nucleotide variant calling analyses (Data S2, Fig. S1). The inferred evolutionary rate
was 6.23x10-3 substitutions/site/year (95% highest posterior density (HPD), 5.29x10-3-7.19x10-3). The time
to the most recent common ancestor (TMRCA) for the HA segment for the cattle clade 2.3.4.4b HPAI
H5N1 sequences was estimated as December 9, 2023 (95% HPD, October 12, 2023, to January 26, 2024.
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
6
A clade 2.3.4.4b genotype B3.13 infection in a dairy worker was diagnosed at the end of March 2024 and
was similar to the HPAIV from cattle. The TMRCA for the cattle group and the human virus was estimated
as November 10, 2023 (95% HPD, September 2, 2023, to December 24, 2023). These two groups shared a
TMRCA that was estimated as October 19, 2023 (95% HPD, August 22, 2023, to December 9, 2023). More
accurate TMRCA estimates may be inferred using host-specific molecular clocks models (30), but our dates 5
were supported by TMRCA estimated using maximum likelihood methods (Fig. 2) and they are feasible
given associated epidemiological information (Fig. 1). These data support a single introduction event from
wild bird origin virus into cattle, likely followed by limited local circulation for approximately 4 months
prior to confirmation by USDA.
10
Transmission within cattle resulted in subsequent interspecies transmission events to domestic cats,
poultry, and peridomestic animals
The phylogenetic tree topology indicated that following the introduction, the virus persisted in cattle
populations, with subsequent evidence for transmission from cattle into poultry and peridomestic animal
species (Fig. 1-2, Figs. S2, S7). There were as many as five cattle to poultry, one cattle to raccoon, two 15
cattle to domestic cats, and three cattle to wild bird transmission events. These were further supported by
epidemiological information, as these animals were collected from premises with cattle where genotype
B3.13 HPAIV was identified. Our Bayesian discrete state analysis (Fig. 3) that quantified the movement of
HPAIV between six different host categories (poultry, wild bird, cattle, wild mammal, domestic cat, and
humans) demonstrated sufficient evidence to support the proposition of HPAI in cattle resulted in infections 20
in other hosts. W e cannot exclude the possibility that this genotype is circulating in unsampled locations
and hosts as the existing analysis suggests that data are missing and undersurveillance may obscure
transmission inferred using phylogenetic methods (31). Th e gap in data is highlighted by the human
infection with genotype B3.13 HPAIV where the HA gene sequence was not nested within cattle HA gene
sequences. This could indicate that HPAIV in unsampled cows were the source of infection or within-host 25
evolution resulted in divergence sufficient to result in a different phylogenetic grouping. It is most likely,
however, that asymptomatic transmission and undersurveillance in epidemiologically important
populations drove this pattern. Our analysis of tran smission chains within the cattle B3.13 clade using a
phylogenomic approach suggested unsampled transmission in late 2023 and early 2024 (Fig. S8), and the
TMRCA indicates there may have been 4 months of circulation prior to confirmation by USDA. However, 30
given the decline in milk production in highly monitored dairy herds, it is unlikely that the spillover
occurred significantly outside of the described TMRCA ranges.
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
7
Reassortment in migratory bird populations resulted in a new genotype associated with emergence
in cattle
From January 2022 to April 2024, more than 482 commercial and 645 backyard flocks tested positive for
HPAI (5, 27). A major component in the dissemination of the 2.3.4.4b viruses is movement across four
migratory flyways, and infection of nearly 200 wild avian species (5). Extensive genetic reassortment with 5
existing North American wild bird LPAIVs is a consequence of the host- and geographic-breadth. This
generated a spectrum of different genotypes, with the HA, NA, and M gene s demonstrating preferential
pairings similar to mammalian adapted IAV (32) and transient detections of reassorted viruses with
different PB2, PB1, PA, NP, and NS genes (5). In this study, we detected 243 putative reassortment events
across the HA phylogeny (Fig. S10). These events were associated with PB2 (137/243), PB1 (78/243), PA 10
(36/243), NP (126/243), and NS (52/243) gene segments. Most of these events did not persist. However,
we identified 24 major reassortment events that resulted in new genotypes with more than 20 downstream
detections. The spillover into cattle was preceded by a reassortment event(s) involving different PB2 and
NP genes, likely derived from wild bird LPAI in late 2023. This reassortment resulted in the B3.13 genotype
that is maintained across the clade of epidemiologically linked cattle samples with no evidence for further 15
reassortment following the spillover. The NP gene acquired by the reassortment event may have resulted
in a phenotype change that mediated the emergence of this virus genotype in cattle. The NP gene is
associated with multiple processes in the virus lifecycle (33) and was implicated in increasing the
transmission efficiency of IAV in the swine host (34).
20
HPAI H5N1 dispersed across the United States tracking cattle movement
Epidemiological records documented dairy cattle movement from a Texas herd (at shipment, HPAI status
was unknown) to North Carolina and Idaho (Fig. 1). Records also indicated that asymptomatic dairy cattle,
some that were subsequently diagnosed as HPAIV positive, were moved from a Texas herd to Michigan
and to Ohio, and a Kansas herd to South Dakota. We also applied an asymmetric discrete-trait model with 25
Bayesian stochastic search variable selection to reconstruct how clade 2.3.4.4b HPAIV H5 isolated in cattle
moved among the eight US states (Idaho, Texas, Kansas, Michigan, New Mexico, North Carolina, Ohio,
and South Dakota). The interstate HPAIV movement, inferred by these phylodynamic techniques (Fig. 3),
demonstrated that following the first confirmed case in Texas, HPAIV moved rapidly across the US. The
phylogenetic signal within the HA gene of cattle B3.13 genes was relatively low, and we conservativel y 30
identified when a state transition occurred, i.e., Bayes factor >3 with an associated posterior probability
greater than 75%. Using these thresholds, there was phylogenetic evidence for the movement of the B3.13
genotype from Texas to Kansas, Michigan, and New Mexico. There was sufficient evidence in the
phylodynamic analyses to support the proposition that the B3.13, following its introduction into Michigan,
was moved into North Carolina, but it was more likely that a confirmed Texas to North Carolina anim al 35
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
8
movement moved the virus as we were unable to confirm a link between Michigan and North Carolina
dairy herds. This implies that direct movement of cattle based upon production practices allowed for virus
dissemination. The movement of viruses from one region to another will influence the opportunity for
reassortment and subsequent changes in genomic diversity. Areas that receive d large shipments of cattle
and their viruses will provide more opportunities for reassortment and the potential emergence of IAV 5
strains with phenotypes that may have increased zoonotic potential.
Transmission within cattle resulted in low-frequency variants associated with increased transmission
efficiency and changes in virulence and pathogenicity
We identified within -host sequence variants across the genome that were present in >0.5% of whole 10
genome sequencing reads, matching to the custom database from the Influenza Research Database (35) and
a literature search (Table S1). If a sequence variant were to arise that provided a selective advantage, it
could increase in frequency through transmission, and potentially alter virus phenotype. Generally, most
single-nucleotide variants (SNVs) were present at low frequencies (Data S5). There were 491 amino acid
sites with nonsynonymous amino acid changes in cattle; of the variable amino acid sites, 309 15
nonsynonymous mutations occurred at sites associated with functional changes (mean 44.32 ± 5.06
potential functional changes per sample, range: 24-64). We detected variants associated with changes in
pathogenicity in HA, MP, NP, and PB2 (e.g., increase from Q154R in HA and S207G in MP; a decrease in
V105M in NP; an increase in D701N in PB2 : Table 1: Tables S2-S5 for other animal groups). We also
detected SNVs previously associated with changes in virulence (e.g. Q134K/R in HA; increase in V67I in 20
NA; an increase in E627K in PB2) and increased host-range specificity (e.g., A55T, E57G, N71S in N1;
and E229K in NS). We detected variants in PB2 associated with mammalian adaptation (an increase in
E627K, M631L, and D701N and decrease in V495I), where the frequency was 33% in a single animal for
E627K, and at 99% in 214 cattle for M631L. We did not detect the key mammal adaptation mutation PB2
271A in cattle , even at low frequencies ; however, one mammal sample contained the mutation in the 25
consensus gene, and we detected mutations on HA that affected receptor binding affinity (Table 1) . We
calculated synonymous (π S) and nonsynonymous (π N) site ratio to assess natural selection; πN/πS 1 suggestive of positive selection . When we combined the
diversity estimates across genes, all cattle strains exhibited πN/πS 1 (HA, π N/πS = 0.042; MP, π N/πS = 0.566; NA, π N/πS = 0.188; NP, π N/πS = 0.032; NS, 30
πN/πS = 0.355; PA, πN/πS = 0.160; PB1, πN/πS = 0.048; PB2, πN/πS = 0.039), suggesting that the within-host
virus populations tended to exhibit weak purifying selection when compared to an ancestral reference.
There was no direct evidence that these minor populations of sequence variants alter phenotype of B3.13
genotype HPAIV in cattle. V ariants should be monitored to detect whether they increase in frequency as
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
9
some of the amino acid positions have been associated with increases in pathogenicity, virulence,
transmission efficiency and mammalian or human adaptation in laboratory experiments and other animals.
Discussion
5
The potential for HPAI H5N1 to become endemic in cattle will shape the zoonotic risk of the B3.13
genotype. There may be low levels of immunity against H5N1 viruses (36-39) and the immunological
landscape in the human population affects disease severity (40). Genetically similar viruses do have the
potential to cross the species barrier as there has already been a clade 2.3.4.4b B3.13 virus infection in a
person with conjunctivitis in March of 2024. The existing prepandemic candidate vaccine viruses (CVV) 10
do retain cross -reactivity with currently circulating clade 2.3.4.4b HPAI H5N1 (41). These CVVs are
coordinated and shared among the WHO Global Influenza Surveillance and Response Network for use by
academic, government, and industry partners for research and development (42). However, recent viruses
collected in the US had reduced reactivity with the A/Astrakhan/3212/2020 candidate vaccine virus and
based on these data and other genetic and epidemiologic measures, a new CVV for the clade 2.3.4.4b viruses 15
was proposed (41).
The HPAI H5N1 genotype B3.13 viruses circulating in cattle represent a potential zoonotic threat based on
the evidence we present for transmission in a mammalian host. Based upon current information, it appears
that once infected, a cow may shed virus for 2 -3 weeks. We detected some amino acid mutations at sites 20
associated with mammalian adaptation that had already become fixed in the virus population that likely
reflect the ~4 months of evolution and limited local circulation in dairy cattle. Notably, important low -
frequency sequence variants within cattle were also detected, even within the limited time following the
first spillover. If these low-frequency variants become dominant, they may have phenotypes that increase
the probability of interspecies transmission. Further studies are needed to understand the pathobiology and 25
evolution of the virus in dairy cattle. In addition, there is the potential for multiple animal species to be co-
located on agricultural premises, each species may be infected with endemic IAV strains, and an IAV co -
infection with HPAI could result in reassortment and the emergence of new strains that increase zoonotic
risk (43, 44). Monitoring of cattle for HPAI will inform epidemiological risk and provide an early warning
for whether this interspecies transmission event and dissemination of the viruses throughout the US dairy 30
cattle herd represents a future threat to human health.
35
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
10
Materials and methods
Sample isolation, whole genome sequencing, assembly, and single nucleotide variant calling
IAV extraction and reverse transcription real-time PCR (RT-rtPCR) were performed at the National Animal
Health Laboratory Network member labs and the U.S. Department of Agriculture, National Veterinary
Services Laboratories according to the standard operating procedures (3). Influenza A virus RNA from 5
samples was amplified (44) and after amplification was completed, we generated cDNA libraries for iSeq
by using the Illumina DNA Sample Preparation Kit, (M) Tagmentation
(Illumina, https://www.illumina.com) and the 300 -cycle iSeq Reagent Kit v2 (Illumina) according to
manufacturer instructions. We performed reference guided assembly of genome sequences using IRMA
v0.6.7 (45). 10
We developed a bioinformatics pipeline for processing, calling single nucleotide variant s (SNVs), and
analyzing Illumina short read data for influenza A virus called “Flumina” ( https://github.com/flu-
crew/Flumina). The pipeline uses Python v3.10, R v4.4 (46), and SnakeMake (47) to organize programs
and script execution. A custom Python script organize s raw reads for SnakeMake that subsequently 15
executes other programs for variant calling. The pipeline cleans the r aw reads of adapter contamination,
low complexity sequences, and other sequencing artifacts using the program FASTP (48). IRMA v0.6.7
(45) was used on the processed reads to generate consensus contigs that were used for phylogenetic and
phylodynamic analyses and other summary statistics and graphs. Following these steps, the pipeline maps
reads to a single ancestral reference strain, which was generated from ancestrally reconstructed sequences 20
as described below, by indexing using BWA (bwa index -a bwtsw function; (49)). Then it uses SAMtools
(50) to create an fai index with the faidx function and generate a sequence dictionary for GATK v4.4 (51)
using the CreateSequenceDictionary function.
For calling of high frequency SNVs, we used GATK v4.4 (51) following best practices for discovering and 25
calling SNVs (52). The pipeline processes the cleaned reads through the GATK4 functions FastqToSam,
RevertSam, and AddorReplaceReadGroups. Next, the BAM of the reads are converted back to Fastq and
mapped with BWA using the function bwa mem -M. The mapped reads BAM file are merged with the
unmapped reads BAM, and duplicate reads are marked after sorting the BAM file by coordinate with the
functions SortSam and MarkDuplicates. Finally, HaplotypeCaller is used (parameters: -ERC GVCF -ploidy 30
1) to call haplotypes, and GenotypeGVCFs is used to genotype the sample. Specific variants were selected
using the SelectVariants function for the SNP type of variant. Using VariantFiltration, a filtered set of
variants were created by applying the following filters : Qual < 30 (Quality), QD < 2 (Quality by Depth),
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
11
SOR < 3 (StrandOddsRatio), FS < 60 (FisherStrand), MQ < 40 (MapQuality), MQRankSum < 12.5
(MapQuality RankSumTest), and ReadPosRankSum < 8 (ReadPosRankSumTest). To call low frequency
SNVs, the program LoFreq (53) was applied, with the similar pre processing steps where the same BAM
was used as input from HaplotypeCaller. A database was generated using the Sequence Feature Variant
Types tool from the Influenza Research Database (34) for all eight genes, and SNVs associated with 5
nonsynonymous changes in each gene were screened to determine whether these had previously been
associated with phenotypic change. To estimate genome-wide estimates of natural selection, we used the
program SNPGenie on the VCF files (54).
Data sources, curation, and preparation 10
To generate a reference dataset of relevant HPAI H5N1 strains, we queried GISAID (55) for all H5 genomes
with 8 complete gene segments available collected between the dates of January 01 2020 and March 29
2024 in the United States. This query provided 23,322 sequences from 2,915 genomes : we extracted the
hemagglutinin (HA) gene sequence and aligned the data using MAFFT v7.525 (56) with default parameters.
We subsequently inferred a maximum likelihood phylogenetic tree with IQ -Tree v2.2.0 (57) following 15
automatic model selection (58). Based on this tree, we classified all data using a custom python clade
identifier called GenoFLU ( https://github.com/USDA-VS/GenoFLU) and filtered the data to only those
strains from the HPAI clade 2.3.4.4b. We then removed identical HA sequences, ensuring that we retained
the cattle strains that had been published in GISAID and the associated human case (A/Texas/37/2024,
GISAID accession EPI_ISL_19027114). This process resulted in a reference gene dataset of 1393 strains. 20
For Bayesian phylodynamic analyses, these data were then subsampled using smot v.1.0.0 (59) where we
maintained representation by sampling 20% of the tips within each monophyletic clade on the HA gene
phylogeny inferred using FastTree v2.1.11 (60). Using the subsampled dataset of n=964 strains, we inferred
the ancestral sequences of North American 2.3.4.4b strains for each gene segment using TreeTime v0.9.4
ancestral reconstruction functionality. The ancestral sequences were reconstructed at the deepest nodes on 25
each gene tree that comprised at least 99% of all strains in the subsampled dataset.
Phylogenetic analyses
The subsampled reference data were merged with whole genome sequences generated within this study.
For each gene segment, the data were aligned with MAFFT v7.525 (56) with default settings, and maximum 30
likelihood (ML) trees were inferred with IQ -Tree v2.2.0 (57) following automatic model selection (58).
These trees were used to assess root -to-tip divergence using TempEst v1.5.3 (61) and genes with
incongruent divergence and sampling date were removed. Subsequently, we realigned each gene dataset,
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
12
and inferred new ML phylogenetic trees using IQ -Tree v2.2.0. Using the ML phylogenies for each gene
segment as a starting point, we generated time-scaled phylogenies, rooted under the strict molecular clock
assumption using the TreeTime v0.8.4 phylodynamic toolkit (62). TreeTime was executed with the GTR
site substitution model with 10 iterations of optimization. The host state (poultry, wild bird, cattle, mammal,
domestic mammal, and humans) and the US location were inferred for each ancestral node on the time -5
scaled phylogeny using a TreeTime m igration model. For each ancestral node, TreeTime inferred a
probability of the respective virus belonging to the specific discrete trait (i.e., Texas or New Mexico; wild
bird or cattle) and we subsequently assigned the most likely trait (over 50% probability) to each node. This
approach allowed us to identify branches o n the tree where spillovers may have occurred and between
different locations and this approach complemented our Bayesian phylodynamic analyses. 10
We also tracked the emergence of single nucleotide variants (SNVs) within clade 2.3.4.4b consensus
sequences that were generated using IRMA v0.6.7 (45). SNVs were annotated and compared using a custom
pipeline called vSNP3 (https://github.com/USDA-VS/vSNP). The pipeline infers phylogenetic trees using
RAxML (63) and generates tables of SNVs relative to a reference composed of two North American wild 15
bird origin segments (PB2, NP) and six segments from a H5N1 clade 2.3.4.4b clade virus. These data were
used to determine viral genome sequence similarity and to identify genomic links between premises (Fig.
1, Fig. S1, Data S2).
Reassortment analyses 20
We assembled a separate dataset for reassortment analysis that comprehensively included all H5N1 whole
genomes collected from 2020 to present through downloading n=27,177 HPAI H5N1 gene sequences from
GISAID [accessed April 19, 2024 ] (55). We maintained only those strains with complete sequences for
each of the eight gene segments, resulting in n= 2893 whole genomes. We merged th ese data with the
assembled whole genomes from our study and generated alignments for each gene using MAFFT v7.490 25
(56) with default parameters. We subsequently inferred phylogenetic trees for each gene segment using IQ-
Tree v1.6.12 (64) under the generalized time-reversible (GTR) model of nucleotide substitutions with the
stationary probabilities estimated from the empirical base frequencies and 5 free-rate categories (65). To
identify reassortment events on the HA phylogeny, we applied TreeSort v.0.1.1 ( https://github.com/flu-
crew/TreeSort) with a maximum molecular clock deviation parameter of 2.5. TreeSort uses TreeTime (62) 30
to estimate the substitution rates for each gene segment and identifies branches on a tree with a high signal
of reassortment. We then removed n=35 strains that were molecular clock outliers in the HA gene and were
not related to the outbreak in dairy cattle and repeated the TreeSort analysis.
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
13
Phylogenomic analyses over concatenated B3.13 genomes
To improve the phylogenetic signal among the cattle and associated sequences, we concatenated the coding
regions of all gene segments into whole genome sequences for all reassortment-free B3.13 strains from the
reassortment analysis above. This resulted in n=256 aligned whole genome sequences, including n =220 5
cattle strains. We inferred a phylogenetic tree from these sequences using an edge -linked partition model
in IQ -Tree v.2.3.2 (57), where every partition corresponded to an individual gene segment and was
associated with a different GTR model with empirical base frequencies and 5 free-rate categories. IQ-Tree
was run with ultrafast bootstrap with 1000 replicates (66). We then used the resulting tree topology as input
to TreeTime v.0.9.4 (62) to infer a time-scaled tree of the B3.13 genomes. TreeTime was executed with an 10
option to account for covariation in tip dates due to shared ancestry.
Inference of B3.13 transmission chains in dairy cattle
Inference of transmission chains between hosts, sampled diversity, sampling time, and generation time were
estimated from the temporally scaled genome tree using the TransPhylo R package (67, 68). As input, we 15
used the B3.13 tree generated in the phylogenomic analysis. Multifurcations in the tree were suppressed to
bifurcations and all edge weights were rounded to at least 1e -9. The prior generation time and sampling
density were gamma-distributed parameters extracted from published studies (75, 76) resulting in a mean
of 5 days (69, 70) and a variance of 2 days. We adapted protocol 3 (68) ensuring an Effective Sample Size
(ESS) > 200 at the end of the simulations. The MCMC simulations were run for 1e7 iterations, discarding 20
the first 5% as burn-in. After the simulations converged, the medoid tree from the posterior set of inferred
trees was used as the final inferred transmission tree. Transmissions of HPAIV from host to host (e.g., wild
bird to cattle) were inferred by TransPhylo (75, 76) and the distribution of incident cases (sampled and
unsampled) was generated by adapting protocol 4 (Fig. S8) (68).
25
Bayesian evolutionary inference and discrete phylogeography
We estimated when HPAI H5N1 clade 2.3.4.4b were introduced into dairy cattle under a Bayesian
framework in BEAST v.1.10.4 (71) with the BEAGLE library (72, 73). We merged the subsampled public
data with newly generated whole genome data and aligned the hemagglutinin dataset (n=606 HA genes).
We implemented a generalized time reversible (GTR) nucleotide substitution model (74) with gamma-30
distributed site heterogeneity (75), an uncorrelated relaxed clock with lognormal distribution (76), and a
Gaussian Markov random field (GMRF) Bayesian skyride with time aware -smoothing for the coalescent
model (77). We conducted six independent Markov chain Monte Carlo (MCMC) sampling runs with 100
million iterations with sampling every 10,000 iterations. The results were analyzed using the GMRF skyride
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
14
reconstruction in Tracer v1.7.2 (78), and runs were combined to ensure an effective sample size of more
than 200. A time-scaled maximum clade credibility (MCC) tree was generated using TreeAnnotator v1.8.4
using median node heights and 20 percent burn-in (71). Dates of time to the most recent common ancestor
(TMRCA) were inferred from the nodes between clades, using the 95 percent higher posterior density
(HPD) as the range of uncertainty. 5
To reconstruct the spatial diffusion of the H5N1 virus across states within the US and different hosts
(poultry, wild bird, cattle, mammal, domestic mammal, and humans), we applied an asymmetric discrete-
trait phylogeographic analyses with Bayesian stochastic search variable selection, a strict molecular clock,
and an exponential growth model in BEAST v.1.10.4 (71). We conducted this analysis on the subsampled 10
and aligned hemagglutinin dataset and an extracted the monophyletic clade of cattle HA clade 2.3.4.4b
genes with any host category (host) or cattle only (location). In this approach, each HA gene had a US state
and host group designated as a trait, and transitions from one category to another (e.g. from Texas to New
Mexico) was inferred along the internal branches representing the evolutionary history of the virus. These
transitions are termed Markov jumps and state change counts for the HA clade trait were reconstructed in 15
BEAST v1.10.4. We conducted at least 5 independent Markov chain Monte Carlo (MCMC) chains with
50 million iterations with sampling every 5,000 iterations. Convergence and mixing was assessed in Tracer
v1.7.2, as described above, and runs were combined to ensure an effective sampling size of more than 200.
We used the resulting posterior trees to provide estimates of the ancestral region and host for each internal
node. We then used SpreaD3 v.0.9.6 (79) to estimate the Bayes Factor (BF) from the estimated state 20
transition rates and used these values as statistical support for spatial movements : we imposed thresholds
used in previously published studies, i.e., a definitive spatial movement had a BF > 100, sufficient evidence
was assessed as 100 > BF > 3 (1, 80).
Acknowledgments: 25
We gratefully acknowledge producers, veterinarians, and laboratories for participating in the
epidemiological investigations and sharing the results of sequence analysis. We thank the U.S. Department
of Agriculture (USDA), Animal and Plant Health Inspection Services veterinary epidemiology team for
contact tracing, sample collection, and coordinating the response to the outbreak: Nicole Amey, Bradley
Christenson, Michael Contente, Lindsey Holmstrom, Dillon McBride, Stacey Schwabenlander, and Jason 30
Lombard (Colorado State University). We thank Michael Zeller from Iowa State University for insight into
TransPhylo, and Brian Stucky and Jeffrey Silverstein for providing priority access to the SCInet High
Performance Computing infrastructure at the U.S. Departm ent of Agriculture, Agricultural Research
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
15
Service. We also acknowledge all data contributors, i.e., the Authors and their Originating laboratories
responsible for obtaining the specimens, and their submitting laboratories for generating the genetic
sequence and metadata and sharing via the GISAID Initiative, on which components of this research was
based. Mention of trade names or commercial products in this article is solely for the purpose of providing
specific information and does not imply recommendation or endorsement by the U.S. Department of 5
Agriculture (USDA), DOE, or ORISE. The funders had no role in study design, data collection and
interpretation, or the decision to submit the work for publication. The findings and conclusions in this
publication are those of the authors and should not be construed to represent any official USDA or U.S.
Government determination or policy. USDA is an equal opportunity provider and employer.
10
Funding:
National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of
Health and Human Services contract 75N93021C00015 (TKA, ALB)
U.S. Department of Agriculture Agricultural Research Service project 5030 -32000-231-000-D
(ALB, TKA) 15
U.S. Department of Agriculture Agricultural Research Service project 0201-88888-003-000D
(ALB, TKA)
U.S. Department of Agriculture Agricultural Research Service project 0201-88888-002-000D
(ALB, TKA)
U.S. Department of Agriculture Agricultural Research Service Research Participation Program of 20
the Oak Ridge Institute for Science and Education through the U.S. Department of Energy contract
DE-SC0014664 (GJ, CH, TN, SW)
Centers for Disease Control and Prevention of the U.S. Department of Health and Human Services
Interagency Agreement DE-SC0000001 (ALB, TKA)
Author contributions: 25
Conceptualization: MKT, TKA
Data curation: TN, CH, BI, KL, MLK
Formal Analysis: TN, CH, AM, MT, KL, SV, SW, GMJ, BI, SML, MKT, TKA
Funding acquisition: ALB, SRA, MKT, TKA
Investigation: KL, MLK, SML, SCA, KRJ 30
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
16
Methodology: TN, CH, AM, MT, KL, MLK, GMJ, SV, SW, BI
Project administration: TKA, MKT
Resources: DRM, GL, DGD, EAF, KMD, AKS, ACT, KRS, DLS, ES, SCA, KRJ, SRA
Software: CH, AM, SML
Supervision: MKT, TKA 5
Visualizations: TN, AM, GMJ, SML, SV, TKA
Writing – original draft: CH, AM, MKT, TKA
Writing – review & editing: TN, CH, AM, MT, KL, MLK, GMJ, SV, SW, DRM, GL, DGD, EF,
KMD, AKS, ACT, KRS, DLS, ES, SML, SCA, ALB, SRA, MKT, TKA
Competing interests: Authors declare that they have no competing interests. 10
Data and materials availability: All data, code, and materials used in the analysis are available at
https://github.com/flu-crew/dairy-cattle-hpai-2024. Sequence data generated within this study are
provided at NCBI GenBank and NCBI SRA and the accession numbers are provided in the
supplementary text data files.
15
Supplementary Text
Figs. S1 to S9
Tables S1 to S5
References
(45–81)
Data S1 to S5 20
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
17
References
1. R. Xie et al., The episodic resurgence of highly pathogenic avian influenza H5 virus. Nature 622,
810-817 (2023).
2. A. Pohlmann et al., Has Epizootic Become Enzootic? Evidence for a Fundamental Change in the
Infection Dynamics of Highly Pathogenic Avian Influenza in Europe, 2021. Mbio 13, (2022). 5
3. S. N. Bevins et al., Intercontinental Movement of Highly Pathogenic Avian Influenza A(H5N1)
Clade 2.3.4.4 Virus to the United States, 2021. Emerg Infect Dis 28, 1006-1011 (2022).
4. V. Caliendo et al., Transatlantic spread of highly pathogenic avian influenza H5N1 by wild birds
from Europe to North America in 2021. Sci Rep 12, 11729 (2022).
5. S. Youk et al., H5N1 highly pathogenic avian influenza clade 2.3.4.4b in wild and domestic 10
birds: Introductions into the United States and reassortments, December 2021-April 2022.
Virology 587, 109860 (2023).
6. J. A. Pulit-Penaloza et al., Characterization of highly pathogenic avian influenza H5Nx viruses in
the ferret model. Sci Rep 10, 12700 (2020).
7. R. Yamaji et al., Pandemic potential of highly pathogenic avian influenza clade 2.3.4.4 A(H5) 15
viruses. Rev Med Virol 30, e2099 (2020).
8. A. European Food Safety et al., Avian influenza overview December 2022 - March 2023. EFSA J
21, e07917 (2023).
9. M. Aguero et al., Highly pathogenic avian influenza A(H5N1) virus infection in farmed minks,
Spain, October 2022. Euro Surveill 28, (2023). 20
10. E. de Vries, C. A. de Haan, Letter to the editor: Highly pathogenic influenza A(H5N1) viruses in
farmed mink outbreak contain a disrupted second sialic acid binding site in neuraminidase,
similar to human influenza A viruses. Euro Surveill 28, (2023).
11. M. Aguero et al., Authors' response: Highly pathogenic influenza A(H5N1) viruses in farmed
mink outbreak contain a disrupted second sialic acid binding site in neuraminidase, similar to 25
human influenza A viruses. Euro Surveill 28, (2023).
12. N. C. f. I. a. R. D. N. Centers for Disease Control and Prevention. (2024).
13. A. Kandeil et al., Rapid evolution of A(H5N1) influenza viruses after intercontinental spread to
North America. Nat Commun 14, 3082 (2023).
14. T. Maemura et al., Characterization of highly pathogenic clade 2.3.4.4b H5N1 mink influenza 30
viruses. EBioMedicine 97, 104827 (2023).
15. X. Hu et al., Highly Pathogenic Avian Influenza A (H5N1) clade 2.3.4.4b Virus detected in dairy
cattle. bioRxiv 2024.04.16.588916, (2024).
16. USDA-APHIS, in Highly Pathogenic Avian Influenza (HPAI) Detections in Livestock. (2024),
vol. https://www.aphis.usda.gov/livestock-poultry-disease/avian/avian-influenza/hpai-35
detections/livestock.
17. E. R. Burrough et al., Highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection
in domestic dairy cattle and cats, United States, 2024. Emerg Infect Dis, (2024).
18. I. H. Brown, T. R. Crawshaw, P. A. Harris, D. J. Alexander, Detection of antibodies to influenza
A virus in cattle in association with respiratory disease and reduced milk yield. Vet Rec 143, 637-40
638 (1998).
19. C. H. Campbell, B. C. Easterday, R. G. Webster, Strains of Hong Kong influenza virus in calves.
J Infect Dis 135, 678-680 (1977).
20. C. A. Mitchell, O. Nordland, R. V. Walker, Myxoviruses And Their Propagation In The
Mammary Gland Of Ruminants. Can J Comp Med Vet Sci 22, 154-156 (1958). 45
21. C. A. Mitchell, R. V. L. Walker, G. L. Bannister, Studies Relating to the Formation of
Neutralizing Antibody Following the Propagation of Influenza and Newcastle Disease Virus in
the Bovine Mammary Gland. Can J Microbiol 2, 322-328 (1956).
22. T. R. Crawshaw, I. H. Brown, S. C. Essen, S. C. Young, Significant rising antibody titres to
influenza A are associated with an acute reduction in milk yield in cattle. Vet J 178, 98-102 50
(2008).
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
18
23. D. Kalthoff, B. Hoffmann, T. Harder, M. Durban, M. Beer, Experimental infection of cattle with
highly pathogenic avian influenza virus (H5N1). Emerg Infect Dis 14, 1132-1134 (2008).
24. USDA-APHIS. (https://github.com/USDA-VS/GenoFLU, 2024).
25. W. Puryear et al., Highly Pathogenic Avian Influenza A(H5N1) Virus Outbreak in New England
Seals, United States. Emerg Infect Dis 29, 786-791 (2023). 5
26. USDA-APHIS, in Detections of Highly Pathogenic Avian Influenza in Wild Birds. (2024), vol.
https://www.aphis.usda.gov/livestock-poultry-disease/avian/avian-influenza/hpai-detections/wild-
birds.
27. USDA-APHIS, in Confirmations of Highly Pathogenic Avian Influenza in Commercial and
Backyard Flocks. (2024), vol. https://www.aphis.usda.gov/livestock-poultry-disease/avian/avian-10
influenza/hpai-detections/commercial-backyard-flocks.
28. E. J. Elsmo et al., Highly Pathogenic Avian Influenza A(H5N1) Virus Clade 2.3.4.4b Infections
in Wild Terrestrial Mammals, United States, 2022. Emerg Infect Dis 29, 2451-2460 (2023).
29. USDA-APHIS, in Detections of Highly Pathogenic Avian Influenza in Mammals. (2024), vol.
https://www.aphis.usda.gov/livestock-poultry-disease/avian/avian-influenza/hpai-15
detections/mammals.
30. M. Worobey, G. Z. Han, A. Rambaut, Genesis and pathogenesis of the 1918 pandemic H1N1
influenza A virus. Proc Natl Acad Sci U S A 111, 8107-8112 (2014).
31. C. Colijn, J. Gardy, Phylogenetic tree shapes resolve disease transmission patterns. Evol Med
Public Health 2014, 96-108 (2014). 20
32. V. G. Dugan et al., The evolutionary genetics and emergence of avian influenza viruses in wild
birds. PLoS Pathog 4, e1000076 (2008).
33. S. Boulo, H. Akarsu, R. W. Ruigrok, F. Baudin, Nuclear traffic of influenza virus proteins and
ribonucleoprotein complexes. Virus Res 124, 12-21 (2007).
34. M. N. Thomas et al., Nucleoprotein reassortment enhanced transmissibility of H3 1990.4.a clade 25
influenza A virus in swine. J Virol, e0170323 (2024).
35. Y. Zhang et al., Influenza Research Database: An integrated bioinformatics resource for influenza
virus research. Nucleic Acids Res 45, D466-D474 (2017).
36. Y. Qi, H. B. Ni, X. Chen, S. Li, Seroprevalence of highly pathogenic avian influenza (H5N1)
virus infection among humans in mainland China: A systematic review and meta-analysis. 30
Transbound Emerg Dis 67, 1861-1871 (2020).
37. E. S. Toner et al., Assessment of serosurveys for H5N1. Clin Infect Dis 56, 1206-1212 (2013).
38. S. Nasreen et al., Seroprevalence of antibodies against highly pathogenic avian influenza A
(H5N1) virus among poultry workers in Bangladesh, 2009. PLoS One 8, e73200 (2013).
39. B. Khuntirat et al., Absence of neutralizing antibodies against influenza A/H5N1 virus among 35
children in Kamphaeng Phet, Thailand. J Clin Virol 69, 78-80 (2015).
40. K. M. Gostic, M. Ambrose, M. Worobey, J. O. Lloyd-Smith, Potent protection against H5N1 and
H7N9 influenza via childhood hemagglutinin imprinting. Science 354, 722-726 (2016).
41. W. H. Organization, "Genetic and antigenic characteristics of zoonotic influenza A viruses and
development of candidate vaccine viruses for pandemic preparedness," (2023). 40
42. J. S. Robertson et al., The development of vaccine viruses against pandemic A(H1N1) influenza.
Vaccine 29, 1836-1843 (2011).
43. A. Markin et al., Reverse-zoonoses of 2009 H1N1 pandemic influenza A viruses and evolution in
United States swine results in viruses with zoonotic potential. PLoS Pathog 19, e1011476 (2023).
44. E. J. Abente et al., A highly pathogenic avian-derived influenza virus H5N1 with 2009 pandemic 45
H1N1 internal genes demonstrates increased replication and transmission in pigs. J Gen Virol 98,
18-30 (2017).
45. E. Spackman, Avian Influenza Virus Detection and Quantitation by Real-Time RT-PCR. Methods
Mol Biol 2123, 137-148 (2020).
46. S. S. Shepard et al., Viral deep sequencing needs an adaptive approach: IRMA, the iterative 50
refinement meta-assembler. BMC Genomics 17, 708 (2016).
47. R. C. Team, R: A language and environment for statistical computing. (2015).
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
19
48. F. Molder et al., Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021).
49. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics 34, i884-i890 (2018).
50. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754-1760 (2009). 5
51. P. Danecek et al., Twelve years of SAMtools and BCFtools. Gigascience 10, (2021).
52. A. McKenna et al., The Genome Analysis Toolkit: a MapReduce framework for analyzing next-
generation DNA sequencing data. Genome Res 20, 1297-1303 (2010).
53. G. A. Van der Auwera et al., From FastQ data to high confidence variant calls: the Genome
Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 43, 11 10 11-11 10 33 10
(2013).
54. A. Wilm et al., LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering
cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res 40,
11189-11201 (2012).
55. C. W. Nelson, L. H. Moncla, A. L. Hughes, SNPGenie: estimating evolutionary parameters to 15
detect natural selection using pooled next-generation sequencing data. Bioinformatics 31, 3709-
3711 (2015).
56. Y. Shu, J. McCauley, GISAID: Global initiative on sharing all influenza data - from vision to
reality. Euro Surveill 22, (2017).
57. K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: 20
improvements in performance and usability. Molecular biology and evolution 30, 772-780
(2013).
58. B. Q. Minh et al., IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in
the Genomic Era. Mol Biol Evol 37, 1530-1534 (2020).
59. S. Kalyaanamoorthy, B. Q. Minh, T. K. F. Wong, A. von Haeseler, L. S. Jermiin, ModelFinder: 25
fast model selection for accurate phylogenetic estimates. Nat Methods 14, 587-589 (2017).
60. Z. W. Arendsee, A. L. Vincent Baker, T. K. Anderson, smot: a python package and CLI tool for
contextual phylogenetic subsampling. . Journal of Open Source Software 7, 4193 (2022).
61. M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2–approximately maximum-likelihood trees for
large alignments. PloS one 5, e9490 (2010). 30
62. A. Rambaut, T. T. Lam, L. Max Carvalho, O. G. Pybus, Exploring the temporal structure of
heterochronous sequences using TempEst (formerly Path-O-Gen). Virus evolution 2, vew007
(2016).
63. P. Sagulenko, V. Puller, R. A. Neher, TreeTime: Maximum-likelihood phylodynamic analysis.
Virus Evol 4, vex042 (2018). 35
64. A. Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large
phylogenies. Bioinformatics 30, 1312-1313 (2014).
65. L. T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: a fast and effective
stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268-274
(2015). 40
66. J. Soubrier et al., The influence of rate heterogeneity among sites on the time dependence of
molecular rates. Mol Biol Evol 29, 3345-3358 (2012).
67. D. T. Hoang, O. Chernomor, A. von Haeseler, B. Q. Minh, L. S. Vinh, UFBoot2: Improving the
Ultrafast Bootstrap Approximation. Mol Biol Evol 35, 518-522 (2018).
68. X. Didelot, C. Fraser, J. Gardy, C. Colijn, Genomic Infectious Disease Epidemiology in Partially 45
Sampled and Ongoing Outbreaks. Mol Biol Evol 34, 997-1007 (2017).
69. X. Didelot, M. Kendall, Y. Xu, P. J. White, N. McCarthy, Genomic Epidemiology Analysis of
Infectious Disease Outbreaks Using TransPhylo. Curr Protoc 1, e60 (2021).
70. A. Bouma et al., Estimation of transmission parameters of H5N1 avian influenza virus in
chickens. PLoS Pathog 5, e1000281 (2009). 50
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
20
71. W. H. Kim, S. Cho, Estimation of the Basic Reproduction Numbers of the Subtypes H5N1,
H5N8, and H5N6 During the Highly Pathogenic Avian Influenza Epidemic Spread Between
Farms. Front Vet Sci 8, 597630 (2021).
72. A. J. Drummond, M. A. Suchard, D. Xie, A. Rambaut, Bayesian phylogenetics with BEAUti and
the BEAST 1.7. Molecular biology and evolution 29, 1969-1973 (2012). 5
73. D. L. Ayres et al., BEAGLE 3: Improved Performance, Scaling, and Usability for a High-
Performance Computing Library for Statistical Phylogenetics. Syst Biol 68, 1052-1061 (2019).
74. D. L. Ayres et al., BEAGLE: an application programming interface and high-performance
computing library for statistical phylogenetics. Systematic biology 61, 170-173 (2011).
75. S. Tavaré, Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures 10
on mathematics in the life sciences 17, 57-86 (1986).
76. Z. Yang, Maximum likelihood phylogenetic estimation from DNA sequences with variable rates
over sites: approximate methods. Journal of Molecular evolution 39, 306-314 (1994).
77. A. J. Drummond, S. Y. Ho, M. J. Phillips, A. Rambaut, Relaxed phylogenetics and dating with
confidence. PLoS biology 4, e88 (2006). 15
78. V. N. Minin, E. W. Bloomquist, M. A. Suchard, Smooth skyride through a rough skyline:
Bayesian coalescent-based inference of population dynamics. Molecular biology and evolution
25, 1459-1471 (2008).
79. A. Rambaut, A. J. Drummond, D. Xie, G. Baele, M. A. Suchard, Posterior Summarization in
Bayesian Phylogenetics Using Tracer 1.7. Syst Biol 67, 901-904 (2018). 20
80. F. Bielejec et al., SpreaD3: interactive visualization of spatiotemporal history and trait
evolutionary processes. Molecular biology and evolution 33, 2167-2169 (2016).
81. A. Alfonso-Morales et al., Evaluation of a Phylogenetic Marker Based on Genomic Segment B of
Infectious Bursal Disease Virus: Facilitating a Feasible Incorporation of this Segment to the
Molecular Epidemiology Studies for this Viral Agent. PLoS One 10, e0125853 (2015). 25
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
21
Fig. 1. Putative transmission pathways of HPAI H5N1 clade 2.3.4.4b genotype B3.13 supported by
epidemiological links, animal movements, and genomic analysis. Three lines of evidence support a
single interspecies transmission event followed by lateral spread within dairy cattle with onward
transmission to poultry. This visualization represents 26 dairy cattle premises (blue boxes) and six poultry 5
premises (green boxes). Herds with unknown disease status are colored orange, and putative unsampled
transmission links are indicated with diamonds. Genomic clusters, specifically those strains that are more
similar to each other than other clusters, are bounded by progressively smaller circles. Genomic links are
light blue arrows, confirmed animal movements are orange arrows, and non-animal links are black arrows.
Premises that appear more than once indicate the presence of genetically distinct viruses sequenced from 10
independent samples on those premises. (A) The gray box represents an unsampled wild bird -origin
common ancestor. The white box is the B3.13 genotype human detection similar but distinct from the
sampled cattle and poultry clusters. Two major groups emerged from an unsampled common ancestor: one
containing dairy premises in Texas and New Mexico along with Parmer 01 poultry, and another containing
TX 001 and three clusters labeled (B), (C), and (D) respectively. An unsampled common source links the 15
premises shown in B, C, and D within each of the clusters. (B) Two known animal movements occurred in
this cluster: TX 007 to OH 001, and TX 007 to MI 001. These are shown in panel C, indicating that
Human
A/Texas/37
TX 001
Parmer
01
NM 004
NM 006
NM 005
NM 007
TX 001
?
Deaf Smith 01
OH 001 MI 001
BA
NM 006
NM 002
TX 007
TX 003
TX 007
?
TX 005
NM 003
TX 004TX 001 TX 011
?
C
MI 002
OH 001
MI 001
Ionia 01
Ionia 02
?
TX 007
TX 010
KS 001
TX 004
TX 010TX 010
Moore
01
TX 008
D
TX 010TX 002
TX 009
KS 003
ID 001
?
KS 003
TX 009
ID 001SD 001KS 003
TX 008
KS 003
NC 001
TX 012
?
KS 002
? TX
Dairy
? MI
Dairy
?
?
?
Wild Bird
Introduction
Non-Animal Epi Link
Affected
Poultry
Unknown
Dz
Status
Poultry
Genomic
Cluster
Unsampled
Transmissions
Affected
Dairy
Dairy
Genomic
ClusterGenomic Epi Link
Animal Epi Link
TX 007
Grant
01
TX 004
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
22
simultaneous transmission of genetically distinct viruses occurred. (C) MI 001 was genetically and
temporally linked to the Ionia 01 poultry premises, and Ionia 01 was genetically, epidemiologically, and
temporally linked to Ionia 02, suggesting a lateral transmission chain. Prior to detection, MI 002 received
an animal movement from an unknown Mich igan dairy premise of unknown disease status. (D)
Phylogenetic analysis suggested several directional transmissions: TX 010 to KS 001, KS 002 to KS 003, 5
and KS 002 to SD 001. The link from KS 002 to SD 001 was supported by an animal movement. Prior to
detection, NC 001 and ID 001 each received an animal movement from the same Texas dairy of unknown
disease status.
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
23
Fig. 2. Evolutionary history of H5N1 2.3.4.4b in North America prior to and during the introduction
and emergence in US cattle. (A) A time-scaled phylogeny of the HA gene between December 2021 and
April 2024 demonstrating the single introduction of the virus from wild birds to dairy cattle with an
estimated date of December 2023 (95% credible interval: October 2023 – January 2024). B3.13 strains 5
inherited PB1, PA, HA, NA, MP, and NS genes from a B3.6 ancestor and acquired different PB2 and NP
genes from North American LPAI viruses. This phylogeny was paraphyletically subsampled to maintain
tree topology and demonstrate the 12 subsequent spillovers from cattle to domestic cats, poultry, and
peridomestic animals and is presented in the supplemental figures . ( B) A simplified visualization of a
hypothesis on the evolutionary history of the H5N1 2.3.4.4b B3.13 genotype that emerged in cattle and a 10
range of other mammalian and avian hosts . This time -scaled phylogeny was inferred using maximum
likelihood methods from concatenated genomes: inferred dates presented at nodes for the most recent
common ancestor were congruent with those estimated using Bayesian methods.
15
Host
Cattle
Wild bird
Wild mammal
Domestic mammal
Poultry
PB2
NP
PB2
PB1
B
A
2022 2022.5 2023 2023.5 2024 2024.5
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
24
Fig. 3. Bayes factors for inferred movement between different discrete traits of H5N1 clade 2.3.4.4b
viruses demonstrating the frequency of movement. Bayes factors and posterior probabilities were plotted
to demonstrate the frequency and support for the movement of the clade 2.3.4.4b B3.13 HA gene between
different host species (A) or US locations with documented detections of 2.3.4.4b (B) with the source host 5
species or US state plotted on the y-axis, and the recipient host species or US state plotted on the x-axis.
This analysis was conducted on the subsampled and aligned hemagglutinin dataset (A), and an extracted
subset of the data that covered the monophyletic clade of cattle HA clade 2.3.4.4b genes (B). When
imposing a Bayes factor > 3 with an associated posterior probability greater than 75%, there were supported
transitions from Texas to Kansas, New Mexico, and Michigan, and from Michigan to North Carolina. (C) 10
Transitions between locations were inferred using a discrete trait model across time and using 14 -day
timepoints of the spatial spread corresponding to beginning, middle, and end of our dataset. T he centroid
of each state was used : for Texas we located the point in the panhandle where dairy cattle are commonly
farmed. The circular polygon reflects the number of lineages within a location and eastward movements
are depicted by lines with a downward curve, while westward movements are depicted by lines with a n 15
upward curve.
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
25
Table 1. Sequence variants detected in functionally relevant sites within H5N1 2.3.4.4b strains
isolated in cattle that have been associated with pathogenicity, host -adaptation, and virulence. Raw
read data from c attle samples were processed and high - and low -frequency single nucleotide variants
(SNVs) were identified relative to the most recent common ancestor of the phylogenetic group in this study.
The SNVs that induced a coding region change were screened against a database of positions associated 5
with functional change with a relevant selection shown here. The number of cattle samples with the SNV
were enumerated, the mean allele frequency was calculated, the presence of the mutation within the
consensus gene sequence was determined, and the variants detected at low frequencies were counted.
Gene
Coding
region
change Functional type
Cattle with
variant (#)
Mean
allele
frequency
Consensus
sequence
Low
frequency
variants
HA E91K increase human adapt 1 0.05 0 1
HA S137F change human adapt 1 0.375 0 1
HA Q154R change pathogenicity 6 0.012 0 6
HA 172A increase human adapt 213 0.999 213 0
HA A172T decrease human adapt 9 0.891 8 1
HA N209T change human adapt 1 0.012 0 1
HA Q234K/R change virulence 9 0.044 0 9
HA G240R increase human adapt 1 0.025 0 1
HA S336N change pathogenicity 17 0.885 15 2
HA P337L increase virulence 8 0.715 5 2
MP R77K change pathogenicity 1 0.006 0 1
MP S207G decrease pathogenicity 4 0.013 0 4
NA A55T virulence / host range 222 0.997 222 0
NA E57G virulence / host range 7 0.016 0 7
NA V67I virulence / host range 222 0.997 222 0
NA N70S/D virulence / host range 221 0.99 221 2
NA N71S virulence / host range 88 0.999 88 0
NA T438A/I increase antiviral resist 3 0.627 2 1
NP V105M decrease pathogenicity 221 0.998 221 0
NS V205I/G change virulence 222 0.998 222 0
NS E229K change virulence 18 0.968 18 0
PB2 V495I change mammal adapt 211 0.997 211 0
PB2 E627K increase virulence/adapt 1 0.327 0 1
PB2 M631L increase mammal adapt 214 0.999 214 0
PB2 D701N increase pathogenicity 2 0.015 0 2
10
and is also made available for use under a CC0 license.
was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted May 1, 2024. ; https://doi.org/10.1101/2024.05.01.591751doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.