Reference
Unit, Public Health Microbiology - Reference Microbiology Division, 18
Chief Scientific Officer’s Group, UKHSA, UK 19
6. NIHR Oxford Biomedical Research Centre, Oxford, UK 20
7. Oxford University Hospitals NHS Foundation Trust, Oxford, UK 21
22
Corresponding author: 23
Dorottya Nagy (
[email protected]) 24
Keywords
Bacterial genomics, Escherichia coli, Klebsiella spp., long-read sequencing, 25
genome assembly 26
Repositories: Long and short-read sequencing data has been deposited in ENA 27
(BioProject accession: PRJEB93885). Code used for bioinformatic and statistical 28
analyses has been uploaded to GitHub 29
(https://github.com/oxfordmmm/NEKSUS_ont_hybrid_assembly_comparison). 30
Summary data files have been uploaded to FigShare 31
(https://figshare.com/account/home#/projects/253775). 32
Nanopore long-read only genome assembly of clinical
Enterobacterales isolates is complete and accurate
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Abstract
33
Whole bacterial genome sequence reconstruction using Oxford Nanopore Technologies 34
(“Nanopore”) long-read only sequencing may offer a lower-cost, higher-throughput alternative 35
for pathogen surveillance to ‘hybrid’ assembly with recent improvements in Nanopore 36
sequencing accuracy. We evaluated the accuracy, including plasmid reconstruction, of 37
Nanopore long-read only genome assemblies of Enterobacterales. 38
We sequenced 92 genomes from clinical Enterobacterales isolates, collected in 39
England under a national surveillance program, with long-read Nanopore (R10.4.1, Dorado 40
v5.0.0 super-high-accuracy basecalled) and short-read Illumina (NovaSeq) sequencing 41
approaches. Genomes were assembled using three long-read only (Flye; Hybracter long; 42
Autocycler), and three hybrid assemblers (Hybracter hybrid; Unicycler normal; bold). Three 43
polishing modalities (Medaka v2 with subsampled or un-subsampled long-reads; Polypolish + 44
Pypolca with short-reads) were investigated. 45
Autocycler circularised the most chromosomes (87/92 [95%]). Plasmid sequence 46
reconstruction was comparable between all assemblers except Flye, all recovering 90-96% of 47
plasmids, although the ‘ground truth’ was uncertain. Flye performed worse than other 48
assemblers on almost all metrics. Autocycler + Medaka (un-subsampled long-reads) was the 49
most accurate long-read only assembler/polisher combination, comparable to hybrid 50
assemblies (median 0 [IQR:0-0] SNPs and 0 [IQR:0-1] indels per genome; quality value/Q score, 51
100 [IQR: 64-100]), with only 4/92 genome sequences having >10 SNPs/indels. Medaka 52
polishing with un-subsampled long-reads resulted in small improvements in indels but not 53
SNPs for both Flye and Autocycler assemblies. Seven-locus MLST, antimicrobial resistance, 54
virulence, and stress gene annotation was equivalent across assembler/polisher combinations. 55
Nanopore long-read only bacterial genome assembly with Autocycler combined with 56
Medaka polishing (using un-subsampled reads) is similarly accurate and possibly more 57
complete than hybrid assemblies, representing a viable alternative for incorporating high-58
quality genomic data, including plasmids, into Enterobacterales surveillance. 59
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Data Summary 60
Nanopore long-reads and Illumina short-reads from the 92 Enterobacterales isolates 61
from this study have been uploaded to ENA (BioProject accession: PRJEB93885). Code for the 62
Nextflow assembly pipeline, downstream analysis scripts, and R statistical analysis scripts are 63
available on GitHub 64
(https://github.com/oxfordmmm/NEKSUS_ont_hybrid_assembly_comparison). The following 65
supplementary data tables are available on FigShare 66
(https://figshare.com/account/home#/projects/253775): 67
• ENA Sample accessions and sample metadata (accessions_and_metadata.csv) 68
• Seqkit stats summaries of the Illumina and Nanopore reads (raw_qc_sup.cav) 69
• Summary of assembly contig features (contigs_summary_sup_cleaned.csv) 70
• Pairwise mash distances between contigs (mash_cleaned.csv) 71
• Plasmids matching across different assemblers compared to the Hybracter (hybrid) 72
and manually-curated reference sets (plasmids_match_hybracter_mash.csv; 73
plasmids_match_manual_mash.csv, respectively) 74
• Seven-locus multi-locus sequence type annotation (mlst_cleaned.csv) 75
• CheckM2 summaries of assemblies (checkm2_cleaned.csv) 76
• Nucleotide-level accuracy of assemblies (SNP , Indels, and Quality value compared 77
to short-read mapping; assembly_nucleotide_accuracy_cleaned.csv) 78
• Bakta annotation (bakta_by_contig_cleaned.csv) 79
• AMRFinderPlus annotations of contigs (amrfinder_plus_cleaned.csv) 80
• MOB-suite annotation summaries of contigs (mobsuite_cleaned.csv) 81
Impact Statement 82
Nanopore long-reads have historically been too error-prone to use alone for accurate 83
bacterial genome assembly, necessitating additional Illumina short-reads to achieve 84
structurally complete and accurate ‘hybrid’ genome assemblies for public health surveillance. 85
This increases cost and complexity. Previous studies have shown that recent improvements in 86
Nanopore chemistry (R10.4.1 flowcell) and basecalling (super-high accuracy) allow high-quality 87
long-read only assemblies on a small number of laboratory reference strains. This is the first 88
evaluation, to our knowledge, to assess Nanopore long-read only genome assembly compared 89
with hybrid assembly on a large number of clinical isolates. In addition, this is the first large-90
scale evaluation of the recently released automated consensus long-read assembly tool, 91
Autocycler. 92
We show that Autocycler long-read only assemblies are more structurally complete for 93
chromosomal sequences, while reconstructing a similar number of plasmids to other long-read 94
and hybrid assemblers. Most long-read polished, Autocycler-assembled genome sequences 95
have 0 errors (median: 0 SNPs/indels) relative to a short-read polished (hybrid) Autocycler 96
assemblies, enabling accurate annotation of key genes. 97
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Introduction
98
Hybrid assembly combining short- and long-read genomic sequencing is widely used in 99
research to assemble complete and accurate bacterial genome sequences. Incremental 100
improvements in Nanopore flowcells/chemistry (10.4.1 flowcell/kit 14) and basecalling 101
accuracy (Dorado v5.0.0 super-high accuracy DNA model)(1-5) have been shown in small-scale 102
evaluations to facilitate long-read only assemblies that may now be comparable in accuracy to 103
hybrid assembly(6, 7). Nanopore-only sequencing may also offer advantages over hybrid 104
sequencing, including cost effectiveness, real-time data generation and decentralised 105
implementation(8, 9). 106
Highly accurate bacterial genome reconstruction, with minimal noise from sequencing 107
artefact, is key for identifying closely-related clusters of isolates and plasmids for outbreak 108
detection(10). Accurate reconstruction of mobile genetic elements (MGEs) such as plasmids in 109
particular, is clinically and epidemiologically important as plasmids are common transmission 110
vectors for antimicrobial resistance (AMR) genes in clinically-relevant Enterobacterales(11, 12). 111
Long-read or hybrid assembly approaches can facilitate plasmid sequence reconstruction and 112
therefore analysis of AMR gene epidemiology compared to short-reads, which may not be able 113
to resolve highly repetitive sequences often associated with MGEs(13, 14). Nevertheless, 114
Nanopore-only genome assembly accuracy has only been validated for a small number of 115
Reference
bacterial isolates(15, 16), and has not yet been assessed on a large collection of 116
clinical isolates, including for plasmids as well as chromosomes. This may be important 117
because of the reliance of long-read basecalling models on training datasets of unknown size 118
and diversity, whose performance may therefore generalise poorly to clades not included in 119
these training datasets. Similarly, although best-practice assembly guidelines have been 120
proposed(6, 17, 18), multiple long-read assembly pipelines implement these guidelines with 121
slight variations(16, 19-22), and no robust consensus exists, particularly regarding the optimal 122
strategy for plasmid assembly. 123
In this study, we comprehensively evaluated the completeness and accuracy of 92 124
Nanopore long-read only assemblies (with and without polishing) compared to hybrid assembly 125
in reconstructing both chromosomes and plasmids using isolates collected in The National 126
Escherichia coli and KlebSiella spp. bloodstream infection (BSI) and Carbapenemase-127
producing Enterobacterales (CPE) UK Surveillance (NEKSUS) study. 128
Methods
129
Isolate collection 130
Nine English NHS Trusts (groups of hospitals under the same administration) 131
representing the largest in terms of number of emergency admissions across all seven NHS 132
England regions were recruited to the NEKSUS consortium. Consecutive, unselected BSI and 133
CPE-positive rectal screening isolates were collected between October 2023 and March 2024 134
as part of routine clinical practice. One convenience sample of the first 96 Enterobacterales 135
isolates collected, mostly E. coli and Klebsiella spp. (Table S1), sequenced from three regions, 136
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
were included in this analysis as our isolates were sequenced in batches of 96. Isolates were 137
stored in brain-heart infusion (BHI) broth with 10% glycerol at -70C, then grown on blood agar 138
for 24h at 37C, following which a colony sweep of the pure bacterial culture was suspended in 139
1 ml phosphate buffer saline, pelleted, and cold-packed. Bacteria were subcultured for a further 140
24h at 37C where there was insufficient growth after 24h. 141
DNA extraction and sequencing 142
DNA extraction, library preparation and sequencing were conducted at GENEWIZ 143
Germany GmbH (Leipzig, Germany). DNA was extracted using the MagMAX Microbiome Ultra 144
Nucleic Acid Isolation Kit with bead plate (Life Technologies, Carlsbad, CA, USA). Genomic DNA 145
was quantified using the Qubit 4.0 Fluorometer and qualified using the Agilent 5600 Fragment 146
Analyzer. The same DNA extract was sequenced by both methods. 147
For Nanopore sequencing the Rapid Barcoding Kit 96 V14 (Oxford Nanopore 148
Technologies, Oxford, UK) was used according to the manufacturer's recommendations. Briefly, 149
sequencing libraries were generated using a transposase, which simultaneously cleaves 150
template molecules and attaches barcoded tags to the cleaved ends. The barcoded samples 151
were then pooled (96-plexed) before solid phase reversible immobilisaton (SPRI)-cleaning and 152
addition of Rapid Adapters to the tagged ends. The library pools were loaded onto ONT 153
PromethION flow cells (R10 [M Version]) – one 96-plex pool per flow cell – and sequenced on a 154
PromethION P2 Solo for 72 hours according to the manufacturer's instructions. 155
For Illumina sequencing the NEBNext Ultra II DNA Library Prep Kit for Illumina (New 156
England Biolabs, Ipswich, MA, USA), including clustering and sequencing reagents, was used 157
according to manufacturer's recommendations. Briefly, the genomic DNA was fragmented by 158
acoustic shearing with a Covaris LE220 instrument. Fragmented DNA was cleaned up and end 159
repaired. Adapters were ligated after adenylation of the 3’ ends followed by enrichment by 160
limited cycle PCR. DNA libraries were validated using the Agilent TapeStation (Agilent 161
Technologies, Palo Alto, CA, USA), and were quantified using a Qubit 4.0 Fluorometer. The 162
libraries were multiplexed on a flowcell and loaded on the Illumina NovaSeq X Plus instrument 163
according to manufacturer's instructions. The samples were sequenced using a 2x150bp 164
paired-end (PE) configuration. Raw sequencing data (.bcl files) generated from Illumina 165
NovaSeq were converted into fastq files and de-multiplexed using Illumina's bcl2fastq(23) v2.20 166
software. 167
Bioinformatic analysis 168
Computational analysis was performed on a virtual machine in the Oracle Cloud 169
Infrastructure. POD5 files were basecalled and demultiplexed using Dorado(24) v5.0.0 (super 170
high accuracy 5mCG, 5hmCG and 6mA methylation aware simplex DNA model). All 171
bioinformatic tools were run using default settings unless otherwise specified. Raw-read quality 172
was evaluated with SeqKit(25) v2.9.0. Long-reads were randomly subsampled to 60x using the 173
built-in subsampling and genome size estimation scripts from Autocycler(20) v0.2.1, and short-174
reads were randomly subsampled to 100x (50x for each paired-end read) with 175
Rasusa(26) v2.1.0. Genome sequences were assembled using three long-read only assemblers 176
(Flye(27) v2.9.5, Hybracter(19) (long) v0.11.2, the consensus assembler Autocycler(20) v0.2.1), 177
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
and three hybrid assemblers (Hybracter(19) (hybrid), Unicycler(7) v0.5.1 (normal and bold 178
modes)). The input long-read assemblies used for Autocycler were four assemblies each of 179
Canu(28) v2.2, Flye(27), Raven(29) v1.8.3, Miniasm(30) v0.3, and Hybracter(19) (long) (which 180
incorporates the plasmid assembly tool Plassembler(31)), where each of the four assemblies 181
was derived from a randomly subsampled set of reads. The Flye and Hybracter (long) 182
assemblies from the first subsampled read set were used in downstream analyses. Three 183
polishing modalities were investigated: long-read polishing with one round of Medaka(32) v2.0.1 184
using 1) subsampled long-reads, 2) un-subsampled long-reads, or 3) short-read polishing with 185
Polypolish(33) v0.6.0 and Pypolca(17, 34) v0.3.1 (‘--careful’ flag; Fig. 1). 186
Assembly quality control 187
Quality control of assemblies was done using SeqKit(25) stats and CheckM2(35-37) 188
v1.0.2, excluding isolates where any assembly for that isolate had 5% contamination. 4/96 (4%) isolates had >5% ‘contamination’ based on the checkM2 output, 190
likely corresponding to mixed isolate sequences (i.e. not pure cultures), so were excluded from 191
subsequent analyses. The remaining 92/96 (96%) pure-culture isolates passed the 192
completeness threshold. 193
Assembly annotation 194
Assemblies from all 12 assembler/polisher combinations were annotated using 195
Bakta(38) v1.10.4, 7-locus MLST (mlst(39) v2.23.0), AMRFinderPlus v4.0.3 (species flag inferred 196
from Kraken2(40) v2.1.3) and MOB-suite(41, 42) v3.1.9 (mob_recon and mob_typer). 197
Chromosome evaluation 198
Assemblies from the six different assemblers (without polishing) were evaluated for 199
structural completeness of chromosomes and plasmids, as polishing is not expected to alter 200
structure. Chromosomes were considered 'fully reconstructed' if the chromosomal contig was 201
>4Mb and circularised. 202
Plasmid evaluation 203
Contigs ≥1,000bp and ≤400,000bp in length were considered potential plasmids. Mash 204
distances between all potential pairwise plasmid combinations were calculated using Mash(43, 205
44) v2.3 (k-mer size = 21, sketch size 10,000,000). 206
Plasmid reconstruction was assessed by comparing with two alternative ‘reference’ 207
plasmid sets generated from the assembly data in this study, due to the absence of a ‘ground 208
truth’ for these isolates. The first ‘reference’ plasmid set included all circular potential plasmids 209
recovered by Hybracter (hybrid), which incorporates the plasmid assembly tool 210
Plassembler(31), recommended in best-practice assembly guidance(6). The second ‘reference’ 211
plasmid set was created using a manually-curated consensus approach considering all six 212
assemblies for each isolate. This latter manually-curated reference set was constructed by 213
matching each potential plasmid contig from the six assembly methods to its most similar 214
contig from each other assembler based on mash distance, forming a network with all pairwise 215
assembler combinations. The R package igraph(45, 46) v2.1.4 was used to extract connected 216
components (sub-networks within each sample with at least one mash-distance connection 217
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
between nodes). Each connected component was assigned a ‘match-set’ ID. Three (out of 303) 218
match-sets (connected components) contained more than one contig per assembler, and were 219
corrected manually (two were likely partial plasmids and one was likely a chimeric Unicycler 220
(bold) plasmid that joined two separate plasmid match-sets together; data not shown). ‘True’ 221
match sets were retained in the manually-curated reference set where at least two assemblers’ 222
contigs were present, circular, of similar length (±10%) and had a low mash distance (<0.025). 223
The 0.025 mash distance threshold reflects the highest possible mash distance between draft 224
and complete plasmid assemblies of the same plasmid from the original MOB-suite 225
publication(42). 226
Plasmid reconstruction for each assembler was then evaluated, for the Hybracter 227
(hybrid) reference set, by matching potential plasmid contigs to each reference plasmid set 228
based on circularity (i.e. circular or linear), length (±10%), and mash distance (<0.025). 229
Plasmids were ‘present’ if all three match criteria were met, or ‘misassembled’ if at least one of 230
the criteria were not met. Plasmids were ‘absent’ if none of the criteria were met, if only the 231
circularity matched (but not length or mash distance), or if no contig from an assembler could 232
be matched to that set. For the manually-curated reference set, where no single reference 233
plasmid was available, mash distance and length similarity criteria were fulfilled if an 234
assembler’s plasmid matched more than half of the other plasmids in a match set (see 235
supplementary data file plasmids_mash_manual_mash.csv). 236
Nucleotide-level accuracy 237
Nucleotide-level accuracy was assessed in a reference-free manner by aligning Illumina 238
short-reads to the 12 assembler-polisher combinations using the Pypolca(17) in-built read 239
aligner and variant caller (BWA(47) 0.7.18 and Freebayes(48) v1.3.6). Single nucleotide 240
substitutions (SNPs), short insertions/deletions (indels) and quality value (QV) were extracted 241
from the .vcf output file. QV, like Phred score, is a measure of accuracy where higher QV signifies 242
a more accurate consensus (QV = -10 * log10(probability of error), where a 0-error probability 243
takes the value of Q100). Mean gene length was extracted from CheckM2(37) as a further 244
measure of accuracy. Errors may introduce premature stop codons and are thus expected to 245
reduce the length of coding sequences(38). 246
Statistical analyses and visualisations 247
Statistical analysis and visualisation were done in R(49) v4.4.1 using ggplot2(50) v3.5.1 248
and other tidyverse(51) v1.3.1 functions, gridExtra(52) v2.3, cowplot(53) v1.1.3, psych(54) v2.5.6 249
and irr(55) v0.84.1 packages. Global test for uneven proportions in categorical variables was 250
done using the multiple-group Fleiss’ Kappa test, and for continuous variables, with a Friedman 251
test to account for non-independence between different assemblers’ ‘observations’ on the 252
same isolate. Pairwise test between assemblers for differences in proportions were done using 253
McNemar’s Χ2-test with continuity correction and for differences in counts, with Wilcoxon 254
signed-rank tests. A Bonferroni correction was applied to all pairwise testing to account for 255
multiple testing. An exact binomial test was used to test for a significant difference to 1 for the 256
proportion of plasmids reconstructed compared to the Hybracter (hybrid) reference set. 257
Clinker(56) v0.0.31 was used to visualise plasmid alignments using the Bakta(38) v1.10.4 258
annotated .gbff files. 259
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Figure 1: Schematic diagram of assembly, polishing and downstream analysis pipeline. 260
261
Nanopore
long-reads
seqkit stats
Subsampling x1
(Rasusa)
QC and Subsampling
Assembly
Polishing
Illumina
short-reads
Subsampled short-reads
Subsampling x4
(Autocycler scripts)
Flye 02
Flye 03
Flye 04
Raven 02
Raven 03
Raven 01
Raven 04
Minasm 02
Minasm 03
Minasm 01
Minasm 04
Hybracter (long) 02
Hybracter (long) 03
Hybracter (long) 01
Hybracter (long) 04
Canu 02
Canu 03
Canu 01
Canu 04
Hybracter (hybrid)
Unicycler (normal)
Unicycler (bold)
Autocycler
Subsampled long-reads 01
Subsampled long-reads 02
Subsampled long-reads 03
Subsampled long-reads 04
Flye
(unpolished)
Flye + Medaka
(subsampled)
Flye + Medaka
(un-subsampled)
Autocycler
(unpolished)
Autocycler + Medaka
(un-subsampled)
Autocycler +
Polypolish + Pypolca
Flye + Polypolish +
Pypolca
Flye 01
Structure
evaluation:
- Chromosome
- Plasmids
- MOB-suite
- Mash
Accuracy
evaluation:
- SNPs
- Indels
- CheckM2
- MLST
- AMRFinder
- Bakta
Autocycler + Medaka
(subsampled)
Medaka
Medaka
Polypolish + Pypolca
Medaka
Medaka
Polypolish + Pypolca
seqkit stats
Raw QC
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Results
262
Raw sequences 263
High sequencing depth and quality was achieved for both Illumina short- and Nanopore long-264
reads 265
Over 200x sequencing depth was achieved for both Illumina and Nanopore reads (Table 266
S2). Median long-read length was 5814bp (IQR: 5366-6338), and median estimated Phred 267
quality score was 16.6 (IQR: 16.4-16.8). Subsampling did not affect median read length or read 268
quality (Table S2; Fig. S1). 269
Structural completeness 270
Chromosome reconstruction was optimal using the consensus long-read only assembler, 271
Autocycler 272
Autocycler circularised the most chromosomal sequences, 95% (87/92), significantly 273
more than Unicycler (80% [74/92], pairwise McNemar’s p=0.006), Unicycler bold (85% [78/92], 274
p=0.039) and Flye (85% [78/92], p=0.027), Hybracter (hybrid) (86% [79/92], p=0.043), while there 275
was no statistical evidence of a difference to Hybracter (long) (87% [80/92], p=0.070; Table 1; 276
Fig.2a). Notably, for two isolates that were correctly assembled by all other assemblers, 277
Autocycler failed to generate a circular consensus chromosome (Fig. 2a), producing highly 278
fragmented draft assemblies instead. 279
Plasmid reconstruction was improved by Autocycler or Hybracter compared with Flye 280
Given the absence of a ‘ground truth’ for plasmids in the sequenced isolates, we 281
considered two ‘reference’ plasmid sets generated from the assembly data. The first was the 282
Hybracter (hybrid) reference set, and the second, a manually-curated reference set considering 283
potential plasmids across all assemblers. All plasmids from the Hybracter (hybrid) reference 284
set (n=278) were present in the manually-curated set. However, the manually-curated set 285
included an additional 25 plasmids (total 303 vs 278 plasmids), which were missing from the 286
Hybracter (hybrid) reference set, mostly due to being non-circular (17/25, 68%), or non-circular 287
and of different length (3/25, 12%), while 5/25 (20%) plasmid sets could not be matched to any 288
Hybracter (hybrid) contigs not already in another match set (all pairwise mash distances >0.2; 289
Table S3). 290
Compared with the Hybracter (hybrid) reference set, Flye reconstructed significantly 291
fewer plasmids than all the other assemblers (56% [156/278]; exact binomial test p<0.0001 vs 292
100% reconstructed by Hybracter (hybrid) and McNemar’s p<0.0001 vs Autocycler, Hybracter 293
(long), Unicycler, and Unicycler (bold)). Among the remining assemblers, 93-96% of plasmids 294
were reconstructed, which was significantly fewer than 100% of the Hybracter (hybrid) 295
Reference
set (all exact binomial test p<0.0001; Table 1; Fig. 2b). There was no evidence of a 296
difference between the 96% (267/278) of plasmids reconstructed by Autocycler compared to 297
the other assemblers besides Flye (Hybracter (long) 96% [268/278], McNemar’s p=1 vs 298
Autocycler, Unicycler 96% [266/278], p=1 and Unicycler (bold) 93% [258/278], p=0.095). 299
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Similarly, compared with the manually-curated reference set, Flye reconstructed 300
significantly fewer plasmids than all other assemblers (55% [166/303]; pairwise McNemar’s 301
p<0.0001 vs each of the five other assemblers). Flye more frequently missed or misassembled 302
small, <10,000bp, plasmids (Fig. 2c; S2b), and incorrect length was the most common reason 303
for Flye plasmid misassembly (Table 1; S2). Among the remaining assemblers, 90-94% of 304
plasmids were reconstructed compared to the manually-curated reference set. Autocycler 305
reconstructed 94% (285/303) of plasmids, significantly more than Hybracter (long) (90% 306
[272/303]; McNemar’s p=0.014). However, there was no evidence of a difference between the 307
number of plasmids reconstructed by Autocycler compared to the other assemblers: Hybracter 308
(hybrid) (91% [276/303]; McNemar’s p=0.066 vs Autocycler), Unicycler (93% [282/303]; p=1), or 309
Unicycler (bold) (90% [274/303]; p=0.296; Table S3; Fig. S2a). 310
Of the 10 Autocycler plasmids with a mash distance of 0 to the corresponding Hybracter 311
(hybrid) plasmid, 2/10 had a missing MOB-suite IncFIC replicon annotation despite identical 312
sequence (Fig. S3). In both cases, the Autocycler plasmid was reversed (i.e. the reverse 313
complement strand was represented in the fasta file) compared with the other plasmids. The 314
Flye plasmid sequence was also missing an IncFIC annotation in one of these two plasmids; 315
however, this difference was not observed in the other 232 plasmids across other assemblers 316
with a mash distance of 0 to the Hybracter (hybrid) reference. 317
Table 1: Chromosomal sequence circularisation and accuracy of plasmid sequence 318
reconstruction for different assemblers using Dorado v5.0.0 super-high accuracy 319
basecalled Nanopore long-reads. Plasmid sequence reconstruction was compared with the 320
Hybracter (hybrid) plasmid reference dataset, defined as circular contigs ≤400,000bp and 321
≥1,000bp assembled by Hybracter (hybrid)(n=278) across 92 Enterobacterales isolates 322
analysed; the denominator for plasmids was therefore 278 throughout. 323
Assembler
Autocycler
n (%)
Flye
n (%)
Hybracter
(long)
n (%)
Hybracter
(hybrid)
n (%)
Unicycler
n (%)
Unicycler
(bold)
n (%)
p-value†
Chromosomes circularised (N=92) 87 (94.6%) 78 (84.8%) 80 (87.0%) 79 (85.9%) 74 (80.4%) 78 (84.8%) <0.0001
Present* plasmids (N=278)
267
(96%)
156
(56.1%)
268
(96.4%)
278
(100%)
266
(95.7%)
258
(92.8%)
0.002
Misassembled** plasmids (N=278)
Non-circular 0 (0%) 16 (5.8%) 2 (0.7%) 0 (0%) 4 (1.4%) 2 (0.7%)
Length mismatch 1 (0.4%) 41 (14.7%) 1 (0.4%) 0 (0%) 1 (0.4%) 1 (0.4%)
Mash distance >0.025 0 (0%) 1 (0.4%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Non-circular & length mismatch 0 (0%) 17 (6.1%) 2 (0.7%) 0 (0%) 3 (1.1%) 6 (2.2%)
Absent plasmids (N=278) 10 (3.6%) 47 (16.9%) 5 (1.8%) 0 (0%) 4 (1.4%) 11 (4%)
*’Present’ plasmids are defined as contigs meeting all three match criteria: circular, length within 10% and mash 324
distance <0.025 of a Hybracter hybrid reference plasmid. 325
**Misassembled plasmids are defined as contigs that failed to meet at least one of the matching criteria, or were 326
non-circular and a different length (>10% difference). 327
†p-value for Fleiss’ Kappa test for uneven proportions of circularised chromosomes or ‘present’ plasmids across all 328
assemblers. 329
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Figure 2: Structural completeness of 92 pure culture Enterobacterales genome sequences assembled by
different long-read only and hybrid assemblers. Genome sequences were assembled using Dorado v5.0.0
super-high accuracy basecalled Nanopore long-reads, plus Illumina short-reads for hybrid assembly. a)
Number and percentage of isolates with a fully circularised chromosome (dark-coloured tiles) or an
incompletely circularised chromosome (light cream tiles) by assembler. b) Upset plot of plasmid assembly
status combinations across assemblers. Plasmid sequence reconstruction (assembly status) is compared to a
Hybracter (hybrid) plasmid reference dataset, defined as circular contigs ≤400,000bp and ≥1,000bp assembled
by Hybracter (hybrid)(n=278) across the 92 Enterobacterales isolates analysed. Dark circles represent
‘present’ plasmids where length (±10%), mash distance (10%, mash distance >0.025, or the contig was non-circular and the palest shades indicate absent plasmids,
where no contig was found matching other plasmids in the reference plasmid set. c) Frequency polygon of
length distribution of ‘present’ plasmids by assembler.
a)
c)
b)
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Assembly accuracy 330
Unpolished Autocycler assemblies are more accurate than non-consensus long-read 331
assemblers, while differences compared with hybrid assemblers are small 332
Autocycler was the most accurate long-read only assembler, with 37% of unpolished 333
assemblies (34/92) having 0 SNPs or indels when compared with 11% (10/92) for unpolished 334
Flye and 7% (6/92) for Hybracter (long). For unpolished Autocycler, this equated to a median of 0 335
SNPs/Mb (IQR: 0-0.17) and 0.18 indels/Mb (IQR:0-0.39), and a median quality value (QV) of Q67 336
(IQR:63-100; Fig. 3a-c; Table S4). The differences in accuracy between unpolished Autocycler, 337
unpolished Flye or Hybracter (long) were significant (pairwise Wilcoxon signed rank p<0.0001 338
for SNPs, indels and QV), while there was no evidence of a difference in accuracy between 339
unpolished Autocycler and Unicycler (normal or bold mode; p=1 for all metrics). There was no 340
evidence of a difference between Flye and Hybracter (long) assemblies (Fig. 3a-c; Table S4). 341
Medaka long-read polishing offers small improvements in accuracy for long-read assemblies, 342
although short-read polishing is still marginally more accurate 343
Medaka long-read polishing (with un-subsampled reads) improved accuracy for 344
Autocycler and Flye by improving QV and reducing indels (from median Q67 to Q100 [Wilcoxon 345
signed rank p=0.007], and Q61 to Q67 [p<0.0001], and 0.18 indels/Mb to 0 [p=0.006], and 0.57 346
indels/Mb to 0.17 [p<0.001], respectively), but there was no evidence of reducing SNPs (p=1 for 347
both Autocycler and Flye). There was some statistical evidence that Medaka long-read polishing 348
using un-subsampled long-reads was marginally better at reducing indels for Autocycler 349
assemblies than using subsampled reads (change vs Autocycler of median 0 indels/Mb [IQR: -350
0.19-0; range: -1.64-3.61] for un-subsampled reads, compared to a change of 0 [IQR: -0.18-0; 351
range: -1.09-7.60] indels/Mb, Wilcoxon signed rank p=0.019; Fig.3; Table S3). However, this very 352
small difference is not reflected in the medians/IQR of indels/Mb as most isolates had 0 indels 353
(57% [52/92] for Autocycler + Medaka [subsampled] and 65% [60/92] for Autocycler + Medaka 354
[un-subsampled]). 355
Short-read polished Autocycler assemblies were more accurate than the best long-read 356
polished Autocycler assemblies (Autocycler + Medaka [un-subsampled]) (change vs unpolished 357
Autocycler of median 0 [IQR: -0.16-0] SNPs/Mb, -0.18 [-0.39-0] indels/Mb, and Q32.6 (Q0-358
Q35.9) for short-read polishing vs median change 0 [0-0] SNPs/Mb, 0 [-0.19-0] indels/Mb, and 359
Q0 (Q0-Q6.15) for Medaka (un-subsampled) polishing, pairwise Wilcoxon signed rank p=0.0002, 360
p<0.0001 and p<0.0001, respectively; Fig 3; Table S4). However, the absolute difference was 361
small, and affected only the worst-performing quartile of isolates. The majority, 55% (51/92), of 362
Autocycler + Medaka (un-subsampled reads) polished assemblies had 0 errors (QV100), and 363
only 4% (4/92) of genome sequences had >10 SNPs or indels in the entire assembly, compared 364
with 95% (87/92) of short-read polished Autocycler assemblies having 0 errors and two genome 365
sequences with >10 SNPs or indels (Figs.3a-c; Table S4). 366
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Mean gene length is slightly shorter for Flye assemblies, and is not corrected by long or short-367
read polishing 368
Mean gene length was assessed as a further measure of accuracy, as small errors can 369
Result
in coding sequence truncation, and shorter average gene length. While there was some 370
statistical evidence of a difference in mean gene length between different assembler/polisher 371
combinations, with unpolished and long-read polished Flye assemblies having a slightly shorter 372
mean gene length compared to other assembler (Friedman’s p<0.0001; all pairwise Wilcoxon 373
signed rank p<0.0001-p=0.01 compared to all other assemblers), the difference was small in 374
magnitude (median of the mean gene length across all isolates of 312bp [IQR: 308-315bp] for 375
Flye + Medaka (subsampled) polishing, vs 312bp [309-316bp] for all other non-Flye assemblers; 376
Fig. 3d). 377
Gene annotation for MLST loci, resistance, virulence and stress genes is equivalent for long-read 378
and hybrid assemblies 379
There was no evidence of a difference in the numbers of key resistance, virulence and 380
stress genes identified by AMRFinder Plus in assemblies generated by any assembler/polisher 381
combination (Friedman’s p=0.209 for resistance, p=0.736 for virulence, and p=0.687 for stress 382
genes; all pairwise Wilcoxon signed-rank p=1; Table S4). There was high concordance between 383
assemblers on the presence/absence of specific gene variants (all pairwise McNemar’s 384
p>0.209). There was also no evidence of a difference in the proportion of isolates with correctly 385
assigned multi-locus sequence type (MLST; all pairwise McNemar’s p=1, Table S4). Hybracter 386
(long; hybrid), Unicycler (normal; bold), and polished Flye assemblies were annotated with 387
identical MLST-types for all 91 isolates belonging to a species with available MLST-typing 388
schemes (i.e. all isolates except one Serratia marcescens). A single locus in one isolate was 389
‘uncertain’ for the unpolished Flye assembly ((gapA(~2)), and another locus (gyrB(10)) was 390
duplicated in a different isolate amongst Autocycler assemblies. Polishing did not correct this 391
duplicated annotation, although the allele was correctly identified. 392
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Figure 3: Assembly accuracy for different assembler/polisher combinations. a) Single nucleotide
substitution errors (SNPs) and b) insertion/deletions (indels) identified by re-aligning Illumina short-reads, c)
quality value as annotated by Freebayes(48) from Pypolca(17) and d) mean gene length from CheckM2(37) of
12 different assembler/polisher combinations. The y-axes in a), b) and c) are transformed using a pseudo-log
scale to facilitate plotting zero values given log(0) is undefined.
a)
b)
c)
d)
393
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Discussion
394
We evaluated three long-read only bacterial genome assemblers, three hybrid 395
assemblers, and three polishers on 92 clinical Enterobacterales isolates. The consensus long-396
read assembler, Autocycler, produced the most structurally complete assemblies, circularising 397
95% of chromosomes. Plasmid reconstruction was comparable between all assemblers except 398
Flye, which underperformed compared with other assemblers for most metrics. Autocycler with 399
Medaka polishing was the most accurate long-read only assembler/polisher combination, with 400
a median of 0 SNPs/indels compared to what we consider the ‘gold-standard’ hybrid assembly 401
(i.e. short-read polished Autocycler assemblies). Long-read polishing of Autocycler and Flye 402
assemblies offered small improvements in accuracy compared to unpolished assemblies, 403
although short-read polishing still corrected marginally more errors. There was strong 404
agreement in the annotation of seven-locus MLSTs, resistance, virulence and stress genes, and 405
mean gene length across all assemblers. 406
It is not surprising that long-read assemblers circularise more chromosomes, as long-reads 407
can resolve repetitive regions that short-reads may not. This explains why the long-read first 408
hybrid assembler, Hybracter (hybrid), performed more similarly to other long-read assemblers 409
than Unicycler, which uses short-reads first to reconstruct overall structure. The ability of 410
Autocycler to circularise eight chromosomes where non-consensus assemblers failed supports 411
the utility of this software(57). Combining 20 input assemblies in Autocycler may reduce the 412
effects of stochastic variation in individual assemblers. The 2/92 isolates where Autocycler 413
produced fragmented assemblies, while its some input assemblies were complete, are 414
noteworthy. This result is perhaps attributable to regions of input assemblies that are too 415
divergent to resolve, and highlights the need for an iterative approach, where a ‘fallback’ option 416
is available in case of a highly fragmented Autocycler consensus assembly. This also 417
emphasises the importance of quality controls (e.g.: checkM2) to flag highly fragmented 418
assemblies, so that for these cases, manual curation of input assemblies, optimising 419
parameters in the consensus process, or reversion to complete input assemblies may improve 420
assembly. 421
Evaluation of chromosomal and plasmid sequence reconstruction is challenging due to the 422
absence of a ‘ground truth’. For plasmids specifically, there is a risk of mislabelling plasmids by 423
Methods
reliant on reference databases, which may be incomplete or contain misassembled 424
plasmids. We therefore considered two reference plasmid sets generated from the study data. 425
Compared with both reference sets, none of the six assemblers had ‘perfect’ concordance. Flye 426
performed poorly compared to all other assemblers, missing or misassembling ~45% of 427
plasmids compared with 4-10% for other assemblers. Flye struggled particularly with small 428
<10,000bp plasmids, as reported previously(16, 58). This emphasises the necessity of 429
consensus methods like Autocycler(57), and separate plasmid recovery tools like 430
Plassembler(31) to optimise plasmid reconstruction. The fact that Autocycler (including four 431
Hybracter (long) input assemblies) reconstructed a slightly different set of plasmids to a single 432
Hybracter (long/hybrid) assembly suggests complementarity between these methods, where 433
Autocycler can overcome potential issues related to stochastic variation in individual 434
assemblies. The replicon annotation differences between identical plasmids highlights the risks 435
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
of relying on plasmid-annotation tools like MOB-suite for plasmid identification(59), and 436
supports the use of network-based tools like PLING(60). 437
The small differences in nucleotide-level accuracy between long- and short-read polished 438
Autocycler assemblies are likely not in coding regions that are key for downstream analyses. 439
This is evidenced by the strong agreement in MLST profile, resistance, virulence and stress gene 440
annotations, and mean gene length between assemblers. 441
The advantage of our study is that we consider a relatively large sample of real-world, 442
clinically-relevant isolates. Specifically, our sample included predominantly E. coli and K. 443
pneumoniae, which are the two most important Gram-negative species in England in terms of 444
number of bloodstream infections and burden of AMR(61), and therefore our findings are 445
relevant to public health surveillance in this setting. However, a trade-off with this is the 446
absence of ‘ground truth’ sequences against which to evaluate our assemblies. Other 447
Limitations
include the empirical assessment of nucleotide-level accuracy, through aligning 448
short-reads to assemblies. Both SNPs and indels were still present in a small number of short-449
read polished assemblies, potentially representing a baseline level of errors in either Illumina 450
reads or read mapping, and leading to possible overestimation of the error rate of long-read only 451
assemblies. A further limitation is that the performance of Autocycler as a consensus method 452
depends on its input assemblies. Twenty input assemblies were used here, requiring substantial 453
computational time (13,428 CPUh), mostly due to generating assemblies, and resulted in a high 454
carbon footprint, equivalent to driving 164 miles (see Environmental Impact Statement). 455
Furthermore, a closed consensus chromosome was not achieved for 5% of isolates using 456
default settings. Optimisation of Autocycler input assemblies and parameters, such as 457
weighting contigs from certain ‘more reliable’ assembler, as done in more recent automated 458
Autocycler v5 pipelines(20), could thus reduce computational load and improve performance. 459
Incorporating a ‘fallback’ option in Autocycler pipelines, for example to revert to one of the 460
complete input assemblies in cases of a highly fragmented Autocycler consensus, may also be 461
of benefit. Finally, generalisability to other bacterial species is limited. Other species may be 462
less-well represented than E. coli and Klebsiella spp. in the machine-learning training datasets 463
for basecalling (Dorado) and polishing (Medaka) software, producing potentially different error 464
rates. 465
Conclusions
466
This assembly comparison is the first benchmarking study to demonstrate structural 467
completeness and accuracy of Nanopore super-high accuracy long-read only bacterial genome 468
assemblies on 92 clinical Enterobacterales isolates, compared with hybrid assembly. The 469
automated consensus long-read assembler, Autocycler, accurately reconstructed assemblies, 470
including plasmids, for these isolates, and is a promising tool for integrating Nanopore long-471
read only assemblies into an automatable computational pipeline for public health genomics. 472
Ongoing innovation in Nanopore sequencing technology and bioinformatic software may enable 473
further improvements and should continue to be evaluated by the bioinformatics community. 474
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Environmental Impact Statement 475
The Nextflow assembly pipeline used for this work ran in 72h on two AMD EPYC 9J14 96-476
Core Processors (188 total CPUs; 13,428 CPUh), and drew 124.46 kWh. Using Cloud 477
infrastructure based in the United Kingdom, this had a carbon footprint of 28.76 kgCO2e, 478
equivalent to 2.61 tree-years, or 164 km in a car (calculated using green-algorithms.org 479
v3.0(62)). This is a lower bound estimate of the carbon footprint of this work, as it does not 480
account for compute used in pipeline development, downstream statistical analyses, or the 481
energy required to power display screens. The carbon footprint and wider environmental impact 482
of sample processing shipping has also not been accounted for. 483
Conflict of interest 484
The authors have no conflicts of interest to declare. 485
Funding information 486
This study/research is supported/funded by the National Institute for Health Research 487
(NIHR) Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial 488
Resistance (NIHR207397), a partnership between the UK Health Security Agency (UKHSA) and 489
the University of Oxford. This work was also supported by the UKHSA and the NIHR Oxford 490
Biomedical Research Centre (BRC) and the UKHSA PhD Funding Competition. The cloud 491
compute infrastructure for this work was donated by Oracle Corporation Infrastructure. The 492
views expressed are those of the authors and not necessarily those of the NIHR, UKHSA or the 493
Department of Health and Social Care. 494
Ethical approval and consent to participate 495
This work has been reviewed and approved by the UKHSA Research Ethics & 496
Governance Group (reference NR0429). 497
Consent for publication 498
All authors give consent for publication of this work. No further consent for publication 499
was required as this work does not include patient identifiable information. 500
Author contributions 501
NS, SL, SH, DC, ASW, JR, KLH, AL, DW, RH and CSB were involved in conceptualisation, 502
funding acquisition, project administration, provision or resources and supervision. VP , GR, KH, 503
CRJ and NEKSUS consortium members were involved in isolate collection and processing. 504
Methodological development and validation of bioinformatic methods and software was done 505
by DN under the supervision of SL and NS. DN, SL and NS were involved with data curation, 506
analysis, investigation, visualisation and writing/editing. All authors approved the final draft 507
Acknowledgments 508
The authors would also like to acknowledge all participating laboratories in the NEKSUS 509
consortium who were responsible for isolate collection, Zeynab Yusuf from UKHSA for her role 510
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
in sample transportation from UKHSA to Oxford, laboratory and bioinformatician colleagues at 511
the Modernising Medical Microbiology Unit at the University of Oxford for support in 512
methodological development and execution, as well as GENEWIZ Germany GmbH (Leipzig, 513
Germany) for performing long- and short-read sequencing. 514
Individuals within the NEKSUS consortium group authorship are (listed alphabetically): 515
- Alan McNally (University Hospitals Birmingham NHS Foundation Trust) 516
- Caroline Cullerton (The Newcastle-upon-Tyne Hospitals NHS Foundation Trust) 517
- Gabriella Shanks (Barts Heath NHS Trust) 518
- James Price (University Hospital Sussex NHS Foundation Trust) 519
- Jasvir Nahl (Leeds Teaching Hospitals NHS Trust) 520
- Jenny Bradbury (UKHSA) 521
- Jonathan Lambourne (Barts Health NHS Trust) 522
- Julie Samuel (The Newcastle-upon-Tyne Hospitals NHS Foundation Trust) 523
- Jumoke Sule (UKHSA/ Cambridge University Hospitals NHS Foundation Trust) 524
- Ian Butler (Barts Health NHS Trust) 525
- Kavita Sethi (Leeds Teaching Hospitals NHS Trust) 526
- Mark Garvey (University Hospitals Birmingham NHS Foundation Trust) 527
- Martin Williams (University Hospitals Bristol and Weston NHS Foundation Trust) 528
- Nicholas Brown (Cambridge University Hospitals NHS Foundation Trust) 529
- Nicola Childs (North Bristol NHS Trust) 530
- Paul Randell (University Hospital Sussex NHS Foundation Trust) 531
- Poorvi Patel (Cambridge University Hospitals NHS Foundation Trust) 532
- Samuel Stafford (North Bristol NHS Trust) 533
- Samuel Tetley (University Hospital Sussex NHS Foundation Trust) 534
- Simon Eccles (Manchester University Hospitals NHS Foundation Trust) 535
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Supplementary Figures 536
Supplementary Figure S1: Quality control metrics of raw and subsampled Illumina short-537
reads and Dorado v5.0.0 super accurate basecalled Nanopore long-reads. Showing long-538
read subsampled set 1 (of 4) for the 92 pure culture isolates. N50 and N50_num (or L50) are 539
both measures of sequence contiguity(63). N50 is the sequence length of the shortest contig at 540
50% of the total assembly length. N50_num is defined as the count of the smallest number of 541
contigs whose added length makes up at least half of genome size. 542
543
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Supplementary Figure S2: Plasmid sequence reconstruction for 92 Enterobacterales 544
isolates by different long-read only and hybrid assemblers, using the manually-curated 545
consensus ‘reference’ plasmid set (n=303 plasmids). Reference plasmids in the manually 546
curated set are circular contigs between 1,000-400,000bp in length that are present in at least 2 547
assemblers with a matching length (±10%) and mash distance (<0.025). a) Upset plot showing 548
assembly status combinations of plasmids across assemblers. Dark circles/bars indicate 549
‘present’ plasmids where length (±10%), mash distance (10%, mash distance >0.025, or the contig was non-circular and the palest 552
shades indicate absent plasmids, where no contig was found matching other plasmids in the 553
Reference
plasmid set. b) Frequency polygon of length distribution of ‘present’ plasmids by 554
assembler. 555
a)
b)
556
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Supplementary Figure S3: Clinker plots of highly similar plasmids with different MOB-suite 557
annotations. Replicon annotations are shown in bright red and labelled. Other mobility- and 558
replication-associated plasmid machinery are shown in pale red and labelled. a) An 85,796bp 559
IncFIA, IncFIB, IncFIC, rep_cluster_2131 plasmid sequence (isolate AF14) with a missing IncFIC 560
annotation in the Autocycler and Flye assemblies (top 2), despite a mash distance of 0 between 561
Autocycler and Hybracter (hybrid) assemblies. b) A 133,309bp IncFIA, IncFIB, IncFIC plasmid 562
sequence (isolate AHB7) with the IncFIC replicon annotation missing from the Autocycler 563
plasmid sequence, despite a mash distance of 0 between the Autocycler and Hybracter (hybrid) 564
plasmid sequences. Note the Autocycler plasmid sequence is reversed and the Flye plasmid 565
has a different starting point for both plasmids. The Flye plasmid is also reversed in a) compared 566
to the bottom 4 assemblers’ plasmids. 567
a)
b)
MOBF
IncFIC
IncFIB
rep_cluster_2131
IncFIA
MPF
MPF_F/T
MOBF
IncFIC
IncFIB
IncFIA
OriT
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Supplementary Table S1: Species of the 92 pure culture Enterobacterales isolates, as 568
assigned by Kraken2(40). 569
Species Count (percentage)
Escherichia coli 58 (63%)
Klebsiella pneumoniae 21 (23%)
Klebsiella oxytoca 6 (7%)
Klebsiella aerogenes 2 (2%)
Enterobacter hormaechei 2 (2%)
Citrobacter freundii 1 (1%)
Citrobacter portucalensis 1 (1%)
Serratia marcescens 1 (1%)
Supplementary Table S2: Raw and subsampled sequencing read metrics for Illumina short-570
read and Nanopore long-read sequences for 92 pure culture Enterobacterales isolates. 571
Supplementary Table S3: Plasmid reconstruction accuracy of different long-read only and 572
hybrid assemblers for Dorado v5.0.0 super accurate basecalled Nanopore long-reads. 573
Plasmid reconstruction is compared to a manually-curated reference set of ‘consensus’ 574
plasmids (n=303), where ‘consensus’ plasmids were circular contigs 1,000-400,000bp in length 575
present across at least 2 assemblers with a similar length (±10%) and close mash distance 576
(<0.025). 577
Assembler
p-
value†
Autocycler
n (%)
Flye
n (%)
Hybracter
(long)
n (%)
Hybracter
(hybrid)
n (%)
Unicycler
n (%)
Unicycler
(bold)
n (%)
Present* plasmids 285 (94.1%) 166 (54.8%) 272 (89.8%) 276 (91.1%) 282 (93.1%) 274 (90.4%) <0.0001
Misassembled** plasmids
Non-circular 0 (0%) 18 (5.9%) 12 (4.0%) 13 (4.3%) 5 (1.7%) 3 (1%)
Length
mismatch 7 (2.3%) 50 (16.5%) 1 (0.3%) 2 (0.7%) 6 (2.0%) 7 (2.3%)
Non-circular and
length mismatch 0 (0%) 30 (9.9%) 7 (2.3%) 7 (2.3%) 4 (1.3%) 8 (2.6%)
Absent plasmids 11 (3.6%) 39 (12.9%) 11 (3.6%) 5 (1.7%) 6 (2.0%) 11 (3.6%)
*’Present’ plasmids are defined as contigs 1,000-400,000bp in length meeting all three match criteria: circular, length 578
(±10%) and mash distance (<0.025) of a the manually curated reference set of plasmids. 579
**Misassembled plasmids are defined as contigs that failed to meet at least 1 of the matching criteria, but could still 580
be matched to the reference set based on a more distant mash distance. 581
***Absent plasmids were cases where only the circularity matched, or where, for an assembler, no contig could be 582
matched to the rest of the reference plasmids match set based on mash distance. 583
†p-value for Fleiss’ Kappa test for uneven proportions of ‘present’ plasmids across all assemblers. 584
Raw reads Subsampled reads
Median (IQR)
Read depth (x genome)
Short-read 290 (232-340) 104 (100-108)
Long-read 217 (158-313) 64 (59-70)
Read length
Short-read 150 (150-150) 150 (150-150)
Long-read 5858 (5366-6338) 5849 (5398-6370)
Read quality (Q score)
Short-read 23.6 (23.3-23.8) 23.6 (23.3-23.8)
Long-read 16.6 (16.4-16.8) 16.6 (16.4-16.8)
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Supplementary Table S4: Nucleotide-level accuracy of 12 assembler-polisher 585
combinations (7 long-read only, 5 hybrid). Read-alignment metrics were derived by aligning 586
Illumina short-reads to each assembler-polisher combination and variant calling with 587
Feebayes(48) from Pypolca. Mean gene length is derived from CheckM2(37) output files. 7-588
locus MLST is annotated by mlst(39), and key resistance, virulence and stress genes by 589
AMRFinder Plus(64). 590
Autocycler Flye Hybracter Unicycler p-value*
none Medaka Medaka
Polypolish
+Pypolca
none Medaka Medaka
Polypolish
+Pypolca
long hybrid Normal bold
MLST <0.0001
MLST (N=91) 90
(99%)††
90
(99%)††
90
(99%)††
90
(99%)††
90
(99%)††
91
(100%)
91
(100%)
91
(100%)
91
(100%)
91
(100%)
91
(100%)
91
(100%)
Read-alignment metrics
SNP
/Mb
Median
(IQR)
0
(0-0.17)
0
(0-0)
0
(0-0)
0
(0-0)
0.18
(0-1.17)
0
(0-0.52)
0
(0-0.7)
0
(0-0)
0.2
(0-1.26)
0
(0-0)
0
(0-0.37)
0
(0-0.37)
<0.0001
Range 0-6.54 0-7.45 0-5.27 0-3.09 0-10.81 0-35.38 0-41.79 0-4.08 0-35.37 0-10.41 0-4 0-4
Indels
/Mb
Median
(IQR)
0.18
(0-0.39)
0
(0-0.2)
0
(0-0.19)
0
(0-0)
0.57
(0.19-1.13)
0.18
(0-0.51)
0.17
(0-0.36)
0
(0-0)
0.39
(0.19-0.75)
0
(0-0)
0
(0-0.2)
0
(0-0.34)
<0.0001
Range 0-9.5 0-17.11 0-13.12 0-5.45 0-34.11 0-18.66 0-22.35 0-4.47 0-16.71 0-12.42 0-16.21 0-16.21
QV Median
(IQR)
67
(63-100)
100
(64-100)
100
(64-100)
100
(100-100)
61
(57-67)
67
(60-100)
67
(61-100)
100
(100-100)
60
(58-64)
100
(100-100)
67
(62-100)
67
(62-100)
<0.0001
Range 48.8-100 47.3-100 48.4-100 50.7-100 43.48-100 42.7-100 41.9-100 51.7-100 42.8-100 46.4-100 46.9-100 46.9-100
CheckM2
Mean
Gene
Length
Median
(IQR)
312
(309-316)
312
(309-316)
312
(309-316)
312
(309-316)
312
(309-316)
312
(308-315)
312
(308-316)
312
(309-315)
312
(309-316)
312
(309-316)
312
(309-316)
312
(309-316)
<0.0001
Range 300-323 300-323 300-323 300-323 299-323 299-323 299-323 299-323 300-323 300-323 298-324 300-324
AMR Finder Plus
AMR Median
(IQR)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
4
(1-7)
3
(1-7)
4
(1-7)
0.209
Range 0-18 0-18 0-18 0-18 0-18 0-18 0-18 0-18 0-17 0-18 0-17 0-17
Stress Median
(IQR)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-3)
1
(0-2)
1
(0-2)
0.687
Range 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26
Virulence Median
(IQR)
1
(0-7)
1
(0-7)
1
(0-7)
1
(0-7)
1
(0-6)
1
(0-6)
1
(0-6)
1
(0-6)
1
(0-6)
1
(0-6)
1
(0-6)
1
(0-6)
0.736
Range 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35
*p-value for Fleiss’ Kappa test for uneven proportions of isolates with correct MLST profiles annotated across all 591
assemblers, or Friedman’s test for global differences in continuous variables across all assemblers. 592
†MLST typing schemes were only available for 91/92 pure culture isolates. The excluded sample was identified as 593
Serratia marcescens. 594
†† The incorrectly assigned MLST in one isolate by autocycler consensus assemblies, with or without polishing, was 595
due to duplication of one of the seven housekeeping genes (gyrB(10,10)). 596
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
References
597
1. Sanderson ND, Kapel N, Rodger G, Webster H, Lipworth S, Street TL, et al. Comparison of 598
R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome 599
reconstruction. Microb Genom. 2023;9(1).10.1099/mgen.0.000910 600
2. Hall MB, Wick RR, Judd LM, Nguyen AN, Steinig EJ, Xie O, et al. Benchmarking reveals 601
superiority of deep learning variant callers on bacterial nanopore sequence data. Elife. 602
2024;13.10.7554/eLife.98300 603
3. Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, et al. 604
Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial 605
genomes from pure cultures and metagenomes without short-read or reference polishing. Nat 606
Methods. 2022;19(7):823-6.10.1038/s41592-022-01539-7 607
4. Ni Y, Liu X, Simeneh ZM, Yang M, Li R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells 608
in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput Struct 609
Biotechnol J. 2023;21:2352-64.10.1016/j.csbj.2023.03.038 610
5. Foster-Nyarko E, Cottingham H, Wick RR, Judd LM, Lam MMC, Wyres KL, et al. Nanopore-611
only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella 612
pneumoniae. Microb Genom. 2023;9(2).10.1099/mgen.0.000936 613
6. Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore 614
and Illumina sequencing. PLoS Comput Biol. 2023;19(3):e1010905.10.1371/journal.pcbi.1010905 615
7. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from 616
short and long sequencing reads. PLOS Computational Biology. 617
2017;13(6):e1005595.10.1371/journal.pcbi.1005595 618
8. Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA. 619
Genomics. 2016;107(1):1-8.10.1016/j.ygeno.2015.11.003 620
9. Wang Y, Yang Q, Wang Z. The evolution of nanopore sequencing. Front Genet. 621
2014;5:449.10.3389/fgene.2014.00449 622
10. Simar SR, Hanson BM, Arias CA. Techniques in bacterial strain typing: past, present, and 623
future. Curr Opin Infect Dis. 2021;34(4):339-45.10.1097/qco.0000000000000743 624
11. Castaneda-Barba S, Top EM, Stalder T. Plasmids, a molecular cornerstone of antimicrobial 625
resistance in the One Health era. Nat Rev Microbiol. 2024;22(1):18-32.10.1038/s41579-023-00926-x 626
12. Dimitriu T. Evolution of horizontal transmission in antimicrobial resistance plasmids. 627
Microbiology (Reading). 2022;168(7).10.1099/mic.0.001214 628
13. Khezri A, Avershina E, Ahmad R. Hybrid Assembly Provides Improved Resolution of Plasmids, 629
Antimicrobial Resistance Genes, and Virulence Factors in Escherichia coli and Klebsiella pneumoniae 630
Clinical Isolates. Microorganisms. 2021;9(12).10.3390/microorganisms9122560 631
14. Arredondo-Alonso S, Willems RJ, van Schaik W, Schurch AC. On the (im)possibility of 632
reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom. 633
2017;3(10):e000128.10.1099/mgen.0.000128 634
15. Sanderson ND, Hopkins KMV, Colpus M, Parker M, Lipworth S, Crook D, et al. Evaluation of 635
the accuracy of bacterial genome reconstruction with Oxford Nanopore R10.4.1 long-read-only 636
sequencing. Microb Genom. 2024;10(5).10.1099/mgen.0.001246 637
16. Abdel-Glil MY, Brandt C, Pletz MW, Neubauer H, Sprague LD. High intra-laboratory 638
reproducibility of nanopore sequencing in bacterial species underscores advances in its accuracy. 639
Microbial Genomics. 2025;11(3).https://doi.org/10.1099/mgen.0.001372 640
17. Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP, Wick RR. How low can you go? Short-641
read polishing of Oxford Nanopore bacterial genome assemblies. Microb Genom. 642
2024;10(6).10.1099/mgen.0.001254 643
18. De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND, Swann J, et al. Comparison of 644
long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb 645
Genom. 2019;5(9).10.1099/mgen.0.000294 646
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
19. Bouras G, Houtak G, Wick RR, Mallawaarachchi V, Roach MJ, Papudeshi B, et al. Hybracter: 647
enabling scalable, automated, complete and accurate bacterial genome assemblies. Microb Genom. 648
2024;10(5).10.1099/mgen.0.001244 649
20. Wick RR. Autocycler. 2025. 650
21. Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G, Vezina B, et al. Trycycler: consensus long-651
read assemblies for bacterial genomes. Genome Biology. 2021;22(1):266.10.1186/s13059-021-652
02483-z 653
22. Zhou A, Lin T, Xing J. Evaluating nanopore sequencing data processing pipelines for structural 654
variation identification. Genome Biology. 2019;20(1):237.10.1186/s13059-019-1858-1 655
23. illumina. bcl2fastq2 Conversion Software v2.20. 2017. 656
24. Oxford Nanopore Technologies. Dorado v0.9 2024 [Available from: 657
https://github.com/nanoporetech/dorado?tab=readme-ov-file#alignment. 658
25. Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File 659
Manipulation. PLOS ONE. 2016;11(10):e0163962.10.1371/journal.pone.0163962 660
26. Hall MB. Rasusa: Randomly subsample sequencing reads to a specified coverage. Journal of 661
Open Source Software. 2022; 7(69):3941.https://doi.org/10.21105/joss.03941 662
27. Kolmogorov M, Yuan J, Lin Y, Pevzner P. Assembly of Long Error-Prone Reads Using Repeat 663
Graphs. Nature Biotechnology. 2019.doi:10.1038/s41587-019-0072-8 664
28. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and 665
accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 666
2017;27(5):722-36.10.1101/gr.215087.116 667
29. Vaser R, Šikić M. Time- and memory-efficient genome assembly with Raven. Nature 668
Computational Science. 2021;1(5):332-6.10.1038/s43588-021-00073-4 669
30. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. 670
Bioinformatics. 2016;32(14):2103-10.10.1093/bioinformatics/btw152 671
31. Bouras G, Sheppard AE, Mallawaarachchi V, Vreugde S. Plassembler: an automated bacterial 672
plasmid assembly tool. Bioinformatics. 2023;39(7).10.1093/bioinformatics/btad409 673
32. Lee JY, Kong M, Oh J, Lim J, Chung SH, Kim JM, et al. Comparative evaluation of Nanopore 674
polishing tools for microbial genome assembly and polishing strategies for downstream analysis. Sci 675
Rep. 2021;11(1):20740.10.1038/s41598-021-00178-w 676
33. Wick RR, Holt KE. Polypolish: Short-read polishing of long-read bacterial genome assemblies. 677
PLOS Computational Biology. 2022;18(1):e1009802.10.1371/journal.pcbi.1009802 678
34. Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections 679
in genome assemblies. PLOS Computational Biology. 680
2020;16(6):e1007981.10.1371/journal.pcbi.1007981 681
35. Chklovski. CheckM2. 1.1.0 ed2025. 682
36. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the 683
quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 684
2015;25(7):1043-55.10.1101/gr.186072.114 685
37. Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate 686
tool for assessing microbial genome quality using machine learning. Nature Methods. 687
2023;20(8):1203-12.10.1038/s41592-023-01940-w 688
38. Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A. Bakta: rapid and 689
standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial 690
Genomics. 2021;7(11).https://doi.org/10.1099/mgen.0.000685 691
39. Seemann, Torsten. mlst. 2.23.0 ed: Github. 692
40. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome 693
Biology. 2019;20(1):257.10.1186/s13059-019-1891-0 694
41. Robertson J, Nash JHE. MOB-suite: software tools for clustering, reconstruction and typing of 695
plasmids from draft assemblies. Microb Genom. 2018;4(8).10.1099/mgen.0.000206 696
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
42. Robertson J, Bessonov K, Schonfeld J, Nash JHE. Universal whole-sequence-based plasmid 697
typing and its utility to prediction of host range and epidemiological surveillance. Microb Genom. 698
2020;6(10).10.1099/mgen.0.000435 699
43. Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, et al. Mash Screen: high-700
throughput sequence containment estimation for genome discovery. Genome Biology. 701
2019;20(1):232.10.1186/s13059-019-1841-x 702
44. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: fast 703
genome and metagenome distance estimation using MinHash. Genome Biology. 704
2016;17(1):132.10.1186/s13059-016-0997-x 705
45. Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, et al. igraph: Network Analysis and 706
Visualization in R. R package version 2.1.4 ed2025. 707
46. Csardi G, Nepusz T. The igraph software package for complex network research. 708
InterJournal, Complex Systems. 2006;1695 709
47. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 710
2013:1303.3997 711
48. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 712
2012:e1207.3907 713
49. R Core Team. R: A Language and Environment for Statistical Computing. 4.4.1 ed2021. 714
50. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Verlag New York: Springer; 2016. 715
51. Hadley Wickham, Averick M, Bryan J, Chang W, McGowan LDA, François R, et al. Welcome to 716
the tidyverse. Journal of Open Source Software. 2019;4(43):1686.10.21105/joss.01686 717
52. Auguie B, Antonov A. gridExtra: Miscellaneous Functions for "Grid" Graphics 718
2.3 ed2017. 719
53. Wilke CO. cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'. 2024. 720
54. Revelle W. psych: Procedures for Psychological, Psychometric, and Personality Research R 721
package version 2.5.6 ed. Evanston, Illinois: Northwestern University; 2025. 722
55. Gamer M, Lemon J, Fellows I, Singh P. irr: Various Coefficients of Interrater Reliability and 723
Agreement. 0.84.1 ed2019. 724
56. Gilchrist CLM, Chooi Y-H. clinker & clustermap.js: automatic generation of gene cluster 725
comparison figures. Bioinformatics. 2021;37(16):2473-5.10.1093/bioinformatics/btab007 726
57. Wick RR, Howden BP, Stinear TP. Autocycler: long-read consensus assembly for bacterial 727
genomes. bioRxiv. 2025.10.1101/2025.05.12.653612 728
58. Wick RR, Judd LM, Wyres KL, Holt KE. Recovery of small plasmid sequences via Oxford 729
Nanopore sequencing. Microb Genom. 2021;7(8).10.1099/mgen.0.000631 730
59. Douarre PE, Mallet L, Radomski N, Felten A, Mistou MY. Analysis of COMPASS, a New 731
Comprehensive Plasmid Database Revealed Prevalence of Multireplicon and Extensive Diversity of 732
IncF Plasmids. Front Microbiol. 2020;11:483.10.3389/fmicb.2020.00483 733
60. Frolova D, Lima L, Roberts L, Bohnenkämper L, Wittler R, Stoye J, et al. Applying 734
rearrangement distances to enable plasmid epidemiology with pling. bioRxiv. 735
2024:2024.06.12.598623.10.1101/2024.06.12.598623 736
61. UK Health Security Agency. English surveillance programme for antimicrobial utilisation and 737
resistance (ESPAUR) Report 2023 to 2024. 2024. 738
62. Lannelongue L, Grealey J, Inouye M. Green Algorithms: Quantifying the Carbon Footprint of 739
Computation. Adv Sci (Weinh). 2021;8(12):2100707.10.1002/advs.202100707 740
63. Wikipedia. N50, L50, and related statistics 2024 [Available from: 741
https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics. 742
64. Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J, Haft DH, et al. 743
AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among 744
antimicrobial resistance, stress response, and virulence. Sci Rep. 2021;11(1):12728.10.1038/s41598-745
021-91456-0 746
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.