Nanopore long-read only genome assembly of clinical Enterobacterales isolates is complete and accurate

doi:10.1101/2025.09.15.676237

Nanopore long-read only genome assembly of clinical Enterobacterales isolates is complete and accurate

2025 · doi:10.1101/2025.09.15.676237

preprint OA: gold CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 77,959 characters · extracted from oa-pdf · 15 sections · click to expand

Reference

Unit, Public Health Microbiology - Reference Microbiology Division, 18 Chief Scientific Officer’s Group, UKHSA, UK 19 6. NIHR Oxford Biomedical Research Centre, Oxford, UK 20 7. Oxford University Hospitals NHS Foundation Trust, Oxford, UK 21 22 Corresponding author: 23 Dorottya Nagy ([email protected]) 24

Keywords

Bacterial genomics, Escherichia coli, Klebsiella spp., long-read sequencing, 25 genome assembly 26 Repositories: Long and short-read sequencing data has been deposited in ENA 27 (BioProject accession: PRJEB93885). Code used for bioinformatic and statistical 28 analyses has been uploaded to GitHub 29 (https://github.com/oxfordmmm/NEKSUS_ont_hybrid_assembly_comparison). 30 Summary data files have been uploaded to FigShare 31 (https://figshare.com/account/home#/projects/253775). 32 Nanopore long-read only genome assembly of clinical Enterobacterales isolates is complete and accurate .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint

Abstract

33 Whole bacterial genome sequence reconstruction using Oxford Nanopore Technologies 34 (“Nanopore”) long-read only sequencing may offer a lower-cost, higher-throughput alternative 35 for pathogen surveillance to ‘hybrid’ assembly with recent improvements in Nanopore 36 sequencing accuracy. We evaluated the accuracy, including plasmid reconstruction, of 37 Nanopore long-read only genome assemblies of Enterobacterales. 38 We sequenced 92 genomes from clinical Enterobacterales isolates, collected in 39 England under a national surveillance program, with long-read Nanopore (R10.4.1, Dorado 40 v5.0.0 super-high-accuracy basecalled) and short-read Illumina (NovaSeq) sequencing 41 approaches. Genomes were assembled using three long-read only (Flye; Hybracter long; 42 Autocycler), and three hybrid assemblers (Hybracter hybrid; Unicycler normal; bold). Three 43 polishing modalities (Medaka v2 with subsampled or un-subsampled long-reads; Polypolish + 44 Pypolca with short-reads) were investigated. 45 Autocycler circularised the most chromosomes (87/92 [95%]). Plasmid sequence 46 reconstruction was comparable between all assemblers except Flye, all recovering 90-96% of 47 plasmids, although the ‘ground truth’ was uncertain. Flye performed worse than other 48 assemblers on almost all metrics. Autocycler + Medaka (un-subsampled long-reads) was the 49 most accurate long-read only assembler/polisher combination, comparable to hybrid 50 assemblies (median 0 [IQR:0-0] SNPs and 0 [IQR:0-1] indels per genome; quality value/Q score, 51 100 [IQR: 64-100]), with only 4/92 genome sequences having >10 SNPs/indels. Medaka 52 polishing with un-subsampled long-reads resulted in small improvements in indels but not 53 SNPs for both Flye and Autocycler assemblies. Seven-locus MLST, antimicrobial resistance, 54 virulence, and stress gene annotation was equivalent across assembler/polisher combinations. 55 Nanopore long-read only bacterial genome assembly with Autocycler combined with 56 Medaka polishing (using un-subsampled reads) is similarly accurate and possibly more 57 complete than hybrid assemblies, representing a viable alternative for incorporating high-58 quality genomic data, including plasmids, into Enterobacterales surveillance. 59 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Data Summary 60 Nanopore long-reads and Illumina short-reads from the 92 Enterobacterales isolates 61 from this study have been uploaded to ENA (BioProject accession: PRJEB93885). Code for the 62 Nextflow assembly pipeline, downstream analysis scripts, and R statistical analysis scripts are 63 available on GitHub 64 (https://github.com/oxfordmmm/NEKSUS_ont_hybrid_assembly_comparison). The following 65 supplementary data tables are available on FigShare 66 (https://figshare.com/account/home#/projects/253775): 67 • ENA Sample accessions and sample metadata (accessions_and_metadata.csv) 68 • Seqkit stats summaries of the Illumina and Nanopore reads (raw_qc_sup.cav) 69 • Summary of assembly contig features (contigs_summary_sup_cleaned.csv) 70 • Pairwise mash distances between contigs (mash_cleaned.csv) 71 • Plasmids matching across different assemblers compared to the Hybracter (hybrid) 72 and manually-curated reference sets (plasmids_match_hybracter_mash.csv; 73 plasmids_match_manual_mash.csv, respectively) 74 • Seven-locus multi-locus sequence type annotation (mlst_cleaned.csv) 75 • CheckM2 summaries of assemblies (checkm2_cleaned.csv) 76 • Nucleotide-level accuracy of assemblies (SNP , Indels, and Quality value compared 77 to short-read mapping; assembly_nucleotide_accuracy_cleaned.csv) 78 • Bakta annotation (bakta_by_contig_cleaned.csv) 79 • AMRFinderPlus annotations of contigs (amrfinder_plus_cleaned.csv) 80 • MOB-suite annotation summaries of contigs (mobsuite_cleaned.csv) 81 Impact Statement 82 Nanopore long-reads have historically been too error-prone to use alone for accurate 83 bacterial genome assembly, necessitating additional Illumina short-reads to achieve 84 structurally complete and accurate ‘hybrid’ genome assemblies for public health surveillance. 85 This increases cost and complexity. Previous studies have shown that recent improvements in 86 Nanopore chemistry (R10.4.1 flowcell) and basecalling (super-high accuracy) allow high-quality 87 long-read only assemblies on a small number of laboratory reference strains. This is the first 88 evaluation, to our knowledge, to assess Nanopore long-read only genome assembly compared 89 with hybrid assembly on a large number of clinical isolates. In addition, this is the first large-90 scale evaluation of the recently released automated consensus long-read assembly tool, 91 Autocycler. 92 We show that Autocycler long-read only assemblies are more structurally complete for 93 chromosomal sequences, while reconstructing a similar number of plasmids to other long-read 94 and hybrid assemblers. Most long-read polished, Autocycler-assembled genome sequences 95 have 0 errors (median: 0 SNPs/indels) relative to a short-read polished (hybrid) Autocycler 96 assemblies, enabling accurate annotation of key genes. 97 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint

Introduction

98 Hybrid assembly combining short- and long-read genomic sequencing is widely used in 99 research to assemble complete and accurate bacterial genome sequences. Incremental 100 improvements in Nanopore flowcells/chemistry (10.4.1 flowcell/kit 14) and basecalling 101 accuracy (Dorado v5.0.0 super-high accuracy DNA model)(1-5) have been shown in small-scale 102 evaluations to facilitate long-read only assemblies that may now be comparable in accuracy to 103 hybrid assembly(6, 7). Nanopore-only sequencing may also offer advantages over hybrid 104 sequencing, including cost effectiveness, real-time data generation and decentralised 105 implementation(8, 9). 106 Highly accurate bacterial genome reconstruction, with minimal noise from sequencing 107 artefact, is key for identifying closely-related clusters of isolates and plasmids for outbreak 108 detection(10). Accurate reconstruction of mobile genetic elements (MGEs) such as plasmids in 109 particular, is clinically and epidemiologically important as plasmids are common transmission 110 vectors for antimicrobial resistance (AMR) genes in clinically-relevant Enterobacterales(11, 12). 111 Long-read or hybrid assembly approaches can facilitate plasmid sequence reconstruction and 112 therefore analysis of AMR gene epidemiology compared to short-reads, which may not be able 113 to resolve highly repetitive sequences often associated with MGEs(13, 14). Nevertheless, 114 Nanopore-only genome assembly accuracy has only been validated for a small number of 115

Reference

bacterial isolates(15, 16), and has not yet been assessed on a large collection of 116 clinical isolates, including for plasmids as well as chromosomes. This may be important 117 because of the reliance of long-read basecalling models on training datasets of unknown size 118 and diversity, whose performance may therefore generalise poorly to clades not included in 119 these training datasets. Similarly, although best-practice assembly guidelines have been 120 proposed(6, 17, 18), multiple long-read assembly pipelines implement these guidelines with 121 slight variations(16, 19-22), and no robust consensus exists, particularly regarding the optimal 122 strategy for plasmid assembly. 123 In this study, we comprehensively evaluated the completeness and accuracy of 92 124 Nanopore long-read only assemblies (with and without polishing) compared to hybrid assembly 125 in reconstructing both chromosomes and plasmids using isolates collected in The National 126 Escherichia coli and KlebSiella spp. bloodstream infection (BSI) and Carbapenemase-127 producing Enterobacterales (CPE) UK Surveillance (NEKSUS) study. 128

Methods

129 Isolate collection 130 Nine English NHS Trusts (groups of hospitals under the same administration) 131 representing the largest in terms of number of emergency admissions across all seven NHS 132 England regions were recruited to the NEKSUS consortium. Consecutive, unselected BSI and 133 CPE-positive rectal screening isolates were collected between October 2023 and March 2024 134 as part of routine clinical practice. One convenience sample of the first 96 Enterobacterales 135 isolates collected, mostly E. coli and Klebsiella spp. (Table S1), sequenced from three regions, 136 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint were included in this analysis as our isolates were sequenced in batches of 96. Isolates were 137 stored in brain-heart infusion (BHI) broth with 10% glycerol at -70C, then grown on blood agar 138 for 24h at 37C, following which a colony sweep of the pure bacterial culture was suspended in 139 1 ml phosphate buffer saline, pelleted, and cold-packed. Bacteria were subcultured for a further 140 24h at 37C where there was insufficient growth after 24h. 141 DNA extraction and sequencing 142 DNA extraction, library preparation and sequencing were conducted at GENEWIZ 143 Germany GmbH (Leipzig, Germany). DNA was extracted using the MagMAX Microbiome Ultra 144 Nucleic Acid Isolation Kit with bead plate (Life Technologies, Carlsbad, CA, USA). Genomic DNA 145 was quantified using the Qubit 4.0 Fluorometer and qualified using the Agilent 5600 Fragment 146 Analyzer. The same DNA extract was sequenced by both methods. 147 For Nanopore sequencing the Rapid Barcoding Kit 96 V14 (Oxford Nanopore 148 Technologies, Oxford, UK) was used according to the manufacturer's recommendations. Briefly, 149 sequencing libraries were generated using a transposase, which simultaneously cleaves 150 template molecules and attaches barcoded tags to the cleaved ends. The barcoded samples 151 were then pooled (96-plexed) before solid phase reversible immobilisaton (SPRI)-cleaning and 152 addition of Rapid Adapters to the tagged ends. The library pools were loaded onto ONT 153 PromethION flow cells (R10 [M Version]) – one 96-plex pool per flow cell – and sequenced on a 154 PromethION P2 Solo for 72 hours according to the manufacturer's instructions. 155 For Illumina sequencing the NEBNext Ultra II DNA Library Prep Kit for Illumina (New 156 England Biolabs, Ipswich, MA, USA), including clustering and sequencing reagents, was used 157 according to manufacturer's recommendations. Briefly, the genomic DNA was fragmented by 158 acoustic shearing with a Covaris LE220 instrument. Fragmented DNA was cleaned up and end 159 repaired. Adapters were ligated after adenylation of the 3’ ends followed by enrichment by 160 limited cycle PCR. DNA libraries were validated using the Agilent TapeStation (Agilent 161 Technologies, Palo Alto, CA, USA), and were quantified using a Qubit 4.0 Fluorometer. The 162 libraries were multiplexed on a flowcell and loaded on the Illumina NovaSeq X Plus instrument 163 according to manufacturer's instructions. The samples were sequenced using a 2x150bp 164 paired-end (PE) configuration. Raw sequencing data (.bcl files) generated from Illumina 165 NovaSeq were converted into fastq files and de-multiplexed using Illumina's bcl2fastq(23) v2.20 166 software. 167 Bioinformatic analysis 168 Computational analysis was performed on a virtual machine in the Oracle Cloud 169 Infrastructure. POD5 files were basecalled and demultiplexed using Dorado(24) v5.0.0 (super 170 high accuracy 5mCG, 5hmCG and 6mA methylation aware simplex DNA model). All 171 bioinformatic tools were run using default settings unless otherwise specified. Raw-read quality 172 was evaluated with SeqKit(25) v2.9.0. Long-reads were randomly subsampled to 60x using the 173 built-in subsampling and genome size estimation scripts from Autocycler(20) v0.2.1, and short-174 reads were randomly subsampled to 100x (50x for each paired-end read) with 175 Rasusa(26) v2.1.0. Genome sequences were assembled using three long-read only assemblers 176 (Flye(27) v2.9.5, Hybracter(19) (long) v0.11.2, the consensus assembler Autocycler(20) v0.2.1), 177 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint and three hybrid assemblers (Hybracter(19) (hybrid), Unicycler(7) v0.5.1 (normal and bold 178 modes)). The input long-read assemblies used for Autocycler were four assemblies each of 179 Canu(28) v2.2, Flye(27), Raven(29) v1.8.3, Miniasm(30) v0.3, and Hybracter(19) (long) (which 180 incorporates the plasmid assembly tool Plassembler(31)), where each of the four assemblies 181 was derived from a randomly subsampled set of reads. The Flye and Hybracter (long) 182 assemblies from the first subsampled read set were used in downstream analyses. Three 183 polishing modalities were investigated: long-read polishing with one round of Medaka(32) v2.0.1 184 using 1) subsampled long-reads, 2) un-subsampled long-reads, or 3) short-read polishing with 185 Polypolish(33) v0.6.0 and Pypolca(17, 34) v0.3.1 (‘--careful’ flag; Fig. 1). 186 Assembly quality control 187 Quality control of assemblies was done using SeqKit(25) stats and CheckM2(35-37) 188 v1.0.2, excluding isolates where any assembly for that isolate had 5% contamination. 4/96 (4%) isolates had >5% ‘contamination’ based on the checkM2 output, 190 likely corresponding to mixed isolate sequences (i.e. not pure cultures), so were excluded from 191 subsequent analyses. The remaining 92/96 (96%) pure-culture isolates passed the 192 completeness threshold. 193 Assembly annotation 194 Assemblies from all 12 assembler/polisher combinations were annotated using 195 Bakta(38) v1.10.4, 7-locus MLST (mlst(39) v2.23.0), AMRFinderPlus v4.0.3 (species flag inferred 196 from Kraken2(40) v2.1.3) and MOB-suite(41, 42) v3.1.9 (mob_recon and mob_typer). 197 Chromosome evaluation 198 Assemblies from the six different assemblers (without polishing) were evaluated for 199 structural completeness of chromosomes and plasmids, as polishing is not expected to alter 200 structure. Chromosomes were considered 'fully reconstructed' if the chromosomal contig was 201 >4Mb and circularised. 202 Plasmid evaluation 203 Contigs ≥1,000bp and ≤400,000bp in length were considered potential plasmids. Mash 204 distances between all potential pairwise plasmid combinations were calculated using Mash(43, 205 44) v2.3 (k-mer size = 21, sketch size 10,000,000). 206 Plasmid reconstruction was assessed by comparing with two alternative ‘reference’ 207 plasmid sets generated from the assembly data in this study, due to the absence of a ‘ground 208 truth’ for these isolates. The first ‘reference’ plasmid set included all circular potential plasmids 209 recovered by Hybracter (hybrid), which incorporates the plasmid assembly tool 210 Plassembler(31), recommended in best-practice assembly guidance(6). The second ‘reference’ 211 plasmid set was created using a manually-curated consensus approach considering all six 212 assemblies for each isolate. This latter manually-curated reference set was constructed by 213 matching each potential plasmid contig from the six assembly methods to its most similar 214 contig from each other assembler based on mash distance, forming a network with all pairwise 215 assembler combinations. The R package igraph(45, 46) v2.1.4 was used to extract connected 216 components (sub-networks within each sample with at least one mash-distance connection 217 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint between nodes). Each connected component was assigned a ‘match-set’ ID. Three (out of 303) 218 match-sets (connected components) contained more than one contig per assembler, and were 219 corrected manually (two were likely partial plasmids and one was likely a chimeric Unicycler 220 (bold) plasmid that joined two separate plasmid match-sets together; data not shown). ‘True’ 221 match sets were retained in the manually-curated reference set where at least two assemblers’ 222 contigs were present, circular, of similar length (±10%) and had a low mash distance (<0.025). 223 The 0.025 mash distance threshold reflects the highest possible mash distance between draft 224 and complete plasmid assemblies of the same plasmid from the original MOB-suite 225 publication(42). 226 Plasmid reconstruction for each assembler was then evaluated, for the Hybracter 227 (hybrid) reference set, by matching potential plasmid contigs to each reference plasmid set 228 based on circularity (i.e. circular or linear), length (±10%), and mash distance (<0.025). 229 Plasmids were ‘present’ if all three match criteria were met, or ‘misassembled’ if at least one of 230 the criteria were not met. Plasmids were ‘absent’ if none of the criteria were met, if only the 231 circularity matched (but not length or mash distance), or if no contig from an assembler could 232 be matched to that set. For the manually-curated reference set, where no single reference 233 plasmid was available, mash distance and length similarity criteria were fulfilled if an 234 assembler’s plasmid matched more than half of the other plasmids in a match set (see 235 supplementary data file plasmids_mash_manual_mash.csv). 236 Nucleotide-level accuracy 237 Nucleotide-level accuracy was assessed in a reference-free manner by aligning Illumina 238 short-reads to the 12 assembler-polisher combinations using the Pypolca(17) in-built read 239 aligner and variant caller (BWA(47) 0.7.18 and Freebayes(48) v1.3.6). Single nucleotide 240 substitutions (SNPs), short insertions/deletions (indels) and quality value (QV) were extracted 241 from the .vcf output file. QV, like Phred score, is a measure of accuracy where higher QV signifies 242 a more accurate consensus (QV = -10 * log10(probability of error), where a 0-error probability 243 takes the value of Q100). Mean gene length was extracted from CheckM2(37) as a further 244 measure of accuracy. Errors may introduce premature stop codons and are thus expected to 245 reduce the length of coding sequences(38). 246 Statistical analyses and visualisations 247 Statistical analysis and visualisation were done in R(49) v4.4.1 using ggplot2(50) v3.5.1 248 and other tidyverse(51) v1.3.1 functions, gridExtra(52) v2.3, cowplot(53) v1.1.3, psych(54) v2.5.6 249 and irr(55) v0.84.1 packages. Global test for uneven proportions in categorical variables was 250 done using the multiple-group Fleiss’ Kappa test, and for continuous variables, with a Friedman 251 test to account for non-independence between different assemblers’ ‘observations’ on the 252 same isolate. Pairwise test between assemblers for differences in proportions were done using 253 McNemar’s Χ2-test with continuity correction and for differences in counts, with Wilcoxon 254 signed-rank tests. A Bonferroni correction was applied to all pairwise testing to account for 255 multiple testing. An exact binomial test was used to test for a significant difference to 1 for the 256 proportion of plasmids reconstructed compared to the Hybracter (hybrid) reference set. 257 Clinker(56) v0.0.31 was used to visualise plasmid alignments using the Bakta(38) v1.10.4 258 annotated .gbff files. 259 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Figure 1: Schematic diagram of assembly, polishing and downstream analysis pipeline. 260 261 Nanopore long-reads seqkit stats Subsampling x1 (Rasusa) QC and Subsampling Assembly Polishing Illumina short-reads Subsampled short-reads Subsampling x4 (Autocycler scripts) Flye 02 Flye 03 Flye 04 Raven 02 Raven 03 Raven 01 Raven 04 Minasm 02 Minasm 03 Minasm 01 Minasm 04 Hybracter (long) 02 Hybracter (long) 03 Hybracter (long) 01 Hybracter (long) 04 Canu 02 Canu 03 Canu 01 Canu 04 Hybracter (hybrid) Unicycler (normal) Unicycler (bold) Autocycler Subsampled long-reads 01 Subsampled long-reads 02 Subsampled long-reads 03 Subsampled long-reads 04 Flye (unpolished) Flye + Medaka (subsampled) Flye + Medaka (un-subsampled) Autocycler (unpolished) Autocycler + Medaka (un-subsampled) Autocycler + Polypolish + Pypolca Flye + Polypolish + Pypolca Flye 01 Structure evaluation: - Chromosome - Plasmids - MOB-suite - Mash Accuracy evaluation: - SNPs - Indels - CheckM2 - MLST - AMRFinder - Bakta Autocycler + Medaka (subsampled) Medaka Medaka Polypolish + Pypolca Medaka Medaka Polypolish + Pypolca seqkit stats Raw QC .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint

Results

262 Raw sequences 263 High sequencing depth and quality was achieved for both Illumina short- and Nanopore long-264 reads 265 Over 200x sequencing depth was achieved for both Illumina and Nanopore reads (Table 266 S2). Median long-read length was 5814bp (IQR: 5366-6338), and median estimated Phred 267 quality score was 16.6 (IQR: 16.4-16.8). Subsampling did not affect median read length or read 268 quality (Table S2; Fig. S1). 269 Structural completeness 270 Chromosome reconstruction was optimal using the consensus long-read only assembler, 271 Autocycler 272 Autocycler circularised the most chromosomal sequences, 95% (87/92), significantly 273 more than Unicycler (80% [74/92], pairwise McNemar’s p=0.006), Unicycler bold (85% [78/92], 274 p=0.039) and Flye (85% [78/92], p=0.027), Hybracter (hybrid) (86% [79/92], p=0.043), while there 275 was no statistical evidence of a difference to Hybracter (long) (87% [80/92], p=0.070; Table 1; 276 Fig.2a). Notably, for two isolates that were correctly assembled by all other assemblers, 277 Autocycler failed to generate a circular consensus chromosome (Fig. 2a), producing highly 278 fragmented draft assemblies instead. 279 Plasmid reconstruction was improved by Autocycler or Hybracter compared with Flye 280 Given the absence of a ‘ground truth’ for plasmids in the sequenced isolates, we 281 considered two ‘reference’ plasmid sets generated from the assembly data. The first was the 282 Hybracter (hybrid) reference set, and the second, a manually-curated reference set considering 283 potential plasmids across all assemblers. All plasmids from the Hybracter (hybrid) reference 284 set (n=278) were present in the manually-curated set. However, the manually-curated set 285 included an additional 25 plasmids (total 303 vs 278 plasmids), which were missing from the 286 Hybracter (hybrid) reference set, mostly due to being non-circular (17/25, 68%), or non-circular 287 and of different length (3/25, 12%), while 5/25 (20%) plasmid sets could not be matched to any 288 Hybracter (hybrid) contigs not already in another match set (all pairwise mash distances >0.2; 289 Table S3). 290 Compared with the Hybracter (hybrid) reference set, Flye reconstructed significantly 291 fewer plasmids than all the other assemblers (56% [156/278]; exact binomial test p<0.0001 vs 292 100% reconstructed by Hybracter (hybrid) and McNemar’s p<0.0001 vs Autocycler, Hybracter 293 (long), Unicycler, and Unicycler (bold)). Among the remining assemblers, 93-96% of plasmids 294 were reconstructed, which was significantly fewer than 100% of the Hybracter (hybrid) 295

Reference

set (all exact binomial test p<0.0001; Table 1; Fig. 2b). There was no evidence of a 296 difference between the 96% (267/278) of plasmids reconstructed by Autocycler compared to 297 the other assemblers besides Flye (Hybracter (long) 96% [268/278], McNemar’s p=1 vs 298 Autocycler, Unicycler 96% [266/278], p=1 and Unicycler (bold) 93% [258/278], p=0.095). 299 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Similarly, compared with the manually-curated reference set, Flye reconstructed 300 significantly fewer plasmids than all other assemblers (55% [166/303]; pairwise McNemar’s 301 p<0.0001 vs each of the five other assemblers). Flye more frequently missed or misassembled 302 small, <10,000bp, plasmids (Fig. 2c; S2b), and incorrect length was the most common reason 303 for Flye plasmid misassembly (Table 1; S2). Among the remaining assemblers, 90-94% of 304 plasmids were reconstructed compared to the manually-curated reference set. Autocycler 305 reconstructed 94% (285/303) of plasmids, significantly more than Hybracter (long) (90% 306 [272/303]; McNemar’s p=0.014). However, there was no evidence of a difference between the 307 number of plasmids reconstructed by Autocycler compared to the other assemblers: Hybracter 308 (hybrid) (91% [276/303]; McNemar’s p=0.066 vs Autocycler), Unicycler (93% [282/303]; p=1), or 309 Unicycler (bold) (90% [274/303]; p=0.296; Table S3; Fig. S2a). 310 Of the 10 Autocycler plasmids with a mash distance of 0 to the corresponding Hybracter 311 (hybrid) plasmid, 2/10 had a missing MOB-suite IncFIC replicon annotation despite identical 312 sequence (Fig. S3). In both cases, the Autocycler plasmid was reversed (i.e. the reverse 313 complement strand was represented in the fasta file) compared with the other plasmids. The 314 Flye plasmid sequence was also missing an IncFIC annotation in one of these two plasmids; 315 however, this difference was not observed in the other 232 plasmids across other assemblers 316 with a mash distance of 0 to the Hybracter (hybrid) reference. 317 Table 1: Chromosomal sequence circularisation and accuracy of plasmid sequence 318 reconstruction for different assemblers using Dorado v5.0.0 super-high accuracy 319 basecalled Nanopore long-reads. Plasmid sequence reconstruction was compared with the 320 Hybracter (hybrid) plasmid reference dataset, defined as circular contigs ≤400,000bp and 321 ≥1,000bp assembled by Hybracter (hybrid)(n=278) across 92 Enterobacterales isolates 322 analysed; the denominator for plasmids was therefore 278 throughout. 323 Assembler Autocycler n (%) Flye n (%) Hybracter (long) n (%) Hybracter (hybrid) n (%) Unicycler n (%) Unicycler (bold) n (%) p-value† Chromosomes circularised (N=92) 87 (94.6%) 78 (84.8%) 80 (87.0%) 79 (85.9%) 74 (80.4%) 78 (84.8%) <0.0001 Present* plasmids (N=278) 267 (96%) 156 (56.1%) 268 (96.4%) 278 (100%) 266 (95.7%) 258 (92.8%) 0.002 Misassembled** plasmids (N=278) Non-circular 0 (0%) 16 (5.8%) 2 (0.7%) 0 (0%) 4 (1.4%) 2 (0.7%) Length mismatch 1 (0.4%) 41 (14.7%) 1 (0.4%) 0 (0%) 1 (0.4%) 1 (0.4%) Mash distance >0.025 0 (0%) 1 (0.4%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) Non-circular & length mismatch 0 (0%) 17 (6.1%) 2 (0.7%) 0 (0%) 3 (1.1%) 6 (2.2%) Absent plasmids (N=278) 10 (3.6%) 47 (16.9%) 5 (1.8%) 0 (0%) 4 (1.4%) 11 (4%) *’Present’ plasmids are defined as contigs meeting all three match criteria: circular, length within 10% and mash 324 distance <0.025 of a Hybracter hybrid reference plasmid. 325 **Misassembled plasmids are defined as contigs that failed to meet at least one of the matching criteria, or were 326 non-circular and a different length (>10% difference). 327 †p-value for Fleiss’ Kappa test for uneven proportions of circularised chromosomes or ‘present’ plasmids across all 328 assemblers. 329 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Figure 2: Structural completeness of 92 pure culture Enterobacterales genome sequences assembled by different long-read only and hybrid assemblers. Genome sequences were assembled using Dorado v5.0.0 super-high accuracy basecalled Nanopore long-reads, plus Illumina short-reads for hybrid assembly. a) Number and percentage of isolates with a fully circularised chromosome (dark-coloured tiles) or an incompletely circularised chromosome (light cream tiles) by assembler. b) Upset plot of plasmid assembly status combinations across assemblers. Plasmid sequence reconstruction (assembly status) is compared to a Hybracter (hybrid) plasmid reference dataset, defined as circular contigs ≤400,000bp and ≥1,000bp assembled by Hybracter (hybrid)(n=278) across the 92 Enterobacterales isolates analysed. Dark circles represent ‘present’ plasmids where length (±10%), mash distance (10%, mash distance >0.025, or the contig was non-circular and the palest shades indicate absent plasmids, where no contig was found matching other plasmids in the reference plasmid set. c) Frequency polygon of length distribution of ‘present’ plasmids by assembler. a) c) b) .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Assembly accuracy 330 Unpolished Autocycler assemblies are more accurate than non-consensus long-read 331 assemblers, while differences compared with hybrid assemblers are small 332 Autocycler was the most accurate long-read only assembler, with 37% of unpolished 333 assemblies (34/92) having 0 SNPs or indels when compared with 11% (10/92) for unpolished 334 Flye and 7% (6/92) for Hybracter (long). For unpolished Autocycler, this equated to a median of 0 335 SNPs/Mb (IQR: 0-0.17) and 0.18 indels/Mb (IQR:0-0.39), and a median quality value (QV) of Q67 336 (IQR:63-100; Fig. 3a-c; Table S4). The differences in accuracy between unpolished Autocycler, 337 unpolished Flye or Hybracter (long) were significant (pairwise Wilcoxon signed rank p<0.0001 338 for SNPs, indels and QV), while there was no evidence of a difference in accuracy between 339 unpolished Autocycler and Unicycler (normal or bold mode; p=1 for all metrics). There was no 340 evidence of a difference between Flye and Hybracter (long) assemblies (Fig. 3a-c; Table S4). 341 Medaka long-read polishing offers small improvements in accuracy for long-read assemblies, 342 although short-read polishing is still marginally more accurate 343 Medaka long-read polishing (with un-subsampled reads) improved accuracy for 344 Autocycler and Flye by improving QV and reducing indels (from median Q67 to Q100 [Wilcoxon 345 signed rank p=0.007], and Q61 to Q67 [p<0.0001], and 0.18 indels/Mb to 0 [p=0.006], and 0.57 346 indels/Mb to 0.17 [p<0.001], respectively), but there was no evidence of reducing SNPs (p=1 for 347 both Autocycler and Flye). There was some statistical evidence that Medaka long-read polishing 348 using un-subsampled long-reads was marginally better at reducing indels for Autocycler 349 assemblies than using subsampled reads (change vs Autocycler of median 0 indels/Mb [IQR: -350 0.19-0; range: -1.64-3.61] for un-subsampled reads, compared to a change of 0 [IQR: -0.18-0; 351 range: -1.09-7.60] indels/Mb, Wilcoxon signed rank p=0.019; Fig.3; Table S3). However, this very 352 small difference is not reflected in the medians/IQR of indels/Mb as most isolates had 0 indels 353 (57% [52/92] for Autocycler + Medaka [subsampled] and 65% [60/92] for Autocycler + Medaka 354 [un-subsampled]). 355 Short-read polished Autocycler assemblies were more accurate than the best long-read 356 polished Autocycler assemblies (Autocycler + Medaka [un-subsampled]) (change vs unpolished 357 Autocycler of median 0 [IQR: -0.16-0] SNPs/Mb, -0.18 [-0.39-0] indels/Mb, and Q32.6 (Q0-358 Q35.9) for short-read polishing vs median change 0 [0-0] SNPs/Mb, 0 [-0.19-0] indels/Mb, and 359 Q0 (Q0-Q6.15) for Medaka (un-subsampled) polishing, pairwise Wilcoxon signed rank p=0.0002, 360 p<0.0001 and p<0.0001, respectively; Fig 3; Table S4). However, the absolute difference was 361 small, and affected only the worst-performing quartile of isolates. The majority, 55% (51/92), of 362 Autocycler + Medaka (un-subsampled reads) polished assemblies had 0 errors (QV100), and 363 only 4% (4/92) of genome sequences had >10 SNPs or indels in the entire assembly, compared 364 with 95% (87/92) of short-read polished Autocycler assemblies having 0 errors and two genome 365 sequences with >10 SNPs or indels (Figs.3a-c; Table S4). 366 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Mean gene length is slightly shorter for Flye assemblies, and is not corrected by long or short-367 read polishing 368 Mean gene length was assessed as a further measure of accuracy, as small errors can 369

Result

in coding sequence truncation, and shorter average gene length. While there was some 370 statistical evidence of a difference in mean gene length between different assembler/polisher 371 combinations, with unpolished and long-read polished Flye assemblies having a slightly shorter 372 mean gene length compared to other assembler (Friedman’s p<0.0001; all pairwise Wilcoxon 373 signed rank p<0.0001-p=0.01 compared to all other assemblers), the difference was small in 374 magnitude (median of the mean gene length across all isolates of 312bp [IQR: 308-315bp] for 375 Flye + Medaka (subsampled) polishing, vs 312bp [309-316bp] for all other non-Flye assemblers; 376 Fig. 3d). 377 Gene annotation for MLST loci, resistance, virulence and stress genes is equivalent for long-read 378 and hybrid assemblies 379 There was no evidence of a difference in the numbers of key resistance, virulence and 380 stress genes identified by AMRFinder Plus in assemblies generated by any assembler/polisher 381 combination (Friedman’s p=0.209 for resistance, p=0.736 for virulence, and p=0.687 for stress 382 genes; all pairwise Wilcoxon signed-rank p=1; Table S4). There was high concordance between 383 assemblers on the presence/absence of specific gene variants (all pairwise McNemar’s 384 p>0.209). There was also no evidence of a difference in the proportion of isolates with correctly 385 assigned multi-locus sequence type (MLST; all pairwise McNemar’s p=1, Table S4). Hybracter 386 (long; hybrid), Unicycler (normal; bold), and polished Flye assemblies were annotated with 387 identical MLST-types for all 91 isolates belonging to a species with available MLST-typing 388 schemes (i.e. all isolates except one Serratia marcescens). A single locus in one isolate was 389 ‘uncertain’ for the unpolished Flye assembly ((gapA(~2)), and another locus (gyrB(10)) was 390 duplicated in a different isolate amongst Autocycler assemblies. Polishing did not correct this 391 duplicated annotation, although the allele was correctly identified. 392 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Figure 3: Assembly accuracy for different assembler/polisher combinations. a) Single nucleotide substitution errors (SNPs) and b) insertion/deletions (indels) identified by re-aligning Illumina short-reads, c) quality value as annotated by Freebayes(48) from Pypolca(17) and d) mean gene length from CheckM2(37) of 12 different assembler/polisher combinations. The y-axes in a), b) and c) are transformed using a pseudo-log scale to facilitate plotting zero values given log(0) is undefined. a) b) c) d) 393 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint

Discussion

394 We evaluated three long-read only bacterial genome assemblers, three hybrid 395 assemblers, and three polishers on 92 clinical Enterobacterales isolates. The consensus long-396 read assembler, Autocycler, produced the most structurally complete assemblies, circularising 397 95% of chromosomes. Plasmid reconstruction was comparable between all assemblers except 398 Flye, which underperformed compared with other assemblers for most metrics. Autocycler with 399 Medaka polishing was the most accurate long-read only assembler/polisher combination, with 400 a median of 0 SNPs/indels compared to what we consider the ‘gold-standard’ hybrid assembly 401 (i.e. short-read polished Autocycler assemblies). Long-read polishing of Autocycler and Flye 402 assemblies offered small improvements in accuracy compared to unpolished assemblies, 403 although short-read polishing still corrected marginally more errors. There was strong 404 agreement in the annotation of seven-locus MLSTs, resistance, virulence and stress genes, and 405 mean gene length across all assemblers. 406 It is not surprising that long-read assemblers circularise more chromosomes, as long-reads 407 can resolve repetitive regions that short-reads may not. This explains why the long-read first 408 hybrid assembler, Hybracter (hybrid), performed more similarly to other long-read assemblers 409 than Unicycler, which uses short-reads first to reconstruct overall structure. The ability of 410 Autocycler to circularise eight chromosomes where non-consensus assemblers failed supports 411 the utility of this software(57). Combining 20 input assemblies in Autocycler may reduce the 412 effects of stochastic variation in individual assemblers. The 2/92 isolates where Autocycler 413 produced fragmented assemblies, while its some input assemblies were complete, are 414 noteworthy. This result is perhaps attributable to regions of input assemblies that are too 415 divergent to resolve, and highlights the need for an iterative approach, where a ‘fallback’ option 416 is available in case of a highly fragmented Autocycler consensus assembly. This also 417 emphasises the importance of quality controls (e.g.: checkM2) to flag highly fragmented 418 assemblies, so that for these cases, manual curation of input assemblies, optimising 419 parameters in the consensus process, or reversion to complete input assemblies may improve 420 assembly. 421 Evaluation of chromosomal and plasmid sequence reconstruction is challenging due to the 422 absence of a ‘ground truth’. For plasmids specifically, there is a risk of mislabelling plasmids by 423

Methods

reliant on reference databases, which may be incomplete or contain misassembled 424 plasmids. We therefore considered two reference plasmid sets generated from the study data. 425 Compared with both reference sets, none of the six assemblers had ‘perfect’ concordance. Flye 426 performed poorly compared to all other assemblers, missing or misassembling ~45% of 427 plasmids compared with 4-10% for other assemblers. Flye struggled particularly with small 428 <10,000bp plasmids, as reported previously(16, 58). This emphasises the necessity of 429 consensus methods like Autocycler(57), and separate plasmid recovery tools like 430 Plassembler(31) to optimise plasmid reconstruction. The fact that Autocycler (including four 431 Hybracter (long) input assemblies) reconstructed a slightly different set of plasmids to a single 432 Hybracter (long/hybrid) assembly suggests complementarity between these methods, where 433 Autocycler can overcome potential issues related to stochastic variation in individual 434 assemblies. The replicon annotation differences between identical plasmids highlights the risks 435 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint of relying on plasmid-annotation tools like MOB-suite for plasmid identification(59), and 436 supports the use of network-based tools like PLING(60). 437 The small differences in nucleotide-level accuracy between long- and short-read polished 438 Autocycler assemblies are likely not in coding regions that are key for downstream analyses. 439 This is evidenced by the strong agreement in MLST profile, resistance, virulence and stress gene 440 annotations, and mean gene length between assemblers. 441 The advantage of our study is that we consider a relatively large sample of real-world, 442 clinically-relevant isolates. Specifically, our sample included predominantly E. coli and K. 443 pneumoniae, which are the two most important Gram-negative species in England in terms of 444 number of bloodstream infections and burden of AMR(61), and therefore our findings are 445 relevant to public health surveillance in this setting. However, a trade-off with this is the 446 absence of ‘ground truth’ sequences against which to evaluate our assemblies. Other 447

Limitations

include the empirical assessment of nucleotide-level accuracy, through aligning 448 short-reads to assemblies. Both SNPs and indels were still present in a small number of short-449 read polished assemblies, potentially representing a baseline level of errors in either Illumina 450 reads or read mapping, and leading to possible overestimation of the error rate of long-read only 451 assemblies. A further limitation is that the performance of Autocycler as a consensus method 452 depends on its input assemblies. Twenty input assemblies were used here, requiring substantial 453 computational time (13,428 CPUh), mostly due to generating assemblies, and resulted in a high 454 carbon footprint, equivalent to driving 164 miles (see Environmental Impact Statement). 455 Furthermore, a closed consensus chromosome was not achieved for 5% of isolates using 456 default settings. Optimisation of Autocycler input assemblies and parameters, such as 457 weighting contigs from certain ‘more reliable’ assembler, as done in more recent automated 458 Autocycler v5 pipelines(20), could thus reduce computational load and improve performance. 459 Incorporating a ‘fallback’ option in Autocycler pipelines, for example to revert to one of the 460 complete input assemblies in cases of a highly fragmented Autocycler consensus, may also be 461 of benefit. Finally, generalisability to other bacterial species is limited. Other species may be 462 less-well represented than E. coli and Klebsiella spp. in the machine-learning training datasets 463 for basecalling (Dorado) and polishing (Medaka) software, producing potentially different error 464 rates. 465

Conclusions

466 This assembly comparison is the first benchmarking study to demonstrate structural 467 completeness and accuracy of Nanopore super-high accuracy long-read only bacterial genome 468 assemblies on 92 clinical Enterobacterales isolates, compared with hybrid assembly. The 469 automated consensus long-read assembler, Autocycler, accurately reconstructed assemblies, 470 including plasmids, for these isolates, and is a promising tool for integrating Nanopore long-471 read only assemblies into an automatable computational pipeline for public health genomics. 472 Ongoing innovation in Nanopore sequencing technology and bioinformatic software may enable 473 further improvements and should continue to be evaluated by the bioinformatics community. 474 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Environmental Impact Statement 475 The Nextflow assembly pipeline used for this work ran in 72h on two AMD EPYC 9J14 96-476 Core Processors (188 total CPUs; 13,428 CPUh), and drew 124.46 kWh. Using Cloud 477 infrastructure based in the United Kingdom, this had a carbon footprint of 28.76 kgCO2e, 478 equivalent to 2.61 tree-years, or 164 km in a car (calculated using green-algorithms.org 479 v3.0(62)). This is a lower bound estimate of the carbon footprint of this work, as it does not 480 account for compute used in pipeline development, downstream statistical analyses, or the 481 energy required to power display screens. The carbon footprint and wider environmental impact 482 of sample processing shipping has also not been accounted for. 483 Conflict of interest 484 The authors have no conflicts of interest to declare. 485 Funding information 486 This study/research is supported/funded by the National Institute for Health Research 487 (NIHR) Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial 488 Resistance (NIHR207397), a partnership between the UK Health Security Agency (UKHSA) and 489 the University of Oxford. This work was also supported by the UKHSA and the NIHR Oxford 490 Biomedical Research Centre (BRC) and the UKHSA PhD Funding Competition. The cloud 491 compute infrastructure for this work was donated by Oracle Corporation Infrastructure. The 492 views expressed are those of the authors and not necessarily those of the NIHR, UKHSA or the 493 Department of Health and Social Care. 494 Ethical approval and consent to participate 495 This work has been reviewed and approved by the UKHSA Research Ethics & 496 Governance Group (reference NR0429). 497 Consent for publication 498 All authors give consent for publication of this work. No further consent for publication 499 was required as this work does not include patient identifiable information. 500 Author contributions 501 NS, SL, SH, DC, ASW, JR, KLH, AL, DW, RH and CSB were involved in conceptualisation, 502 funding acquisition, project administration, provision or resources and supervision. VP , GR, KH, 503 CRJ and NEKSUS consortium members were involved in isolate collection and processing. 504 Methodological development and validation of bioinformatic methods and software was done 505 by DN under the supervision of SL and NS. DN, SL and NS were involved with data curation, 506 analysis, investigation, visualisation and writing/editing. All authors approved the final draft 507 Acknowledgments 508 The authors would also like to acknowledge all participating laboratories in the NEKSUS 509 consortium who were responsible for isolate collection, Zeynab Yusuf from UKHSA for her role 510 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint in sample transportation from UKHSA to Oxford, laboratory and bioinformatician colleagues at 511 the Modernising Medical Microbiology Unit at the University of Oxford for support in 512 methodological development and execution, as well as GENEWIZ Germany GmbH (Leipzig, 513 Germany) for performing long- and short-read sequencing. 514 Individuals within the NEKSUS consortium group authorship are (listed alphabetically): 515 - Alan McNally (University Hospitals Birmingham NHS Foundation Trust) 516 - Caroline Cullerton (The Newcastle-upon-Tyne Hospitals NHS Foundation Trust) 517 - Gabriella Shanks (Barts Heath NHS Trust) 518 - James Price (University Hospital Sussex NHS Foundation Trust) 519 - Jasvir Nahl (Leeds Teaching Hospitals NHS Trust) 520 - Jenny Bradbury (UKHSA) 521 - Jonathan Lambourne (Barts Health NHS Trust) 522 - Julie Samuel (The Newcastle-upon-Tyne Hospitals NHS Foundation Trust) 523 - Jumoke Sule (UKHSA/ Cambridge University Hospitals NHS Foundation Trust) 524 - Ian Butler (Barts Health NHS Trust) 525 - Kavita Sethi (Leeds Teaching Hospitals NHS Trust) 526 - Mark Garvey (University Hospitals Birmingham NHS Foundation Trust) 527 - Martin Williams (University Hospitals Bristol and Weston NHS Foundation Trust) 528 - Nicholas Brown (Cambridge University Hospitals NHS Foundation Trust) 529 - Nicola Childs (North Bristol NHS Trust) 530 - Paul Randell (University Hospital Sussex NHS Foundation Trust) 531 - Poorvi Patel (Cambridge University Hospitals NHS Foundation Trust) 532 - Samuel Stafford (North Bristol NHS Trust) 533 - Samuel Tetley (University Hospital Sussex NHS Foundation Trust) 534 - Simon Eccles (Manchester University Hospitals NHS Foundation Trust) 535 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Supplementary Figures 536 Supplementary Figure S1: Quality control metrics of raw and subsampled Illumina short-537 reads and Dorado v5.0.0 super accurate basecalled Nanopore long-reads. Showing long-538 read subsampled set 1 (of 4) for the 92 pure culture isolates. N50 and N50_num (or L50) are 539 both measures of sequence contiguity(63). N50 is the sequence length of the shortest contig at 540 50% of the total assembly length. N50_num is defined as the count of the smallest number of 541 contigs whose added length makes up at least half of genome size. 542 543 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Supplementary Figure S2: Plasmid sequence reconstruction for 92 Enterobacterales 544 isolates by different long-read only and hybrid assemblers, using the manually-curated 545 consensus ‘reference’ plasmid set (n=303 plasmids). Reference plasmids in the manually 546 curated set are circular contigs between 1,000-400,000bp in length that are present in at least 2 547 assemblers with a matching length (±10%) and mash distance (<0.025). a) Upset plot showing 548 assembly status combinations of plasmids across assemblers. Dark circles/bars indicate 549 ‘present’ plasmids where length (±10%), mash distance (10%, mash distance >0.025, or the contig was non-circular and the palest 552 shades indicate absent plasmids, where no contig was found matching other plasmids in the 553

Reference

plasmid set. b) Frequency polygon of length distribution of ‘present’ plasmids by 554 assembler. 555 a) b) 556 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Supplementary Figure S3: Clinker plots of highly similar plasmids with different MOB-suite 557 annotations. Replicon annotations are shown in bright red and labelled. Other mobility- and 558 replication-associated plasmid machinery are shown in pale red and labelled. a) An 85,796bp 559 IncFIA, IncFIB, IncFIC, rep_cluster_2131 plasmid sequence (isolate AF14) with a missing IncFIC 560 annotation in the Autocycler and Flye assemblies (top 2), despite a mash distance of 0 between 561 Autocycler and Hybracter (hybrid) assemblies. b) A 133,309bp IncFIA, IncFIB, IncFIC plasmid 562 sequence (isolate AHB7) with the IncFIC replicon annotation missing from the Autocycler 563 plasmid sequence, despite a mash distance of 0 between the Autocycler and Hybracter (hybrid) 564 plasmid sequences. Note the Autocycler plasmid sequence is reversed and the Flye plasmid 565 has a different starting point for both plasmids. The Flye plasmid is also reversed in a) compared 566 to the bottom 4 assemblers’ plasmids. 567 a) b) MOBF IncFIC IncFIB rep_cluster_2131 IncFIA MPF MPF_F/T MOBF IncFIC IncFIB IncFIA OriT .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Supplementary Table S1: Species of the 92 pure culture Enterobacterales isolates, as 568 assigned by Kraken2(40). 569 Species Count (percentage) Escherichia coli 58 (63%) Klebsiella pneumoniae 21 (23%) Klebsiella oxytoca 6 (7%) Klebsiella aerogenes 2 (2%) Enterobacter hormaechei 2 (2%) Citrobacter freundii 1 (1%) Citrobacter portucalensis 1 (1%) Serratia marcescens 1 (1%) Supplementary Table S2: Raw and subsampled sequencing read metrics for Illumina short-570 read and Nanopore long-read sequences for 92 pure culture Enterobacterales isolates. 571 Supplementary Table S3: Plasmid reconstruction accuracy of different long-read only and 572 hybrid assemblers for Dorado v5.0.0 super accurate basecalled Nanopore long-reads. 573 Plasmid reconstruction is compared to a manually-curated reference set of ‘consensus’ 574 plasmids (n=303), where ‘consensus’ plasmids were circular contigs 1,000-400,000bp in length 575 present across at least 2 assemblers with a similar length (±10%) and close mash distance 576 (<0.025). 577 Assembler p- value† Autocycler n (%) Flye n (%) Hybracter (long) n (%) Hybracter (hybrid) n (%) Unicycler n (%) Unicycler (bold) n (%) Present* plasmids 285 (94.1%) 166 (54.8%) 272 (89.8%) 276 (91.1%) 282 (93.1%) 274 (90.4%) <0.0001 Misassembled** plasmids Non-circular 0 (0%) 18 (5.9%) 12 (4.0%) 13 (4.3%) 5 (1.7%) 3 (1%) Length mismatch 7 (2.3%) 50 (16.5%) 1 (0.3%) 2 (0.7%) 6 (2.0%) 7 (2.3%) Non-circular and length mismatch 0 (0%) 30 (9.9%) 7 (2.3%) 7 (2.3%) 4 (1.3%) 8 (2.6%) Absent plasmids 11 (3.6%) 39 (12.9%) 11 (3.6%) 5 (1.7%) 6 (2.0%) 11 (3.6%) *’Present’ plasmids are defined as contigs 1,000-400,000bp in length meeting all three match criteria: circular, length 578 (±10%) and mash distance (<0.025) of a the manually curated reference set of plasmids. 579 **Misassembled plasmids are defined as contigs that failed to meet at least 1 of the matching criteria, but could still 580 be matched to the reference set based on a more distant mash distance. 581 ***Absent plasmids were cases where only the circularity matched, or where, for an assembler, no contig could be 582 matched to the rest of the reference plasmids match set based on mash distance. 583 †p-value for Fleiss’ Kappa test for uneven proportions of ‘present’ plasmids across all assemblers. 584 Raw reads Subsampled reads Median (IQR) Read depth (x genome) Short-read 290 (232-340) 104 (100-108) Long-read 217 (158-313) 64 (59-70) Read length Short-read 150 (150-150) 150 (150-150) Long-read 5858 (5366-6338) 5849 (5398-6370) Read quality (Q score) Short-read 23.6 (23.3-23.8) 23.6 (23.3-23.8) Long-read 16.6 (16.4-16.8) 16.6 (16.4-16.8) .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint Supplementary Table S4: Nucleotide-level accuracy of 12 assembler-polisher 585 combinations (7 long-read only, 5 hybrid). Read-alignment metrics were derived by aligning 586 Illumina short-reads to each assembler-polisher combination and variant calling with 587 Feebayes(48) from Pypolca. Mean gene length is derived from CheckM2(37) output files. 7-588 locus MLST is annotated by mlst(39), and key resistance, virulence and stress genes by 589 AMRFinder Plus(64). 590 Autocycler Flye Hybracter Unicycler p-value* none Medaka Medaka Polypolish +Pypolca none Medaka Medaka Polypolish +Pypolca long hybrid Normal bold MLST <0.0001 MLST (N=91) 90 (99%)†† 90 (99%)†† 90 (99%)†† 90 (99%)†† 90 (99%)†† 91 (100%) 91 (100%) 91 (100%) 91 (100%) 91 (100%) 91 (100%) 91 (100%) Read-alignment metrics SNP /Mb Median (IQR) 0 (0-0.17) 0 (0-0) 0 (0-0) 0 (0-0) 0.18 (0-1.17) 0 (0-0.52) 0 (0-0.7) 0 (0-0) 0.2 (0-1.26) 0 (0-0) 0 (0-0.37) 0 (0-0.37) <0.0001 Range 0-6.54 0-7.45 0-5.27 0-3.09 0-10.81 0-35.38 0-41.79 0-4.08 0-35.37 0-10.41 0-4 0-4 Indels /Mb Median (IQR) 0.18 (0-0.39) 0 (0-0.2) 0 (0-0.19) 0 (0-0) 0.57 (0.19-1.13) 0.18 (0-0.51) 0.17 (0-0.36) 0 (0-0) 0.39 (0.19-0.75) 0 (0-0) 0 (0-0.2) 0 (0-0.34) <0.0001 Range 0-9.5 0-17.11 0-13.12 0-5.45 0-34.11 0-18.66 0-22.35 0-4.47 0-16.71 0-12.42 0-16.21 0-16.21 QV Median (IQR) 67 (63-100) 100 (64-100) 100 (64-100) 100 (100-100) 61 (57-67) 67 (60-100) 67 (61-100) 100 (100-100) 60 (58-64) 100 (100-100) 67 (62-100) 67 (62-100) <0.0001 Range 48.8-100 47.3-100 48.4-100 50.7-100 43.48-100 42.7-100 41.9-100 51.7-100 42.8-100 46.4-100 46.9-100 46.9-100 CheckM2 Mean Gene Length Median (IQR) 312 (309-316) 312 (309-316) 312 (309-316) 312 (309-316) 312 (309-316) 312 (308-315) 312 (308-316) 312 (309-315) 312 (309-316) 312 (309-316) 312 (309-316) 312 (309-316) <0.0001 Range 300-323 300-323 300-323 300-323 299-323 299-323 299-323 299-323 300-323 300-323 298-324 300-324 AMR Finder Plus AMR Median (IQR) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 4 (1-7) 3 (1-7) 4 (1-7) 0.209 Range 0-18 0-18 0-18 0-18 0-18 0-18 0-18 0-18 0-17 0-18 0-17 0-17 Stress Median (IQR) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-3) 1 (0-2) 1 (0-2) 0.687 Range 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 0-26 Virulence Median (IQR) 1 (0-7) 1 (0-7) 1 (0-7) 1 (0-7) 1 (0-6) 1 (0-6) 1 (0-6) 1 (0-6) 1 (0-6) 1 (0-6) 1 (0-6) 1 (0-6) 0.736 Range 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 0-35 *p-value for Fleiss’ Kappa test for uneven proportions of isolates with correct MLST profiles annotated across all 591 assemblers, or Friedman’s test for global differences in continuous variables across all assemblers. 592 †MLST typing schemes were only available for 91/92 pure culture isolates. The excluded sample was identified as 593 Serratia marcescens. 594 †† The incorrectly assigned MLST in one isolate by autocycler consensus assemblies, with or without polishing, was 595 due to duplication of one of the seven housekeeping genes (gyrB(10,10)). 596 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint

References

597 1. Sanderson ND, Kapel N, Rodger G, Webster H, Lipworth S, Street TL, et al. Comparison of 598 R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome 599 reconstruction. Microb Genom. 2023;9(1).10.1099/mgen.0.000910 600 2. Hall MB, Wick RR, Judd LM, Nguyen AN, Steinig EJ, Xie O, et al. Benchmarking reveals 601 superiority of deep learning variant callers on bacterial nanopore sequence data. Elife. 602 2024;13.10.7554/eLife.98300 603 3. Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, et al. 604 Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial 605 genomes from pure cultures and metagenomes without short-read or reference polishing. Nat 606 Methods. 2022;19(7):823-6.10.1038/s41592-022-01539-7 607 4. Ni Y, Liu X, Simeneh ZM, Yang M, Li R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells 608 in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput Struct 609 Biotechnol J. 2023;21:2352-64.10.1016/j.csbj.2023.03.038 610 5. Foster-Nyarko E, Cottingham H, Wick RR, Judd LM, Lam MMC, Wyres KL, et al. Nanopore-611 only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella 612 pneumoniae. Microb Genom. 2023;9(2).10.1099/mgen.0.000936 613 6. Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore 614 and Illumina sequencing. PLoS Comput Biol. 2023;19(3):e1010905.10.1371/journal.pcbi.1010905 615 7. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from 616 short and long sequencing reads. PLOS Computational Biology. 617 2017;13(6):e1005595.10.1371/journal.pcbi.1005595 618 8. Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA. 619 Genomics. 2016;107(1):1-8.10.1016/j.ygeno.2015.11.003 620 9. Wang Y, Yang Q, Wang Z. The evolution of nanopore sequencing. Front Genet. 621 2014;5:449.10.3389/fgene.2014.00449 622 10. Simar SR, Hanson BM, Arias CA. Techniques in bacterial strain typing: past, present, and 623 future. Curr Opin Infect Dis. 2021;34(4):339-45.10.1097/qco.0000000000000743 624 11. Castaneda-Barba S, Top EM, Stalder T. Plasmids, a molecular cornerstone of antimicrobial 625 resistance in the One Health era. Nat Rev Microbiol. 2024;22(1):18-32.10.1038/s41579-023-00926-x 626 12. Dimitriu T. Evolution of horizontal transmission in antimicrobial resistance plasmids. 627 Microbiology (Reading). 2022;168(7).10.1099/mic.0.001214 628 13. Khezri A, Avershina E, Ahmad R. Hybrid Assembly Provides Improved Resolution of Plasmids, 629 Antimicrobial Resistance Genes, and Virulence Factors in Escherichia coli and Klebsiella pneumoniae 630 Clinical Isolates. Microorganisms. 2021;9(12).10.3390/microorganisms9122560 631 14. Arredondo-Alonso S, Willems RJ, van Schaik W, Schurch AC. On the (im)possibility of 632 reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom. 633 2017;3(10):e000128.10.1099/mgen.0.000128 634 15. Sanderson ND, Hopkins KMV, Colpus M, Parker M, Lipworth S, Crook D, et al. Evaluation of 635 the accuracy of bacterial genome reconstruction with Oxford Nanopore R10.4.1 long-read-only 636 sequencing. Microb Genom. 2024;10(5).10.1099/mgen.0.001246 637 16. Abdel-Glil MY, Brandt C, Pletz MW, Neubauer H, Sprague LD. High intra-laboratory 638 reproducibility of nanopore sequencing in bacterial species underscores advances in its accuracy. 639 Microbial Genomics. 2025;11(3).https://doi.org/10.1099/mgen.0.001372 640 17. Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP, Wick RR. How low can you go? Short-641 read polishing of Oxford Nanopore bacterial genome assemblies. Microb Genom. 642 2024;10(6).10.1099/mgen.0.001254 643 18. De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND, Swann J, et al. Comparison of 644 long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb 645 Genom. 2019;5(9).10.1099/mgen.0.000294 646 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint 19. Bouras G, Houtak G, Wick RR, Mallawaarachchi V, Roach MJ, Papudeshi B, et al. Hybracter: 647 enabling scalable, automated, complete and accurate bacterial genome assemblies. Microb Genom. 648 2024;10(5).10.1099/mgen.0.001244 649 20. Wick RR. Autocycler. 2025. 650 21. Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G, Vezina B, et al. Trycycler: consensus long-651 read assemblies for bacterial genomes. Genome Biology. 2021;22(1):266.10.1186/s13059-021-652 02483-z 653 22. Zhou A, Lin T, Xing J. Evaluating nanopore sequencing data processing pipelines for structural 654 variation identification. Genome Biology. 2019;20(1):237.10.1186/s13059-019-1858-1 655 23. illumina. bcl2fastq2 Conversion Software v2.20. 2017. 656 24. Oxford Nanopore Technologies. Dorado v0.9 2024 [Available from: 657 https://github.com/nanoporetech/dorado?tab=readme-ov-file#alignment. 658 25. Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File 659 Manipulation. PLOS ONE. 2016;11(10):e0163962.10.1371/journal.pone.0163962 660 26. Hall MB. Rasusa: Randomly subsample sequencing reads to a specified coverage. Journal of 661 Open Source Software. 2022; 7(69):3941.https://doi.org/10.21105/joss.03941 662 27. Kolmogorov M, Yuan J, Lin Y, Pevzner P. Assembly of Long Error-Prone Reads Using Repeat 663 Graphs. Nature Biotechnology. 2019.doi:10.1038/s41587-019-0072-8 664 28. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and 665 accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 666 2017;27(5):722-36.10.1101/gr.215087.116 667 29. Vaser R, Šikić M. Time- and memory-efficient genome assembly with Raven. Nature 668 Computational Science. 2021;1(5):332-6.10.1038/s43588-021-00073-4 669 30. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. 670 Bioinformatics. 2016;32(14):2103-10.10.1093/bioinformatics/btw152 671 31. Bouras G, Sheppard AE, Mallawaarachchi V, Vreugde S. Plassembler: an automated bacterial 672 plasmid assembly tool. Bioinformatics. 2023;39(7).10.1093/bioinformatics/btad409 673 32. Lee JY, Kong M, Oh J, Lim J, Chung SH, Kim JM, et al. Comparative evaluation of Nanopore 674 polishing tools for microbial genome assembly and polishing strategies for downstream analysis. Sci 675 Rep. 2021;11(1):20740.10.1038/s41598-021-00178-w 676 33. Wick RR, Holt KE. Polypolish: Short-read polishing of long-read bacterial genome assemblies. 677 PLOS Computational Biology. 2022;18(1):e1009802.10.1371/journal.pcbi.1009802 678 34. Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections 679 in genome assemblies. PLOS Computational Biology. 680 2020;16(6):e1007981.10.1371/journal.pcbi.1007981 681 35. Chklovski. CheckM2. 1.1.0 ed2025. 682 36. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the 683 quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 684 2015;25(7):1043-55.10.1101/gr.186072.114 685 37. Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate 686 tool for assessing microbial genome quality using machine learning. Nature Methods. 687 2023;20(8):1203-12.10.1038/s41592-023-01940-w 688 38. Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A. Bakta: rapid and 689 standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial 690 Genomics. 2021;7(11).https://doi.org/10.1099/mgen.0.000685 691 39. Seemann, Torsten. mlst. 2.23.0 ed: Github. 692 40. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome 693 Biology. 2019;20(1):257.10.1186/s13059-019-1891-0 694 41. Robertson J, Nash JHE. MOB-suite: software tools for clustering, reconstruction and typing of 695 plasmids from draft assemblies. Microb Genom. 2018;4(8).10.1099/mgen.0.000206 696 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint 42. Robertson J, Bessonov K, Schonfeld J, Nash JHE. Universal whole-sequence-based plasmid 697 typing and its utility to prediction of host range and epidemiological surveillance. Microb Genom. 698 2020;6(10).10.1099/mgen.0.000435 699 43. Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, et al. Mash Screen: high-700 throughput sequence containment estimation for genome discovery. Genome Biology. 701 2019;20(1):232.10.1186/s13059-019-1841-x 702 44. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: fast 703 genome and metagenome distance estimation using MinHash. Genome Biology. 704 2016;17(1):132.10.1186/s13059-016-0997-x 705 45. Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, et al. igraph: Network Analysis and 706 Visualization in R. R package version 2.1.4 ed2025. 707 46. Csardi G, Nepusz T. The igraph software package for complex network research. 708 InterJournal, Complex Systems. 2006;1695 709 47. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 710 2013:1303.3997 711 48. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 712 2012:e1207.3907 713 49. R Core Team. R: A Language and Environment for Statistical Computing. 4.4.1 ed2021. 714 50. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Verlag New York: Springer; 2016. 715 51. Hadley Wickham, Averick M, Bryan J, Chang W, McGowan LDA, François R, et al. Welcome to 716 the tidyverse. Journal of Open Source Software. 2019;4(43):1686.10.21105/joss.01686 717 52. Auguie B, Antonov A. gridExtra: Miscellaneous Functions for "Grid" Graphics 718 2.3 ed2017. 719 53. Wilke CO. cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'. 2024. 720 54. Revelle W. psych: Procedures for Psychological, Psychometric, and Personality Research R 721 package version 2.5.6 ed. Evanston, Illinois: Northwestern University; 2025. 722 55. Gamer M, Lemon J, Fellows I, Singh P. irr: Various Coefficients of Interrater Reliability and 723 Agreement. 0.84.1 ed2019. 724 56. Gilchrist CLM, Chooi Y-H. clinker & clustermap.js: automatic generation of gene cluster 725 comparison figures. Bioinformatics. 2021;37(16):2473-5.10.1093/bioinformatics/btab007 726 57. Wick RR, Howden BP, Stinear TP. Autocycler: long-read consensus assembly for bacterial 727 genomes. bioRxiv. 2025.10.1101/2025.05.12.653612 728 58. Wick RR, Judd LM, Wyres KL, Holt KE. Recovery of small plasmid sequences via Oxford 729 Nanopore sequencing. Microb Genom. 2021;7(8).10.1099/mgen.0.000631 730 59. Douarre PE, Mallet L, Radomski N, Felten A, Mistou MY. Analysis of COMPASS, a New 731 Comprehensive Plasmid Database Revealed Prevalence of Multireplicon and Extensive Diversity of 732 IncF Plasmids. Front Microbiol. 2020;11:483.10.3389/fmicb.2020.00483 733 60. Frolova D, Lima L, Roberts L, Bohnenkämper L, Wittler R, Stoye J, et al. Applying 734 rearrangement distances to enable plasmid epidemiology with pling. bioRxiv. 735 2024:2024.06.12.598623.10.1101/2024.06.12.598623 736 61. UK Health Security Agency. English surveillance programme for antimicrobial utilisation and 737 resistance (ESPAUR) Report 2023 to 2024. 2024. 738 62. Lannelongue L, Grealey J, Inouye M. Green Algorithms: Quantifying the Carbon Footprint of 739 Computation. Adv Sci (Weinh). 2021;8(12):2100707.10.1002/advs.202100707 740 63. Wikipedia. N50, L50, and related statistics 2024 [Available from: 741 https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics. 742 64. Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J, Haft DH, et al. 743 AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among 744 antimicrobial resistance, stress response, and virulence. Sci Rep. 2021;11(1):12728.10.1038/s41598-745 021-91456-0 746 .CC-BY 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted September 17, 2025. ; https://doi.org/10.1101/2025.09.15.676237doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-21T05:10:58.409756+00:00

License: CC-BY-4.0