Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline

doi:10.21203/rs.3.rs-4271356/v1

Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline

2024 · doi:10.21203/rs.3.rs-4271356/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 123,343 characters · extracted from preprint-html · click to expand

Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline Charles Langelier, Dan Lu, Katrina Kalantar, Victoria Chu, Abigail Glascock, and 19 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4271356/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Antimicrobial resistant (AMR) pathogens represent urgent threats to human health, and their surveillance is of paramount importance. Metagenomic next generation sequencing (mNGS) has revolutionized such efforts, but remains challenging due to the lack of open-access bioinformatics tools capable of simultaneously analyzing both microbial and AMR gene sequences. To address this need, we developed the CZ ID AMR module, an open-access, cloud-based workflow designed to integrate detection of both microbes and AMR genes in mNGS and whole-genome sequencing (WGS) data. It leverages the Comprehensive Antibiotic Resistance Database and associated Resistance Gene Identifier software, and works synergistically with the CZ ID short-read mNGS module to enable broad detection of both microbes and AMR genes. We highlight diverse applications of the AMR module through analysis of both publicly available and newly generated mNGS and WGS data from four clinical cohort studies and an environmental surveillance project. Through genomic investigations of bacterial sepsis and pneumonia cases, hospital outbreaks, and wastewater surveillance data, we gain a deeper understanding of infectious agents and their resistomes, highlighting the value of integrating microbial identification and AMR profiling for both research and public health. We leverage additional functionalities of the CZ ID mNGS platform to couple resistome profiling with the assessment of phylogenetic relationships between nosocomial pathogens, and further demonstrate the potential to capture the longitudinal dynamics of pathogen and AMR genes in hospital acquired bacterial infections. In sum, the new AMR module advances the capabilities of the open-access CZ ID microbial bioinformatics platform by integrating pathogen detection and AMR profiling from mNGS and WGS data. Its development represents a critical step toward democratizing pathogen genomic analysis and supporting collaborative efforts to combat the growing threat of AMR. Biological sciences/Computational biology and bioinformatics/Classification and taxonomy Biological sciences/Computational biology and bioinformatics/Computational platforms and environments Biological sciences/Microbiology/Microbial communities/Metagenomics Biological sciences/Microbiology/Infectious-disease diagnostics Biological sciences/Biotechnology/Genomics/Metagenomics Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Antimicrobial resistance (AMR) is responsible for an estimated 1.27 million global deaths annually 1 , and is on track to cause 10 million deaths a year by 2050, becoming a leading cause of global mortality 2 . Furthermore, the World Health Organization has declared AMR to be one of the top ten global public health threats facing humanity 3 . A critical step in combating AMR is the development and implementation of new methods and analysis tools for genomic detection and surveillance of AMR microbes with high resolution and throughput 4 . Whole genome sequencing (WGS) of cultured bacterial isolates and direct metagenomic next-generation sequencing (mNGS) of biological and environmental samples have emerged at the forefront of technological advances for AMR surveillance 5 . Several tools and databases have been developed over the past decade to enable the detection of AMR genes from both WGS and mNGS data. These include ResFinder 6 , the Comprehensive Antibiotic Resistance Database (CARD) 7 , 8 , ARG-ANNOT 9 , SRST2 10 , AMRFinderPlus, the Reference Gene Catalog by NCBI 11 , and others. Effective surveillance for resistant pathogens requires not only detecting AMR genes, but also detecting their associated microbes. Despite this, each task has traditionally been approached separately in bioinformatics pipelines, with few available tools enabling simultaneous evaluation of both. The CZ ID mNGS pipeline, for instance, was developed in 2017 to democratize access to metagenomic data analysis through a free, no-code, cloud-based workflow, but has had limited AMR assessment capabilities 12 . Realizing the unmet need for, and potential impact of, a single bioinformatics tool integrating the detection of both AMR genes and microbes, we sought to add AMR analysis capabilities to the open-access CZ ID mNGS pipeline. Here we report the development of a new AMR module within the CZ ID web platform, which leverages CARD to support openly-accessible AMR detection and analysis. We demonstrate its utility across both WGS and mNGS data, and in clinical and environmental samples, and demonstrate the value of enriching AMR findings through simultaneous unbiased profiling of microbes. Implementation AMR gene and variant detection using the CZ ID AMR module The AMR module is incorporated into the CZ ID web application ( https://czid.org ) 12 and allows researchers to upload FASTQ files from both mNGS and WGS short-read data. Once uploaded, the module automatically processes samples in the cloud using Amazon Web Services infrastructure, eliminating the need for users to download and install software or maintain high-performance computing resources. The web-based application makes analysis of AMR datasets accessible even to researchers with limited bioinformatics or computational expertise. Underlying the AMR module is CARD ( https://card.mcmaster.ca ), a comprehensive, continually curated, database of AMR genes and their variants, linked to gene family, resistance mechanism, and drug class information 7 , 8 . The AMR module specifically leverages the CARD Resistance Gene Identifier (RGI) tool ( https://github.com/arpcard/rgi ) 7,13 to match short reads or contigs to AMR gene reference sequences in the CARD database, returning metrics such as gene coverage and percent identity. CARD also contains a Resistomes & Variants database of in silico predictions of allelic variants and AMR gene homologs in pathogens of public health significance. This database provides information linking AMR genes to specific species, and can be used for k-mer-based pathogen-of-origin prediction, a beta feature implemented in RGI 13 . The CZ ID AMR module automates the running of a containerized WDL workflow that strings together multiple steps and informatics tools to enable efficient data processing and accurate resistome profiling. The workflow shares the same preprocessing steps as the existing CZ ID mNGS module. Briefly, it accepts raw FASTQ files from short-read mNGS or WGS samples as input (DNA or RNA) (Fig. 1 , Fig. S1 ). Low quality and low complexity reads are first removed with fastp 14 , host reads are removed with Bowtie2 15 and HISAT2 16 , and then duplicate reads are filtered out using CZID-dedup ( https://github.com/chanzuckerberg/czid-dedup ). The resulting quality- and host-filtered reads are subsampled to 1 million single-end reads or 2 million paired-end reads to limit the resources required for compute-intensive downstream alignment steps. In the AMR workflow, to accommodate targeted mNGS protocols designed to amplify many copies of low abundance AMR genes, duplicate reads are then added back prior to further processing. There are two parallel approaches for AMR gene detection (Fig. 1 , Fig. S1 ). In the ‘contig’ approach, the short reads are assembled into contiguous sequences (contigs) using SPADES 17 , and the contigs are subsequently sent to RGI (main) for AMR gene detection based on sequence similarity and mutation mapping. In the ‘read’ approach, the short reads are directly sent to RGI (bwt) for read mapping by KMA 18 to CARD reference sequences (Fig. 1 ). In both approaches, the assembled contigs or reads containing AMR genes are also sent to RGI (kmer_query) for pathogen-of-origin detection. AMR module result output The AMR module displays results in an interactive table, facilitating viewing, sorting, and filtering. The table is organized in three collapsible vertical sections: 1) general Information, 2) contigs, and 3) reads (Fig. 2 A). The general information section includes “Gene” and “Gene Family” as well as information on the antibiotic(s) against which the gene confers resistance (“Drug Class” and “High-level Drug Class”), resistance mechanism (“Mechanism”), and model used to identify resistance (“Model”). With respect to the latter, several models are used to identify resistance such as the CARD protein homolog model which identifies the presence of AMR genes, and the protein variant model which identifies specific mutations that confer resistance. Clicking on the AMR gene name will reveal a description and web hyperlinks to CARD, NCBI and PubMed entries. The “Contigs” section includes the number of contigs that map to each AMR gene (“Contigs”), cutoff based on BLAST bit-score (“Cutoff”), percentage of the AMR gene covered by all contigs (“%Cov”), percent identity of the covered region (“%Id”), and pathogen-of-origin prediction based on contigs (“Contig Species”). The “Reads” section includes metrics corresponding to the number of reads mapping to the AMR gene (“Reads”), relative abundance of the AMR gene in reads per million reads sequenced (“rpM”), percentage of AMR gene covered by sequencing reads (“%Cov”), average depth of reads aligned across the gene (“Cov. Depth”), average depth of reads aligned across the gene per million reads sequenced (“dpM”), and a pathogen-of-origin prediction based on reads (“Read Species”). All columns can be sorted and numerical metrics can be further filtered using user specified thresholds. Results files at each stage of the pipeline can be downloaded for inspection or additional downstream analysis. These files include quality- and host-filtered reads, assembled contigs, AMR annotations and corresponding metrics in tabular format, and all output files from CARD RGI. The contigs as well as short reads mapped to each AMR gene can also be downloaded. Quality filtering for AMR gene predictions One challenge with mNGS-based AMR surveillance is interpretation of results. The CZ ID AMR module provides key quantitative metrics including rpM, percent coverage of the AMR gene, and dpM to enable assessments of relative abundance and the confidence of AMR gene assignments. Additionally, for AMR detection using contigs, the “Cutoff” column which reports RGI’s stringency thresholds based on CARD’s curated bit-score cut-offs can provide valuable insight into AMR gene alignment confidence. Here, “Perfect” indicates perfect or identical matches to the curated reference sequences and mutations in CARD while “Strict” indicates matches to variants of known AMR genes, including a secondary screen for key mutations. Finally, the terminology “Nudged'' is adopted by the CZ ID module to indicate more distant homologs (matched via RGI’s “Loose” paradigm) with at least 95% identity to known AMR genes, which is ideal for discovery but is more likely to return false-positive hits. Given that a consensus approach has yet to be developed for quantifying and interpreting AMR genes from mNGS and WGS data, the CZ ID AMR module provides comprehensive information that can be subsequently filtered or otherwise optimized based on the goals of a given analysis. Microbial profiling using the CZ ID mNGS module The CZ ID mNGS module, which has undergone several updates since first described 12 , preprocesses the uploaded reads and then proceeds to assembly-based alignment to produce taxonomic relative abundance profiles for each sample. Briefly, the non-host reads output by the quality- and host-filtering steps (as described above) are aligned to the NCBI nucleotide (NT) and protein (NR) databases using minimap2 19 and DIAMOND 20 , respectively, to identify putative short-read alignments (Fig. 1 , Fig. S1 ). Then, reads are assembled into contigs using SPADES 17 and contigs are re-aligned to the set of putative accessions using BLAST 21 to improve specificity. Finally, alignments are used to identify taxons of origin, which are tallied into relative abundance estimates 12 . The web interface provides various reports with metrics including reads per million (“rpM”), number of reads (“r”), number of contigs (“contig”), number of reads in the contigs (“contig r”), percent identity (“%id”), and average length of alignment (“L”), alongside visualizations and download options to support the analysis and exploration of results (Fig. 2 B). Connecting the pathogens and AMR genes The CZ ID platform enables simultaneous data analysis of microbe and AMR genes from a single data upload via the mNGS and AMR modules. This provides complementary, but distinct, microbial and AMR gene profiles from a given sample or dataset. The mNGS module does not provide any direct link between species calls and AMR genes from the AMR module, although in cases where a single bacterial pathogen comprises the majority of reads in a metagenomic sample, this may be inferred. Conversely, the AMR module provides two ways to help connect AMR genes to their source microbes. First, each AMR gene returned in the report table is hyperlinked to its corresponding CARD webpage, where the Resistomes section reports all species in which the gene and its variants have been identified as predicted by RGI. Secondly, the AMR module returns results from a pathogen-of-origin analysis conducted by RGI 13 , which maps k-mers derived from reads or contigs containing the AMR gene of interest against AMR alleles in CARD Resistomes & Variants database. This second approach is particularly useful for identifying the source species in cases when the first CARD Resistomes section lists multiple species or genera. However, because only AMR gene sequences present in CARD are considered in the pathogen-of-origin analysis, as opposed to species identification using complete reference genome sequences in the mNGS module, species predictions from AMR module are best interpreted in the context of all outputs from the CZ ID AMR and mNGS modules. Sharing results for collaboration Projects on CZ ID can be shared with specific users or made public to all users. Everyone with access to the project can view or download the results, and perform data filtering or other analyses. All data and results for this paper can be accessed by searching for a project named “AMR example applications” among public projects at https:///czid.org . Results Application 1: Identification of AMR genes from WGS and mNGS data. To demonstrate the CZ ID AMR module’s utility for detecting bacterial pathogens and their AMR genes in both WGS and mNGS data, we leveraged data from a recent investigation of transfusion-related sepsis 22 . In this study, two immunocompromised patients received platelet units originating from a single donor. Both developed septic shock within hours after the transfusion, with blood cultures from Patient 1, who did not survive, returning positive for Klebsiella pneumoniae. Patient 2, who was receiving prophylactic antibiotic therapy at the time of the transfusion, survived, but had negative blood cultures. Direct mNGS of post-transfusion blood samples from both patients revealed a large increase in reads mapping to Klebsiella pneumoniae , a pathogen which was later also identified from culture of residual material from the transfused platelet bag (Fig. 3 A) 22 . While blood mNGS data yielded less coverage of the K. pneumoniae genome compared to WGS of the cultured isolates, mNGS of patient 1’s post-transfusion plasma sample recovered all the AMR genes found by WGS of cultured isolates (Fig. 3 B). Even in patient 2, whose blood sample had fewer reads mapping to K. pneumoniae , most AMR genes found in the cultured isolates were still able to be identified using the RGI “Nudged” threshold. Application 2: Comprehensive metagenomic and WGS profiling of pathogens and AMR genes in the setting of a hospital outbreak. To demonstrate how the CZ ID AMR module can facilitate deeper insights into pathogen and AMR transmission in hospitals, we evaluated WGS and mNGS data from surveillance skin swabs collected from 40 babies in a neonatal intensive care unit (NICU). The swabs were collected to evaluate for suspected transmission of methicillin-susceptible Staphylococcus aureus (MSSA) between patients. WGS of the MSSA isolates followed by implementation of the AMR module demonstrated many shared AMR genes, and revealed a cluster of nine samples with identical AMR profiles (Fig. 4 A). Subsequent phylogenetic assessment using split k-mer analysis with SKA2 23 , revealed that samples within this cluster differed by less than 11 single nucleotide polymorphisms (SNP) across their genomes, consistent with an outbreak involving S. aureus transmission between patients (Fig. 4 B). Within this cluster of patients, we considered whether other bacterial species in the microbiome were also being exchanged in addition to the S. aureus . Intriguingly, mNGS analysis of the direct swab samples from which the S. aureus isolates were selectively cultured revealed a diversity of bacterial taxa, many of which were more abundant than S. aureus . These included several established healthcare-associated pathogens that were never identified using the selective culture-based approach, such as Enterobacter , Citrobacter , Klebsiella and Enterococcus species. mNGS also demonstrated that each sample had a distinct microbial community composition even among samples from the cluster, indicating that only S. aureus and potentially a subset of other species were actually exchanged between babies, rather than the entire skin microbiome (Fig. 5 A). Further analysis of mNGS data using the AMR module also revealed a diversity of AMR genes conferring resistance to several drug classes, and commonly associated with nosocomial pathogens. These included genes encoding ampC-type inducible beta-lactamases (e.g., CKO, CMY, SS T), extended spectrum beta-lactamases (e.g., SHV ), and the recently emerged MCR class of AMR genes, which confer plasmid-transmissible colistin resistance 24 . The AMR gene profiles varied greatly across the samples, both within the cluster and outside of the cluster, consistent with the observed taxonomic diversity (Fig. 5 B). Together, these results revealed both inter-patient MSSA transmission in the NICU, and the acquisition of AMR genes associated with nosocomial pathogens within the first months of life. Application 3: Correlating pathogen identification with AMR gene detection. Next, we aimed to integrate results from the CZ ID mNGS and AMR modules by analyzing mNGS data from critically ill patients with bacterial infections. In Patient 350 25 , who was hospitalized for Serratia marcescens pneumonia, metagenomic RNA sequencing (RNA-seq) of a lower respiratory tract sample identified Serratia marcescens as the single most dominant species within the lung microbiome (Fig. 6 A) 25 . Among the detected AMR genes, based on the Resistomes & Variants information from CARD, SRT-2 and SST-1 are found exclusively in Serratia marcescens (Fig. 6 B in blue). Further analysis by the pathogen-of-origin feature in the AMR module matched the k-mers from reads and contigs containing rsmA, AAC(6’)-Ic , and CRP to Serratia marcescens (Fig. 6 B in purple). In Patient 11827 26 , who was hospitalized for sepsis due to a methicillin-resistant Staphylococcus aureus (MRSA) blood stream infection, analysis of plasma mNGS data demonstrated that Staphylococcus aureus was the dominant species present in the blood sample (Fig. 6 C) 26 . Among the detected AMR genes, based on Resistome & Variants information from CARD, Staphylococcus aureus norA, Staphylococcus aureus LmrS, arlS, mepA, tet(38), mecR1, mecA are found exclusively in staph species (Fig. 6 D in blue). Pathogen-of-origin analysis further matched k-mers from the reads containing sdrM to S. aureus (Fig. 6 D in purple). Application 4: Profiling the longitudinal dynamics of pathogens and AMR genes. To demonstrate the utility of the CZID mNGS and AMR modules for studying the longitudinal dynamics of infection, we analyzed serially-collected lower respiratory RNA-seq data from a critically ill patient with respiratory syncytial virus (RSV) infection who subsequently developed ventilator-associated pneumonia (VAP) due to Pseudomonas aeruginosa 27 , 28 . Analysis of microbial mNGS data using the CZ ID pipeline highlighted the temporal dynamics of RSV abundance, which decreased over time. Following viral clearance, we noted an increase in reads mapping to P. aeruginosa on day 9, correlating with a subsequent clinical diagnosis of VAP and bacterial culture positivity (Fig. 7 A) 27 , 28 . Analysis using the CZ ID AMR module demonstrated that P. aeruginosa -associated AMR genes were also detected, and their prevalence tracked with the relative abundance of the nosocomial bacterial pathogen (Fig. 7 B). Application 5: AMR gene detection from environmental surveillance samples. Lastly, to highlight the application of the CZ ID AMR module for environmental surveillance of AMR pathogens, we analyzed publicly-available short-read mNGS data from a wastewater surveillance study comparing Boston, USA to Vellore, India 29 . In this study, municipal wastewater, hospital wastewater, and surface water samples were collected from each city and underwent DNA mNGS. From AMR gene alignments at the contig level, we observed a total 22 AMR gene families in Boston samples versus 30 from Vellore (Fig. 8 ). Several AMR genes of high public health concern such as the KPC and NDM plasmid-transmissible carbapenemase genes were only present in hospital effluent, reflecting the fact that hospitals frequently serve as reservoirs of AMR pathogens 30 . Discussion Metagenomics has emerged as a powerful tool for studying and tracking AMR pathogens in a range of research and public health contexts. Both surveillance and research applications of mNGS benefit from simultaneous assessment of AMR genes and their associated microbes, yet traditionally separate bioinformatics workflows and resource-intense computational infrastructure have been required for each. Here, we address these challenges with the CZ ID AMR module, a fast and openly accessible platform for combined analysis of AMR genes and microbial genomes that couples the expansive database and advanced RGI software of CARD with the unbiased microbial detection capacity of CZ ID. We demonstrate the AMR module’s diverse applications from infectious disease research to environmental monitoring through a series of case studies leveraging four observational patient cohorts and a wastewater surveillance study. The CZ ID AMR module is designed to enable rapid and accessible data processing without a need for coding expertise, and return a comprehensive set of AMR gene alignment metrics to aid in data interpretation. Researchers can then apply stringency threshold filters to maximize sensitivity or specificity depending on the use case. For instance, when seeking to detect established AMR genes from data types with high coverage of microbial genomes (e.g., WGS data of cultured isolates), “Perfect” or “Strict” stringency thresholds maximize the accuracy of assignments. In contrast, from mNGS data with sparse microbial genome coverage (e.g., from blood or wastewater), using “Nudged” to increase sensitivity of mapping reads at the expense of specificity may be the only way to detect biologically important AMR genes. The “Nudged” threshold also enables more alignment permissiveness to sequence variations, which can be helpful for detecting novel alleles. The CZ ID AMR module provides various metrics to support optimization of cutoffs based on specific sample types and applications by the users. Depending on the number of reads, breadth of coverage, and whether reads originate from conserved versus variable gene regions, the confidence of AMR gene assignment can vary. Generally, the confidence of contig-based AMR gene assignments is greater than read-based AMR gene matches due to the increased length of assembled fragments. When it comes to AMR gene alleles with high sequence similarity, such as those from within the same gene family, the AMR module can only distinguish between them if sufficient gene coverage is achieved. In most of our analyses, if genes within the same family were identified at both the individual read and contig level, we preferentially evaluated the contig annotation to maximize allele specificity. As our understanding of AMR gene biology increases over time, annotations may change in the CARD reference database that underpins the CZ ID AMR gene module. This was evident, for instance, in the Klebsiella transfusion-related sepsis case (Application 1, Fig. 2 B), where mdfA was annotated as conferring resistance to tetracycline antibiotics based on CARD version 3.2.6, used for our analysis. This will be updated as a multiple drug resistance gene 31 in the next CARD release. To mitigate database limitations and ensure traceability of results over time, CZ ID periodically updates the database versions and highlights the specific versions of the underlying databases used for each analysis. CZ ID enables simultaneous detection of pathogens and AMR genes, and our results emphasize the importance of integrating taxonomic abundance from the CZ ID mNGS module with several data outputs within the AMR module. Each AMR gene is directly linked to its CARD webpage where the Resistomes section provides information on the species predicted to harbor the gene of interest and its variants. The pathogen-of-origin predictions, while still a beta feature, can further help identify the source species of detected AMR genes. These assignments are predictions based on matching AMR sequences in each sample to CARD Resistomes & Variants database, and should be interpreted in the context of the microbes found to exist in the sample from the CZ ID mNGS module output. Connecting AMR genes to their originating microbes thus necessitates integrating all available results from both the CZ ID AMR and mNGS modules. In sum, we describe the novel AMR analysis module within the CZ ID bioinformatics web platform designed to facilitate integrated analyses of AMR genes and microbes. This open-access, cloud-based pipeline permits studying AMR genes and microbes together across a broad range of applications, ranging from infectious diseases to environmental surveillance. By overcoming the significant computing infrastructure and technical expertise typically required for mNGS data processing, this tool aims to democratize the analysis of microbial genomes and metagenomes across humans, animals, and the environment. Methods Patient enrollment, sample collection and ethics Skin swabs and cultured isolates analyzed for Application 2 (hospital outbreak) were collected under the University of California San Francisco Institutional Review Board (IRB) protocol no. 17-24056, which granted a waiver of consent for their collection, as part of a larger ongoing surveillance study of patients with healthcare-associated infections. Samples analyzed for Application 4 (longitudinal profiling) were collected from patients enrolled in a prospective cohort study of mechanically ventilated children admitted to eight intensive care units in the National Institute of Child Health and Human Development’s Collaborative Pediatric Critical Care Research Network (CPCCRN) from February 2015 to December 2017. The original cohort study was approved by the Collaborative Pediatric Critical Care Research IRB at the University of Utah (protocol no. 00088656). Details regarding enrollment and consent have previously been described 27 , 28 . Briefly, children aged 31 days to 18 years who were expected to require mechanical ventilation via endotracheal tube for at least 72 hours were enrolled. Parents or other legal guardians of eligible patients were approached for consent by study-trained staff as soon as possible after intubation. Waiver of consent was granted for TA samples to be obtained from standard-of-care suctioning of the endotracheal tube until the parents or guardians could be approached for informed consent. For all other applications and analyses, previously published datasets were used as described in the data and code availability section. Nucleic acid extraction and Illumina sequencing For the skin swab samples and cultured isolates described in Application 2, DNA was extracted using the Zymo pathogen magbead kit (Zymo Research) according to manufacturer’s instructions. Sequencing libraries were then prepared from 20ng of input DNA using the NEBNext Ultra-II DNA kit (New England Biolabs) following manufacturer’s instructions 22 . For the tracheal aspirate samples described in Application 4, RNA was extracted using the Qiagen Allprep kit (Qiagen) following manufacturer’s instructions. Sequencing libraries were prepared using the NEBNext Ultra-II RNA kit (New England Biolabs) according to a previously described protocol 27 . Paired end 150 base pair illumina sequencing was performed on all samples using Illumina NextSeq 550 or NovaSeq 6000. AMR gene identification We downloaded the tabular results from the AMR module and applied quality filters to ensure robust AMR gene identification. Specifically, for mNGS data, we required all AMR genes (from contig and read approaches) to have coverage breadth > 10% and for read mappings we additionally required > 5 reads mapping to the AMR gene. For WGS data, we required all AMR genes (from contig and read approaches), to have coverage breadth > 50% and additionally required > 5 reads mapping to the AMR gene for read results. Across all analyses, Nudged results were treated the same way as contig results. For studies with corresponding water controls, we applied the above filters to the water controls, and then removed AMR genes or gene families (depending on what was plotted) also found in water controls from experimental samples. AMR gene heatmaps All plots were generated in R using Tidyverse 32 , patchwork 33 and ComplexHeatmap 34 . While making the plots, we did an additional filtering to focus the analysis within the context of the use-case and limit the size of the plots for the paper. In particular, we included only CARD’s protein homolog and protein variation models (see https://github.com/arpcard/rgi ), and included only medically relevant antibiotics drug classes by removing disinfecting agents and antiseptics, antibacterial free fatty acids, and aminocoumarin, diaminopyrimidine, elfamycin, fusidane, phosphonic acid, nucleoside, and pleuromutilin antibiotics. In Fig. 5 B and Fig. 8 , we also excluded efflux pumps to reduce plot size as efflux pumps tend to have ubiquitous functions in cellular processes. Then, we applied a series of heuristics to make this structured data amenable to heatmap visualization. Given the nature of a heatmap visualization, each AMR annotation in each sample can have only one representing tile, so we plotted the result with the highest confidence. We considered AMR genes identified through the contig approach with Perfect or Strict cutoffs as higher confidence than those with the Nudged cutoff, which were then of higher confidence than AMR genes found by reads alone. Finally, given the challenges for gene attribution presented by homology between genes in the same gene family, we developed a systematic approach for collapsing the visualization to a single candidate per sample. For all figures except for Fig. 6 , if in the same sample one AMR gene was found by the read approach and a different AMR gene from the same gene family was found by the contig approach, the first AMR gene was omitted and only the second AMR gene was plotted. The rationale for this prioritization stems from the fact that sometimes short reads alone cannot sufficiently distinguish between highly similar alleles or genes from the same gene family. Contigs, which typically provide greater sequence length are often of higher confidence. This approach should be considered on a per gene or per gene family basis, due to variability in the extent of sequence similarity within genes and gene families, and also be modified for specific use cases. For example in Fig. 6 B, even though mecR1 and mecA are from the same gene family, they do not have highly similar sequences and we did not apply this step. Species identification For results from the CZ ID mNGS module, filters were again applied to ensure high-quality results. Specifically, for Fig. 3 and Fig. 7 , which each focused on a single species, the NT rpM calculated by the mNGS module was used with no extra filtering. For Fig. 5 and Fig. 6 A, which focused on species composition, the species detected by the mNGS module were filtered with: NT rpM > 10 and NR rpM > 10 to implement a minimal abundance requirement for taxonomic identification, NT alignment length > 50 to ensure alignment specificity and NT Z-score > 2 using a background model calculated with the corresponding study-specific water samples to ensure significance of taxa above levels of possible background contamination. Finally, for Fig. 6 B, which had low read coverage, abundance filters were omitted and only the significance filter of NT Z-score > 2 was applied, using a background model calculated with the corresponding water samples. SNP distance analysis Host-filtered reads were downloaded from the CZ ID mNGS module. SNP distance were calculated with SKA2 0.3.2 23 using ska build --min-count 4 --threads 4 --min-qual 20 -k 31 --qual-filter strict and ska distance --filter-ambiguous. The heatmap plot was generated with ComplexHeatmap 34 Declarations Data and code availability All raw microbial sequencing data supporting the conclusions of this article are available via NCBI’s Sequence Read Archive under BioProjects PRJNA544865, PRJNA1086943, PRJNA450137 and PRJNA672704. For previously unpublished datasets, non-host FASTQ files generated by CZ ID mNGS module were submitted to SRA under NCBI Bioproject Accession: PRJNA1086943. We obtained raw FASTQ files from previous studies 22,25–29 , either from the authors or public repositories, and uploaded them to the CZ ID pipeline (https://czid.org/) under an openly accessible manuscript-specific project called “AMR example applications” to be processed through both the AMR module and the mNGS module (the project can be accessed at https://czid.org/home?project_id=5929 after logging in). CZ ID workflow code can be found in https://github.com/chanzuckerberg/czid-workflows/. Additional code for data filtering and plotting can be found in https://github.com/chanzuckerberg/czid-amr-manuscript-2024. The following software versions were used for this manuscript: CZ ID mNGS workflow version 8.2.5, CZ ID AMR workflow version 1.4.2 based on CARD RGI version 6.0.3, CARD database versions 3.2.6 and the CARD Resistomes & Variants database: 4.0.0. SK2 version 0.3.2. Competing interests The authors declare that they have no competing interests. Funding Chan Zuckerberg Initiative (DL, KK, NB, XB, KR, KE, EF, OH, EH, AEJ, RL, SM, LR, JT, OV). Chan Zuckerberg Biohub (CL, VC, AG, AJP). NIH/NHLBI 5R01HL155418 (CL, PMM) and 1R01HL124103 (PMM). Canadian Institutes of Health Research PJT-156214 and David Braley Chair in Computational Biology (ARP, BPA, AGM). Authors' contributions KK and CL conceived of and designed the work. DL carried out data analysis with valuable inputs and guidance from KK, CL, VC and AG. ESG collected and sequenced all samples in Application 2. The CZ ID team (NB, XB, KR, KE, EF, OH, EH, AEJ, RL, SM, LR, JT, OV) built the AMR module. PMM collected and sequenced all samples in Application 4. AJP provided the data for Application 5. ARR, BPA, AGM provided expert input on the project. CL supervised the work. DL, KK and CL drafted the manuscript with inputs from all coauthors. Acknowledgements We acknowledge the contributions of the whole CZI Infectious Disease development team: Robert Aboukhalil, Kami Bankston, Neha Chourasia, Jerry Fu, Julie Han, Francisco Loo, Todd Morse, Juan Caballero Perez, David Ruiz, Vincent Selhorst-Jones and Kevin Wang. References Antimicrobial Resistance Collaborators. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399 , 629–655 (2022). Review on Antimicrobial Resistance. Tackling Drug-Resistant Infections Globally: Final Report and Recommendations . (2016). global health issues to track in 2021. https://www.who.int/news-room/spotlight/10-global-health-issues-to-track-in-2021. Baker, K. S. et al. Evidence review and recommendations for the implementation of genomics for antimicrobial resistance surveillance: reports from an international expert group. Lancet Microbe 4 , e1035–e1039 (2023). Anjum, M. F., Zankari, E. & Hasman, H. Molecular Methods for Detection of Antimicrobial Resistance. Microbiol Spectr 5 , (2017). Zankari, E. et al. Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67 , 2640–2644 (2012). Jia, B. et al. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res. 45 , D566–D573 (2017). McArthur, A. G. et al. The comprehensive antibiotic resistance database. Antimicrob. Agents Chemother. 57 , 3348–3357 (2013). Gupta, S. K. et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob. Agents Chemother. 58 , 212–220 (2014). Inouye, M. et al. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med. 6 , 90 (2014). Feldgarden, M. et al. AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci. Rep. 11 , 12728 (2021). Kalantar, K. L. et al. IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring. Gigascience 9 , (2020). Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48 , D517–D525 (2020). Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34 , i884–i890 (2018). Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9 , 357–359 (2012). Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37 , 907–915 (2019). Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19 , 455–477 (2012). Clausen, P. T. L. C., Aarestrup, F. M. & Lund, O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 19 , 307 (2018). Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34 , 3094–3100 (2018). Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12 , 59–60 (2015). Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215 , 403–410 (1990). Crawford, E. et al. Investigating Transfusion-related Sepsis Using Culture-Independent Metagenomic Sequencing. Clin. Infect. Dis. 71 , 1179–1185 (2020). GitHub - bacpop/ska.rust: Split k-mer analysis – version 2. GitHub https://github.com/bacpop/ska.rust. Hussein, N. H., Al-Kadmy, I. M. S., Taha, B. M. & Hussein, J. D. Mobilized colistin resistance (mcr) genes from 1 to 10: a comprehensive review. Mol. Biol. Rep. 48 , 2897–2907 (2021). Langelier, C. et al. Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. Proc. Natl. Acad. Sci. U. S. A. 115 , E12353–E12362 (2018). Kalantar, K. L. et al. Integrated host-microbe plasma metagenomics for sepsis diagnosis in a prospective cohort of critically ill adults. Nat Microbiol 7 , 1805–1816 (2022). Tsitsiklis, A. et al. Lower respiratory tract infections in children requiring mechanical ventilation: a multicentre prospective surveillance study incorporating airway metagenomics. Lancet Microbe 3 , e284–e293 (2022). Mick, E. et al. Integrated host/microbe metagenomics enables accurate lower respiratory tract infection diagnosis in critically ill children. J. Clin. Invest. 133 , (2023). Fuhrmeister, E. R. et al. Surveillance of potential pathogens and antibiotic resistance in wastewater and surface water from Boston, USA and Vellore, India using long-read metagenomic sequencing. medRxiv 2021.04.22.21255864 (2021) doi:10.1101/2021.04.22.21255864. Struelens, M. J. The epidemiology of antimicrobial resistance in hospital acquired infections: problems and possible solutions. BMJ 317 , 652–654 (1998). Lewinson, O. et al. The Escherichia coli multidrug transporter MdfA catalyzes both electrogenic and electroneutral transport reactions. Proc. Natl. Acad. Sci. U. S. A. 100 , 1667–1672 (2003). Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4 , 1686 (2019). Pedersen, T. L. patchwork: The Composer of Plots. Preprint at https://patchwork.data-imaginist.com (2024). Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32 , 2847–2849 (2016). Additional Declarations There is NO Competing Interest. Supplementary Files SuppCZIDAMRmodule041424.docx Supplementary Materials Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4271356","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":293207568,"identity":"718b6c0d-65f9-44c7-a09b-e810f3d68a86","order_by":0,"name":"Charles Langelier","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABDUlEQVRIie3QMUvDQBTA8fcIOFWyHqj0K6Q4Kl39Ch3zCJiluBS6OPiKoEuh6235CimuDi8c6CJxDXSpCM7p1qFCcw5SxBwdBe8/PJLwfnAXAJ/vDxbqZiDbx4AF4Ny+LgEO2omqvglacmlt5CTRYIc0w+xBjm7fa+T+VfdiMpHV4yvNZg2rx6adHD9FEXIy6r0UXOiPBekqAdSlg6g4ipEDmmti05EF5SqU4PDORdJakG9onr2x2UhJeWgg+HSRwbDHyIYy1UwQoRwSCNBBVDUcAfEz5R3iYirJqb1LMS3TVhLq9KFeba4puzemXkv/pPljuFyPz1rJV7E9oex8kF/XftblvdZ8Pp/vP7YF8Nxi5JeT2+IAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-6708-4646","institution":"University of California, San Francisco","correspondingAuthor":true,"prefix":"","firstName":"Charles","middleName":"","lastName":"Langelier","suffix":""},{"id":293207569,"identity":"859cc0bc-e5ee-4c68-9925-c679f3d5a69b","order_by":1,"name":"Dan Lu","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Dan","middleName":"","lastName":"Lu","suffix":""},{"id":293207570,"identity":"4d56482b-52bd-414d-a9a4-82d62cb489ff","order_by":2,"name":"Katrina Kalantar","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Katrina","middleName":"","lastName":"Kalantar","suffix":""},{"id":293207571,"identity":"ccaecd63-11ad-460a-a37c-b71f4b386efb","order_by":3,"name":"Victoria Chu","email":"","orcid":"","institution":"UCSF","correspondingAuthor":false,"prefix":"","firstName":"Victoria","middleName":"","lastName":"Chu","suffix":""},{"id":293207572,"identity":"1039869f-0936-409f-ac2f-b177a21cc534","order_by":4,"name":"Abigail Glascock","email":"","orcid":"","institution":"Chan Zuckerberg Biohub","correspondingAuthor":false,"prefix":"","firstName":"Abigail","middleName":"","lastName":"Glascock","suffix":""},{"id":293207573,"identity":"29c3d4de-9b94-405d-b97e-5a2f4b338dd9","order_by":5,"name":"Estella Guerrero","email":"","orcid":"","institution":"Nova Southeastern University","correspondingAuthor":false,"prefix":"","firstName":"Estella","middleName":"","lastName":"Guerrero","suffix":""},{"id":293207574,"identity":"f15b02f2-b436-4b52-8faa-2e1b05e55952","order_by":6,"name":"Nina Bernick","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Nina","middleName":"","lastName":"Bernick","suffix":""},{"id":293207575,"identity":"7eb16501-7cad-4a91-80c8-9420cc1a14dd","order_by":7,"name":"Xochitl Butcher","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Xochitl","middleName":"","lastName":"Butcher","suffix":""},{"id":293207576,"identity":"93355fbd-7ba1-4154-a4c7-f31ca69d4f2a","order_by":8,"name":"Kirsty Ewing","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Kirsty","middleName":"","lastName":"Ewing","suffix":""},{"id":293207577,"identity":"ec1c40de-f6ff-4775-a275-ff9a7ba1b2ad","order_by":9,"name":"Elizabeth Fahsbender","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Elizabeth","middleName":"","lastName":"Fahsbender","suffix":""},{"id":293207578,"identity":"55338029-174e-4170-92fc-17dd55181626","order_by":10,"name":"Olivia Holmes","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Olivia","middleName":"","lastName":"Holmes","suffix":""},{"id":293207579,"identity":"f14261e1-b34c-4725-a4fb-d3fe039013c3","order_by":11,"name":"Erin Hoops","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Erin","middleName":"","lastName":"Hoops","suffix":""},{"id":293207580,"identity":"b03a8ad8-fdc8-452f-bd69-914471064005","order_by":12,"name":"Ann Jones","email":"","orcid":"https://orcid.org/0000-0003-1148-7275","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Ann","middleName":"","lastName":"Jones","suffix":""},{"id":293207581,"identity":"e5450800-5802-4491-b439-1960689c21b1","order_by":13,"name":"Ryan Lim","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Ryan","middleName":"","lastName":"Lim","suffix":""},{"id":293207582,"identity":"88950c29-76c4-4f52-8c34-0f57d89e24c6","order_by":14,"name":"Suzette McCanny","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Suzette","middleName":"","lastName":"McCanny","suffix":""},{"id":293207583,"identity":"b03f0d44-cfac-4ca6-9eb6-f06049a5be0c","order_by":15,"name":"Lucia Reynoso","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Lucia","middleName":"","lastName":"Reynoso","suffix":""},{"id":293207584,"identity":"d3f97031-ba12-4b28-8159-75a83a36ef4a","order_by":16,"name":"Karyna Rosario","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Karyna","middleName":"","lastName":"Rosario","suffix":""},{"id":293207585,"identity":"6e5bdbd2-9301-4775-a30f-bb4ca39189f6","order_by":17,"name":"Jennifer Tang","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Jennifer","middleName":"","lastName":"Tang","suffix":""},{"id":293207586,"identity":"a152d4a2-4849-4303-b404-c67de319ea33","order_by":18,"name":"Omar Valenzuela","email":"","orcid":"","institution":"Chan Zuckerberg Initiative","correspondingAuthor":false,"prefix":"","firstName":"Omar","middleName":"","lastName":"Valenzuela","suffix":""},{"id":293207587,"identity":"ac62e050-cbf4-4e00-8694-0e5dc15b3cc5","order_by":19,"name":"Peter Mourani","email":"","orcid":"https://orcid.org/0000-0002-1829-3775","institution":"Arkansas Children's","correspondingAuthor":false,"prefix":"","firstName":"Peter","middleName":"","lastName":"Mourani","suffix":""},{"id":293207588,"identity":"50f297d1-8a9c-413c-acff-3d8f9605493e","order_by":20,"name":"Amy Pickering","email":"","orcid":"","institution":"UC Berkeley","correspondingAuthor":false,"prefix":"","firstName":"Amy","middleName":"","lastName":"Pickering","suffix":""},{"id":293207589,"identity":"2377aed0-e648-4afc-be94-b58d60effd20","order_by":21,"name":"Amogelang Raphenya","email":"","orcid":"","institution":"McMaster University","correspondingAuthor":false,"prefix":"","firstName":"Amogelang","middleName":"","lastName":"Raphenya","suffix":""},{"id":293207590,"identity":"5ca6f30e-a3d4-43aa-8132-95d678c864f3","order_by":22,"name":"Brian Alcock","email":"","orcid":"","institution":"McMaster University","correspondingAuthor":false,"prefix":"","firstName":"Brian","middleName":"","lastName":"Alcock","suffix":""},{"id":293207591,"identity":"5c2c4ea6-9870-4c34-a29b-9c0054c63023","order_by":23,"name":"Andrew McArthur","email":"","orcid":"https://orcid.org/0000-0002-1142-3063","institution":"McMaster University","correspondingAuthor":false,"prefix":"","firstName":"Andrew","middleName":"","lastName":"McArthur","suffix":""}],"badges":[],"createdAt":"2024-04-15 18:17:33","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4271356/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4271356/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":55768490,"identity":"066e090f-1c0d-4ef4-8184-a856f18c3d60","added_by":"auto","created_at":"2024-05-02 20:30:59","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":215851,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHigh-level flow diagram highlighting the integrated AMR and mNGS modules within the CZ ID pipeline.\u003c/strong\u003e A more detailed diagram is provided in Figure S1.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/ee1c75399d866454d5c0e6de.png"},{"id":55768770,"identity":"6e4a3129-abe2-44bd-a8a5-43a386fcd031","added_by":"auto","created_at":"2024-05-02 20:38:59","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":566915,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExamples of CZ ID web tool sample reports. \u003c/strong\u003e(A) The report in the AMR module with a filter of Number of Reads \u0026gt;= 5 and Reads/Contig % coverage \u0026gt;= 10% applied to the AMR genes. (B) The report in the mNGS module showing the list of detected species and the coverage visualization for one species. Details about report metrics are discussed in the main text and CZ ID help center \u003cu\u003ehttps://help.czid.org/\u003c/u\u003e.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/94568ef99bbe31154bdfd571.jpg"},{"id":55767812,"identity":"a1e18c18-5ce5-45ef-8125-02c6e28f4156","added_by":"auto","created_at":"2024-05-02 20:22:59","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":342969,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCombining pathogen detection and AMR gene profiling of mNGS and WGS data to investigate \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eKlebsiella pneumoniae \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003etransfusion-related sepsis. (A)\u003c/strong\u003e Abundance and genome coverage of \u003cem\u003eKlebsiella pneumoniae \u003c/em\u003efrom direct mNGS of plasma or serum samples versus WGS of cultured bacterial isolates. \u003cstrong\u003e(B)\u003c/strong\u003e AMR genes detected in each sample. *denotes AMR gene(s) for which resistance originates due to point mutations (as opposed to presence/absence of the gene); these were detected by the “protein variant model” in CARD and the gene name shown is a representative reference gene containing the mutations known to lead to resistance. Legend: NT rPM = reads mapping to pathogen in the NCBI NT database per million reads sequenced. Contig = contiguous sequence. Strict/Perfect/Nudged refers to RGI’s alignment stringency threshold. “pt1” = patient 1, “pt2” = patient 2. “pre” = pre-transfusion, “post” = post-transfusion.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/b535f50beadc40ee2dfd4a25.jpg"},{"id":55767809,"identity":"3ed9e75e-5ec6-4609-b2dc-c34469ca3aae","added_by":"auto","created_at":"2024-05-02 20:22:59","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1349439,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOutbreak investigation pairing WGS of methicillin susceptible \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eStaphylococcus aureus\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e isolates and mNGS of surveillance skin swabs from babies in a neonatal intensive care unit.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(A) \u003c/strong\u003eUnsupervised clustering of AMR gene profiles from WGS data reveals a cluster of related isolates indicated by the dashed-line box. \u003cstrong\u003e(B)\u003c/strong\u003e Matrix of single nucleotide polymorphism (SNP) distances between each sequenced isolate confirms the genetic relatedness of this cluster, which is highlighted by a dashed-line box.\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/7ac96ec8a739c7f32edacd7b.jpg"},{"id":55768488,"identity":"00419f89-c147-4324-bf54-7b0297e82330","added_by":"auto","created_at":"2024-05-02 20:30:59","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":926347,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eBacterial genera and AMR genes detected by mNGS of skin swabs from babies in a neonatal intensive care unit. (A) \u003c/strong\u003emNGS of swab samples demonstrated a diversity of genera in both samples from patients within an outbreak cluster of genetically related \u003cem\u003eS. aureus\u003c/em\u003e, as well as in those from patients outside of the cluster. \u003cstrong\u003e(B)\u003c/strong\u003e mNGS analysis revealed a greater number and type of AMR gene families versus those identified by WGS of \u003cem\u003eS. aureus\u003c/em\u003e isolated in culture from the swabs. Selected AMR gene families of high public health concern are highlighted in red with the specific genes detected in parenthesis.\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/8ec019ec015424554dd745da.jpg"},{"id":55768492,"identity":"a6a312bf-3e7d-4967-8e3a-44a486024f8d","added_by":"auto","created_at":"2024-05-02 20:30:59","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":364912,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCo-detection of microbes and AMR genes in patients with critical bacterial infections using the CZ ID mNGS and AMR modules.\u003c/strong\u003e \u003cstrong\u003e(A)\u003c/strong\u003e Relative abundance (reads per million, rpM) of the eight most abundant taxa in the lower respiratory tract detected by RNA mNGS of tracheal aspirate from a patient with \u003cem\u003eSerratia marcescens \u003c/em\u003epneumonia. The dominant microbe is highlighted in blue.\u003cstrong\u003e (B)\u003c/strong\u003e AMR genes and their species prediction by the AMR module. Columns indicate the species these AMR genes and their variants are found in according to CARD Resistomes \u0026amp; Variants database, and those found in the dominant species as in (A) are colored in blue. AMR genes that are further associated with the dominant species by the pathogen-of-origin analysis are colored in purple. \u003cstrong\u003e(C) \u003c/strong\u003eRelative abundance (rpM) of the eight most abundant taxa detected by plasma DNA mNGS in a patient with sepsis due to \u003cem\u003eMRSA \u003c/em\u003ebloodstream infection. The dominant microbe is highlighted in blue. \u003cstrong\u003e(D)\u003c/strong\u003e AMR genes and their species prediction by the AMR module. Columns indicate the species these AMR genes and their variants are found in according to CARD Resistomes \u0026amp; Variants database, and those found in the dominant species as in (C) are colored in blue. AMR genes that are further associated with the dominant species by the pathogen-of-origin analysis are colored in purple.\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/857299d3560b2042b5f8b690.jpg"},{"id":55767817,"identity":"ef7f9d0e-d2c3-4437-a800-532a79f6db47","added_by":"auto","created_at":"2024-05-02 20:22:59","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":381267,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eLongitudinal profiling of pathogen and AMR gene abundance in a patient hospitalized for severe Respiratory Syncytial Virus (RSV) infection who developed \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003ePseudomonas aeruginosa\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e Ventilator Associated Pneumonia (VAP)\u003c/strong\u003e. \u003cstrong\u003e(A)\u003c/strong\u003e Relative abundance in reads per million (rpM) of RSV and \u003cem\u003eP. aeruginosa \u003c/em\u003edetected by the CZ ID mNGS pipeline. \u003cstrong\u003e(B) \u003c/strong\u003eAMR genes detected in the lower respiratory tract microbiome at each time point. Perfect or strict AMR alignments from contigs are highlighted in yellow, while those nudged are orange. Short read alignments are in red. AMR genes mapping to\u003cem\u003e Pseudomonas aeruginosa\u003c/em\u003e or any \u003cem\u003ePseudomonas \u003c/em\u003especies are highlighted in blue and purple, respectively. *Sample from Day 12 did not have enough sequencing reads but was plotted to maintain even scaling on the x-axis.\u003c/p\u003e","description":"","filename":"7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/ea9789c4e6148d6a28829266.jpg"},{"id":55768771,"identity":"d540279f-a007-4083-bd80-bc7632f8d712","added_by":"auto","created_at":"2024-05-02 20:38:59","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":552122,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAMR surveillance from environmental water samples. \u003c/strong\u003eAMR gene families identified from global surveillance of surface or wastewater samples from Boston, USA and Vellore, India. AMR genes found by contigs that passed Perfect or Strict cutoff are included in heatmap. Gene families of high public health concern are highlighted in red.\u003c/p\u003e","description":"","filename":"8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/6ec77b8029632e516b8cd669.jpg"},{"id":55769790,"identity":"6fbe3e5e-381e-419d-b1fb-53e57c73085e","added_by":"auto","created_at":"2024-05-02 20:46:59","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2064329,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/fcc8226e-2cbd-419f-a0f4-95db70128c28.pdf"},{"id":55767813,"identity":"acf27130-b01c-4353-b8de-d78ac22c1d9b","added_by":"auto","created_at":"2024-05-02 20:22:59","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":735113,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Materials\u003c/p\u003e","description":"","filename":"SuppCZIDAMRmodule041424.docx","url":"https://assets-eu.researchsquare.com/files/rs-4271356/v1/0409aeda4862785e0097dfa4.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAntimicrobial resistance (AMR) is responsible for an estimated 1.27\u0026nbsp;million global deaths annually\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e, and is on track to cause 10\u0026nbsp;million deaths a year by 2050, becoming a leading cause of global mortality\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. Furthermore, the World Health Organization has declared AMR to be one of the top ten global public health threats facing humanity\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eA critical step in combating AMR is the development and implementation of new methods and analysis tools for genomic detection and surveillance of AMR microbes with high resolution and throughput\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Whole genome sequencing (WGS) of cultured bacterial isolates and direct metagenomic next-generation sequencing (mNGS) of biological and environmental samples have emerged at the forefront of technological advances for AMR surveillance\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Several tools and databases have been developed over the past decade to enable the detection of AMR genes from both WGS and mNGS data. These include ResFinder\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, the Comprehensive Antibiotic Resistance Database (CARD)\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e, ARG-ANNOT\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, SRST2\u003csup\u003e10\u003c/sup\u003e, AMRFinderPlus, the Reference Gene Catalog by NCBI\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, and others.\u003c/p\u003e \u003cp\u003eEffective surveillance for resistant pathogens requires not only detecting AMR genes, but also detecting their associated microbes. Despite this, each task has traditionally been approached separately in bioinformatics pipelines, with few available tools enabling simultaneous evaluation of both. The CZ ID mNGS pipeline, for instance, was developed in 2017 to democratize access to metagenomic data analysis through a free, no-code, cloud-based workflow, but has had limited AMR assessment capabilities\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eRealizing the unmet need for, and potential impact of, a single bioinformatics tool integrating the detection of both AMR genes and microbes, we sought to add AMR analysis capabilities to the open-access CZ ID mNGS pipeline. Here we report the development of a new AMR module within the CZ ID web platform, which leverages CARD to support openly-accessible AMR detection and analysis. We demonstrate its utility across both WGS and mNGS data, and in clinical and environmental samples, and demonstrate the value of enriching AMR findings through simultaneous unbiased profiling of microbes.\u003c/p\u003e"},{"header":"Implementation","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n\u003ch2\u003eAMR gene and variant detection using the CZ ID AMR module\u003c/h2\u003e\n\u003cp\u003eThe AMR module is incorporated into the CZ ID web application (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://czid.org\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e12\u003c/sup\u003e and allows researchers to upload FASTQ files from both mNGS and WGS short-read data. Once uploaded, the module automatically processes samples in the cloud using Amazon Web Services infrastructure, eliminating the need for users to download and install software or maintain high-performance computing resources. The web-based application makes analysis of AMR datasets accessible even to researchers with limited bioinformatics or computational expertise.\u003c/p\u003e\n\u003cp\u003eUnderlying the AMR module is CARD (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://card.mcmaster.ca\u003c/span\u003e\u003c/span\u003e), a comprehensive, continually curated, database of AMR genes and their variants, linked to gene family, resistance mechanism, and drug class information\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. The AMR module specifically leverages the CARD Resistance Gene Identifier (RGI) tool (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/arpcard/rgi\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e7,13\u003c/sup\u003e to match short reads or contigs to AMR gene reference sequences in the CARD database, returning metrics such as gene coverage and percent identity. CARD also contains a Resistomes \u0026amp; Variants database of \u003cem\u003ein silico\u003c/em\u003e predictions of allelic variants and AMR gene homologs in pathogens of public health significance. This database provides information linking AMR genes to specific species, and can be used for k-mer-based pathogen-of-origin prediction, a beta feature implemented in RGI\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThe CZ ID AMR module automates the running of a containerized WDL workflow that strings together multiple steps and informatics tools to enable efficient data processing and accurate resistome profiling. The workflow shares the same preprocessing steps as the existing CZ ID mNGS module. Briefly, it accepts raw FASTQ files from short-read mNGS or WGS samples as input (DNA or RNA) (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eFig. \u003cspan class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/strong\u003e). Low quality and low complexity reads are first removed with fastp\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e, host reads are removed with Bowtie2\u003csup\u003e15\u003c/sup\u003e and HISAT2\u003csup\u003e16\u003c/sup\u003e, and then duplicate reads are filtered out using CZID-dedup (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/chanzuckerberg/czid-dedup\u003c/span\u003e\u003c/span\u003e). The resulting quality- and host-filtered reads are subsampled to 1\u0026nbsp;million single-end reads or 2\u0026nbsp;million paired-end reads to limit the resources required for compute-intensive downstream alignment steps. In the AMR workflow, to accommodate targeted mNGS protocols designed to amplify many copies of low abundance AMR genes, duplicate reads are then added back prior to further processing.\u003c/p\u003e\n\u003cp\u003eThere are two parallel approaches for AMR gene detection (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eFig. \u003cspan class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/strong\u003e). In the \u0026lsquo;contig\u0026rsquo; approach, the short reads are assembled into contiguous sequences (contigs) using SPADES\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e, and the contigs are subsequently sent to RGI (main) for AMR gene detection based on sequence similarity and mutation mapping. In the \u0026lsquo;read\u0026rsquo; approach, the short reads are directly sent to RGI (bwt) for read mapping by KMA\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e to CARD reference sequences (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e). In both approaches, the assembled contigs or reads containing AMR genes are also sent to RGI (kmer_query) for pathogen-of-origin detection.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch2\u003eAMR module result output\u003c/h2\u003e\n\u003cp\u003eThe AMR module displays results in an interactive table, facilitating viewing, sorting, and filtering. The table is organized in three collapsible vertical sections: 1) general Information, 2) contigs, and 3) reads (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eA). The general information section includes \u0026ldquo;Gene\u0026rdquo; and \u0026ldquo;Gene Family\u0026rdquo; as well as information on the antibiotic(s) against which the gene confers resistance (\u0026ldquo;Drug Class\u0026rdquo; and \u0026ldquo;High-level Drug Class\u0026rdquo;), resistance mechanism (\u0026ldquo;Mechanism\u0026rdquo;), and model used to identify resistance (\u0026ldquo;Model\u0026rdquo;). With respect to the latter, several models are used to identify resistance such as the CARD \u003cem\u003eprotein homolog model\u003c/em\u003e which identifies the presence of AMR genes, and the \u003cem\u003eprotein variant model\u003c/em\u003e which identifies specific mutations that confer resistance. Clicking on the AMR gene name will reveal a description and web hyperlinks to CARD, NCBI and PubMed entries.\u003c/p\u003e\n\u003cp\u003eThe \u0026ldquo;Contigs\u0026rdquo; section includes the number of contigs that map to each AMR gene (\u0026ldquo;Contigs\u0026rdquo;), cutoff based on BLAST bit-score (\u0026ldquo;Cutoff\u0026rdquo;), percentage of the AMR gene covered by all contigs (\u0026ldquo;%Cov\u0026rdquo;), percent identity of the covered region (\u0026ldquo;%Id\u0026rdquo;), and pathogen-of-origin prediction based on contigs (\u0026ldquo;Contig Species\u0026rdquo;). The \u0026ldquo;Reads\u0026rdquo; section includes metrics corresponding to the number of reads mapping to the AMR gene (\u0026ldquo;Reads\u0026rdquo;), relative abundance of the AMR gene in reads per million reads sequenced (\u0026ldquo;rpM\u0026rdquo;), percentage of AMR gene covered by sequencing reads (\u0026ldquo;%Cov\u0026rdquo;), average depth of reads aligned across the gene (\u0026ldquo;Cov. Depth\u0026rdquo;), average depth of reads aligned across the gene per million reads sequenced (\u0026ldquo;dpM\u0026rdquo;), and a pathogen-of-origin prediction based on reads (\u0026ldquo;Read Species\u0026rdquo;). All columns can be sorted and numerical metrics can be further filtered using user specified thresholds.\u003c/p\u003e\n\u003cp\u003eResults files at each stage of the pipeline can be downloaded for inspection or additional downstream analysis. These files include quality- and host-filtered reads, assembled contigs, AMR annotations and corresponding metrics in tabular format, and all output files from CARD RGI. The contigs as well as short reads mapped to each AMR gene can also be downloaded.\u003c/p\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n\u003ch2\u003eQuality filtering for AMR gene predictions\u003c/h2\u003e\n\u003cp\u003eOne challenge with mNGS-based AMR surveillance is interpretation of results. The CZ ID AMR module provides key quantitative metrics including rpM, percent coverage of the AMR gene, and dpM to enable assessments of relative abundance and the confidence of AMR gene assignments. Additionally, for AMR detection using contigs, the \u0026ldquo;Cutoff\u0026rdquo; column which reports RGI\u0026rsquo;s stringency thresholds based on CARD\u0026rsquo;s curated bit-score cut-offs can provide valuable insight into AMR gene alignment confidence. Here, \u0026ldquo;Perfect\u0026rdquo; indicates perfect or identical matches to the curated reference sequences and mutations in CARD while \u0026ldquo;Strict\u0026rdquo; indicates matches to variants of known AMR genes, including a secondary screen for key mutations. Finally, the terminology \u0026ldquo;Nudged'' is adopted by the CZ ID module to indicate more distant homologs (matched via RGI\u0026rsquo;s \u0026ldquo;Loose\u0026rdquo; paradigm) with at least 95% identity to known AMR genes, which is ideal for discovery but is more likely to return false-positive hits. Given that a consensus approach has yet to be developed for quantifying and interpreting AMR genes from mNGS and WGS data, the CZ ID AMR module provides comprehensive information that can be subsequently filtered or otherwise optimized based on the goals of a given analysis.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n\u003ch2\u003eMicrobial profiling using the CZ ID mNGS module\u003c/h2\u003e\n\u003cp\u003eThe CZ ID mNGS module, which has undergone several updates since first described\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, preprocesses the uploaded reads and then proceeds to assembly-based alignment to produce taxonomic relative abundance profiles for each sample. Briefly, the non-host reads output by the quality- and host-filtering steps (as described above) are aligned to the NCBI nucleotide (NT) and protein (NR) databases using minimap2\u003csup\u003e19\u003c/sup\u003e and DIAMOND\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, respectively, to identify putative short-read alignments (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eFig. \u003cspan class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/strong\u003e). Then, reads are assembled into contigs using SPADES\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e and contigs are re-aligned to the set of putative accessions using BLAST\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e to improve specificity. Finally, alignments are used to identify taxons of origin, which are tallied into relative abundance estimates\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. The web interface provides various reports with metrics including reads per million (\u0026ldquo;rpM\u0026rdquo;), number of reads (\u0026ldquo;r\u0026rdquo;), number of contigs (\u0026ldquo;contig\u0026rdquo;), number of reads in the contigs (\u0026ldquo;contig r\u0026rdquo;), percent identity (\u0026ldquo;%id\u0026rdquo;), and average length of alignment (\u0026ldquo;L\u0026rdquo;), alongside visualizations and download options to support the analysis and exploration of results (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n\u003ch2\u003eConnecting the pathogens and AMR genes\u003c/h2\u003e\n\u003cp\u003eThe CZ ID platform enables simultaneous data analysis of microbe and AMR genes from a single data upload via the mNGS and AMR modules. This provides complementary, but distinct, microbial and AMR gene profiles from a given sample or dataset. The mNGS module does not provide any direct link between species calls and AMR genes from the AMR module, although in cases where a single bacterial pathogen comprises the majority of reads in a metagenomic sample, this may be inferred.\u003c/p\u003e\n\u003cp\u003eConversely, the AMR module provides two ways to help connect AMR genes to their source microbes. First, each AMR gene returned in the report table is hyperlinked to its corresponding CARD webpage, where the Resistomes section reports all species in which the gene and its variants have been identified as predicted by RGI. Secondly, the AMR module returns results from a pathogen-of-origin analysis conducted by RGI\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, which maps k-mers derived from reads or contigs containing the AMR gene of interest against AMR alleles in CARD Resistomes \u0026amp; Variants database. This second approach is particularly useful for identifying the source species in cases when the first CARD Resistomes section lists multiple species or genera. However, because only AMR gene sequences present in CARD are considered in the pathogen-of-origin analysis, as opposed to species identification using complete reference genome sequences in the mNGS module, species predictions from AMR module are best interpreted in the context of all outputs from the CZ ID AMR and mNGS modules.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n\u003ch2\u003eSharing results for collaboration\u003c/h2\u003e\n\u003cp\u003eProjects on CZ ID can be shared with specific users or made public to all users. Everyone with access to the project can view or download the results, and perform data filtering or other analyses. All data and results for this paper can be accessed by searching for a project named \u0026ldquo;AMR example applications\u0026rdquo; among public projects at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps:///czid.org\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eApplication 1: Identification of AMR genes from WGS and mNGS data.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo demonstrate the CZ ID AMR module\u0026rsquo;s utility for detecting bacterial pathogens and their AMR genes in both WGS and mNGS data, we leveraged data from a recent investigation of transfusion-related sepsis\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. In this study, two immunocompromised patients received platelet units originating from a single donor. Both developed septic shock within hours after the transfusion, with blood cultures from Patient 1, who did not survive, returning positive for \u003cem\u003eKlebsiella pneumoniae.\u003c/em\u003e Patient 2, who was receiving prophylactic antibiotic therapy at the time of the transfusion, survived, but had negative blood cultures. Direct mNGS of post-transfusion blood samples from both patients revealed a large increase in reads mapping to \u003cem\u003eKlebsiella pneumoniae\u003c/em\u003e, a pathogen which was later also identified from culture of residual material from the transfused platelet bag (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eA)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. While blood mNGS data yielded less coverage of the \u003cem\u003eK. pneumoniae\u003c/em\u003e genome compared to WGS of the cultured isolates, mNGS of patient 1\u0026rsquo;s post-transfusion plasma sample recovered all the AMR genes found by WGS of cultured isolates (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eB). Even in patient 2, whose blood sample had fewer reads mapping to \u003cem\u003eK. pneumoniae\u003c/em\u003e, most AMR genes found in the cultured isolates were still able to be identified using the RGI \u0026ldquo;Nudged\u0026rdquo; threshold.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eApplication 2: Comprehensive metagenomic and WGS profiling of pathogens and AMR genes in the setting of a hospital outbreak.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo demonstrate how the CZ ID AMR module can facilitate deeper insights into pathogen and AMR transmission in hospitals, we evaluated WGS and mNGS data from surveillance skin swabs collected from 40 babies in a neonatal intensive care unit (NICU). The swabs were collected to evaluate for suspected transmission of methicillin-susceptible \u003cem\u003eStaphylococcus aureus\u003c/em\u003e (MSSA) between patients. WGS of the MSSA isolates followed by implementation of the AMR module demonstrated many shared AMR genes, and revealed a cluster of nine samples with identical AMR profiles (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003eA). Subsequent phylogenetic assessment using split k-mer analysis with SKA2\u003csup\u003e23\u003c/sup\u003e, revealed that samples within this cluster differed by less than 11 single nucleotide polymorphisms (SNP) across their genomes, consistent with an outbreak involving \u003cem\u003eS. aureus\u003c/em\u003e transmission between patients (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003eB).\u003c/p\u003e\n\u003cp\u003eWithin this cluster of patients, we considered whether other bacterial species in the microbiome were also being exchanged in addition to the \u003cem\u003eS. aureus\u003c/em\u003e. Intriguingly, mNGS analysis of the direct swab samples from which the \u003cem\u003eS. aureus\u003c/em\u003e isolates were selectively cultured revealed a diversity of bacterial taxa, many of which were more abundant than \u003cem\u003eS. aureus\u003c/em\u003e. These included several established healthcare-associated pathogens that were never identified using the selective culture-based approach, such as \u003cem\u003eEnterobacter\u003c/em\u003e, \u003cem\u003eCitrobacter\u003c/em\u003e, \u003cem\u003eKlebsiella\u003c/em\u003e and \u003cem\u003eEnterococcus\u003c/em\u003e species. mNGS also demonstrated that each sample had a distinct microbial community composition even among samples from the cluster, indicating that only \u003cem\u003eS. aureus\u003c/em\u003e and potentially a subset of other species were actually exchanged between babies, rather than the entire skin microbiome (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e\n\u003cp\u003eFurther analysis of mNGS data using the AMR module also revealed a diversity of AMR genes conferring resistance to several drug classes, and commonly associated with nosocomial pathogens. These included genes encoding ampC-type inducible beta-lactamases (e.g., \u003cem\u003eCKO, CMY, SS\u003c/em\u003eT), extended spectrum beta-lactamases (e.g., \u003cem\u003eSHV\u003c/em\u003e), and the recently emerged \u003cem\u003eMCR\u003c/em\u003e class of AMR genes, which confer plasmid-transmissible colistin resistance\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThe AMR gene profiles varied greatly across the samples, both within the cluster and outside of the cluster, consistent with the observed taxonomic diversity (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003eB). Together, these results revealed both inter-patient MSSA transmission in the NICU, and the acquisition of AMR genes associated with nosocomial pathogens within the first months of life.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eApplication 3: Correlating pathogen identification with AMR gene detection.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNext, we aimed to integrate results from the CZ ID mNGS and AMR modules by analyzing mNGS data from critically ill patients with bacterial infections. In Patient 350\u003csup\u003e25\u003c/sup\u003e, who was hospitalized for \u003cem\u003eSerratia marcescens\u003c/em\u003e pneumonia, metagenomic RNA sequencing (RNA-seq) of a lower respiratory tract sample identified \u003cem\u003eSerratia marcescens\u003c/em\u003e as the single most dominant species within the lung microbiome (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eA)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Among the detected AMR genes, based on the Resistomes \u0026amp; Variants information from CARD, \u003cem\u003eSRT-2\u003c/em\u003e and \u003cem\u003eSST-1\u003c/em\u003e are found exclusively in \u003cem\u003eSerratia marcescens\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eB in blue). Further analysis by the pathogen-of-origin feature in the AMR module matched the k-mers from reads and contigs containing \u003cem\u003ersmA, AAC(6\u0026rsquo;)-Ic\u003c/em\u003e, and \u003cem\u003eCRP\u003c/em\u003e to \u003cem\u003eSerratia marcescens\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eB in purple).\u003c/p\u003e\n\u003cp\u003eIn Patient 11827\u003csup\u003e26\u003c/sup\u003e, who was hospitalized for sepsis due to a methicillin-resistant \u003cem\u003eStaphylococcus aureus\u003c/em\u003e (MRSA) blood stream infection, analysis of plasma mNGS data demonstrated that \u003cem\u003eStaphylococcus aureus\u003c/em\u003e was the dominant species present in the blood sample (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eC)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. Among the detected AMR genes, based on Resistome \u0026amp; Variants information from CARD, \u003cem\u003eStaphylococcus aureus norA, Staphylococcus aureus LmrS, arlS, mepA, tet(38), mecR1, mecA\u003c/em\u003e are found exclusively in staph species (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eD in blue). Pathogen-of-origin analysis further matched k-mers from the reads containing \u003cem\u003esdrM\u003c/em\u003e to \u003cem\u003eS. aureus\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eD in purple).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eApplication 4: Profiling the longitudinal dynamics of pathogens and AMR genes.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo demonstrate the utility of the CZID mNGS and AMR modules for studying the longitudinal dynamics of infection, we analyzed serially-collected lower respiratory RNA-seq data from a critically ill patient with respiratory syncytial virus (RSV) infection who subsequently developed ventilator-associated pneumonia (VAP) due to \u003cem\u003ePseudomonas aeruginosa\u003c/em\u003e\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Analysis of microbial mNGS data using the CZ ID pipeline highlighted the temporal dynamics of RSV abundance, which decreased over time. Following viral clearance, we noted an increase in reads mapping to \u003cem\u003eP. aeruginosa\u003c/em\u003e on day 9, correlating with a subsequent clinical diagnosis of VAP and bacterial culture positivity (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003eA)\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Analysis using the CZ ID AMR module demonstrated that \u003cem\u003eP. aeruginosa\u003c/em\u003e-associated AMR genes were also detected, and their prevalence tracked with the relative abundance of the nosocomial bacterial pathogen (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003eB).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eApplication 5: AMR gene detection from environmental surveillance samples.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLastly, to highlight the application of the CZ ID AMR module for environmental surveillance of AMR pathogens, we analyzed publicly-available short-read mNGS data from a wastewater surveillance study comparing Boston, USA to Vellore, India\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. In this study, municipal wastewater, hospital wastewater, and surface water samples were collected from each city and underwent DNA mNGS. From AMR gene alignments at the contig level, we observed a total 22 AMR gene families in Boston samples versus 30 from Vellore (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e). Several AMR genes of high public health concern such as the \u003cem\u003eKPC\u003c/em\u003e and \u003cem\u003eNDM\u003c/em\u003e plasmid-transmissible carbapenemase genes were only present in hospital effluent, reflecting the fact that hospitals frequently serve as reservoirs of AMR pathogens\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e .\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eMetagenomics has emerged as a powerful tool for studying and tracking AMR pathogens in a range of research and public health contexts. Both surveillance and research applications of mNGS benefit from simultaneous assessment of AMR genes and their associated microbes, yet traditionally separate bioinformatics workflows and resource-intense computational infrastructure have been required for each. Here, we address these challenges with the CZ ID AMR module, a fast and openly accessible platform for combined analysis of AMR genes and microbial genomes that couples the expansive database and advanced RGI software of CARD with the unbiased microbial detection capacity of CZ ID. We demonstrate the AMR module\u0026rsquo;s diverse applications from infectious disease research to environmental monitoring through a series of case studies leveraging four observational patient cohorts and a wastewater surveillance study.\u003c/p\u003e\n\u003cp\u003eThe CZ ID AMR module is designed to enable rapid and accessible data processing without a need for coding expertise, and return a comprehensive set of AMR gene alignment metrics to aid in data interpretation. Researchers can then apply stringency threshold filters to maximize sensitivity or specificity depending on the use case. For instance, when seeking to detect established AMR genes from data types with high coverage of microbial genomes (e.g., WGS data of cultured isolates), \u0026ldquo;Perfect\u0026rdquo; or \u0026ldquo;Strict\u0026rdquo; stringency thresholds maximize the accuracy of assignments. In contrast, from mNGS data with sparse microbial genome coverage (e.g., from blood or wastewater), using \u0026ldquo;Nudged\u0026rdquo; to increase sensitivity of mapping reads at the expense of specificity may be the only way to detect biologically important AMR genes. The \u0026ldquo;Nudged\u0026rdquo; threshold also enables more alignment permissiveness to sequence variations, which can be helpful for detecting novel alleles. The CZ ID AMR module provides various metrics to support optimization of cutoffs based on specific sample types and applications by the users.\u003c/p\u003e\n\u003cp\u003eDepending on the number of reads, breadth of coverage, and whether reads originate from conserved versus variable gene regions, the confidence of AMR gene assignment can vary. Generally, the confidence of contig-based AMR gene assignments is greater than read-based AMR gene matches due to the increased length of assembled fragments. When it comes to AMR gene alleles with high sequence similarity, such as those from within the same gene family, the AMR module can only distinguish between them if sufficient gene coverage is achieved. In most of our analyses, if genes within the same family were identified at both the individual read and contig level, we preferentially evaluated the contig annotation to maximize allele specificity.\u003c/p\u003e\n\u003cp\u003eAs our understanding of AMR gene biology increases over time, annotations may change in the CARD reference database that underpins the CZ ID AMR gene module. This was evident, for instance, in the \u003cem\u003eKlebsiella\u003c/em\u003e transfusion-related sepsis case (Application 1, Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eB), where \u003cem\u003emdfA\u003c/em\u003e was annotated as conferring resistance to tetracycline antibiotics based on CARD version 3.2.6, used for our analysis. This will be updated as a multiple drug resistance gene\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e in the next CARD release. To mitigate database limitations and ensure traceability of results over time, CZ ID periodically updates the database versions and highlights the specific versions of the underlying databases used for each analysis.\u003c/p\u003e\n\u003cp\u003eCZ ID enables simultaneous detection of pathogens and AMR genes, and our results emphasize the importance of integrating taxonomic abundance from the CZ ID mNGS module with several data outputs within the AMR module. Each AMR gene is directly linked to its CARD webpage where the Resistomes section provides information on the species predicted to harbor the gene of interest and its variants. The pathogen-of-origin predictions, while still a beta feature, can further help identify the source species of detected AMR genes. These assignments are predictions based on matching AMR sequences in each sample to CARD Resistomes \u0026amp; Variants database, and should be interpreted in the context of the microbes found to exist in the sample from the CZ ID mNGS module output. Connecting AMR genes to their originating microbes thus necessitates integrating all available results from both the CZ ID AMR and mNGS modules.\u003c/p\u003e\n\u003cp\u003eIn sum, we describe the novel AMR analysis module within the CZ ID bioinformatics web platform designed to facilitate integrated analyses of AMR genes and microbes. This open-access, cloud-based pipeline permits studying AMR genes and microbes together across a broad range of applications, ranging from infectious diseases to environmental surveillance. By overcoming the significant computing infrastructure and technical expertise typically required for mNGS data processing, this tool aims to democratize the analysis of microbial genomes and metagenomes across humans, animals, and the environment.\u003c/p\u003e"},{"header":"Methods ","content":"\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\n\u003ch2\u003ePatient enrollment, sample collection and ethics\u003c/h2\u003e\n\u003cp\u003eSkin swabs and cultured isolates analyzed for Application 2 (hospital outbreak) were collected under the University of California San Francisco Institutional Review Board (IRB) protocol no. 17-24056, which granted a waiver of consent for their collection, as part of a larger ongoing surveillance study of patients with healthcare-associated infections.\u003c/p\u003e\n\u003cp\u003eSamples analyzed for Application 4 (longitudinal profiling) were collected from patients enrolled in a prospective cohort study of mechanically ventilated children admitted to eight intensive care units in the National Institute of Child Health and Human Development\u0026rsquo;s Collaborative Pediatric Critical Care Research Network (CPCCRN) from February 2015 to December 2017. The original cohort study was approved by the Collaborative Pediatric Critical Care Research IRB at the University of Utah (protocol no. 00088656). Details regarding enrollment and consent have previously been described \u003csup\u003e\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e,\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. Briefly, children aged 31 days to 18 years who were expected to require mechanical ventilation via endotracheal tube for at least 72 hours were enrolled. Parents or other legal guardians of eligible patients were approached for consent by study-trained staff as soon as possible after intubation. Waiver of consent was granted for TA samples to be obtained from standard-of-care suctioning of the endotracheal tube until the parents or guardians could be approached for informed consent.\u003c/p\u003e\n\u003cp\u003eFor all other applications and analyses, previously published datasets were used as described in the \u003cspan class=\"InternalRef\"\u003edata and code availability\u003c/span\u003e section.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n\u003ch2\u003eNucleic acid extraction and Illumina sequencing\u003c/h2\u003e\n\u003cp\u003eFor the skin swab samples and cultured isolates described in Application 2, DNA was extracted using the Zymo pathogen magbead kit (Zymo Research) according to manufacturer\u0026rsquo;s instructions. Sequencing libraries were then prepared from 20ng of input DNA using the NEBNext Ultra-II DNA kit (New England Biolabs) following manufacturer\u0026rsquo;s instructions\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. For the tracheal aspirate samples described in Application 4, RNA was extracted using the Qiagen Allprep kit (Qiagen) following manufacturer\u0026rsquo;s instructions. Sequencing libraries were prepared using the NEBNext Ultra-II RNA kit (New England Biolabs) according to a previously described protocol\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. Paired end 150 base pair illumina sequencing was performed on all samples using Illumina NextSeq 550 or NovaSeq 6000.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n\u003ch2\u003eAMR gene identification\u003c/h2\u003e\n\u003cp\u003eWe downloaded the tabular results from the AMR module and applied quality filters to ensure robust AMR gene identification. Specifically, for mNGS data, we required all AMR genes (from contig and read approaches) to have coverage breadth\u0026thinsp;\u0026gt;\u0026thinsp;10% and for read mappings we additionally required\u0026thinsp;\u0026gt;\u0026thinsp;5 reads mapping to the AMR gene. For WGS data, we required all AMR genes (from contig and read approaches), to have coverage breadth\u0026thinsp;\u0026gt;\u0026thinsp;50% and additionally required\u0026thinsp;\u0026gt;\u0026thinsp;5 reads mapping to the AMR gene for read results. Across all analyses, Nudged results were treated the same way as contig results. For studies with corresponding water controls, we applied the above filters to the water controls, and then removed AMR genes or gene families (depending on what was plotted) also found in water controls from experimental samples.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n\u003ch2\u003eAMR gene heatmaps\u003c/h2\u003e\n\u003cp\u003eAll plots were generated in R using Tidyverse\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e, patchwork\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e and ComplexHeatmap\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. While making the plots, we did an additional filtering to focus the analysis within the context of the use-case and limit the size of the plots for the paper. In particular, we included only CARD\u0026rsquo;s protein homolog and protein variation models (see \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/arpcard/rgi\u003c/span\u003e\u003c/span\u003e), and included only medically relevant antibiotics drug classes by removing disinfecting agents and antiseptics, antibacterial free fatty acids, and aminocoumarin, diaminopyrimidine, elfamycin, fusidane, phosphonic acid, nucleoside, and pleuromutilin antibiotics. In Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003eB and Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e, we also excluded efflux pumps to reduce plot size as efflux pumps tend to have ubiquitous functions in cellular processes.\u003c/p\u003e\n\u003cp\u003eThen, we applied a series of heuristics to make this structured data amenable to heatmap visualization. Given the nature of a heatmap visualization, each AMR annotation in each sample can have only one representing tile, so we plotted the result with the highest confidence. We considered AMR genes identified through the contig approach with Perfect or Strict cutoffs as higher confidence than those with the Nudged cutoff, which were then of higher confidence than AMR genes found by reads alone. Finally, given the challenges for gene attribution presented by homology between genes in the same gene family, we developed a systematic approach for collapsing the visualization to a single candidate per sample. For all figures except for Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e, if in the same sample one AMR gene was found by the read approach and a different AMR gene from the same gene family was found by the contig approach, the first AMR gene was omitted and only the second AMR gene was plotted. The rationale for this prioritization stems from the fact that sometimes short reads alone cannot sufficiently distinguish between highly similar alleles or genes from the same gene family. Contigs, which typically provide greater sequence length are often of higher confidence. This approach should be considered on a per gene or per gene family basis, due to variability in the extent of sequence similarity within genes and gene families, and also be modified for specific use cases. For example in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eB, even though \u003cem\u003emecR1\u003c/em\u003e and \u003cem\u003emecA\u003c/em\u003e are from the same gene family, they do not have highly similar sequences and we did not apply this step.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n\u003ch2\u003eSpecies identification\u003c/h2\u003e\n\u003cp\u003eFor results from the CZ ID mNGS module, filters were again applied to ensure high-quality results. Specifically, for Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e, which each focused on a single species, the NT rpM calculated by the mNGS module was used with no extra filtering. For Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eA, which focused on species composition, the species detected by the mNGS module were filtered with: NT rpM\u0026thinsp;\u0026gt;\u0026thinsp;10 and NR rpM\u0026thinsp;\u0026gt;\u0026thinsp;10 to implement a minimal abundance requirement for taxonomic identification, NT alignment length\u0026thinsp;\u0026gt;\u0026thinsp;50 to ensure alignment specificity and NT Z-score\u0026thinsp;\u0026gt;\u0026thinsp;2 using a background model calculated with the corresponding study-specific water samples to ensure significance of taxa above levels of possible background contamination. Finally, for Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003eB, which had low read coverage, abundance filters were omitted and only the significance filter of NT Z-score\u0026thinsp;\u0026gt;\u0026thinsp;2 was applied, using a background model calculated with the corresponding water samples.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\n\u003ch2\u003eSNP distance analysis\u003c/h2\u003e\n\u003cp\u003eHost-filtered reads were downloaded from the CZ ID mNGS module. SNP distance were calculated with SKA2 0.3.2\u003csup\u003e23\u003c/sup\u003e using ska build --min-count 4 --threads 4 --min-qual 20 -k 31 --qual-filter strict and ska distance --filter-ambiguous. The heatmap plot was generated with ComplexHeatmap\u003csup\u003e\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData and code availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll raw microbial sequencing data supporting the conclusions of this article are available via NCBI\u0026rsquo;s Sequence Read Archive under BioProjects PRJNA544865, PRJNA1086943, PRJNA450137 and PRJNA672704. For previously unpublished datasets, non-host FASTQ files generated by CZ ID mNGS module were submitted to SRA under NCBI Bioproject Accession: PRJNA1086943. We obtained raw FASTQ files from previous studies\u003csup\u003e22,25\u0026ndash;29\u003c/sup\u003e, either from the authors or public repositories, and uploaded them to the CZ ID pipeline (https://czid.org/) under an openly accessible manuscript-specific project called \u0026ldquo;AMR example applications\u0026rdquo; to be processed through both the AMR module and the mNGS module (the project can be accessed at https://czid.org/home?project_id=5929 after logging in). CZ ID workflow code can be found in https://github.com/chanzuckerberg/czid-workflows/. Additional code for data filtering and plotting can be found in https://github.com/chanzuckerberg/czid-amr-manuscript-2024. The following software versions were used for this manuscript: CZ ID mNGS workflow version 8.2.5, CZ ID AMR workflow version 1.4.2 based on CARD RGI version 6.0.3, CARD database versions 3.2.6 and the CARD Resistomes \u0026amp; Variants database: 4.0.0. SK2 version 0.3.2.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eChan Zuckerberg Initiative (DL, KK, NB, XB, KR, KE, EF, OH, EH, AEJ, RL, SM, LR, JT, OV). Chan Zuckerberg Biohub (CL, VC, AG, AJP). NIH/NHLBI 5R01HL155418 (CL, PMM) and 1R01HL124103 (PMM). Canadian Institutes of Health Research PJT-156214 and David Braley Chair in Computational Biology (ARP, BPA, AGM).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eKK and CL conceived of and designed the work. DL carried out data analysis with valuable inputs and guidance from KK, CL, VC and AG. ESG collected and sequenced all samples in Application 2. The CZ ID team (NB, XB, KR, KE, EF, OH, EH, AEJ, RL, SM, LR, JT, OV) built the AMR module. PMM collected and sequenced all samples in Application 4. AJP provided the data for Application 5. ARR, BPA, AGM provided expert input on the project. CL supervised the work. DL, KK and CL drafted the manuscript with inputs from all coauthors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe acknowledge the contributions of the whole CZI Infectious Disease development team: Robert Aboukhalil, Kami Bankston, Neha Chourasia, Jerry Fu, Julie Han, Francisco Loo, Todd Morse, Juan Caballero Perez, David Ruiz, Vincent Selhorst-Jones and Kevin Wang.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAntimicrobial Resistance Collaborators. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. \u003cem\u003eLancet\u003c/em\u003e \u003cstrong\u003e399\u003c/strong\u003e, 629\u0026ndash;655 (2022).\u003c/li\u003e\n\u003cli\u003eReview on Antimicrobial Resistance. \u003cem\u003eTackling Drug-Resistant Infections Globally: Final Report and Recommendations\u003c/em\u003e. (2016).\u003c/li\u003e\n\u003cli\u003eglobal health issues to track in 2021. https://www.who.int/news-room/spotlight/10-global-health-issues-to-track-in-2021.\u003c/li\u003e\n\u003cli\u003eBaker, K. S. \u003cem\u003eet al.\u003c/em\u003e Evidence review and recommendations for the implementation of genomics for antimicrobial resistance surveillance: reports from an international expert group. \u003cem\u003eLancet Microbe\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, e1035\u0026ndash;e1039 (2023).\u003c/li\u003e\n\u003cli\u003eAnjum, M. F., Zankari, E. \u0026amp; Hasman, H. Molecular Methods for Detection of Antimicrobial Resistance. \u003cem\u003eMicrobiol Spectr\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, (2017).\u003c/li\u003e\n\u003cli\u003eZankari, E. \u003cem\u003eet al.\u003c/em\u003e Identification of acquired antimicrobial resistance genes. \u003cem\u003eJ. Antimicrob. Chemother.\u003c/em\u003e \u003cstrong\u003e67\u003c/strong\u003e, 2640\u0026ndash;2644 (2012).\u003c/li\u003e\n\u003cli\u003eJia, B. \u003cem\u003eet al.\u003c/em\u003e CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, D566\u0026ndash;D573 (2017).\u003c/li\u003e\n\u003cli\u003eMcArthur, A. G. \u003cem\u003eet al.\u003c/em\u003e The comprehensive antibiotic resistance database. \u003cem\u003eAntimicrob. Agents Chemother.\u003c/em\u003e \u003cstrong\u003e57\u003c/strong\u003e, 3348\u0026ndash;3357 (2013).\u003c/li\u003e\n\u003cli\u003eGupta, S. K. \u003cem\u003eet al.\u003c/em\u003e ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. \u003cem\u003eAntimicrob. Agents Chemother.\u003c/em\u003e \u003cstrong\u003e58\u003c/strong\u003e, 212\u0026ndash;220 (2014).\u003c/li\u003e\n\u003cli\u003eInouye, M. \u003cem\u003eet al.\u003c/em\u003e SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. \u003cem\u003eGenome Med.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 90 (2014).\u003c/li\u003e\n\u003cli\u003eFeldgarden, M. \u003cem\u003eet al.\u003c/em\u003e AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 12728 (2021).\u003c/li\u003e\n\u003cli\u003eKalantar, K. L. \u003cem\u003eet al.\u003c/em\u003e IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring. \u003cem\u003eGigascience\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, (2020).\u003c/li\u003e\n\u003cli\u003eAlcock, B. P. \u003cem\u003eet al.\u003c/em\u003e CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, D517\u0026ndash;D525 (2020).\u003c/li\u003e\n\u003cli\u003eChen, S., Zhou, Y., Chen, Y. \u0026amp; Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, i884\u0026ndash;i890 (2018).\u003c/li\u003e\n\u003cli\u003eLangmead, B. \u0026amp; Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 357\u0026ndash;359 (2012).\u003c/li\u003e\n\u003cli\u003eKim, D., Paggi, J. M., Park, C., Bennett, C. \u0026amp; Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 907\u0026ndash;915 (2019).\u003c/li\u003e\n\u003cli\u003eBankevich, A. \u003cem\u003eet al.\u003c/em\u003e SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. \u003cem\u003eJ. Comput. Biol.\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 455\u0026ndash;477 (2012).\u003c/li\u003e\n\u003cli\u003eClausen, P. T. L. C., Aarestrup, F. M. \u0026amp; Lund, O. Rapid and precise alignment of raw reads against redundant databases with KMA. \u003cem\u003eBMC Bioinformatics\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 307 (2018).\u003c/li\u003e\n\u003cli\u003eLi, H. Minimap2: pairwise alignment for nucleotide sequences. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 3094\u0026ndash;3100 (2018).\u003c/li\u003e\n\u003cli\u003eBuchfink, B., Xie, C. \u0026amp; Huson, D. H. Fast and sensitive protein alignment using DIAMOND. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 59\u0026ndash;60 (2015).\u003c/li\u003e\n\u003cli\u003eAltschul, S. F., Gish, W., Miller, W., Myers, E. W. \u0026amp; Lipman, D. J. Basic local alignment search tool. \u003cem\u003eJ. Mol. Biol.\u003c/em\u003e \u003cstrong\u003e215\u003c/strong\u003e, 403\u0026ndash;410 (1990).\u003c/li\u003e\n\u003cli\u003eCrawford, E. \u003cem\u003eet al.\u003c/em\u003e Investigating Transfusion-related Sepsis Using Culture-Independent Metagenomic Sequencing. \u003cem\u003eClin. Infect. Dis.\u003c/em\u003e \u003cstrong\u003e71\u003c/strong\u003e, 1179\u0026ndash;1185 (2020).\u003c/li\u003e\n\u003cli\u003eGitHub - bacpop/ska.rust: Split k-mer analysis \u0026ndash; version 2. \u003cem\u003eGitHub\u003c/em\u003e https://github.com/bacpop/ska.rust.\u003c/li\u003e\n\u003cli\u003eHussein, N. H., Al-Kadmy, I. M. S., Taha, B. M. \u0026amp; Hussein, J. D. Mobilized colistin resistance (mcr) genes from 1 to 10: a comprehensive review. \u003cem\u003eMol. Biol. Rep.\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 2897\u0026ndash;2907 (2021).\u003c/li\u003e\n\u003cli\u003eLangelier, C. \u003cem\u003eet al.\u003c/em\u003e Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e \u003cstrong\u003e115\u003c/strong\u003e, E12353\u0026ndash;E12362 (2018).\u003c/li\u003e\n\u003cli\u003eKalantar, K. L. \u003cem\u003eet al.\u003c/em\u003e Integrated host-microbe plasma metagenomics for sepsis diagnosis in a prospective cohort of critically ill adults. \u003cem\u003eNat Microbiol\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 1805\u0026ndash;1816 (2022).\u003c/li\u003e\n\u003cli\u003eTsitsiklis, A. \u003cem\u003eet al.\u003c/em\u003e Lower respiratory tract infections in children requiring mechanical ventilation: a multicentre prospective surveillance study incorporating airway metagenomics. \u003cem\u003eLancet Microbe\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, e284\u0026ndash;e293 (2022).\u003c/li\u003e\n\u003cli\u003eMick, E. \u003cem\u003eet al.\u003c/em\u003e Integrated host/microbe metagenomics enables accurate lower respiratory tract infection diagnosis in critically ill children. \u003cem\u003eJ. Clin. Invest.\u003c/em\u003e \u003cstrong\u003e133\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eFuhrmeister, E. R. \u003cem\u003eet al.\u003c/em\u003e Surveillance of potential pathogens and antibiotic resistance in wastewater and surface water from Boston, USA and Vellore, India using long-read metagenomic sequencing. \u003cem\u003emedRxiv\u003c/em\u003e 2021.04.22.21255864 (2021) doi:10.1101/2021.04.22.21255864.\u003c/li\u003e\n\u003cli\u003eStruelens, M. J. The epidemiology of antimicrobial resistance in hospital acquired infections: problems and possible solutions. \u003cem\u003eBMJ\u003c/em\u003e \u003cstrong\u003e317\u003c/strong\u003e, 652\u0026ndash;654 (1998).\u003c/li\u003e\n\u003cli\u003eLewinson, O. \u003cem\u003eet al.\u003c/em\u003e The Escherichia coli multidrug transporter MdfA catalyzes both electrogenic and electroneutral transport reactions. \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e \u003cstrong\u003e100\u003c/strong\u003e, 1667\u0026ndash;1672 (2003).\u003c/li\u003e\n\u003cli\u003eWickham, H. \u003cem\u003eet al.\u003c/em\u003e Welcome to the tidyverse. \u003cem\u003eJ. Open Source Softw.\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 1686 (2019).\u003c/li\u003e\n\u003cli\u003ePedersen, T. L. patchwork: The Composer of Plots. Preprint at https://patchwork.data-imaginist.com (2024).\u003c/li\u003e\n\u003cli\u003eGu, Z., Eils, R. \u0026amp; Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 2847\u0026ndash;2849 (2016).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4271356/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4271356/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAntimicrobial resistant (AMR) pathogens represent urgent threats to human health, and their surveillance is of paramount importance. Metagenomic next generation sequencing (mNGS) has revolutionized such efforts, but remains challenging due to the lack of open-access bioinformatics tools capable of simultaneously analyzing both microbial and AMR gene sequences. To address this need, we developed the CZ ID AMR module, an open-access, cloud-based workflow designed to integrate detection of both microbes and AMR genes in mNGS and whole-genome sequencing (WGS) data. It leverages the Comprehensive Antibiotic Resistance Database and associated Resistance Gene Identifier software, and works synergistically with the CZ ID short-read mNGS module to enable broad detection of both microbes and AMR genes. We highlight diverse applications of the AMR module through analysis of both publicly available and newly generated mNGS and WGS data from four clinical cohort studies and an environmental surveillance project. Through genomic investigations of bacterial sepsis and pneumonia cases, hospital outbreaks, and wastewater surveillance data, we gain a deeper understanding of infectious agents and their resistomes, highlighting the value of integrating microbial identification and AMR profiling for both research and public health. We leverage additional functionalities of the CZ ID mNGS platform to couple resistome profiling with the assessment of phylogenetic relationships between nosocomial pathogens, and further demonstrate the potential to capture the longitudinal dynamics of pathogen and AMR genes in hospital acquired bacterial infections. In sum, the new AMR module advances the capabilities of the open-access CZ ID microbial bioinformatics platform by integrating pathogen detection and AMR profiling from mNGS and WGS data. Its development represents a critical step toward democratizing pathogen genomic analysis and supporting collaborative efforts to combat the growing threat of AMR.\u003c/p\u003e","manuscriptTitle":"Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID pipeline","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-05-02 20:22:54","doi":"10.21203/rs.3.rs-4271356/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"9a384e93-a3aa-480f-8459-d225ebb01c3b","owner":[],"postedDate":"May 2nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":30902861,"name":"Biological sciences/Computational biology and bioinformatics/Classification and taxonomy"},{"id":30902862,"name":"Biological sciences/Computational biology and bioinformatics/Computational platforms and environments"},{"id":30902863,"name":"Biological sciences/Microbiology/Microbial communities/Metagenomics"},{"id":30902864,"name":"Biological sciences/Microbiology/Infectious-disease diagnostics"},{"id":30902865,"name":"Biological sciences/Biotechnology/Genomics/Metagenomics"}],"tags":[],"updatedAt":"2024-05-02T20:22:54+00:00","versionOfRecord":[],"versionCreatedAt":"2024-05-02 20:22:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4271356","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4271356","identity":"rs-4271356","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0