Host Whole Genome Sequence data represent an untapped resource for characterising affiliated parasite diversity

doi:10.22541/au.174405286.65008416/v1

Host Whole Genome Sequence data represent an untapped resource for characterising affiliated parasite diversity

2025 · doi:10.22541/au.174405286.65008416/v1

preprint OA: closed

Full text JSON View at publisher

Full text 95,642 characters · extracted from preprint-html · click to expand

Host Whole Genome Sequence data represent an untapped resource for characterising affiliated parasite diversity | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 7 April 2025 V1 Latest version Share on Host Whole Genome Sequence data represent an untapped resource for characterising affiliated parasite diversity Authors : Sarah Nichols 0000-0001-5053-3858 [email protected] , Andrea Estandía , Catherine Young , Lucy Knowles , Vaidas Palinauskas , Beth Okamura 0000-0001-7279-715X , and Sonya Clegg Authors Info & Affiliations https://doi.org/10.22541/au.174405286.65008416/v1 Published International Journal for Parasitology Version of record Peer review timeline 512 views 311 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Parasites are ubiquitous and exert varied ecological and evolutionary pressures on their hosts. However, characterising parasite diversity and distributions can be both challenging and costly. Leveraging existing data to identify parasites is thus an attractive alternative. High-throughput sequencing (HTS) generates Whole Genome Sequence (WGS) data which are increasingly freely available in public repositories and represent an untapped resource for characterising parasites affiliated with hosts. In this study, we examine WGS data generated for the silvereye ( Zosterops lateralis ), to identify endogenous eukaryotic parasites that were inadvertently captured during host sequencing. We compared detection of parasite genera by this approach with detection via 18S metabarcoding. Results were verified by traditional microscopy of blood slides and conducting a targeted multiplex Polymerase Chain Reaction (PCR) for haemosporidian parasites. Mining WGS data for parasite DNA revealed the broadest range of genera. Further, detection of haemosporidians was largely consistent across microscopy, multiplex PCR and WGS data while 18S metabarcoding entirely failed to detect this group of parasites. Our results demonstrate that existing WGS datasets can be used to estimate endoparasite diversity and provide greater insights on diversity than metabarcoding whilst also avoiding the costs and challenges of direct sampling. We provide a framework outlining opportunities and constraints to consider when mining WGS data to identify parasite sequences. The framework particularly stresses the influences of sequencing depth, database completeness, and methodological biases. Our findings demonstrate how repurposing existing WGS data can provide a cost-effective and informative means of unravelling complex host-parasite interactions in future disease ecology studies. Introduction Parasites are ubiquitous and diverse (Dobson et al. , 2008), imposing significant and varied ecological and evolutionary pressures on their hosts (Penczykowski et al. , 2016). However, climate change may alter parasite distributions (Pickles et al. , 2013; Carlson et al. , 2017; Byers, 2021), with some parasites benefitting (Harvell et al. , 2002) while others become vulnerable to extinction (Carlson et al. , 2017). These changes could lead to the redistribution of parasites, drastically impacting the landscape of host-parasite interactions and, consequently, altering species interactions throughout ecosystems (Poulin and Mouritsen, 2006; Dougherty et al. , 2016; Carlson et al. , 2017). Yet, studies that simultaneously quantify a broad range of parasites are relatively uncommon (Hoarau et al. , 2020; Huang et al. , 2021). In reality, co-infections occur frequently in the wild (Petney and Andrews, 1998; Bordes and Morand, 2009) and parasite interactions may impact host fitness, parasite transmission and ultimately parasite distributions (Cox, 2001; Pedersen and Fenton, 2007; Handel and Rohani, 2015). Nevertheless, most studies focus on single-host, single-parasite systems (Hellard et al. , 2015). Our currently poor understanding of parasite distributions and patterns of co-occurrence likely reflects the difficulty of detecting and characterising parasites. These difficulties arise because parasite diversity is largely undocumented, particularly for certain groups, and their hidden nature, patchy distributions, and expertise required for identification hinder detection (Dobson et al. , 2008; Okamura et al. , 2018; de Buron et al. , 2025). In view of these various challenges, it is of considerable interest to explore methodological approaches to simultaneously characterise a broad range of parasites. Conventional surveys for parasites variously use post-mortem examination, culturing of microbial faecal communities, and microscopic examination of faecal or blood samples (Fallon et al. , 2003; Mes et al. , 2007; Gouba et al. , 2013). However, these often highly invasive and time-intensive methods frequently fail to detect the diversity of parasites present (Durso et al. , 2010). This is because levels of parasitaemia (the quantitative content of parasites in the blood, or parasite load) are often low, the appearance of observable life cycle stages is ephemeral, and parasites themselves are often inconspicuous. Furthermore, many protists cannot easily be cultured and their diversity is thus underestimated (Edgcomb et al. , 2002). Conventional methods also require specialist taxonomic knowledge, which makes identifying a range of taxa challenging. Adopting molecular methods to identify parasites could be highly beneficial. Such methods can be non-destructive, do not require expertise in morphological parasite identification, and can reveal a diversity of parasites within a single host (Nadler and de León, 2011). Polymerase Chain Reaction (PCR)-based detection followed by Sanger sequencing may be used to target specific parasite groups (Valkiūnas et al. , 2014). The simultaneous detection of multiple parasites can be achieved by High-Throughput Sequencing (HTS) methods. For example, metabarcoding involves the amplification of a region of DNA and has the potential to reveal information about mixed infections. Commonly used metabarcoding regions are also relatively well represented in reference databases, making it possible to match sequences to taxa (Wylezich et al. , 2020). However, this approach requires selection of a primer set to amplify the specific target region by PCR and while many primers claim to be ‘universal’, none reliably amplify all diversity present in a sample (Pallen, 2014). This is mainly due to amplification bias introduced by PCR (Krehenwinkel et al. , 2017; Fonseca, 2018). Untargeted (amplification-free) methods, like shotgun metagenomics, attempt to sequence all the organisms present in a sample, whether the sample is environmental or derived from an individual organism. Shotgun metagenomics allows the detection of a broad range of parasites and disease agents in a single test (Vijayvargiya et al. , 2019). This approach has been used to identify eukaryotic pathogens (Briscoe et al. , 2022) and antibiotic resistant bacteria (Cao et al. , 2020) from avian faecal samples and for diagnosis of specific parasites, such as Malayan filariasis (Gao et al. , 2016). However, metagenomics approaches require sufficient sequencing depth to capture the diversity in a sample. Because parasites will likely comprise a small proportion of the sample, sequencing reads will be dominated by the host or environmental contaminants (Wylezich et al. , 2019). This is particularly relevant for organisms that have nucleated erythrocytes; such as birds, fish and reptiles (Hartenstein, 2006). Therefore, the cost of generating sequencing data for the primary purpose of parasite identification could be prohibitively expensive. However, HTS data are increasingly generated at scale (Hudson, 2008; Kahn, 2011; Koepfli et al. , 2015; Parks et al. , 2017) for a range of biological questions (e.g. Lemos et al. , 2011; Goldfeder et al. , 2017; Posada-Cespedes et al. , 2017; Luikart et al. , 2019; Nilsson et al. , 2019; Baldrian et al. , 2022) and are routinely deposited in repositories, making them publicly and freely available. These data provide a resource which could be harnessed for new applications. For example, Franssen et al. (2021) surveyed environmental shotgun metagenomics data in public databases for specific pathogen groups. It may be similarly possible to harness other types of genomic data to identify parasites and pathogens. Whole Genome Sequencing (WGS) is another HTS method that offers the opportunity to identify parasites and pathogens. The aim of WGS is to generate sufficient sequencing reads to reconstruct the entire genome of the focal organism. A somewhat overlooked feature of WGS data is that, like shotgun metagenomic data, they are produced using untargeted methods. Therefore, any DNA present in a sample is liable to be sequenced, including affiliated parasites that may infect the focal individual. Indeed, WGS data from humans have been harnessed to identify the causative agents of previously idiopathic diseases (Kostic et al. , 2011) and to identify pathogens that facilitate some cancers, such as Human papillomavirus (HPV) and Helicobacter pylori (Gihawi et al. , 2019). Similarly, myxozoan genome sequences have been detected in WGS data of their yellowfin tuna host (Weber et al., 2024). It is therefore increasingly evident that WGS data present an untapped opportunity to investigate the endoparasites of host organisms. The silvereye ( Zosterops lateralis ), is small passerine bird distributed throughout the southwest Pacific. As is common for passerine birds (Parsa et al. , 2023), silvereye harbour a diversity of protozoan parasites. Previous research has predominantly focussed on characterising blood parasites from these birds, specifically Haemosporidians which are the causative agent of avian malaria (Clark et al. , 2014; Gudex-Cross et al. , 2015; Olsson-Pons et al. , 2015) although Atoxoplasma (Laird, 1959) and filarial nematodes (Clark et al. , 2016) have also been documented. Some research also highlights the presence of gastrointestinal parasites such as coccidia like Isospora (Yang et al., 2018). Our aim here is to determine whether WGS data generated from the blood of the host can be used to simultaneously identify a range of affiliated endoparasitic eukaryotes. We then evaluate the performance of WGS and metabarcoding using 18S primers in terms of their ability to characterise a diversity of parasites. In addition, we conducted microscopic examination and targeted multiplex Polymerase Chain Reaction (PCR) to identify a subset of parasites (haemosporidians) to verify and evaluate the consistency and power of detection by WGS and metabarcoding. We summarise insights gained through our analyses by providing a framework of issues to consider when re-purposing WGS data to identify affiliated parasites. The framework highlights potential constraints, including sequencing depth, read length and how the availability of reference sequences can introduce biases in characterising some parasites. While we expect that HTS-based methods, like metabarcoding and mining WGS data, will reveal a greater diversity of parasites than targeted methods, we predict that they may not identify parasites with the same accuracy. Nevertheless, the ability to characterise parasites from publicly available WGS data could facilitate insights on fundamental and applied questions regarding patterns and distributions of a diverse range of parasites. For example, distributions in space and time or patterns of host use could be revealed. In turn, such insights could provide the impetus for more fine-tuned and targeted studies such as investigating specific host-endoparasite interactions or environmental drivers of parasite abundance. Materials and Methods Sample collection: Mist nets were used to capture birds of the subspecies Zosterops lateralis lateralis in Queensland, New South Wales and Tasmania, Australia from May to July 2022. Blood samples were obtained from 68 individuals by pricking the brachial vein with a sterile needle and collecting blood in a glass capillary tube. A small droplet (~ 1-2 μl) of blood was used to prepare a microscopy slide (see below) (Valkiunas, 2004). Capillary tubes were snapped to size then placed in 1.5 ml microfuge tubes with Queen’s lysis buffer. The buffer was modified from Seutin et al. (1991) with the final concentrations of 0.01 M Tris, 0.01 M NaCL, 0.01 M EDTA and 0.03 M n-lauroylsarcosine (10% w/v), and a pH of 7.5. Samples were refrigerated until they could be frozen at -20°C. Identification of blood parasites via microscopy of blood smears Microscope slide preparation and screening was carried out according to an adapted protocol (Valkiunas, 2004). A small droplet of blood (~ 1-2 μl) was smeared onto a glass slide and air-dried for up to 30 seconds, then fixed in absolute methanol for 1-2 min, and air-dried again. Slide staining took place one year after their collection, by submersion in 10% Giemsa stain for 1 hr (concentrated stock solution, pH 7.0–7.2). Despite the delay between fixing and staining, slides were sufficiently informative to identify haemosporidian genera. Initially, each slide was screened using an Olympus BX51 microscope for 10 minutes at x40. This was followed by examining 100 fields of view at x100 oil immersion. The lower magnification for the initial screening enabled a greater area of the slide to be inspected, equating to a minimum of 2 cm 2 , thus increasing the likelihood of parasite detection. However, parasites vary in size and while the larger Leucocytozoon spp. are detectable at x40, some Plasmodium spp. life stages may not be identified unless a higher magnification is utilised. Each field was inspected for 30 seconds. We estimated parasitaemia by counting the number of infected and uninfected erythrocytes in 10 fields of view (Godfrey Jr et al. , 1987). Identification of parasites via molecular methods DNA extractions were carried out using a modified standard phenol-chloroform protocol (Sambrook and Russell, 2006). Approximately 100 μl of blood was incubated overnight at 55 °C with rotation with 250 μl DIGSOL extraction buffer (0.02 M EDTA, 0.05 M Tris-HCl (pH 8.0), 0.4 M NaCl, 0.5% sodium dodecyl sulphate (SDS)) and 10 μl Proteinase K (20 mg/ mL). Following incubation, 250 μl phenol:chloroform:isoamyl alcohol (25:24:1) was added and samples were rotated for 10 minutes and then centrifuged at 10,000 rpm for 10 minutes. At each of the following steps, the aqueous phase was transferred to a new microfuge tube after rotation and centrifugation with addition of: i) phenol:chloroform:isoamyl alcohol as above; ii) 250 μl chloroform:isolamyl alcohol (24:1); and iii) 2 volumes of cold 100% ethanol, 1 volume of 2.5 M ammonium acetate and 1 μl glycogen to precipitate the DNA. The samples were left at -20 °C for a minimum of 12 hours, then centrifuged for 10 min at 15,000 rpm at 4 °C. The supernatant was discarded, and the precipitate washed with 500 μl cold 70% ethanol. Centrifugation and washing were repeated twice. The precipitated DNA was left to dry at room temperature. Once dried, the DNA was resuspended in 50 μl of TE (TrisEDTA) buffer (0.01 M Tris-HCL (pH 8.0), 0.0001 M EDTA). A negative extraction control was included per extraction batch. The DNA extracts were used to conduct a multiplex PCR, 18S metabarcoding and generate WGS data. i. Haemosporidian-specific multiplex PCR To screen the samples for haemosporidians ( Haemoproteus, Plasmodium and Leucocytozoon ) we used a multiplex PCR assay for avian haemosporidian parasites (Ciloglu et al. , 2019). The PCR requires equimolar concentrations of three sets of primers in each reaction (PMF, PMR, HMF, HMR, LMF and LMR) which target and amplify different regions of the parasite genomes. Each reaction included Qiagen 5X Multiplex PCR Master Mix (Qiagen, Hilden, Germany), 1 μl of each primer at 10 μM (final conc. 0.5M), 1 μl DNA (ca. 10 ng/μl) and sterile ddH2O to make a final volume of 10 μl. The thermal cycling conditions were as follows: 95 °C for 15 min; followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 59 °C for 90 s, and extension at 72 °C for 30 s; with a final extension at 72 °C for 10 min. All PCR experiments contained one negative (ddH 2 O) and one positive control for every 16 samples. For the positive control, we alternated between using a Haemproteus single infection and a Plasmodium and Haemoproteus mixed infection. The amplified products were resolved using electrophoresis on a 2% agarose gel containing GelRed™ gel stain (Biotium, Inc., Hayward, CA, USA) for 1 h at 90 V. The product size was used to infer which haemosporidian parasite was present in each sample ( Haemoproteus: 533 bp, Plasmodium: 378 bp, Leucocytozoon: 218 bp). ii. 18S anti-metazoan metabarcoding Metabarcoding was conducted using a set of 18S anti-metazoan primers, 3NDF (Cavalier-Smith et al. , 2009) and the “universal non-metazoan” reverse primer, 18s-EUK1134-R (Bower et al. , 2004), which target the V4 region of the 18S SSU rDNA. These primers were chosen for three reasons. First, because the amplified region is conserved across taxa but shows substantial sequence variation between taxonomic groups (Van de Peer et al. , 1997; Wuyts et al. , 2000; Stoeck et al. , 2010). In addition, the region is often used for characterising parasites (Kounosu et al. , 2019) therefore it has substantial representation in databases (e.g. NCBI, PR 2 ), enabling recognition of a broad range of parasites. Finally, the primers have been designed to avoid amplifying metazoans, without this adjustment it is likely that the amplicons would be dominated by reads from the host. The primers were also modified to include annealing sites to be used for indexing: the forward primer sequence was 5’- ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNN GCAAGTCTGGTGCCAG-3’ and the reverse was 5’- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT CTTTAARTTTCASYCTTGCG-3’. PCR1 reactions included Qiagen 5X Multiplex PCR Master Mix, 2 μl of primer at 5 μM (final conc. 0.5M), 1 μl DNA (ca. 10 ng/μl) and sterile ddH2O to make 20 μl final volume. The thermal cycling conditions were as follows: 95 °C for 15 min, 35 cycles of 95 °C for 30 sec, 53 °C for 45 sec and 72 °C for 1 min with a final extension at 72 °C for 10 min. Two negative controls were included for every 30 amplifications: a negative PCR control (substituting 1 μl ddH2O for DNA) and a negative extraction control. Positive PCR reactions produced a ~ 540bp product. Library preparation was carried out by cleaning successfully amplified samples with Promega ProNex Magnetic Beads using a ratio of 1.15X (v/v) beads to product, according to the manufacturer’s instructions (Promega, Inc., United States). Dual indexing of each sample was carried out using Fi5 and Ri7 indexing primers. Reactions were conducted in a total volume of 20 μl that included 5X Meridian Bioscience MyTaq HS Mix, Fi5 and Fi7 primers (at 10 μM), sterile ddH2O and product from PCR1. The PCR cycling conditions were 95°C for 15 min; 8 cycles of 98°C for 10 sec, 65°C for 30 sec and 72°C for 30 sec and a final extension at 72 °C for 5 min. Successful ligation of the index primers was confirmed using the Agilent TapeStation (Agilent Technologies Inc., United States). This was followed by a second clean-up using a 1X (v/v) ratio of Promega ProNex Magnetic Beads. Amplifications were quantified using the Promega QuantiFluor. Pools were created by combining equal ratios of amplicons from each sample according to their concentration (ng/ μl) and repeatedly quantifying by qPCR as follows. Three independent 100- 1000- and 10000-fold dilutions were prepared for each pool using sample dilution buffer (10mM Tris, pH 8.0 and 0.05% Tween 20X). Reactions were prepared with SYBR FAST mix and primers. The thermal cycling conditions were 5 min at 95°C, 35 cycles of 30 sec at 95°C and 45 sec at 60°C. The pools were subsequently combined and analysed by further qPCR until all the samples were represented in a single pool. The size and quantity of the final product was checked using a High Sensitivity D1000 ScreenTape with the Agilent TapeStation. The final pool was sent to the NERC Environmental Omics Facility, University of Liverpool, UK for sequencing on a PacBio Sequel II instrument using SMRT cell 8 M, Sequel II Primer V3.1.1.5, Sequel® II Binding Kit V3.1. Library loading was achieved by diffusion, with all the samples loaded onto one SMRT cell. PacBio CCS reads (mean passes = 20, median read quality = Q37) were demultiplexed and Illumina primers were removed from reads using the cutadapt software (Martin, 2011). The DADA2 package (Callahan et al. , 2016) was used for Amplicon Sequence Variant (ASV) calling. Reads below Q30 were removed and reads with minimum and maximum lengths of 495 bp and 675 bp, respectively, were retained. Following the filtering step, reads were dereplicated to reduce computation time. The error rates were learned, and the samples were subsequently denoised to remove likely sequencing errors. Finally, chimeras were removed. We downloaded the NCBI nt database version 5 (downloaded on 07/08/2024) (Sayers et al. , 2022) and aligned ASVs to this database using the BLASTn algorithm, with the identity threshold set at 97%. For BLAST hits with sequence similarity of 97% or higher, the Lowest Common Ancestor (LCA) was identified using the BASTA package (Kahlke and Ralph, 2019). This LCA approach determines the taxonomic assignment for an ASV by examining the hierarchical structure of the taxonomic tree and finding the LCA of all identified taxonomic groups associated with that ASV. Following taxonomic assignment, we gathered life history information for each detected parasite genera. We searched for information on previously recorded primary hosts according to the categories: bird, other vertebrate, invertebrate, or plant. We also assigned each parasite a dominant transmission strategy: flying insect vector, tick-borne, two-host (i.e. any other complex life cycle with two hosts or direct transmission. iii. Generating WGS data and extracting parasite sequences DNA extractions were quantified with a Qubit 2.0 (Thermofisher Scientific Inc., United States) and were sent to Novogene UK for library preparation and whole genome sequencing, aiming for 10X coverage on the Illumina Novaseq 6000 platform (Figure S1, Illumina, San Diego) and generating 150 bp paired-end reads. Construction of a pseudochromosome assembly : A pseudochromosome assembly was constructed for use as a reference genome for aligning the raw sequencing data and subtracting host reads. The Chromosembler tool available in Satsuma2 (Grabherr et al. , 2010) was used to order and orient scaffolds of the Z. lateralis melanops genome assembly (Cornetti et al. ) (GCA_001281735.1), producing 33 pseudochromosomes assembled according to synteny with the Vertebrate Genome Project’s Zebra finch ( Taeniopygia guttata ) genome assembly (GCA_009859065.2) (Rhie et al. , 2021). The PathSeq pipeline from the Genome Analysis Toolkit (GATK) (Walker et al. , 2018) was used to generate a host k-mer library: k-mers generated from each consecutive position in the reference genome sequence were stored in a hash table. A Burrows-Wheeler Aligner (BWA) image of the reference genome was also generated using the BWA-MEM index image file creator in GATK (Li, 2013). Quality control and removal of host sequences: Raw whole genome sequences were trimmed to remove adapter content (10 bp) and low-quality base calls using fastp (Chen et al. , 2018). The reads from each individual were then aligned to the pseudo-chromosome assembly using the BWA-MEM algorithm (Li, 2013). Host sequence removal utilises a fast k-mer search of sequence reads with consecutive k-mers from each read checked against the host k-mer library (generated above). Reads were removed if there was at least one k-mer match. Next, the remaining reads were aligned to the host-reference image using the BWA-MEM algorithm. Reads with an identity score above 30 were removed. Further quality control was carried out in parallel to host sequence removal using the PathSeq pipeline in GATK (Walker et al. , 2018). Reads with an excess of A/T or G/C content (29 out of 30 bases per window; (Benjamini and Speed, 2012) were removed. Low-complexity sequence bases and low-quality bases were masked using the sDUST algorithm (Morgulis et al. , 2006). Read ends were further trimmed according to base quality (Q = 15), and duplicate sequences and those shorter than 60 bp were removed. Contiguous sequence assembly: Following host sequence removal, longer contigs were assembled from the remaining sequences using metaSPAdes (Nurk et al. , 2017). Two datasets were generated: one containing assembled reads, the other containing the quality filtered, paired-end reads prior to the assembly step. We included the latter dataset because some parasite DNA is likely to be present at extremely low levels in the host DNA and therefore will not have sufficient coverage for assembly, and because reference availability is likely to be limited for longer sequences. Building a custom reference database of eukaryotic parasite sequences: The accuracy of taxonomic assignment is reliant on reference database availability and completeness (Hleap et al. , 2021; Jeunen et al. , 2023). To maximise parasite taxon coverage, we compiled a custom database using the pre-built bacteria, Protozoa, UniVec and human databases in Kraken 2 (Wood et al. , 2019). We then added additional reference sequences from a variety of sources (Table 1). A complete list of NCBI taxonomic IDs was generated for apicomplexans, fornicata, cercozoans, parabasalids and euglenozoans using TaxonKit (Shen and Ren, 2021). The list was filtered for any IDs labelled as uncultured or environmental. The sequences were subsequently downloaded using Biopython (Cock et al., 2009; version released 05-08-2024). We also downloaded all available genome references for the same taxonomic groups from RefSeq (Pruitt et al. , 2007; O’Leary et al. , 2016). The PR 2 database was filtered to remove sequences labelled uncultured, environmental, fungal and metazoan. All available genomic references from WormBase ParaSite (Howe et al. , 2017) and Eukaryotic Pathogen Genomics Database Resource (EuPathDB) (Warrenfeltz et al. , 2018) were also incorporated. Kraken2 produces a compact hash table of k-mers derived from these reference sequences to be queried against when assigning sequences. Taxonomic identification: Both the assembled and paired-end sequences derived for each sample were assigned using the custom parasite reference database with Kraken 2 (Wood et al. , 2019). We assigned parasites to recorded hosts and dominant transmission mode as described above for 18S metabarcoding data. Confidence scores for each taxonomic assignment were calculated according to the fraction of k-mers assigned to the final taxonomic identity, its descendants and ascendants, i.e. the RTL score with Conifer (Silamiķelism, 2020). We applied confidence scores of 0.2 for the assembled dataset and 0.7 for the paired-end dataset. The former score (0.2) reduces the propensity for Kraken 2 to overclassify reads that are not represented in the reference database (Peabody et al. , 2015). The latter score of 0.7 was considered highly conservative and hence more suitable for dealing with uncertainty introduced by the relatively short paired-end sequences. Our processing resulted in two datasets to be used for analysis: 1) parsing filtered paired-end reads directly to Kraken 2 and filtering for 70% confidence (WGS Paired-end 70%), and 2) assembling longer contigs from paired-end reads prior to assignment with Kraken 2 and applying a 20% confidence filter (WGS Assembled 20%). Assessing parasite identifications from WGS data Species accumulation curves: These were produced to assess how the sequencing depth of whole genome sequence data could impact the number of parasites detected and the confidence scores for taxonomic assignments. The majority of our WGS data achieved 10X genome coverage although there was some variation. In order to explore the impact of greater sequencing depths on parasite identification, we downloaded raw WGS data for the Eurasian blackcap ( Sylvia atricapilla ) generated in two studies that aimed to achieve 60X coverage (Ishigohoka et al. (2023); Sigeman et al. (2020). The data were downloaded from the BioProject IDs PRJEB66075 and PRJNA578893 in NCBI. The samples were processed using the pipeline described above, except that a Eurasian blackcap host reference sequence (generated by the Vertebrate Genomes Project; GCA_009819655.1) was used for host removal. Prior to host filtering, we conducted random sub-sampling to achieve sequence depths of approximately 1X 2.5X, 5X, 10X, 15X, 20X, 30X, 40X and 50X using SAMtools (Li et al. , 2009). We used the vegan package in R (Oksanen, 2019) to plot species accumulation curves using the exact method for our own WGS data and the blackcap data. We used the multiplex PCR data as a reference to ascertain the occurrence of true positive, false positive, true negative and false negative identifications made by mining WGS data for haemosporidian parasites ( Haemoproteus , Plasmodium and Leucocytozoon) . We then generated confusion matrices and Area Under the Curve, Receiver Operating Characteristic (AUC ROC) scores using the caret package in R (Kuhn, 2011). AUC ROC scores indicate how well a model can distinguish between positive and negative classes according to multiplex PCR data. We calculated AUC ROC scores rather than accuracy scores because imbalanced datasets (e.g. most outcomes belong to a single class) may appear to achieve high accuracy by solely predicting the majority class, even if it fails to identify the minority class (Ling et al. , 2003). It should be noted that determining infection status via targeted multiplex PCR can also generate false positive and negative errors (Zehtindjiev et al. , 2012; Valkiūnas et al. , 2014; Ciloglu et al. , 2019). Indeed, previous studies have found that PCR assays can have an accuracy ranging from 57-100% (Freed and Cann, 2006). Nevertheless, multiplex PCR results provide some insight into the true infection status of a sample. Comparative assessment of different approaches Methods are variously compared to highlight differences in the detection of parasitic genera. In particular, we assess the number of distinct parasite genera identified per host individual by each method, the total number of genera detected by each method, the frequency with which genera are detected by multiple methods, and among those, the extent to which they are detected in the same samples. Results Identification of blood parasites via microscopy of blood smears Across the 68 individual silvereye samples, microscopic examination of blood smears revealed two genera of blood parasites: Haemoproteus in 3.5% of samples and Leucocytozoon in 7%. Parasitaemia was ranged from 2-9% for Haemoproteus (Table S1) . Identification of parasites via molecular methods Haemosporidian-specific multiplex PCR Multiplex PCR detected Haemoproteus , Leucocytozoon, and Plasmodium in 13.6%, 15.2% and 3% of samples, respectively. ii. 18S anti-metazoan metabarcoding Metabarcoding with anti-metazoan 18S primers detected a total of six parasite genera. Four of the six genera have previously been reported in birds ( Babesia, Blastocystis, Eimeria and Isospora ). Babesia are characterised by tick borne transmission, Blastocystis and Eimeria by direct modes of transmission, and Isospora utilises either a direct or two-host life cycle. We also detected Theileria , which is a tick borne parasite reported in vertebrates, predominantly cattle (Mans et al. , 2015), and Albugo , which is a directly transmitted parasite of plants (Thines and Voglmayr, 2009). Assessing parasite identifications from WGS data Across the 68 samples, a mean of 82,914,641 raw reads was reduced to 368,977 paired-end reads per sample after host read removal, quality filtering and removal of single end reads (Figure 1A). These paired-end reads were assigned directly using Kraken 2, to produce the paired-end dataset. Approximately 5% of reads could be identified per sample (Figure 1B). Following assembly, a mean of 6335.2 sequences per sample remained, 7% of which were identified per sample (Figure 1B). We detected a total of 73 and 35 parasitic genera for the paired-end and assembled datasets, respectively. The number of parasitic taxa identified across both datasets decreased following filtering based on confidence scores. The majority of parasites detected had previously been recorded as parasites of birds (Figure 2). However, several parasites of invertebrates, plants and other vertebrates were recorded in both datasets. After applying a confidence filter of 60%, the parasites recorded in the assembled WGS mining data were all known bird parasites. A variety of transmission strategies were detected for parasites identified in both paired-end and assembled datasets (Figure 2). The diversity of transmission strategies detected in the assembled dataset decreased rapidly following confidence filtering, with only parasites transmitted by flying insect vectors being detected with over 70% confidence. Species accumulation curves showing the effect of sequencing depth on detection of parasite genera asymptotes by 10x sequencing depth in both blackcaps and silvereyes, even under the highest confidence thresholds (Figure 3). Increasing confidence thresholds lowers the overall value of at the asymptote. When comparing the detection of haemosporidian parasites, the assembled WGS data resulted in fewer false positives than the paired-end data, and the paired-end WGS data detected more true positives than the assembled data. The AUC ROC score, indicating how well a model can distinguish between positive and negative classes, ranged from 0.5 to 0.7 and was generally higher in the paired-end 70% dataset than in the assembled 20% dataset, but scores were similar for detection of Haemoproteus (Table 2). Comparative assessment of different approaches The number of parasite taxa detected per individual by different methods was variable. Metabarcoding did not detect any parasites in the majority of cases (Figure 4). One or more parasites were detected for most individuals by WGS paired-end datasets with a confidence threshold of 70%. Two parasites were detected in the majority of cases by the WGS assembled dataset with 20% confidence threshold. The paired-end WGS data detected the highest number of parasite genera (Figure 5). Some genera of parasites were detected by multiple methods (Figure 5), but none were detected by all methods, i.e. there were no cases with incidence recorded across all five methods (Figure 5). It should be noted that some methods, such as the multiplex PCR and microscopy, do not aim to detect a broad range of parasite genera. Figure 6 illustrates incidences when parasites were detected in the same sample by more than one method. Haemoproteus was most consistently detected in the same samples by multiple methods. There was substantial agreement between the detection of Leucocytozoon by multiplex PCR and by the paired-end WGS data with a confidence filter of 70%. Leucocytozoon was no longer detected in WGS data following the assembly step (Figure 6). Notably, Babesia and Theileria were both detected by three methods but were usually detected in different samples. Discussion Eukaryotic endoparasites are detectable in WGS data Whole genome sequence data offer a useful resource for effective and broad-scale characterisation of parasite infection via bioinformatic mining for parasite sequences. Our study demonstrates that interrogating WGS data for inadvertently sequenced taxa reveals a greater diversity of parasites (by an order of magnitude) than is revealed by more commonly used methods for broad-scale identification, such as metabarcoding. Furthermore, the approach is effective at detecting a taxonomically wide range of parasite genera with a variety of transmission modes, as well as identifying cases of co-infections within individuals. The latter may be missed by more targeted approaches, such as a parasite-specific PCR. Endoparasite diversity revealed by WGS and metabarcoding data Of the broad-scale detection methods employed in this study, mining WGS data for parasites was more powerful than metabarcoding in terms of the number of parasite genera detected. The assembled and paired-end WGS datasets detected 35 and 73 parasite genera, respectively, compared to the six genera detected by the metabarcoding approach. There are several factors that may explain this discrepancy in the variety of parasites detected. First, the 18S primer design deliberately avoided amplification of metazoans to reduce the representation of host DNA in the sequencing data (Bower et al. , 2004). Therefore, we did not expect to capture helminth or other metazoan parasites using metabarcoding, and indeed none were recovered. In contrast, the untargeted WGS approach had no such constraints, enabling detection of a much broader range of parasites. Indeed, the detection of helminths such as Schistosoma and Dibothriocephalus in WGS data was achieved. However, this explanation does not account for the plethora of non-metazoan parasites detected in WGS data but not the metabarcoding. The WGS approach may, additionally, have captured a greater range of parasites because it is amplification-free. It is well-established that mixed-template approaches like metabarcoding can suffer from PCR-bias, whereby the polymerase preferentially amplifies sequences with certain characteristics, particularly GC content (Nichols et al. , 2018). In particular, GC-rich sequences are expected to amplify preferentially (Polz and Cavanaugh, 1998). In our study, the metabarcoding approach failed to identify haemosporidian parasites, despite the availability of reference sequences for this region (Harl et al. , 2019; Harl et al. , 2023). Notably, the GC content of avian haemosporidian parasites is generally low (Nikbakht et al. , 2014; Huang et al. , 2020). Other parasitic taxa have also been noted to have AT-rich genomes (e.g. Plasmodium falciparum [Gardner et al. 2002]; myxozoans [Yang et al. 2014; Faber et al. 2021]; Strongyloides spp. [Hunt et al. 2016]). Therefore, it is possible that haemosporidians are not represented in the metabarcoding data because they were outcompeted at the amplification step. Indeed, Plasmodium has previously been shown to amplify poorly using a range of 18S primers (Kounosu et al. , 2019). The large difference in the numbers of parasitic genera recovered from WGS compared to metabarcoding is somewhat surprising considering that reference sequences for the 18S ribosomal region are readily available, and they are routinely used for identification (Moon-van der Staay et al. , 2001; Guillou et al. , 2012; Banos et al. , 2018; Harl et al. , 2019; Kounosu et al. , 2019; Wylezich et al. , 2019; De Jonge et al. , 2021; Vaulot et al. , 2022; Harl et al. , 2023). Despite the relatively limited reference sequence availability for DNA fragments mined from WGS, our approach appears to circumvent some of the issues introduced by metabarcoding, allowing for a broad range of parasite diversity to be captured. Reference sequences for WGS will no doubt increase in availability, and this will further improve the utility of the method. Interestingly, both metabarcoding and WGS data identified several parasites that reside in the intestine, despite being generated from blood samples. For metabarcoding, these ‘gut parasites’ included Blastocystis and the coccidians, Eimeria and Isospora (Box, 1981; Allen, 1987; Boreham and Stenzel, 1993) which collectively represented the vast majority of identified parasites via metabarcoding. Equally, a number of gut parasites were identified from the WGS data, including Blastocysitis, the coccidians Eimeria , Cyclospora , and Cryptosporidium as well as Entamoeba . There is increasing evidence that DNA from ingested material or components of the gut microbiome can be detected in the blood. For example, DNA from microbes in the gut can be detected in human blood samples, where it is referred to as circulating microbial DNA (cmDNA) (Damgaard et al. , 2015; Zhai et al. , 2024). DNA fragments from food items can also be detected in human blood (Spisak et al. , 2013). Furthermore, eukaryotic micro-organisms could conceivably enter blood, through translocation from other sites in the body (Tan et al. , 2023), or via shedding of parasite DNA into the blood stream. Indeed, it has been suggested that characterising microbiomes from blood rather than faecal samples could help to avoid contamination issues (Zhai et al. , 2024). These insights and our results suggest that mining HGS data for parasites could be similarly fruitful for identifying a broad range of affiliated parasites. Further studies to characterise and compare the eukaryotic parasite communities from blood and faecal samples of the same individuals would help to elucidate how reliably gut parasite communities are revealed by analysis of blood. Consistency of detection by metabarcoding and WGS data Surprisingly, metabarcoding and WGS seldom provided consistent results in terms of detecting the same parasite genera in the same samples. Sequencing depth provides one possible explanation for this disparity. Low sequence coverage in shotgun metagenomics data has been suggested to result in inconsistent taxonomic identifications of fungal sequences from faecal samples when compared to metabarcoding data (Usyk et al. , 2023). Such results may arise if informative regions for distinguishing taxa are not captured at low sequencing depths. However, in the present study (which retained approximately 1% of raw read data compared to the 0.1% recovered by Usyk et al. (2023)), species accumulation curves indicated that the parasite community did not change significantly with a greater sequencing depth. Thus, sequencing depth is unlikely to account for the incongruence between the parasites identified in the metabarcoding and WGS data. Similarly, shallow metagenomic sequencing has been shown to be as effective as deep sequencing (Hillmann et al. , 2018) and more effective than metabarcoding (La Reau et al. , 2023) in capturing community composition in microbiome analyses. These considerations suggest that sequencing depth does not appear to explain the differences observed in the present study. The discrepancy in parasites identified in the same samples by the WGS and metabarcoding approaches could also reflect erroneous identifications. In similar shotgun metagenomics studies misalignments with the reference database have been suggested to result in misidentification of sequences, leading to different taxa being identified in shotgun metagenomics and metabarcoding data (Clooney et al. , 2016). However, we found a moderate level of consistency in identifying known haemosporidian parasite infections using the paired-end WGS approach with AUC ROC scores of 0.7 for each genera. It should be noted on the other hand, that utilising the assembled WGS data to identify haemosporidians produced AUC ROC scores of 0.5 for both Leucocytozoon spp. and Plasmodium , indicating it performed no better than chance in identifying these infections. The metabarcoding approach on the other hand, was unable to identify any haemosporidian parasites. The paired-end WGS approach may thus provide more power in identifying parasites than metabarcoding which further emphasises its consideration for future parasite studies. Considerations for re-purposing host WGS data to characterise affiliated parasites Our collective results show that mining parasite sequences from WGS data of hosts can reveal a greater diversity of affiliated parasites with potentially more robust estimates than gained by commonly used approaches like metabarcoding. Overall, our results highlight how repurposing pre-existing WGS data could allow broad surveys of parasite community composition. However, as the WGS data were originally generated to investigate host biology, the data may not be optimised for characterising parasites. Therefore, there are several caveats and trade-offs to consider on a per-study basis if repurposing such data. These are elaborated below and summarised in Box 1. The availability and completeness of references for taxonomic assignment are well-recognised limitations of shotgun metagenomics and metabarcoding approaches and have been shown to influence inferences of metagenomics studies (Tessler et al. , 2017; Hotaling et al. , 2021; Leray et al. , 2022; Jeunen et al. , 2023; Usyk et al. , 2023). The availability of reference sequences may limit or introduce biases in sequence identification. This limitation is exacerbated in parasite studies for several reasons. Parasites are generally challenging to sequence, making them under-represented in databases. This is partially due to the close association between intracellular parasites and host tissues, which can make it difficult to isolate substantial ratios of parasite DNA for sequencing (Palinauskas et al. , 2013; Videvall, 2019). In addition, the presence of multiple parasites within a single host can complicate assigning sequences to specific parasite taxa (Galen et al. , 2020). Intraspecific and geographic variation of sequences can also be limited in public databases (Hestetun et al. , 2020). Furthermore, the sampling effort for parasites is geographically patchy (Poulin and Jorge, 2019) and data for parasites in many regions are simply unavailable. Finally, research effort and therefore reference sequence availability is heavily influenced by the economic importance of taxa (Vallée et al. , 2016). As a result many taxa continue to be under-represented in public databases (Galen et al. , 2020). Reliability of sequence information can also be problematic. Taxonomically narrow databases can both increase the rate of false negative identifications and lead to false positives (Keck et al. , 2023). These issues may be avoided by using databases with a broad range of sequences, such as the NCBI nucleotide database (Sayers et al. , 2022). However, it has been found that large, uncurated databases contain frequent errors (Keck et al. , 2023) and next-generation sequencing data uploaded to public databases often contain contaminants, leading to misidentifications (Laurence et al. , 2014; Bensch et al. , 2021). In situations where reference material is lacking, limiting identification to the genus level may be required, as in the present study. Our results also indicate a possible trade-off between maximising the captured diversity and ensuring a high level of accuracy in parasite identifications. We utilised two approaches in the present study to identify parasites from paired-end and assembled reads. The short 150 bp sequences provided by paired-end reads may not sufficiently capture regions of the genome that enable differentiation between taxa. Therefore, multiple short sequences of the same parasite could hit multiple parasite references, inflating the estimated number of parasites. To minimise such issues, paired-end reads may be assembled into longer contiguous sequences. Longer sequences confer a better chance of covering informative genomic regions as well as helping to correct sequencing errors by creating consensus sequences between overlapping regions (Ayling et al. , 2020). However, improving the accuracy of identifications in this way may also significantly reduce the diversity detected. For example, the lack of genomic references for bird parasites means that the probability of finding a reference of sufficient length to match a longer sequence is low. In addition, some parasites may not have sufficient sequencing coverage to be assembled into longer sequences (Ayling et al. , 2020). The sequence coverage can depend on the abundance of parasite in the host (i.e. parasitaemia) and the physical size of parasites. Low parasitaemia infections by small parasites are less likely to be represented in raw sequencing data. For example, Giardia is a particularly low parasitaemia infection, resulting in low representation in faecal shotgun metagenomics data and fewer positive identifications (Wylezich et al. , 2020). Likewise, in our study blood smears revealed low parasitaemia for Leucocytozoon spp. compared to Haemoproteus spp. Accordingly, Leucocytozoon spp. were not detected in the assembled WGS dataset, despite their detection in the paired-end read WGS datasets, the multiplex PCR data, and by microscopic examination of blood smears. In addition, rarity of genomic references (particularly longer sequences) for Leucocytozoon spp. (Omori et al. , 2008) could have compounded detection in WGS data. A potential solution to avoid the trade-off between maximising captured diversity and the robustness of taxonomic identification is to avoid an assembly step while applying a stringent confidence filter. The developers of Kraken 2 suggest that filtering based on confidence is not necessary to obtain accurate taxonomic identifications (Wood, 2020). However, we found that applying a confidence filter aided in removal of potentially spurious identifications. For example, the number of parasites that are known to use plant or invertebrate hosts, which we presume to be incorrectly identified, were substantially reduced following confidence filtering. It should be noted that confidence scores for longer sequences will be significantly lower for assembled sequences. This might seem counter-intuitive because taxonomic assignments based on longer sequences (that are less error prone than short sequences due to consensus sequence obtained from overlapping fragments) would be expected to confer greater confidence. However, the lack of longer sequences in the reference databases often means that few ‘good matches’ are available. It is also possible that the detection of non-avian or vertebrate parasites in the host samples is not spurious, but rather reflects a dearth of knowledge regarding the life cycles and host specificity for many parasites (Dobson et al. , 2008). Therefore, it is possible that some of these parasites are yet to be described in avian hosts. However, it is difficult to discern whether detection of DNA belonging to a certain parasite indicates completion of its life cycle in the host. Alternatively, the parasite DNA may have entered the blood from material that had been ingested. Our framework highlights contradictory decisions that should be considered when re-purposing WGS data for parasite identification. For example, while assembling longer contiguous sequences may help to avoid spurious parasite identifications (Ayling et al. , 2020), doing so may result in rarer parasites going undetected if sequencing depth or the availability of reference sequences is limited (Tessler et al. , 2017; Usyk et al. , 2023). In such cases the application of a stringent confidence filter may be preferable. Therefore, it is imperative to consider the aim of the study when designing an analysis pipeline. Indeed, there are some research questions for which the occurrence of false positives or false negatives may be more or less problematic. For example, studies which aim to capture diversity, particularly if much of the taxonomic diversity is unknown, may choose to use a less conservative analysis pipeline. While this may increase the rate of false positives, it will maximise the captured diversity. However, false positives may be problematic in studies that focus on more specific host-parasite interactions and eco-evolutionary dynamics. Conclusions We have shown that WGS data, generated with the primary aim of investigating host biology, can be repurposed to reveal information about multiple parasites infecting the host. Utilising such genomic data to quantify affiliated parasites provides an opportunity to conduct broad-scale biogeographic studies, potentially at low additional cost. The capacity to detect co-infections within individuals is a further bonus, together enabling investigations beyond single-host, single-parasite systems (Stewart Merrill et al. , 2022) and helping to unravel complex networks of host-parasite and parasite-parasite interactions. Furthermore, the broad taxonomic range of parasites detected could lend itself to exploring the potential of varied parasite life-histories to mediate parasite dispersal and distribution patterns. Additional opportunities and outcomes may also arise, for example to investigate novel host-parasite interactions, patterns of host use, or parasite distributions over space and time. At present, untargeted genomic sequencing data are referred to under a huge variety of names, these include shotgun metagenomics, (Couto et al. , 2018), metagenomic shotgun sequencing, genome skimming (Dodsworth, 2015; Vijayvargiya et al. , 2019), and WGS (Ng and Kirkness, 2010). Our study highlights that the data resulting from these various approaches can be repurposed for other applications. Although we highlight some limitations to extracting parasite reads from WGS data, it is important to consider the biases associated with well-established methods for quantifying parasite diversity. Future research that adopts a WGS mining approach to characterising parasites should consider the completeness and availability of reference databases, the sequencing depth and read length of WGS data, and the research question to evaluate how best to re-purpose WGS data to characterise affiliated parasites. Table 1 Summary of reference sequences included in the custom Kraken 2 database. Table 2 Occurrence of true positive, false positive, true negative and false negative and AUC ROC scores when characterising Haemoproteus, Plasmodium and Leucocytozoon spp. from 68 silvereye WGS samples when parsing filtered paired-end reads directly to Kraken 2 and filtering for 70% confidence (WGS Paired-end 70%) or assembling longer contigs from paired-end reads prior to assignment with Kraken 2 and applying a 20% confidence filter (WGS Assembled 20%), in comparison to multiplex PCR data generated for the same samples. Box 1. Factors to consider when implementing a WGS mining pipeline to characterise affiliated parasites. Figure legends Figure 1 Number of reads per individual WGS sample generated from 68 individual silvereye for A) Primary reads (raw reads prior to host filtering), After Host Filter (subtraction of host reads), After Quality Filter (quality filtered reads) and Final Paired Reads (final paired-end reads with single-end reads subtracted) and B) the proportion of reads that could be identified using Kraken 2 for the assembled and paired-end reads following host filtering. Figure 2 Identified reads at each confidence interval for DNA sequences extracted from 68 WGS samples from individual silvereye. Paired-end reads were either parsed directly to Kraken 2 following host filtering and quality control (paired-end) or assembled into longer contigs prior to assignment with Kraken 2 (assembled). Colours indicate known parasite hosts and transmission routes of the parasitic genera detected. Figure 3 Species accumulation curves showing number of parasite genera detected with increasing sequencing depth for four datasets: WGS data from 68 silvereye and 11 blackcaps either assigning filtered paired-end reads directly with Kraken 2 (paired-end) or assembling longer contigs from paired-end reads prior to assignment with Kraken 2 (assembled). Each plot displays the data after filtering against three confidence thresholds: 0%, 20% and 70%. Figure 4 Number of parasite genera detected per individual for 68 silvereye samples for three datasets: Metabarcoding – amplification of the 18S region using anti-metazoan primers; WGS Paired-end 70% - assigning parasites from WGS data via Kraken 2 and filtering for 70% confidence; WGS Assembled 20% - assembling longer contigs from the paired-end sequences prior to assignment with Kraken 2 and filtering for 20% confidence. Figure 5 Total number of parasite genera detected by each method (set size) and incidences where parasite genera were detected by multiple methods (intersection size) for 68 silvereye individuals screened for parasites using five methods: WGS Paired-end 70% - assigning parasites from WGS data via Kraken 2 and filtering for 70% confidence; WGS Assembled 20% - assembling longer contigs from the paired-end sequences prior to assignment with Kraken 2 and filtering for 20% confidence; metabarcoding – amplification of the 18S region using anti-metazoan primers, PCR - multiplex PCR for haemosporidian parasites; slides - inspecting microscopy slides of blood. For example, two parasite genera were detected by WGS paired-end 70%, WGS assembled 20% and Metabarcoding. Figure 6 Variation in detection of parasite genera by each method showing number of times parasites that were detected by multiple methods were recorded per method (indicated by size) and the proportion of times that parasites were detected in the same sample across methods (indicated by bar chart) for 68 silvereye individuals screened for parasites using five methods: WGS Paired-end 70% - assigning parasites from WGS data via Kraken 2 and filtering for 70% confidence; WGS Assembled 20% - assembling longer contigs from the paired-end sequences prior to assignment with Kraken 2 and filtering for 20% confidence; metabarcoding – amplification of the 18S region using anti-metazoan primers, PCR - multiplex PCR for haemosporidian parasites; slides - inspecting microscopy slides of blood. Data Accessibility Statement Raw sequence reads will be deposited in the SRA (BioProject XXX – to be provided upon acceptance). The associated sample metadata will also be stored in the SRA (BioProject XXX – to be provided upon acceptance). The raw data is available as supplementary material and will be uploaded to DataDryad following acceptance (XXXX). The code used to generate the results is available here: https://github.com/sarah-nichols/wgs-mining-validation. Benefits Generated Benefits from this research accrue from the sharing of our data and approach on public databases as described above. Author contributions SN, SMC and BO designed the research; SN, AE, CMY and SMC conducted fieldwork to collect samples for the research; SN, AE, LSK and VP conducted lab work to generate the data; SN and AE analysed the data; SN wrote the paper with BO and SMC. All authors provided feedback on the manuscript during the drafting process. Animal studies Ethical approval was issued to SMC (Griffith University ENV/06/20/AEC) and CY (University of Tasmania 27197). We exported our samples from Australia under permit no. PWS2022-AU-001814 and imported samples to the UK under Authorisation No. ITIMP21.1083. We thank the National Parks and Wildlife Service for New South Wales (Scientific Licence SL102647 from Department of Planning, Industry and Environment, NSW, Australia, issued to SMC), Queensland (Research permit WA0042422 from Department of Environment and Science, Queensland, Australia issued to SMC) and Tasmania (Permit No. FA22367 from Department of Natural Resources and Environment, Tasmania, Australia and Scientific Research permit SR-2022-5283, Hobart City Council, Tasmania, Australia issued to CY) for allowing us access to collecting sites. We recognise ABBBS for issuing a project licence to SMC. Funding The study was supported by funding from the Natural Environment Research Council (NERC) through a PhD studentship awarded to SN (NE/S007474/1) as well as an American Ornithological Society Student Research Award and The Biology Eurofins Foundation Fund Award granted to SN. The following grants were also awarded to AE: Heredity Fieldwork Grant (The Genetics Society), NERC (NE/S007474/1), Eurofins Foundation Fund, Hesse Research Award (American Ornithological Society), Santander Travel Award (University of Oxford). Acknowledgements We are grateful to the many people who assisted us in identifying field sites and facilitating sample collection. In particular, we would like to thank Jon Coleman and the Queensland Bird Research and Banding Group; Judith and Gregory Little; and Jaslyn Allnutt. References Allen, P. C. 1987. Physiological responses of chicken gut tissue to coccidial infection: comparative effects of Eimeria acervulina and Eimeria mitis on mucosal mass, carotenoid content, and brush border enzyme activity. Poultry Science, 66 , 1306-1315. Ayling, M., Clark, M. D. & Leggett, R. M. 2020. New approaches for metagenome assembly with short reads. Briefings in bioinformatics, 21 , 584-594. Baldrian, P., Větrovský, T., Lepinay, C. & Kohout, P. 2022. High-throughput sequencing view on the magnitude of global fungal diversity. Fungal Diversity, 114 , 539-547. Banos, S., Lentendu, G., Kopf, A., Wubet, T., Glöckner, F. O. & Reich, M. 2018. A comprehensive fungi-specific 18S rRNA gene sequence primer toolkit suited for diverse research issues and sequencing platforms. BMC microbiology, 18 , 1-15. Benjamini, Y. & Speed, T. P. 2012. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic acids research, 40 , e72-e72. Bordes, F. & Morand, S. 2009. Parasite Diversity: An Overlooked Metric of Parasite Pressures? Oikos, 118 , 801-806. Boreham, P. F. & Stenzel, D. J. 1993. Blastocystis in humans and animals: morphology, biology, and epizootiology. Advances in parasitology, 32 , 1-70. Bower, S. M., Carnegie, R. B., Goh, B., Jones, S. R., Lowe, G. J. & Mak, M. W. 2004. Preferential PCR amplification of parasitic protistan small subunit rDNA from metazoan tissues. Journal of Eukaryotic Microbiology, 51 , 325-332. Box, E. D. 1981. Isospora as an Extraintestinal Parasite of Passerine Birds 1. The Journal of Protozoology, 28 , 244-246. Briscoe, A. G., Nichols, S., Hartikainen, H., Knipe, H., Foster, R., Green, A. J., Okamura, B. & Bass, D. 2022. High‐throughput sequencing of faeces provides evidence for dispersal of parasites and pathogens by migratory waterbirds. Molecular Ecology Resources, 22 , 1303-1318. Byers, J. E. 2021. Marine parasites and disease in the era of global climate change. Annual Review of Marine Science, 13 , 397-420. Callahan, B. J., Mcmurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A. & Holmes, S. P. 2016. DADA2: High-resolution sample inference from Illumina amplicon data. Nature methods, 13 , 581-583. Cao, J., Hu, Y., Liu, F., Wang, Y., Bi, Y., Lv, N., Li, J., Zhu, B. & Gao, G. F. 2020. Metagenomic analysis reveals the microbiome and resistome in migratory birds. Microbiome, 8 , 1-18. Carlson, C. J., Burgio, K. R., Dougherty, E. R., Phillips, A. J., Bueno, V. M., Clements, C. F., Castaldo, G., Dallas, T. A., Cizauskas, C. A. & Cumming, G. S. 2017. Parasite biodiversity faces extinction and redistribution in a changing climate. Science advances, 3 , e1602422. Cavalier-Smith, T., Lewis, R., Chao, E. E., Oates, B. & Bass, D. 2009. Helkesimastix marina n. sp.(Cercozoa: Sainouroidea superfam. n.) a gliding zooflagellate of novel ultrastructure and unusual ciliary behaviour. Protist, 160 , 452-479. Chen, S., Zhou, Y., Chen, Y. & Gu, J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34 , i884-i890. Ciloglu, A., Ellis, V. A., Bernotienė, R., Valkiūnas, G. & Bensch, S. 2019. A new one-step multiplex PCR assay for simultaneous detection and identification of avian haemosporidian parasites. Parasitology Research, 118 , 191-201. Clooney, A. G., Fouhy, F., Sleator, R. D., O’ Driscoll, A., Stanton, C., Cotter, P. D. & Claesson, M. J. 2016. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis. PLOS ONE, 11 , e0148028. Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F. & Wilczynski, B. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25 , 1422. Cornetti, L., Savolainen, V., Valente, L., Dunning, L., Quan, X., Black, R. & Hebert, O. The genome of the” great speciator” provides insights into bird diversification. Couto, N., Schuele, L., Raangs, E. C., Machado, M. P., Mendes, C. I., Jesus, T. F., Chlebowicz, M., Rosema, S., Ramirez, M. & Carriço, J. A. 2018. Critical steps in clinical shotgun metagenomics for the concomitant detection and typing of microbial pathogens. Scientific reports, 8 , 13767. Cox, F. 2001. Concomitant infections, parasites and immune responses. Parasitology, 122 , S23-S38. Damgaard, C., Magnussen, K., Enevold, C., Nilsson, M., Tolker-Nielsen, T., Holmstrup, P. & Nielsen, C. H. 2015. Viable bacteria associated with red blood cells and plasma in freshly drawn blood donations. PLoS one, 10 , e0120826. De Buron, I., Hill-Spanik, K., Atkinson, S., Vanhove, M., Kmentová, N., Georgieva, S., Díaz-Morales, D., Kendrick, M., Roumillat, W. & Rothman, G. 2025. ParasiteBlitz: Adaptation of the BioBlitz concept to parasitology. Journal of Helminthology, 99 , e39. De Jonge, D. S., Merten, V., Bayer, T., Puebla, O., Reusch, T. B. & Hoving, H.-J. T. 2021. A novel metabarcoding primer pair for environmental DNA analysis of Cephalopoda (Mollusca) targeting the nuclear 18S rRNA region. Royal Society Open Science, 8 , 201388. Dobson, A., Lafferty, K. D., Kuris, A. M., Hechinger, R. F. & Jetz, W. 2008. Homage to Linnaeus: how many parasites? How many hosts? Proceedings of the National Academy of Sciences, 105 , 11482-11489. Dodsworth, S. 2015. Genome skimming for next-generation biodiversity analysis. Trends in Plant Science, 20 , 525-527. Dougherty, E. R., Carlson, C. J., Bueno, V. M., Burgio, K. R., Cizauskas, C. A., Clements, C. F., Seidel, D. P. & Harris, N. C. 2016. Paradigms for parasite conservation. Conservation biology, 30 , 724-733. Durso, L. M., Harhay, G. P., Smith, T. P., Bono, J. L., Desantis, T. Z., Harhay, D. M., Andersen, G. L., Keen, J. E., Laegreid, W. W. & Clawson, M. L. 2010. Animal-to-animal variation in fecal microbial diversity among beef cattle. Applied and environmental microbiology, 76 , 4858-4862. Edgcomb, V. P., Kysela, D. T., Teske, A., De Vera Gomez, A. & Sogin, M. L. 2002. Benthic eukaryotic diversity in the Guaymas Basin hydrothermal vent environment. Proceedings of the National Academy of Sciences, 99 , 7658-7662. Faber, M., Shaw, S., Yoon, S., De Paiva Alves, E., Wang, B., Qi, Z., Okamura, B., Hartikainen, H., Secombes, C. J. & Holland, J. W. 2021. Comparative transcriptomics and host-specific parasite gene expression profiles inform on drivers of proliferative kidney disease. Scientific Reports, 11 , 2149. Fallon, S. M., Ricklefs, R. E., Swanson, B. & Bermingham, E. 2003. Detecting avian malaria: an improved polymerase chain reaction diagnostic. Journal of Parasitology, 89 , 1044-1047. Fonseca, V. G. 2018. Pitfalls in relative abundance estimation using eDNA metabarcoding. Wiley Online Library. Franssen, F. F., Janse, I., Janssen, D., Caccio, S. M., Vatta, P., Van Der Giessen, J. W. & Van Passel, M. W. 2021. Mining public metagenomes for environmental surveillance of parasites: A proof of principle. Frontiers in Microbiology, 12 , 622356. Freed, L. A. & Cann, R. L. 2006. DNA quality and accuracy of avian malaria PCR diagnostics: a review. The Condor, 108 , 459-473. Galen, S. C., Borner, J., Williamson, J. L., Witt, C. C. & Perkins, S. L. 2020. Metatranscriptomics yields new genomic resources and sensitive detection of infections for diverse blood parasites. Molecular Ecology Resources, 20 , 14-28. Gao, D., Yu, Q., Wang, G., Wang, G. & Xiong, F. 2016. Diagnosis of a malayan filariasis case using a shotgun diagnostic metagenomics assay. Parasites & Vectors, 9 , 1-5. Gardner, M. J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R. W., Carlton, J. M., Pain, A., Nelson, K. E. & Bowman, S. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 419 , 498-511. Gihawi, A., Rallapalli, G., Hurst, R., Cooper, C. S., Leggett, R. M. & Brewer, D. S. 2019. SEPATH: benchmarking the search for pathogens in human tissue whole genome sequence data leads to template pipelines. Genome Biology, 20 , 1-15. Goldfeder, R. L., Wall, D. P., Khoury, M. J., Ioannidis, J. P. & Ashley, E. A. 2017. Human genome sequencing at the population scale: a primer on high-throughput DNA sequencing and analysis. American journal of epidemiology, 186 , 1000-1009. Gouba, N., Raoult, D. & Drancourt, M. 2013. Plant and fungal diversity in gut microbiota as revealed by molecular and culture investigations. PloS one, 8 , e59474. Grabherr, M. G., Russell, P., Meyer, M., Mauceli, E., Alföldi, J., Di Palma, F. & Lindblad-Toh, K. 2010. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics, 26 , 1145-1151. Guillou, L., Bachar, D., Audic, S., Bass, D., Berney, C., Bittner, L., Boutte, C., Burgaud, G., De Vargas, C. & Decelle, J. 2012. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic acids research, 41 , D597-D604. Handel, A. & Rohani, P. 2015. Crossing the scale from within-host infection dynamics to between-host transmission fitness: a discussion of current assumptions and knowledge. Philosophical Transactions of the Royal Society B: Biological Sciences, 370 , 20140302. Harl, J., Himmel, T., Ilgūnas, M., Valkiūnas, G. & Weissenböck, H. 2023. The 18S rRNA genes of Haemoproteus (Haemosporida, Apicomplexa) parasites from European songbirds with remarks on improved parasite diagnostics. Malaria Journal, 22 , 232. Harl, J., Himmel, T., Valkiūnas, G. & Weissenböck, H. 2019. The nuclear 18S ribosomal DNAs of avian haemosporidian parasites. Malaria Journal, 18 , 1-19. Harvell, C. D., Mitchell, C. E., Ward, J. R., Altizer, S., Dobson, A. P., Ostfeld, R. S. & Samuel, M. D. 2002. Climate warming and disease risks for terrestrial and marine biota. Science, 296 , 2158-2162. Hellard, E., Fouchet, D., Vavre, F. & Pontier, D. 2015. Parasite–parasite interactions in the wild: how to detect them? Trends in Parasitology, 31 , 640-652. Hestetun, J. T., Bye-Ingebrigtsen, E., Nilsson, R. H., Glover, A. G., Johansen, P.-O. & Dahlgren, T. G. 2020. Significant taxon sampling gaps in DNA databases limit the operational use of marine macrofauna metabarcoding. Marine Biodiversity, 50 , 70. Hillmann, B., Al-Ghalith, G. A., Shields-Cutler, R. R., Zhu, Q., Gohl, D. M., Beckman, K. B., Knight, R. & Knights, D. 2018. Evaluating the information content of shallow shotgun metagenomics. Msystems, 3 , 10.1128/msystems. 00069-18. Hleap, J. S., Littlefair, J. E., Steinke, D., Hebert, P. D. & Cristescu, M. E. 2021. Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Molecular Ecology Resources, 21 , 2190-2203. Hoarau, A. O. G., Mavingui, P. & Lebarbenchon, C. 2020. Coinfections in wildlife: Focus on a neglected aspect of infectious disease epidemiology. PLOS Pathogens, 16 , e1008790. Hotaling, S., Kelley, J. L. & Frandsen, P. B. 2021. Toward a genome sequence for every animal: Where are we now? Proceedings of the National Academy of Sciences, 118 , e2109019118. Howe, K. L., Bolt, B. J., Shafie, M., Kersey, P. & Berriman, M. 2017. WormBase ParaSite− a comprehensive resource for helminth genomics. Molecular and biochemical parasitology, 215 , 2-10. Huang, S., Farrell, M. & Stephens, P. R. 2021. Infectious disease macroecology: parasite diversity and dynamics across the globe. The Royal Society. Huang, X., Huang, D., Liang, Y., Zhang, L., Yang, G., Liu, B., Peng, Y., Deng, W. & Dong, L. 2020. A new protocol for absolute quantification of haemosporidian parasites in raptors and comparison with current assays. Parasites & Vectors, 13 , 1-9. Hudson, M. E. 2008. Sequencing breakthroughs for genomic ecology and evolutionary biology. Molecular ecology resources, 8 , 3-17. Hunt, V. L., Tsai, I. J., Coghlan, A., Reid, A. J., Holroyd, N., Foth, B. J., Tracey, A., Cotton, J. A., Stanley, E. J. & Beasley, H. 2016. The genomic basis of parasitism in the Strongyloides clade of nematodes. Nature genetics, 48 , 299-307. Ishigohoka, J., Bascón-Cardozo, K., Bours, A., Fuß, J., Rhie, A., Mountcastle, J., Haase, B., Chow, W., Collins, J., Howe, K., Uliano-Silva, M., Fedrigo, O., Jarvis, E. D., Pérez-Tris, J., Illera, J. C. & Liedvogel, M. 2023. Distinct patterns of genetic variation at low-recombining genomic regions represent haplotype structure. Evolution . Jeunen, G. J., Dowle, E., Edgecombe, J., Von Ammon, U., Gemmell, N. J. & Cross, H. 2023. CRABS—a software program to generate curated reference databases for metabarcoding sequencing data. Molecular Ecology Resources, 23 , 725-738. Kahlke, T. & Ralph, P. J. 2019. BASTA–Taxonomic classification of sequences and sequence bins using last common ancestor estimations. Methods in Ecology and Evolution, 10 , 100-103. Kahn, S. D. 2011. On the future of genomic data. science, 331 , 728-729. Keck, F., Couton, M. & Altermatt, F. 2023. Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses. Molecular Ecology Resources, 23 , 742-755. Koepfli, K.-P., Paten, B., Scientists, G. K. C. O. & O’brien, S. J. 2015. The Genome 10K Project: a way forward. Annu. Rev. Anim. Biosci., 3 , 57-111. Kostic, A. D., Ojesina, A. I., Pedamallu, C. S., Jung, J., Verhaak, R. G., Getz, G. & Meyerson, M. 2011. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature biotechnology, 29 , 393-396. Kounosu, A., Murase, K., Yoshida, A., Maruyama, H. & Kikuchi, T. 2019. Improved 18S and 28S rDNA primer sets for NGS-based parasite detection. Scientific reports, 9 , 15789. Krehenwinkel, H., Wolf, M., Lim, J. Y., Rominger, A. J., Simison, W. B. & Gillespie, R. G. 2017. Estimating and mitigating amplification bias in qualitative and quantitative arthropod metabarcoding. Scientific Reports, 7 , 17668. Kuhn, M. 2011. The caret package. Vienna, Austria: R Found Stat Comput. https//cranr-project_org/package= caret. La Reau, A. J., Strom, N. B., Filvaroff, E., Mavrommatis, K., Ward, T. L. & Knights, D. 2023. Shallow shotgun sequencing reduces technical variation in microbiome analysis. Scientific reports, 13 , 7668. Laurence, M., Hatzis, C. & Brash, D. E. 2014. Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes. PloS one, 9 , e97876. Leray, M., Knowlton, N. & Machida, R. J. 2022. MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences. Environmental DNA, 4 , 894-907. Li, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 . Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R. & Subgroup, G. P. D. P. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25 , 2078-2079. Ling, C. X., Huang, J. & Zhang, H. AUC: a better measure than accuracy in comparing learning algorithms. Advances in Artificial Intelligence: 16th Conference of the Canadian Society for Computational Studies of Intelligence, AI 2003, Halifax, Canada, June 11–13, 2003, Proceedings 16, 2003. Springer, 329-341. Luikart, G., Kardos, M., Hand, B. K., Rajora, O. P., Aitken, S. N. & Hohenlohe, P. A. 2019. Population genomics: advancing understanding of nature. Population genomics: Concepts, approaches and applications , 3-79. Mans, B. J., Pienaar, R. & Latif, A. A. 2015. A review of Theileria diagnostics and epidemiology. International Journal for Parasitology: Parasites and Wildlife, 4 , 104-118. Martin, M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. Mes, T. H., Eysker, M. & Ploeger, H. W. 2007. A simple, robust and semi-automated parasite egg isolation protocol. Nature protocols, 2 , 486-489. Moon-Van Der Staay, S. Y., De Wachter, R. & Vaulot, D. 2001. Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature, 409 , 607-610. Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. 2006. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. Journal of Computational Biology, 13 , 1028-1040. Nadler, S. A. & De León, G. P.-P. 2011. Integrating molecular and morphological approaches for characterizing parasite cryptic species: implications for parasitology. Parasitology, 138 , 1688-1709. Ng, P. C. & Kirkness, E. F. 2010. Whole genome sequencing. Genetic variation: Methods and protocols , 215-226. Nichols, R. V., Vollmers, C., Newsom, L. A., Wang, Y., Heintzman, P. D., Leighton, M., Green, R. E. & Shapiro, B. 2018. Minimizing polymerase biases in metabarcoding. Molecular ecology resources, 18 , 927-939. Nikbakht, H., Xia, X. & Hickey, D. A. 2014. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium. Genome, 57 , 507-511. Nilsson, R. H., Anslan, S., Bahram, M., Wurzbacher, C., Baldrian, P. & Tedersoo, L. 2019. Mycobiome diversity: high-throughput sequencing and identification of fungi. Nature Reviews Microbiology, 17 , 95-109. Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27 , 824-834. O’leary, N. A., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D., Mcveigh, R., Rajput, B., Robbertse, B., Smith-White, B. & Ako-Adjei, D. 2016. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic acids research, 44 , D733-D745. Okamura, B., Hartigan, A. & Naldoni, J. 2018. Extensive Uncharted Biodiversity: The Parasite Dimension. Integrative and Comparative Biology, 58 , 1132-1145. Oksanen, J. F., Blanchet, G. Friendly, M., Kindt, R., Legendre, P., Mcglinn, D., Minchin, P.R., O’hara, R. B., Simpson, G.L., Solymos, P., M. Stevens, H.H. Szoecs, E. And Wagner, H. 2019. vegan: Community Ecology Package. R package version 2.5-6. Omori, S., Sato, Y., Hirakawa, S., Isobe, T., Yukawa, M. & Murata, K. 2008. Two extra chromosomal genomes of Leucocytozoon caulleryi; complete nucleotide sequences of the mitochondrial genome and existence of the apicoplast genome. Parasitology Research, 103 , 953-957. Palinauskas, V., Križanauskienė, A., Iezhova, T. A., Bolshakov, C. V., Jönsson, J., Bensch, S. & Valkiūnas, G. 2013. A new method for isolation of purified genomic DNA from haemosporidian parasites inhabiting nucleated red blood cells. Experimental Parasitology, 133 , 275-280. Pallen, M. J. 2014. Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections. Parasitology, 141 , 1856-1862. Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P.-A., Woodcroft, B. J., Evans, P. N., Hugenholtz, P. & Tyson, G. W. 2017. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature microbiology, 2 , 1533-1542. Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. 2015. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC bioinformatics, 16 , 1-19. Pedersen, A. B. & Fenton, A. 2007. Emphasizing the ecology in parasite community ecology. Trends in ecology & evolution, 22 , 133-139. Penczykowski, R. M., Laine, A. L. & Koskella, B. 2016. Understanding the ecology and evolution of host–parasite interactions across scales. Evolutionary applications, 9 , 37-52. Petney, T. N. & Andrews, R. H. 1998. Multiparasite communities in animals and humans: frequency, structure and pathogenic significance. International journal for parasitology, 28 , 377-393. Pickles, R. S., Thornton, D., Feldman, R., Marques, A. & Murray, D. L. 2013. Predicting shifts in parasite distribution with climate change: a multitrophic level approach. Global change biology, 19 , 2645-2654. Polz, M. F. & Cavanaugh, C. M. 1998. Bias in template-to-product ratios in multitemplate PCR. Applied and environmental Microbiology, 64 , 3724-3730. Posada-Cespedes, S., Seifert, D. & Beerenwinkel, N. 2017. Recent advances in inferring viral diversity from high-throughput sequencing data. Virus research, 239 , 17-32. Poulin, R. & Jorge, F. 2019. The geography of parasite discovery across taxa and over time. Parasitology, 146 , 168-175. Poulin, R. & Mouritsen, K. N. 2006. Climate change, parasitism and the structure of intertidal ecosystems. Journal of helminthology, 80 , 183-191. Pruitt, K. D., Tatusova, T. & Maglott, D. R. 2007. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 35 , D61-D65. Rhie, A., Mccarthy, S. A., Fedrigo, O., Damas, J., Formenti, G., Koren, S., Uliano-Silva, M., Chow, W., Fungtammasan, A. & Kim, J. 2021. Towards complete and error-free genome assemblies of all vertebrate species. Nature, 592 , 737-746. Sambrook, J. & Russell, D. W. 2006. Purification of nucleic acids by extraction with phenol: chloroform. Cold Spring Harbor Protocols, 2006 , pdb. prot4455. Sayers, E. W., Bolton, E. E., Brister, J. R., Canese, K., Chan, J., Comeau, D. C., Connor, R., Funk, K., Kelly, C., Kim, S., Madej, T., Marchler-Bauer, A., Lanczycki, C., Lathrop, S., Lu, Z., Thibaud-Nissen, F., Murphy, T., Phan, L., Skripchenko, Y., Tse, T., Wang, J., Williams, R., Trawick, B. W., Pruitt, K. D. & Sherry, S. T. 2022. Database resources of the national center for biotechnology information. Nucleic Acids Res, 50 , D20-d26. Seutin, G., White, B. N. & Boag, P. T. 1991. Preservation of avian blood and tissue samples for DNA analyses. Canadian Journal of Zoology, 69 , 82-90. Shen, W. & Ren, H. 2021. TaxonKit: A practical and efficient NCBI taxonomy toolkit. Journal of genetics and genomics, 48 , 844-850. Sigeman, H., Ponnikas, S. & Hansson, B. 2020. Whole-genome analysis across 10 songbird families within Sylvioidea reveals a novel autosome–sex chromosome fusion. Biology Letters, 16 , 20200082. Silamiķelism, I. 2020. Conifer. Spisak, S., Solymosi, N., Ittzes, P., Bodor, A., Kondor, D., Vattay, G., Bartak, B. K., Sipos, F., Galamb, O. & Tulassay, Z. 2013. Complete genes may pass from food to human blood. PLoS One, 8 , e69805. Stewart Merrill, T. E., Calhoun, D. M. & Johnson, P. T. 2022. Beyond single host, single parasite interactions: Quantifying competence for complete multi‐host, multi‐parasite communities. Functional Ecology, 36 , 1845-1857. Stoeck, T., Bass, D., Nebel, M., Christen, R., Jones, M. D. M., Breiner, H.-W. & Richards, T. A. 2010. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology, 19 , 21-31. Tan, C. C. S., Ko, K. K. K., Chen, H., Liu, J., Loh, M., Chia, M., Nagarajan, N. & Consortium, S. G. K. H. 2023. No evidence for a common blood microbiome based on a population study of 9,770 healthy humans. Nature Microbiology, 8 , 973-985. Tessler, M., Neumann, J. S., Afshinnekoo, E., Pineda, M., Hersch, R., Velho, L. F. M., Segovia, B. T., Lansac-Toha, F. A., Lemke, M. & Desalle, R. 2017. Large-scale differences in microbial biodiversity discovery between 16S amplicon and shotgun sequencing. Scientific reports, 7 , 6589. Thines, M. & Voglmayr, H. 2009. An introduction to the white blister rusts (Albuginales). Oomycete Genetics and Genomics: Diversity, Interactions, and Research Tools. Weinheim: Wiley-VCH , 77-92. Usyk, M., Peters, B. A., Karthikeyan, S., Mcdonald, D., Sollecito, C. C., Vazquez-Baeza, Y., Shaffer, J. P., Gellman, M. D., Talavera, G. A. & Daviglus, M. L. 2023. Comprehensive evaluation of shotgun metagenomics, amplicon sequencing, and harmonization of these platforms for epidemiological studies. Cell reports methods, 3. Valkiunas, G. 2004. Avian malaria parasites and other haemosporidia , CRC press. Valkiūnas, G., Palinauskas, V., Ilgūnas, M., Bukauskaitė, D., Dimitrov, D., Bernotienė, R., Zehtindjiev, P., Ilieva, M. & Iezhova, T. A. 2014. Molecular characterization of five widespread avian haemosporidian parasites (Haemosporida), with perspectives on the PCR-based detection of haemosporidians in wildlife. Parasitology research, 113 , 2251-2263. Vallée, G. C., Muñoz, D. S. & Sankoff, D. 2016. Economic importance, taxonomic representation and scientific priority as drivers of genome sequencing projects. BMC genomics, 17 , 125-133. Van De Peer, Y., Jansen, J., De Rijk, P. & De Wachter, R. 1997. Database on the structure of small ribosomal subunit RNA. Nucleic acids research, 25 , 111-116. Vaulot, D., Geisen, S., Mahé, F. & Bass, D. 2022. pr2‐primers: An 18S rRNA primer database for protists. Molecular Ecology Resources, 22 , 168-179. Videvall, E. 2019. Genomic advances in avian malaria research. Trends in Parasitology, 35 , 254-266. Vijayvargiya, P., Jeraldo, P. R., Thoendel, M. J., Greenwood-Quaintance, K. E., Esquer Garrigos, Z., Sohail, M. R., Chia, N., Pritt, B. S. & Patel, R. 2019. Application of metagenomic shotgun sequencing to detect vector-borne pathogens in clinical blood samples. PLoS One, 14 , e0222915. Walker, M. A., Pedamallu, C. S., Ojesina, A. I., Bullman, S., Sharpe, T., Whelan, C. W. & Meyerson, M. 2018. GATK PathSeq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts. Bioinformatics, 34 , 4287-4289. Warrenfeltz, S., Basenko, E. Y., Crouch, K., Harb, O. S., Kissinger, J. C., Roos, D. S., Shanmugasundram, A. & Silva-Franco, F. 2018. EuPathDB: the eukaryotic pathogen genomics database resource. Eukaryotic genomic databases: methods and protocols , 69-113. Weber, C. C., Paulini, M., Wellcome Sanger Institute Tree of Life Management, S., Team, L., Team, W. S. I. T. O. L. C. I. & Blaxter, M. L. 2024. Kudoa genomes from contaminated hosts reveal extensive gene order conservation and rapid sequence evolution. bioRxiv , 2024.11. 01.621499. Wood, D. 2020. kraken2 [Online]. GitHub. [Accessed 21 Nov 2024 2024]. Wood, D. E., Lu, J. & Langmead, B. 2019. Improved metagenomic analysis with Kraken 2. Genome biology, 20 , 1-13. Wuyts, J., De Rijk, P., Van De Peer, Y., Pison, G., Rousseeuw, P. & De Wachter, R. 2000. Comparative analysis of more than 3000 sequences reveals the existence of two pseudoknots in area V4 of eukaryotic small subunit ribosomal RNA. Nucleic Acids Research, 28 , 4698-4708. Wylezich, C., Belka, A., Hanke, D., Beer, M., Blome, S. & Höper, D. 2019. Metagenomics for broad and improved parasite detection: a proof-of-concept study using swine faecal samples. International Journal for Parasitology, 49 , 769-777. Wylezich, C., Caccio, S. M., Walochnik, J., Beer, M. & Höper, D. 2020. Untargeted metagenomics shows a reliable performance for synchronous detection of parasites. Parasitology research, 119 , 2623-2629. Yang, Y., Xiong, J., Zhou, Z., Huo, F., Miao, W., Ran, C., Liu, Y., Zhang, J., Feng, J. & Wang, M. 2014. The genome of the myxosporean Thelohanellus kitauei shows adaptations to nutrient acquisition within its fish host. Genome Biology and Evolution, 6 , 3182-3198. Zehtindjiev, P., Križanauskienė, A., Bensch, S., Palinauskas, V., Asghar, M., Dimitrov, D., Scebba, S. & Valkiūnas, G. 2012. A new morphologically distinct avian malaria parasite that fails detection by established polymerase chain reaction–based protocols for amplification of the cytochrome b gene. Journal of Parasitology, 98 , 657-665. Zhai, T., Ren, W., Ji, X., Wang, Y., Chen, H., Jin, Y., Liang, Q., Zhang, N. & Huang, J. 2024. Distinct compositions and functions of circulating microbial DNA in the peripheral blood compared to fecal microbial DNA in healthy individuals. Msystems, 9 , e00008-24. Bensch, S., Inumaru, M., Sato, Y., Lee Cruz, L., Cunningham, A. A., Goodman, S. J., Levin, I. I., Parker, P. G., Casanueva, P. & Hernández, M. A. 2021. Contaminations contaminate common databases. Molecular Ecology Resources, 21 , 355-362.Clark, N. J., Adlard, R. D. & Clegg, S. M. 2014. First evidence of avian malaria in Capricorn Silvereyes (Zosterops lateralis chlorocephalus) on Heron Island. Sunbird: Journal of the Queensland Ornithological Society, The, 44 , 1-11.Clark, N. J., Wells, K., Dimitrov, D. & Clegg, S. M. 2016. Co‐infections and environmental conditions drive the distributions of blood parasites in wild birds. Journal of Animal Ecology, 85 , 1461-1470.Godfrey Jr, R. D., Fedynich, A. M. & Pence, D. B. 1987. Quantification of hematozoa in blood smears. Journal of wildlife diseases, 23 , 558-565.Gudex-Cross, D., Barraclough, R. K., Brunton, D. H. & Derraik, J. G. 2015. Mosquito communities and avian malaria prevalence in silvereyes (Zosterops lateralis) within forest edge and interior habitats in a New Zealand regional park. EcoHealth, 12 , 432-440.Hartenstein, V. 2006. Blood cells and blood cell development in the animal kingdom. Annu. Rev. Cell Dev. Biol., 22 , 677-712.Laird, M. 1959. Atoxoplasma paddae (Aragão) from several South Pacific silvereyes (Zosteropidae) and a New Zealand rail. The Journal of Parasitology , 47-52.Laurence, M., Hatzis, C. & Brash, D. E. 2014. Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes. PloS one, 9 , e97876.Martin, M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J.Olsson-Pons, S., Clark, N. J., Ishtiaq, F. & Clegg, S. M. 2015. Differences in host species relationships and biogeographic influences produce contrasting patterns of prevalence, community composition and genetic structure in two genera of avian malaria parasites in southern Melanesia. J Anim Ecol, 84 , 985-98.Parsa, F., Bayley, S., Bell, F., Dodd, S., Morris, R., Roberts, J., Wawman, D., Clegg, S. R. & Dunn, J. C. 2023. Epidemiology of protozoan and helminthic parasites in wild passerine birds of Britain and Ireland. Parasitology, 150 , 297-310.Yang, R., Brice, B., Jian, F. & Ryan, U. 2018. Morphological and molecular characterisation of Isospora butcherae n. sp. in a silvereye (Zosterops lateralis)(Latham, 1801). Parasitology Research, 117 , 1381-1388. Supplementary Material File (box 1.docx) Download 16.50 KB File (table 1.docx) Download 16.90 KB File (table 2.docx) Download 15.12 KB Information & Authors Information Version history V1 Version 1 07 April 2025 Peer review timeline Published International Journal for Parasitology Version of Record 1 Jan 2026 Published Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords bioinfomatics/phyloinfomatics disease ecology metabarcoding metagenomics parasite diversity whole genome sequencing Authors Affiliations Sarah Nichols 0000-0001-5053-3858 [email protected] University of Oxford View all articles by this author Andrea Estandía University of Oxford View all articles by this author Catherine Young Australian National University Fenner School of Environment and Society View all articles by this author Lucy Knowles NERC Environmental Omics Facility, University of Sheffield View all articles by this author Vaidas Palinauskas Nature Research Centre Institute of Ecology View all articles by this author Beth Okamura 0000-0001-7279-715X Natural History Museum Library and Archives View all articles by this author Sonya Clegg Oxford University View all articles by this author Metrics & Citations Metrics Article Usage 512 views 311 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Sarah Nichols, Andrea Estandía, Catherine Young, et al. Host Whole Genome Sequence data represent an untapped resource for characterising affiliated parasite diversity. Authorea . 07 April 2025. DOI: https://doi.org/10.22541/au.174405286.65008416/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.174405286.65008416/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'a02686148da41b23',t:'MTc3OTkwMDY4MA=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00