Full text
76,105 characters
· extracted from
preprint-html
· click to expand
A comprehensive DNA barcode reference library for the macroinvertebrates of Scottish seagrass beds using Oxford Nanopore Flongle Flowcells | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 25 February 2025 V1 Latest version Share on A comprehensive DNA barcode reference library for the macroinvertebrates of Scottish seagrass beds using Oxford Nanopore Flongle Flowcells Authors : Ethan Ross 0009-0008-9694-4757 [email protected] , Stuart Piertney 0000-0001-6654-0569 , Julia Sigwart , Nathaniel Crook , Agathe Moreau , and Kara Layton 0000-0002-4302-3048 Authors Info & Affiliations https://doi.org/10.22541/au.174049397.70710513/v1 Published Ecology and Evolution Version of record Peer review timeline 512 views 218 downloads Contents Abstract Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract DNA Barcoding using Sanger sequencing is a popular technique for identifying species on a per specimen basis. However for larger projects, sequencing individual voucher specimens can be time and resource intensive and moreover is associated with high levels of sequencing failure and contamination. Oxford Nanopore Sequencing Technology (ONT) has emerged as a scalable alternative, capable of generating hundreds of DNA barcodes simultaneously using the portable, benchtop MinION sequencing device. In this study we aim to compare and contrast the sequencing outcomes of Oxford Nanopore R10 Flongle flowcells verses Sanger sequencing for DNA barcoding and produce a DNA barcode reference library. We demonstrate that DNA barcodes generated using ONT outperform those produced by Sanger sequencing in terms of recovery and sequence quality with lower rates of contamination. We then produced DNA barcodes for 146 seagrass associated marine invertebrate OTUs collected from four seagrass beds in Scotland, targeting COI and 18S V4 regions. Using both markers, we show the number of recovered OTUs was higher than if each marker was used in isolation and make use of degenerate and group-specific primer pairs to improve recovery. Furthermore we demonstrate how mapping ONT reads to pre-existing DNA barcodes can be used to reduce ambiguous basecalls and improve recovery of sequences from contaminated specimens. Overall this study informs prospective users intending to carry out multimarker DNA barcode projects using Oxford Nanopore Sequencing. Furthermore, we generated the first DNA barcode reference library for seagrass beds in Scotland to support future biomonitoring of these priority habitats. A comprehensive DNA barcode reference library for the macroinvertebrates of Scottish seagrass beds using Oxford Nanopore Flongle Flowcells Ross EG 1* , Piertney SB 1 , Sigwart JD 3 , Crook NF 1 , Moreau, A 1 , Layton KKS 1,2 1 School of Biological Sciences, University of Aberdeen, UK 2 Department of Biology, University of Toronto Mississauga, Canada 3 Senckenberg Research Institute and Museum, Marine Zoology Department, Frankfurt, Germany *corresponding author Corresponding author email: [email protected] Co-author emails: [email protected] , [email protected] , [email protected] , [email protected] , [email protected] Keywords (DNA Barcoding; Biodiversity studies; Oxford Nanopore Sequencing; Marine Invertebrates; Seagrass habitats) Data Availability: All sequence data is available on GenBank from the following accession numbers: XXXX-XXXX and BOLD systems under project code ERSSI (to be made available upon acceptance of the manuscript). Whole animals and DNA extractions are stored at the University of Aberdeen. Funding: Funding for this project comes from the New Frontiers in Research Fund - Transformation 2020 (NFRFT-2020-00073) and the University of Aberdeen. Ross EG is supported by the QUADRAT Doctoral Training Program (DTP). Conflict of Interest: The authors declare no conflict of interest. Ethical Approval: ethical permissions do not apply as all animals in this study are non-cephalopod invertebrates as per the UK Animals (Scientific Procedures) Act 1986 Amendment Regulations. Abstract DNA Barcoding using Sanger sequencing is a popular technique for identifying species on a per specimen basis. However for larger projects, sequencing individual voucher specimens can be time and resource intensive and moreover is associated with high levels of sequencing failure and contamination. Oxford Nanopore Sequencing Technology (ONT) has emerged as a scalable alternative, capable of generating hundreds of DNA barcodes simultaneously using the portable, benchtop MinION sequencing device. In this study we aim to compare and contrast the sequencing outcomes of Oxford Nanopore R10 Flongle flowcells verses Sanger sequencing for DNA barcoding and produce a DNA barcode reference library. We demonstrate that DNA barcodes generated using ONT outperform those produced by Sanger sequencing in terms of recovery and sequence quality with lower rates of contamination. We then produced DNA barcodes for 146 seagrass associated marine invertebrate OTUs collected from four seagrass beds in Scotland, targeting COI and 18S V4 regions. Using both markers, we show the number of recovered OTUs was higher than if each marker was used in isolation and make use of degenerate and group-specific primer pairs to improve recovery. Furthermore we demonstrate how mapping ONT reads to pre-existing DNA barcodes can be used to reduce ambiguous basecalls and improve recovery of sequences from contaminated specimens. Overall this study informs prospective users intending to carry out multimarker DNA barcode projects using Oxford Nanopore Sequencing. Furthermore, we generated the first DNA barcode reference library for seagrass beds in Scotland to support future biomonitoring of these priority habitats. 1 | Introduction DNA barcoding is a method for taxonomically identifying organisms using short, diagnostic sections of their genome (Hebert et al., 2003). It was proposed as alternative to morphological identification, with advantages including improved objective or quantitative delimitation, a reduction in processing time for large numbers of specimens, and the ability to identify organisms regardless of body size, condition or life stage. As a result DNA barcoding and the higher throughput (meta)barcoding have become a popular and routine tools for species identification (DeSalle and Goldstein, 2019; Ruppert et al., 2019). However, DNA barcoding can only work, when pre-existing reference barcode(s) exists for the organisms in question. Reference barcodes are DNA barcodes generated from organisms identified to through morphological investigation, which link the DNA sequences to the taxonomy. Ideally the organism should be identified to species level and sourced from the type locality (Letsch et al. forthcoming). Additionally, each species should have multiple DNA barcodes generated from individuals across its geographic range so that intraspecific phylogeographic variation can be assessed in relation to interspecific differences with sister taxa. Specimens from which the reference barcodes are generated should also be photographed live and kept preserved as a voucher specimen in a museum collection where possible. This data then serves as a reference for all future DNA barcodes of that species to be compared. Collections of reference barcodes are referred to as reference libraries. Despite the major advances over the past two decades, many species still don’t have reference barcodes and, of those which do, most species have a small number, collected from specimens covering only part of their geographic range (Weigand et al., 2019). Additionally, without comprehensive reference libraries, projects aiming to characterise communities will continue to underestimate diversity and produce inaccurate species inventories (Blackman et al., 2024). Therefore there is increasing need to populate DNA barcode reference libraries. One obstacle to this is the cost, ease and success rate of contemporary DNA barcode sequencing at scale. Current barcoding approaches typically use Sanger sequencing on a per specimen basis (Crossley et al., 2020). For larger projects, this requires hundreds of sequencing reactions, with each barcode ideally sequenced bidirectionally followed by assembly. This often leads to the subjective resolution of bases. Additionally, Sanger sequencing often struggles with coamplification of non-target DNA from contaminants, diet remnants or symbionts which can result in an amplicon sequence which does not belong to the target organism and causes spurious or low confidence basecalls. In recent years, Oxford Nanopore Technology (ONT) has emerged as an alternative to Sanger sequencing for DNA barcoding (Cuber et al., 2023; Srivathsan et al., 2021). In this process amplicons for each specimen are tagged during PCR using a unique combination of indexed primers. This enables hundreds of specimens to be multiplexed and sequenced simultaneously. During the sequencing reaction, amplicons are passed through nanopores embedded in a charged membrane (Wang et al., 2021). Base calling occurs as each nucleotide passes through the nanopore and produces a characteristic charge change. Notably this means sequencing occurs on a per molecule basis, removing the need for assembly since the full amplicon length can be sequenced at once. Post-sequencing, reads are allocated back to the specimen they were generated from using the index sequences and a DNA barcode for that specimen is produced by consensus using ONTBarcoder2 (Srivathsan et al., 2021). Using R10 MinION flowcells, upwards of 4 million COI amplicons can be generated per sequencing run, however, as shown by Cuber et al., (2023), the use of the lower output yet cheaper Flongle flowcells, further reduced the cost per specimen, all be it with reduced sequence recovery. With this in mind, we apply this pipeline to reference library creation for seagrass associated invertebrates in Scotland. Zostera marina (Linnaeus, 1753) beds provide a range of important ecosystem services (Lilley and Unsworth, 2014; Potouroglou et al., 2021) but have been badly degraded in recent decades (Green et al., 2021). As a result, there is an increased interest to apply routine monitoring methods to these habitats, one of which is barcoding based surveys targeting the macrobenthos (Cowart et al., 2015). The macrobenthos plays an important role in stabilising seagrass meadow ecosystem functions (Duffy et al. 2017) and has historically been used to produce habitat quality indices ( ref ). However, marine invertebrates are among the groups with the lowest barcode coverage (Radulovici et al., 2021; Weigand et al., 2019). The aims of this study were to compare Sanger sequencing with ONT R10 Flongle sequencing for recovering DNA barcodes and to generate a multimarker DNA barcode reference library for seagrass associated marine invertebrates. First, we compare sequence recovery, sequence quality, the rate of contamination and the sequence similarity between ONT and Sanger sequences generated from the same specimens. Following this, we use ONT R10 Flongle sequencing to barcode a collection of 316 marine invertebrate specimens targeting COI and 18S. We assess the impact of input amplicon quantity (ng) on read recovery and investigate how this metric affects the quality of downstream consensus sequences. Finally we examine the impact of a read mapping step aimed at maximizing DNA barcode recovery and minimise ambiguous basecalls in cases of contamination. Glossary OTU – a molecular and/or morphologically distinct specimen or group of similar specimens (a likely species-level unit). Specimen – an individual organism. There may be multiple specimens per OTU. Amplicon – a DNA sequence produced by PCR. There may be multiple amplicons from the same specimen generated using different PCR primers. Read – a single, continuous amplicon sequence generated by the Flongle Flowcell. Multiple reads are generated for each amplicon. Consensus sequence – a DNA sequence generated by aligning multiple reads with ONTBarcoder2. The consensus sequence may belong to the target OTU or be from contamination. Target OTU – an OTU which users aim to recover a DNA barcode for that is consistent with the morphology of the specimen. Contamination – a sequence from a non-target OTU. DNA Barcode – a consensus sequence which belongs to the target OTU and can be used to correctly identify a specimen by reverse taxonomy. Reference library – a collection of DNA barcodes used to identify multiple OTUs. A reference library may be curated to identify specific OTUs (e.g. only fishes or invertebrates), or OTUs from a specific habitat / geography (e.g. seagrass beds in Scotland). 2 | Materials and Methods 2.1 | Sample collection and preliminary identification Marine macroinvertebrates were collected from four sites in Scotland between December 2021 and August 2023 (Figure 1). Epifauna were collected by passing a 500 µm mesh kick net over seagrass leaves in a sweeping motion while wading or by swimming ~3 meters in a straight line over the seagrass with a 1 mm mesh, square hand-held net. Infauna were collected using 10 cm diameter, 20 cm long PVC pipe bored manually into the seabed and sieved with a 500 µm stainless steel sieve. Next specimens were placed in a Petri dish of seawater and photographed live using the macro settings on a Nikon COOLPIX W300 camera or a built-in camera on a stereomicroscope (Zeis Stemi 305). Following this, specimens were then euthanised by placing them in a Petri dish of isotonic MgCl 2 (80g/L) and preserved in absolute ethanol. Additional photos of ethanol preserved specimens were also collected using the stereomicroscope. Up to 18 specimens of each putative species per collection site were selected for processing, with a greater number of individuals selected in the presence of considerable morphological variation (e.g. rissoid gastropods). Morphological identification was then carried out by comparing each specimens to a relevant dichotomous key (Hayward and Ryland, 2017) (Lincoln 1979). Identifications were made using live specimens in the field whenever possible. Otherwise, IDs were based on ethanol-preserved specimens and their accompanying photos. 2.2 | DNA extraction DNA extractions were carried out using the E.Z.N.A Tissue DNA Kit, following the manufacturers protocol (D3396-01 Omega Bio-Tek). For each specimen, up to 30mg of soft tissue was collected using sterilised forceps and a scalpel, briefly blotted with paper towel to remove excess ethanol, and placed in an Eppendorf tube with 200 µl of TL buffer. Care was taken to avoid taking tissue from around the gut where dietary contaminants might be present or from tissues that could house symbionts (e.g. anemone tentacles, bivalve gills). For the dominant taxonomic groups, tissue was collected as follows: molluscs - mantle, echinoderms - tube feet, polychaetes – parapodia and crustaceans – chela or pereopods. Between specimens, forceps and scalpels were sterilised by soaking them in 1% bleach solution, washing them in double distilled water. Following this, 25 μL of Proteinase K Solution and was added to each tube and vortexed. All tubes were then placed in an incubator at 55°C overnight. Extractions were carried out in batches of 12 to 36 specimens at a time and a negative extraction control of 25 μL of Proteinase K Solution and 200 μL of TL buffer with no added tissue was included in each batch. The remaining extraction steps followed the manufacturers protocol. Rissoid snails extracted with the E.Z.N.A Tissue DNA Kit repeatably failed PCR. As a result, all rissoids were instead extracted using the E.Z.N.A Mollusc DNA Kit following the manufacturers protocol (D3373-00S, Omega Bio-Tek) which was effect at removing PCR inhibitory mucopolysaccharides. After extraction, DNA was quantified (ng/µl) and the ratio of absorbance at 260 nm and 280 nm was used to assess DNA quality using a nanodrop spectrophotometer. 2.3 PCR 2.3.1 | PCR for Sanger Sequencing DNA from 68 specimens representing 44 morphospecies was selected for Sanger sequencing of the COI region. All PCR reactions and extraction controls were carried out using the Folmer primers (Folmer et al., 1994) except for echinoderms (n=2) for which the modified forward primer LCOech1aF9 (Layton et al., 2016) was used instead of LCO1490 (Folmer et al., 1994) (Table S2). The PCRs were carried out in 25µl reactions using the Taq PCR Kit (#E5000S, New England Biolabs) with the following composition: 10x Standard Taq Reaction Buffer 2.5 µl, 10 mM dNTPs 0.5 µl, 10µM forward primer 0.5 µl, 10µM reverse Primer 0.5 µl, Taq DNA Polymerase 0.125 µl, nuclease free water 18.875 µl and undiluted template DNA 2 µl. Amplification was performed using the following PCR program: 120 seconds 95°C, followed by 35x cycles of denaturation for 40 seconds at 94°C, primer annealing for 40 seconds at 51°C and extension for 60 seconds at 72°C followed by a final elongation step for 180 seconds at 72°C. If the above PCR program failed twice, the following modified program was used instead: 120 secs 95°C, followed by 5x cycles of 40 seconds at 94°C, 40 seconds at 45°C, 60 seconds at 72°C followed by 30x cycles of denaturation for 40 seconds at 94°C, primer annealing for 40 seconds at 51°C and extension for 60 seconds at 72°C followed by a final elongation step for 180 seconds at 72°C. 2.3.2 | PCR for ONT Sequencing A total of 316 specimens were selected for COI amplification and sequencing using ONT using primer indexes from (Cuber et al., 2023). This included the 68 specimens sequenced using Sanger enabling a direct comparison between methods. Additionally, 225 specimens were selected for 18S amplification using tagged primers (Table S3). Fewer specimens were selected for 18S amplification since 18S barcodes are known to provide poor taxonomic resolution at the species level and therefore we expected identical sequences to be produced from multiple specimens from the same species (Tang et al., 2012; Wu et al., 2015). If it was clear two or more specimens had the same putative morphological ID, only one of them were selected for 18S sequencing. All tagged PCRs used the same reaction composition as PCRs for Sanger sequencing. Amplification for the COI region was performed using four combinations of tagged primer pairs (Table S3). Initially, all PCR attempts were made with the tagged Folmer primers (Folmer et al., 1994) with the following PCR program for all: 120 seconds 95°C, followed by 3x amplification cycles of denaturation for 40 seconds at 94°C, primer annealing for 40 seconds at 45°C and extension for 60 seconds at 72°C followed by 30x cycles of denaturation for 40 seconds at 94°C, primer annealing for 40 seconds at 55°C and extension for 60 seconds at 72°C with a final elongation step at 72°C for 5 minutes. For specimens which failed to amplify using the tagged Folmer primers twice, reamplification was attempted using different primer combinations. 39 specimens used tagged jgLCO1490/ jgHCO2198 (Geller et al., 2013), 9 polychaete specimens used tagged polyLCO/polyHCO (Carr et al., 2011) and 10 echinoderm specimens used tagged LCOech1aF1/ HCO2198 (Layton et al., 2016). For 21 specimens which still failed to amplify with the jgLCO1490/ jgHCO2198 (Geller et al., 2013) PCR was repeated with a lower annealing temperature: 120 seconds 95°C, followed by 3x amplification cycles of denaturation for 40 seconds at 94°C, primer annealing for 40 seconds at 45°C and extension for 60 seconds at 72°C followed by 30x cycles of denaturation for 40 seconds at 94°C, primer annealing for 40 seconds at 51°C and extension for 60 seconds at 72°C with a final elongation step at 72°C for 5 minutes. All amplification of the 18S region was performed with Uni18S/Uni18R (Zhan et al., 2013) with the following PCR program: 5 minutes 95 °C, followed by 25x amplification cycles of denaturation for 30s at 95 °C, primer annealing for 30s at 50 °C and extension for 90s at 72 °C followed by a final elongation step at 72 °C for 10 min. PCRs were repeated once if they failed in the first attempt. For all PCRs, DNA from Steromphala cineraria (Linnaeus, 1758) was used as a positive PCR control and nuclease free water in place of template DNA was used as a negative PCR control. All PCR products were visualised on a 2% agarose gel with SYBR Safe DNA stain and a 1kb DNA ladder (NEB Quick-Load Purple 1 kb DNA Ladder), run for 50 minutes at 6 Vcm -1 . 2.4 DNA sequencing 2.4.1 | Sanger Sequencing PCR products for Sanger sequencing were cleaned using the QIAquick PCR Purification Kit (Cat. No. 28104, Qiagen) following the manufacturers protocol and amplicons were eluted with 40 µL of elution buffer. Cleaned PCR products were then quantified using a Nanodrop spectrophotometer and diluted to a concentration of 5 ng/µl using nuclease free water as per sequencing recommendations. Cleaned PCR products for each specimen were sent for Sanger sequencing reactions (ABI3730 Xl platform with BigDye V3.1 from EuroFins). For 61 specimens both the forward and reverse strand were sequenced and for seven specimens only the forward strand was sequenced. Forward and reverse primers were sent separately at a concentration of 10 pmol/µl in a volume of 150 µl. Post-sequencing, electropherogram files for each sequence were imported to Geneious Prime (version 2024.0.7). All sequences were trimmed using the trim ends tool with default settings which removes bases from the ends of the sequences with an error probability of greater than 5%. The reverse strands were then converted to their reverse complement sequence and aligned with the forward strands using the pairwise sequence alignment tool with MAFFT and default settings. Once aligned, each sequence was manually trimmed to remove any remaining poor quality base calls and primer sequences to produce a contig. Seven specimens were reamplified and sequenced after a failed recovery first time round. 2.4.2 | ONT Flongle Sequencing and Consensus Sequence Generation In total, six R10 Flongle Flowcells were used for sequencing (FLO-FLG114, Oxford Nanopore Technologies). To create each sequencing library, amplicons were first pooled by adding 3µl of each PCR product to a 1.5ml Eppendorf tube. Next, the combined concentration of the pooled amplicon mixture was measured using a Qubit Fluorometer High Sensitivity Assay kit (Q32851, Thermo Fisher Scientific). 200 fmol of the pooled amplicon mixture was subsampled for library preparation. Library preparation was then carried out following the manufacturer’s protocol for ligation sequencing amplicons using the V14 Ligation Sequencing Kit (SQK-LSK114, Oxford Nanopore Technologies) and the NEBNext® Companion Module (E7180S, New England Biolabs). DNA was eluted in 7 µl of elution buffer and its concentration was measured using a Qubit Fluorometer High Sensitivity Assay kit. 20 fmol DNA was then loaded into an Flongle Flow Cell (R10.4.1) and sequencing commenced over a 24-hour period. Raw sequencing reads were basecalled using Guppy (version 23.07.15) high accuracy basecalling using the configuration file “dna_r10.4.1_e8.2_400bps_5khz_hac.cfg”, with a minimum average read Q score of 10 and adapter trimming on. The mean Q score for passed reads was then obtained using FastQC (version 0.11.9) (Andrews, 2010). Passed reads were then demultiplexed and converted into consensus sequences using ONTBarcoder2 mode 1 (Srivathsan et al. 2021). The ONTBarcoder2 run was carried out with default settings for COI and modified settings for 18S (Table S4). ONTBarcoder2 produced consensus sequences by length and similarity. One of these was selected per specimen from the output table. Barcodes created by similarity were preferentially selected over those generated by length because the similarity threshold imposed to produce consensus sequences by similarity (default 90%) were deemed useful for removing contamination reads from the consensus sequence generation process. Barcodes generated by length were only selected if there was no associated barcode generated by similarity. If a specimen had no consensus sequence, it was considered not recovered. Consensus by barcode comparison barcodes were ignored in favour of a mapping process discussed below. Furthermore ONTBarcoder2 also flagged each sequence as QC compliant if they were translatable, matches the expected amplicon length and is free from ambiguous bases. This was recorded for COI only since 18S is not a protein coding gene. The percentage of ambiguous bases for each sequence was recorded to ensure they were in compliance of the Barcode of Life Database upload requirements, with fewer than 1% ambiguous bases across the total sequence length (Milton et al., 2013). 2.5 | Taxonomic Assignment COI Sanger sequences and ONT consensus sequences were individually queried against the NCBI Nucleotide collection (nr/nt) using BLASTN with default settings and the Barcode of Life Database (BOLD) using the online identification tool (Figure 2). BOLD searches were made against the extended reference library which included taxa identified to taxonomic levels above species. No species level identifications were considered unless COI hits were > 90% query cover and > 97% sequence similarity. 18S ONT sequences were queried against the NCBI Nucleotide collection (nr/nt) only (Figure 2). No species level identification was considered from any 18S hits, and genus level identifications were only considered if there was > 90% query cover and 100% sequence similarity. The IDs from morphology, COI and 18S were then assessed for congruence. The methods were considered congruent when the taxonomic rank of one of the IDs was the same or sat within the taxonomic rank of another, for example if the 18S ID was Nereididae and the COI ID was Platynereis dumerilii (Audouin & Milne Edwards, 1833), then the IDs were considered congruent, else congruence was rejected. 2.6 | Mapping of reads to improve DNA barcode recovery When ONTBarcoder2 produced barcodes with ambiguous bases and/or that belonged to a non-target organism, attempts were made to recover a DNA barcode with fewer ambiguities from the target OTU by mapping target reads to a preexisting DNA barcode, hereafter referred to as the map sequence (Figure 2). For each specimen, all demultiplexed reads were mapped to a map sequence using the “Map to Reference” tool in Geneious Prime beginning with the Low Sensitivity / Fastest setting and sequentially increasing the sensitivity until sequences were mapped. Map sequences were selected based on the putative morphological ID of the target specimens. Map sequences were preferentially taken from non-contaminated DNA barcodes that were recovered from other specimens with the same putative morphological ID as the target from within our dataset. If there were no other specimens in our dataset which morphologically matched, a preexisting DNA barcode from a specimen assigned the same rank as the putative morphological ID was collected from BOLD (COI) or GenBank (18S). If there were no preexisting entries matching the putative morphological ID, a closely related organism was selected instead (e.g. same genus or same family). Mapped reads were then fed back into ONTBarcoder2 using mode 2 with the same consensus creation parameters as demultiplexing (Table S4). Again, barcodes created by similarity were preferentially selected over those generated by length and if a specimen had no consensus sequence, it was considered as not recovered. The new consensus sequences were then re-queried against the public reference library as in Figure 2. The new barcode replaced the DNA barcode originally produced by ONTBarcoder2 if it belonged to the target organism and contained fewer ambiguous bases. 2.7 | Assessing the relationship between of input DNA and the number of demultiplexed reads For the second Flongle run, eight specimens each representing different species were selected to test the impact of the amount of input amplicon DNA (ng) added into the Flongle sequencing library on the number of demultiplexed sequences recovered after sequencing by ONTBarcoder2. Barcodes for the eight species had already been sequenced in a preceding Flongle run and specimens were selected to cover multiple phyla. PCRs for each specimen PCRs were carried out in triplicate using different combinations of tagged LCO1490/HCO2198 primers. PCRs for Asterias rubens (Linnaeus, 1758) used primers LCOech1aF1/HCO2198 instead. The concentration of each PCR product was measured using a Qubit Fluorometer (high sensitivity). Each of the triplicate PCR products was then diluted to three distinct concentrations with molecular grade water to represent a high, medium, and low input concentration (Table S5). A Nanopore library was then created using the steps outlined above. 3 | Results 3.1 | Sanger sequencing vs Oxford Nanopore sequencing for DNA barcoding Of the 68 specimens sequenced using Sanger sequencing, contigs combining forward and reverse reads were produced for 59 specimens while seven taxa had only forward sequences and for a total of 66. (Table 1). Two specimens failed Sanger sequencing ( Cochlodesma praetenue (Pulteney, 1799) and Hediste diversicolor (O.F. Müller, 1776) (Table 1). In comparison, sequences for all 68 specimens were recovered using ONT. Ambiguous base calls were present in Sanger barcodes for five specimens and four sequences were contamination. In contrast using ONT, ambiguous bases were only present in two consensus sequences and three consensus sequences were contamination (Table 1). Using Sanger sequencing, molecular identification to species level was possible for 46 specimens, while the remaining specimens were identified to genus (9), family (4), order (1) and superorder (2) whereas using ONT, 50 specimens could be identified to species level with the remaining specimens were identified to genus (8), family (4) or order (1) and superorder (2) (Table 1). For four specimens, it was possible to recover DNA barcodes for the target taxon using ONT, when the Sanger counterparts were returned as contamination, however, this was only possible after mapping (Table 1). For Nicolea zostericola (Örsted, 1844)and Pygospio elegans (Claparède, 1863) a contaminated barcode was recovered by ONT when a non-contaminated barcode was recovered by Sanger sequencing (Table 1). Neither method could recover a non-contaminated DNA barcode for Cochlodesma praetenue . The length of DNA barcodes varied between sequencing methods, with ONT generating more DNA barcodes of the expected 658 bp length. Sanger amplicon lengths were truncated for 14 specimens due to low quality of base calls at the ends of the sequences and ONT amplicon lengths deviate from 658 bp in six specimens (Table 1). Of the 60 specimens where both ONT and Sanger sequences were recovered, 52 had identical sequences (Table 1). For the remaining eight specimens, sequence similarity varied between 99.9% and 96.5%. Dissimilarity was mainly driven by single insertions or ambiguous base calls towards the ends of the Sanger sequences (Figure S6). not-yet-known not-yet-known not-yet-known unknown 3.2 | Generating an ONT Reference Library Using a combination of morphology and DNA barcodes we identified 150 OTUs, spanning 12 phyla, 24 classes, 42 orders, 92 families and 101 genera. The four dominant metazoan groups were Polychaeta (n = 45), Malacostraca (n = 38), Gastropoda (n = 20) and Bivalvia (n = 14), collectively accounting for 78% of OTUs in the dataset. For 119 OTUs (80%) both COI and 18S DNA barcodes were recovered, for 14 OTUs only a COI barcode was recovered, for 13 OTUs only an 18S barcode was recovered and for 4 OTUs neither barcodes were recovered. 3.2.1 | Tagged PCR success and Flongle Sequencing Recovery COI amplicons were generated for 139 OTUs (93%) and 18S amplicons were generated for 134 OTUs (90%) (Figure S7). Without the use of the degenerate and group specific primers, COI barcodes for 23 OTUs would not have been recovered. There was no overlap in the OTUs which failed COI PCRs compared to those which failed 18S PCRs. In total 362 COI and 218 18S amplicons were loaded into the MinION across six different Flongles. The number of raw reads produced by each Flongle varied between 290701 – 426031 and was not influenced by the number of active pores (F (1,4) = 5.9, p = 0.07). The majority of reads had a mean basecall accuracy between 96.8 – 99.2% (Table 2). COI consensus sequences were recovered for 334 amplicons (92.3%) and 18S consensus sequences were recovered for 215 amplicons (98.6%). Across the three runs with both 18S and COI amplicons, a higher percentage of 18S amplicons were recovered when compared to COI (Table 2). 3.2.2 | Sequence quality After consensus sequence creation, the number of ambiguous bases in the consensus sequences decreased when more reads were used to produce them and the sequences with the highest number of ambiguous bases were mostly contaminated (Figure 3a). By including the mapping step the number of ambiguous bases per sequence decreased, particularly for COI (Figure 3b). Selecting only sequences which belonged to the same OTU before and after mapping, mapping resulted in an average reduction of 2.2 ambiguous bases per sequence. The median number of reads per consensus sequence prior to mapping for COI was 861 and for 18S was 1,318 (Figure 3c). After mapping, this dropped to 744 and 1,274 respectively due to non-target reads being removed prior to creating consensus sequences (Figure 3d). In total, 277 COI sequences (83.2%) were all translatable and contained no ambiguous bases. 56 (16.8%) COI sequences had ambiguous bases with 34 (10.2%) of these with ambiguous bases making up more than 1% of the total amplicon length. After including the mapping step, the number of COI sequences with ambiguous bases decreased by nine, to 47 (14.1%) and the number of these with ambiguous bases making up more than 1% of the total amplicon length dropped by 14 to 20 (6.0%). For 18S, 184 (85.6%) of 18S sequences had no ambiguous bases. 31 (14.4%) sequences had ambiguous bases, five (2.3%) sequences with ambiguous bases greater than 1% of the total amplicon length. After the mapping step the number of 18S sequences with ambiguous bases decreased by five, to 26 (12.1%) and the number of these with ambiguous bases making up more than 1% of the total amplicon length dropped by two to three (1.4%). By diluted amplicons from eight specimens, a clear positive relationship emerged showing the amount of input amplicon DNA drives the number of demultiplexed reads recovered post sequencing (F (1, 22) = 272.6, p < 0.05) (Figure 4). 3.2.3 | Contamination COI consensus sequences were recovered for 286 specimens. Of these, 237 belonged to the target OTU, while 49 (17%) derived from contamination. Similarly, 18S consensus sequences were recovered for 213 specimens. Of these, 202 belonged to the target OTU, while 11 derived from contamination (5%). By including the mapping step, it was possible to recover consensus sequences belonging to the target OTU for 31 of the 49 initially contaminated specimens (63%). This increased the recovery of consensus sequences belonging to the target OTU to 268 out of 286 specimens (94%). For 18S, it was possible to recover consensus sequences belonging to the target OTU for 7 of the 11 initially contaminated specimens (63%). This increased the recovery of non-contaminated consensus sequences to 209 out of 213 specimens (98%). The rates contamination in Bivalvia was disproportionately higher when compared to the other dominant classes (χ² (3) = 10.34, p < 0.05), the sources being from macroalgae or marine gammaproteobacteria (Figure S8c). Overall, COI barcodes for 14 OTUs (10% of OTUs in the dataset) and 18S barcodes for 6 OTUs (4% of OTUs in the dataset) would not have been recovered without carrying out the mapping step. 3.2.4 | Taxonomic resolution, congruence and new barcodes Using the COI DNA barcodes and morphology on their own, the most common taxonomic rank was species although 56% of the species level IDs from morphology were revoked after taking the molecular data into account (Figure 5). The most common taxonomic rank which 18S barcodes resolved to was order, closely followed by family. Excluding cases of contamination, there was congruence between the COI ID and 18S ID for all OTUs. There was congruence between molecular IDs (COI and 18S combined) and morphological IDs for 78% of OTUs. Finally, COI barcodes were generated for six morphologically identified species which had no previous COI DNA barcodes on BOLD ( Calliopaea bellula (d’Orbigny, 1837) , Leptocheirus pectinatus (Norman, 1869), Leucothoe spinicarpa (Abildgaard, 1789), Moerella donacina (Linnaeus, 1758), Praunus neglectus (G.O. Sars, 1869) and Travisia forbesii (Johnston, 1840)). 4 | Discussion This study favourably compared Oxford Nanopore Sequencing to traditional Sanger sequencing for generating DNA barcodes and successfully generated a reference library of 146 marine invertebrates utilising a two DNA markers and a range of PCR primer pairs to improve recovery across taxa. 4.1 | PCR success For projects aiming to barcode species from across multiple phyla, we show the use of group-specific and degenerate primers can improve PCR success and can be easily incorporated into the ONTBarcoder2 pipeline. Additionally, multimarker approaches are becoming more common in DNA metabarcoding studies to account for the taxonomic PCR biases of different markers (Robinson et al., 2022; Zhang et al., 2018). In this dataset, there are several examples of OTUs amplifying with one but not both markers and we therefore suggest multiple marker approaches should become common place in reference library creation too, especially since different markers can be sequenced simultaneously using ONT. There were only four instances where neither amplicon was recovered; a demosponge and a polychaete which were contaminated and another polychaete and an amphipod which yielded 0 ng/µl after extraction and unsurprisingly failed to amplify. Poor amplification in Porifera is well known in DNA barcoding (Vargas et al. 2012) and the small body size of the other specimens (< 3mm) may not have yielded enough target DNA. 4.2 | Sequence Recovery Sanger sequence failures occurred in only two specimens. Common causes of this include insufficient concentrations of DNA and non-nucleic acid contaminants inhibiting the sequencing reaction, indicated by low 260nm/230nm ratios (Crossley et al., 2020). However, other specimens belonging to the same species, with identical amplicon DNA concentration and similar 260nm/230nm ratios were successfully sequenced, so the exact cause of failure remains unclear. With respect to ONT, using R9 Flongle flowcells, Srivathsan et al. (2021) had a recovery rate between 93-95% when sequencing between 191-257 amplicons, while Cuber et al. (2023) had a recovery rate between 51-84% when sequencing 220 amplicons. More recently Srivathsan et al., (2024) demonstrated a 91% and 93% recovery rate when sequencing 285 amplicons using R10 Flongle flowcells with high accuracy and super-accuracy basecalling respectively, which is comparable to our recovery rate of 91-100% when sequencing between 33-128 amplicons with high accuracy basecalling. The number of specimens we ran per Flongle was certainly below the upper limit, since around 250 specimens can be sequenced per run with no decrease in recovery (Srivathsan et al., 2024). With these previous studies in mind, the change from R9 to R10 reaction chemistry seems to have increased recovery and decreased variability between individual runs. We also saw substantial variation in demultiplexed reads by up to five orders of magnitude among amplicons. Given the strong positive relationship between input DNA and the number of recovered reads, and that we did not equimolar prior to sequencing, this variation is likely explained by unequal representation of individual amplicons in the sequencing library. To further improve recovery an equimolaring step could be introduced, however in agreement with Srivathsan et al., (2024) we recognise the trade-off in cost and time with equimolaring hundreds of specimens given the low cost of resequencing. 4.3 | Sequence quality and Contamination Ambiguous bases were common in consensus sequences for both markers. The first likely cause of ambiguous bases is sequencing error. Although the basecalling accuracy of ONT has historically been worse than other sequencing technologies, this is not the case with current R10.4.1 reaction chemistry (Bogaerts et al., 2024). Other studies show average read accuracies of >97%, consistent with our data (Cuber et al., 2023; Zhang et al., 2023). Despite this, with too few reads, some bases cannot be resolved by consensus. Indeed in our data we see ambiguities are more pronounced in consensus sequences made from small numbers of reads. The highest number of reads that resulted in a DNA barcode with more than 1% ambiguous bases was 32, while for an untranslatable sequence, it was 24. Therefore we suggest a minimum threshold of 30 reads should ensure high quality DNA barcodes. Th second possible cause of ambiguous bases is contamination, where target and non-target reads are erroneously merged into a single chimeric consensus. Indeed we see the consensus sequences with the most ambiguous bases were also frequently contaminated and mostly associated with COI verses 18S. This is not surprising given that the high rates of non-specific amplification for “universal” COI primers are well known (Lobo et al., 2013). While users should continue to optimise PCR reactions to minimise coamplification and use stringent sampling and lab precautions, we show mapping is a worthwhile step to improve recovery and quality of DNA barcodes from contaminated specimens. Additionally, poor amplification in Bivalvia using Folmer primers has previously been demonstrated (Barco et al., 2016; Layton et al., 2014) and likely explains the higher rate of contamination in this group. Furthermore, the degenerate primers jgLCO1490/ jgHCO2198 (Geller et al., 2013) were required to generate a non-contaminated barcodes in half of our bivalve OTUs, further highlighting the benefits of these types of primers for problematic groups. 4.5 | Variable sequence lengths The length of translatable COI amplicons for many marine invertebrates deviated from the expected 658 bp, suggesting codon variation in our data, up to two codons below (e.g. Trivia monacha (da Costa, 1778)) and eight codon above expected (e.g. Polyophthalmus sp .(Quatrefages, 1850)). COI amplicon length variation has been acknowledged in previous studies working on marine invertebrates (Barco et al., 2016). The 18S V4 amplicon is a non-coding region with a much higher variation in length from 421 bp ( Cephalothrix rufifrons ) to 741 bp ( Harpinia sp. ), notably higher than the expected upper size limit of 600 bp presented by the designers of the primers (Zhan et al. 2014). Variable lengths among the COI amplicons may confound ONTBarcoder2 because the pipeline was designed for terrestrial arthropods and assumes all amplicons will be of equal length (default 658 bp). Indeed in the barcode fixing step, ambiguous bases will be introduced and codons deleted to ensure the final consensus sequence conforms with expected length set by the user (Srivathsan et al., 2021). Because of this caveat, we omitted the barcode fixing of the pipeline, however this did come at the cost of legitimate errors such as small insertions or deletions not being corrected automatically. This was particularly problematic for consensus sequences produced from a small numbers of reads since ONT sequencing is known for elevated levels of indels (Chiou et al. 2023). Furthermore, length variable amplicons will not be flagged as QC compliant by ONTBarcoder2, even if they are translatable and free of ambiguities which may lead to misleading interpretations of sequence quality. For length variable non coding regions, the recommendation is to run the pipeline without the consensus by similarity and barcode fixing steps (https://github.com/asrivathsan/ONTBarcoder2). However we found the exclusion of reads by the 90% similarity threshold helped in removing non-target reads. Additionally we saw no major differences in the length of the 18S barcodes produced by length and similarity. 4.6 Taxonomic resolution Both 18S and COI are common markers for marine invertebrates, however in line with many previous studies we see COI is much better at resolving animal specimens to species and genus level (Antil et al., 2023), compared to the 18S V4 region which is more appropriate for order and family levels (Wu et al., 2015). Species level IDs by morphology were also common, but frequently incorrect, highlighting the complementarity of molecular and morphological approaches. Identification to species level is preferable because a specimen can be more accurately linked to important aspects of their ecology such as trophic interactions, functional group, life history, tolerance to environmental stressors and invasive/endangered status among others. Despite this, major knowledge gaps still exist in these areas for most marine invertebrates (Chen, 2021) and species level identification may not be necessary for some biomonitoring applications making 18S identifications still valuable (Bailey et al., 2001; Martin et al., 2016). 4.6 Implications for the future of DNA barcoding Ultimately the intention behind using ONT for DNA barcoding is to rapidly sequence specimens at a minimal cost, with maximum ease. Indeed, ONT based barcoding has now surpassed the cost effectiveness of other third generation sequencing platforms such as the PacBio Sequel (Hebert et al., 2025). At the time of laboratory analysis, twelve R10 Flongle flowcells and the library preparation reagents cost $2786 in the UK, and the tagged PCR primers cost $210, equating to $5.55 per DNA barcode. The bidirectional Sanger sequencing and PCR clean-up kits resulted in around double the cost, at of $11.52 per sequence. We only used six Flongle flowcells in this study, noting that our entire dataset could have been run using only three. However, at time of writing, Flongle flowcells must be purchased in twelves, so for smaller projects, there will likely be wasted sequencing potential. Additionally, Flongle flowcells must currently be used within four weeks of delivery to be run under warranty, however their delivery can be staggered in batches to accommodate this. The ONT library preparation is intuitive, making it easy to train new users and can be carried out in a few hours. The sequencing and bioinformatics can occur within 24 hours and in real time too (Srivathsan et al., 2024). Furthermore, given a powerful enough desktop computer, ONT basecalling and ONTBarcoder2 can be run locally using GUIs, without needing a HPC which has substantial advantages for researchers without these facilities. Irrespective of technological advancement, DNA barcodes should still be linked to correctly identified specimens before they can be used in a reference library. This issue is still pervasive in public sequence repositories (Leray et al., 2022; Weigand et al., 2019), and we argue gathering appropriate taxonomic expertise for cross-phyla studies is still one of the greatest challenges in this field. The majority of our identifications were done by reverse taxonomy, with over half of our morphological species IDs revoked in light of the molecular data. We were fortunate to carry out this work in a locality where invertebrate diversity is historically well known, with most of our OTUs having pre-existing DNA barcodes from neighbouring geographic regions. However, many barcoding projects will not have this luxury, and therefore a reliance on reverse taxonomy alone should not motivate reference library generation moving forward. 5 | Conclusions Flongle sequencing and the ONTBarcoder2 analysis pipeline represents a significant jump in the ease and scale at which inhouse DNA barcoding can be carried out, with important applications for monitoring priority habitats. This work agrees with previous authors that ONT sequencing is a suitable alternative to Sanger sequencing in terms of cost, recovery, and sequence quality (Cuber et al., 2023). Additionally, the ONTBarcoder2 pipeline works efficiently for COI and 18S barcoding of marine invertebrates across eleven phyla. We also highlight the utility of a mapping step which can improve barcode quality and recovery by circumventing the impacts of some contamination. Finally, we generated COI DNA barcodes for six previously un-sequenced invertebrate species inhabiting UK seagrass beds and demonstrate the ONTBarcoder2 pipeline is a useful tool in rapidly generating DNA barcode reference libraries. Acknowledgements We would like to thank Fiona Ware and Sankurie Pye from the National Museums Scotland for assisting with the morphological ID of some specimens. We would also like to thank the following NGOs for facilitating sample collection and accommodation throughout fieldwork: Arran COAST, Seawilding, and the Skye Seas Survey Initiative. Author Contributions Ross EG, Layton KKS Piertney SB and Sigwart JD conceived the initial concept. Ross EG, Crook NF and Moreau, A carried out the field work, morphological analysis and laboratory work. Ross EG carried out the genetic analysis and wrote the manuscript. Layton KKS Piertney SB and Sigwart JD guided the laboratory work and genetic analysis. All authors contributed to the final manuscript. 6 | References Andrews, S., 2010. FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Antil, S., Abraham, J.S., Sripoorna, S., Maurya, S., Dagar, J., Makhija, S., Bhagat, P., Gupta, R., Sood, U., Lal, R., Toteja, R., 2023. DNA barcoding, an effective tool for species identification: a review. Mol Biol Rep 50, 761–775. https://doi.org/10.1007/s11033-022-08015-7Bailey, R.C., Norris, R.H., Reynoldson, T.B., 2001. Taxonomic resolution of benthic macroinvertebrate communities in bioassessments. Journal of the North American Benthological Society 20, 280–286. https://doi.org/10.2307/1468322Barco, A., Raupach, M.J., Laakmann, S., Neumann, H., Knebelsberger, T., 2016. Identification of North Sea molluscs with DNA barcoding. Molecular Ecology Resources 16, 288–297. https://doi.org/10.1111/1755-0998.12440Blackman, R., Couton, M., Keck, F., Kirschner, D., Carraro, L., Cereghetti, E., Perrelet, K., Bossart, R., Brantschen, J., Zhang, Y., Altermatt, F., 2024. Environmental DNA: The next chapter. Molecular Ecology 33, e17355. https://doi.org/10.1111/mec.17355Bogaerts, B., Van den Bossche, A., Verhaegen, B., Delbrassinne, L., Mattheus, W., Nouws, S., Godfroid, M., Hoffman, S., Roosens, N.H.C., De Keersmaecker, S.C.J., Vanneste, K., 2024. Closing the gap: Oxford Nanopore Technologies R10 sequencing allows comparable results to Illumina sequencing for SNP-based outbreak investigation of bacterial pathogens. J Clin Microbiol 62, e0157623. https://doi.org/10.1128/jcm.01576-23Carr, C.M., Hardy, S.M., Brown, T.M., Macdonald, T.A., Hebert, P.D.N., 2011. A Tri-Oceanic Perspective: DNA Barcoding Reveals Geographic Structure and Cryptic Diversity in Canadian Polychaetes. PLOS ONE 6, e22232. https://doi.org/10.1371/journal.pone.0022232Chen, E.Y.-S., 2021. Often Overlooked: Understanding and Meeting the Current Challenges of Marine Invertebrate Conservation. Front. Mar. Sci. 8. https://doi.org/10.3389/fmars.2021.690704Cowart, D.A., Pinheiro, M., Mouchel, O., Maguer, M., Grall, J., Miné, J., Arnaud-Haond, S., 2015. Metabarcoding Is Powerful yet Still Blind: A Comparative Analysis of Morphological and Molecular Surveys of Seagrass Communities. PLOS ONE 10, e0117562. https://doi.org/10.1371/journal.pone.0117562Crossley, B.M., Bai, J., Glaser, A., Maes, R., Porter, E., Killian, M.L., Clement, T., Toohey-Kurth, K., 2020. Guidelines for Sanger sequencing and molecular assay monitoring. J Vet Diagn Invest 32, 767–775. https://doi.org/10.1177/1040638720905833Cuber, P., Chooneea, D., Geeves, C., Salatino, S., Creedy, T.J., Griffin, C., Sivess, L., Barnes, I., Price, B., Misra, R., 2023. Comparing the accuracy and efficiency of third generation sequencing technologies, Oxford Nanopore Technologies, and Pacific Biosciences, for DNA barcode sequencing applications. Ecological Genetics and Genomics 28, 100181. https://doi.org/10.1016/j.egg.2023.100181DeSalle, R., Goldstein, P., 2019. Review and Interpretation of Trends in DNA Barcoding. Front. Ecol. Evol. 7. https://doi.org/10.3389/fevo.2019.00302Folmer, O., Black, M., Hoeh, W., Lutz, R., Vrijenhoek, R., 1994. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol 3, 294–299.Geller, J., Meyer, C., Parker, M., Hawk, H., 2013. Redesign of PCR primers for mitochondrial cytochrome c oxidase subunit I for marine invertebrates and application in all-taxa biotic surveys. Mol Ecol Resour 13, 851–861. https://doi.org/10.1111/1755-0998.12138Green, A.E., Unsworth, R.K.F., Chadwick, M.A., Jones, P.J.S., 2021. Historical Analysis Exposes Catastrophic Seagrass Loss for the United Kingdom. Frontiers in Plant Science 12.Hayward, P.J., Ryland, J.S., Hayward, P.J., Ryland, J.S. (Eds.), 2017. The Marine Environment of North-West Europe, in: Handbook of the Marine Fauna of North-West Europe. Oxford University Press, p. 0. https://doi.org/10.1093/acprof:oso/9780199549443.003.0001Hebert, P.D.N., Cywinska, A., Ball, S.L., deWaard, J.R., 2003. Biological identifications through DNA barcodes. Proc Biol Sci 270, 313–321. https://doi.org/10.1098/rspb.2002.2218Hebert, P.D.N., Floyd, R., Jafarpour, S., Prosser, S.W.J., 2025. Barcode 100K Specimens: In a Single Nanopore Run. Molecular Ecology Resources 25, e14028. https://doi.org/10.1111/1755-0998.14028Layton, K., Corstorphine, E., Hebert, P., 2016. Exploring Canadian Echinoderm Diversity through DNA Barcodes. PLoS ONE 11. https://doi.org/10.1371/journal.pone.0166118Layton, K.K.S., Martel, A.L., Hebert, P.D., 2014. Patterns of DNA Barcode Variation in Canadian Marine Molluscs. PLOS ONE 9, e95003. https://doi.org/10.1371/journal.pone.0095003Leray, M., Knowlton, N., Machida, R.J., 2022. MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences. Environmental DNA 4, 894–907. https://doi.org/10.1002/edn3.303Lilley, R.J., Unsworth, R.K.F., 2014. Atlantic Cod (Gadus morhua) benefits from the availability of seagrass (Zostera marina) nursery habitat. Global Ecology and Conservation 2, 367–377. https://doi.org/10.1016/j.gecco.2014.10.002Lincoln, J.R. 1979. British Marine Amphipoda: Gammaridea [WWW Document], n.d. URL https://www.nhbs.com/british-marine-amphipoda-gammaridea.Martin, G.K., Adamowicz, S.J., Cottenie, K., 2016. Taxonomic resolution based on DNA barcoding affects environmental signal in metacommunity structure. Freshwater Science 35, 701–711. https://doi.org/10.1086/686260Milton, M., Pierossi, P., Ratnasingham, S., 2013. Barcode of life data systems handbook. Biodiversity Institute of Ontario.Folmer, O., Black, M., Hoeh, W., Lutz, R., Vrijenhoek, R., 1994. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular marine biology and biotechnology 3.Potouroglou, M., Whitlock, D., Milatovic, L., MacKinnon, G., Kennedy, H., Diele, K., Huxham, M., 2021. The sediment carbon stocks of intertidal seagrass meadows in Scotland. Estuarine, Coastal and Shelf Science 258, 107442. https://doi.org/10.1016/j.ecss.2021.107442Radulovici, A.E., Vieira, P.E., Duarte, S., Teixeira, M.A.L., Borges, L.M.S., Deagle, B.E., Majaneva, S., Redmond, N., Schultz, J.A., Costa, F.O., 2021. Revision and annotation of DNA barcode records for marine invertebrates: report of the 8th iBOL conference hackathon. Metabarcoding and Metagenomics 5, e67862. https://doi.org/10.3897/mbmg.5.67862Robinson, C.V., Porter, T.M., McGee, K.M., McCusker, M., Wright, M.T.G., Hajibabaei, M., 2022. Multi-marker DNA metabarcoding detects suites of environmental gradients from an urban harbour. Sci Rep 12, 10556. https://doi.org/10.1038/s41598-022-13262-6Ruppert, K.M., Kline, R.J., Rahman, M.S., 2019. Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA. Global Ecology and Conservation 17, e00547. https://doi.org/10.1016/j.gecco.2019.e00547Srivathsan, A., Feng, V., Suárez, D., Emerson, B., Meier, R., 2024. ONTBarcoder2 2.0: rapid species discovery and identification with real-time barcoding facilitated by Oxford Nanopore R10.4. Cladistics 40, 192–203. https://doi.org/10.1111/cla.12566Srivathsan, A., Lee, L., Katoh, K., Hartop, E., Kutty, S.N., Wong, J., Yeo, D., Meier, R., 2021. ONTBarcoder2 and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone. BMC Biol 19, 217. https://doi.org/10.1186/s12915-021-01141-xTang, C.Q., Leasi, F., Obertegger, U., Kieneke, A., Barraclough, T.G., Fontaneto, D., 2012. The widely used small subunit 18S rDNA molecule greatly underestimates true diversity in biodiversity surveys of the meiofauna. Proc Natl Acad Sci U S A 109, 16208–16212. https://doi.org/10.1073/pnas.1209160109Wang, Yunhao, Zhao, Y., Bollas, A., Wang, Yuru, Au, K.F., 2021. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 39, 1348–1365. https://doi.org/10.1038/s41587-021-01108-xWeigand, H., Beermann, A.J., Čiampor, F., Costa, F.O., Csabai, Z., Duarte, S., Geiger, M.F., Grabowski, M., Rimet, F., Rulik, B., Strand, M., Szucsich, N., Weigand, A.M., Willassen, E., Wyler, S.A., Bouchez, A., Borja, A., Čiamporová-Zaťovičová, Z., Ferreira, S., Dijkstra, K.-D.B., Eisendle, U., Freyhof, J., Gadawski, P., Graf, W., Haegerbaeumer, A., van der Hoorn, B.B., Japoshvili, B., Keresztes, L., Keskin, E., Leese, F., Macher, J.N., Mamos, T., Paz, G., Pešić, V., Pfannkuchen, D.M., Pfannkuchen, M.A., Price, B.W., Rinkevich, B., Teixeira, M.A.L., Várbíró, G., Ekrem, T., 2019. DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. Science of The Total Environment 678, 499–524. https://doi.org/10.1016/j.scitotenv.2019.04.247Wu, S., Xiong, J., Yu, Y., 2015. Taxonomic Resolutions Based on 18S rRNA Genes: A Case Study of Subclass Copepoda. PLOS ONE 10, e0131498. https://doi.org/10.1371/journal.pone.0131498Zhan, A., Hulák, M., Sylvester, F., Huang, X., Adebayo, A.A., Abbott, C.L., Adamowicz, S.J., Heath, D.D., Cristescu, M.E., MacIsaac, H.J., 2013. High sensitivity of 454 pyrosequencing for detection of rare species in aquatic communities. Methods Ecol Evol 4, 558–565. https://doi.org/10.1111/2041-210X.12037Zhang, G.K., Chain, F.J.J., Abbott, C.L., Cristescu, M.E., 2018. Metabarcoding using multiplexed markers increases species detection in complex zooplankton communities. Evol Appl 11, 1901–1914. https://doi.org/10.1111/eva.12694Zhang, T., Li, H., Ma, S., Cao, J., Liao, H., Huang, Q., Chen, W., 2023. The newest Oxford Nanopore R10.4.1 full-length 16S rRNA sequencing enables the accurate resolution of species-level microbial community profiling. Applied and Environmental Microbiology 89, e00605-23. https://doi.org/10.1128/aem.00605-23 Data Accessibility and Benefit-Sharing Benefits Generated: In addition to the reference library, this work has contributed to establishing biodiversity baseline for the four seagrass beds we visited, and the results of this study have been shared with the respective NGOs. A community BioBlitz event was also carried out as part of specimen collection from the Isle of Arran with the National Museum of Scotland and Arran COAST to engage with locals about the biodiversity of their seagrass beds. Data Availability: All sequence data is available on GenBank from the following accession numbers: XXXX-XXXX and BOLD systems under project code ERSSI (to be made available upon acceptance of the manuscript). Whole animals and DNA extractions are stored at the University of Aberdeen. Figure 1 – Four seagrass beds from which marine macroinvertebrates were collected Loch Craignish (56°09’17.8”N 5°34’31.7”W), Loch Eishort (57°08’58.3”N 5°56’37.8”W), Whiting Bay (55°29’22.3”N 5°05’23.9”W), and Montrose Basin (56°43’12.2”N 2°28’46.2”W) in Scotland. N = number of net samples collected, C = number of sediment cores collected. Figure 2 – Workflow used to generate molecular IDs for marine invertebrates for two DNA barcodes: COI and 18S . Y = Yes, N = No. Figure 3 – Relationship between the number of reads used to produce consensus sequences and mapping on sequence quality for COI and 18S and the distribution of the data. Parts (a) and (b) show data from before the mapping step and (c) and (d) show after the mapping step. Scatterplots (a) and (c) show the relationship between the number of reads used by ONTBarcoder2 and the resulting number of ambiguous bases in the consensus sequences, with square root transformation applied to the y-axis and log transformation applied to the x-axis. COI and 18S 1% N show the threshold for greater than 1% ambiguous bases. (b) and (d) show the distribution of read counts used to produce consensus sequences, with the x-axis log transformed. Figure 4 – Relationship between the amount of input amplicon DNA (ng) added to a ONT R10 Flongle and the number of reads recovered. Both the x and y-axes have been log transformed. Figure 5 – Number of OTUs identified to different taxonomic ranks using three different methods, two DNA barcodes: COI and 18S and Morphology (Morph). Tables Table 1 – Comparison of ONT and Sanger generated COI DNA barcodes for 68 marine invertebrate specimens. Length = sequence length of DNA barcode in bases, % N = percentage of ambiguous bases in DNA barcode, Translatable = whether the sequence can be translated without stop codons, ID based on BOLD hits = taxonomic ID based on the top BOLD hit(s), Recovered = whether a forward read or overlapping forward and reverse reads (contig) were used to generate the Sanger DNA barcode, % Sim = percentage similarity between the ONT and Sanger sequences, and OTU = the final taxonomic ID after considering sequence information and morphology. Specimens where no DNA barcode was recovered are highlighted in Red . Specimens where more than 10% of the Sanger sequence contained basecalls with greater than 5% error probability are highlighted in Yellow . Specimens where mapping of reads in Geneious was used to generate the DNA barcode are highlighted in Grey . Length % N Translatable Top BOLD Hit Rank Recovered Length % N Translatable Top BOLD hit Rank % Sim OTU Rank 658 0 Yes Ampharetidae Family Contig 625 0.6 Yes Ampharetidae Family 99.7 Ampharetidae Family 658 0 Yes Ampharetidae Family Contig 658 0 Yes Ampharetidae Family 100 Ampharetidae Family 658 0 Yes Amphipholis squamata Species Contig 658 0 Yes Amphipholis squamata Species 100 Amphipholis squamata Species 658 0 Yes Amphorina Genus Contig 658 0 Yes Amphorina Genus 100 Amphorina sp. Genus 658 0 Yes Amphorina Genus Contig 658 0 Yes Amphorina Genus 100 Amphorina sp. Genus 658 0 Yes Anemonia viridis Species LCO only 655 0 Yes Roya anglica contamination - Anemonia viridis Species 658 0 Yes Arenicola marina Species Contig 658 0 Yes Arenicola marina Species 100 Arenicola marina Species 658 0 Yes Arenicola marina Species Contig 658 0 Yes Arenicola marina Species 100 Arenicola marina Species 658 0 Yes Asterias rubens Species Contig 600 0 Yes Asterias rubens Species 100 Asterias rubens Species 658 0 Yes Bittium reticulatum Species Contig 658 0 Yes Bittium reticulatum Species 100 Bittium reticulatum Species 658 0 Yes Bittium reticulatum Species LCO only 651 0 No Bittium reticulatum Species 98 Bittium reticulatum Species 658 0 Yes Sacoglossa Superorder Contig 658 0 Yes Sacoglossa Superorder 100 Calliopaea bellula Species 658 0 Yes Sacoglossa Superorder Contig 658 0 Yes Sacoglossa Superorder 100 Calliopaea bellula Species 658 0 Yes Cerastoderma edule Species Contig 658 0 Yes Cerastoderma edule Species 100 Cerastoderma edule Species 658 0 Yes Cerastoderma edule Species Contig 658 0 Yes Cerastoderma edule Species 100 Cerastoderma edule Species 658 0 Yes Clymenura clypeata Species Contig 617 0 Yes Clymenura clypeata Species 100 Clymenura clypeata Species 658 0 Yes Cochlodesma praetenue Species Contig 646 0 Yes Cochlodesma praetenue Species 100 Cochlodesma praetenue Species 658 0 Yes Clymenura clypeata contamination - - - Yes no sequence no sequence - Cochlodesma praetenue Species 658 0 Yes Crangon crangon Species Contig 658 0 Yes Crangon crangon Species 100 Crangon crangon Species 658 0 Yes Crangon crangon Species Contig 658 0 Yes Crangon crangon Species 100 Crangon crangon Species 658 0 Yes Facelina bostoniensis Species Contig 658 0 Yes Facelina bostoniensis Species 100 Facelina bostoniensis Species 658 0 Yes Gammarus locusta Species Contig 658 0 Yes Gammarus locusta Species 100 Gammarus locusta Species 656 0.5 No Gammarus locusta Species Contig 783 2.2 No no match contamination - Gammarus locusta Species 658 0 Yes Gammarus salinus Species Contig 658 0 Yes Gammarus salinus Species 100 Gammarus salinus Species 658 0 Yes Hediste diversicolor Species Contig 658 0 Yes Hediste diversicolor Species 100 Hediste diversicolor Species 658 0 Yes Hediste diversicolor Species - - - Yes no sequence no sequence - Hediste diversicolor Species 658 0 Yes Hippolyte varians Species Contig 658 0 Yes Hippolyte varians Species 100 Hippolyte varians Species 658 0 Yes Hippolyte varians Species Contig 658 0 Yes Hippolyte varians Species 100 Hippolyte varians Species 658 0 Yes Idotea balthica Species LCO only 644 0 Yes Idotea balthica Species 99.8 Idotea balthica Species 658 0 Yes Lacuna vincta Species Contig 658 0 Yes Lacuna vincta Species 100 Lacuna vincta Species 658 0 Yes Lacuna vincta Species Contig 658 0 Yes Lacuna vincta Species 100 Lacuna vincta Species 658 0 Yes Littorina saxatilis Species Contig 658 0 Yes Littorina saxatilis Species 100 Littorina saxatilis Species 656 0.6 No Littorina saxatilis Species LCO only 662 0 Yes no match contamination - Littorina saxatilis Species 658 0 Yes Lysianassidae Family Contig 658 0 Yes Lysianassidae Family 100 Lysianassidae Family 658 0 Yes Lysianassidae Family Contig 658 0 Yes Lysianassidae Family 100 Lysianassidae Family 655 0.5 No Macoma balthica Species Contig 655 5.3 No Macoma Genus 96.5 Macoma balthica Species 655 0 Yes Macoma balthica Species Contig 656 0 Yes Macoma balthica Species 100 Macoma balthica Species 658 0 Yes Macropodia deflexa Species Contig 639 0 Yes Macropodia deflexa Species 100 Macropodia deflexa Species 658 0 Yes Macropodia rostrata Species Contig 658 0 Yes Macropodia rostrata Species 100 Macropodia rostrata Species 658 0 Yes Macropodia rostrata Species Contig 658 0 Yes Macropodia rostrata Species 100 Macropodia rostrata Species 658 0 Yes Magelona filiformis Species Contig 659 0 Yes Magelona filiformis Species 99.8 Magelona filiformis Species 661 0 Yes Mytilus Genus Contig 661 0 Yes Mytilus Genus 100 Mytilus sp. Genus 661 0 Yes Mytilus Genus Contig 661 0 Yes Mytilus Genus 100 Mytilus sp. Genus 658 0.3 No no match contamination LCO only 617 0 Yes Nicolea zostericola Species - Nicolea zostericola Species 658 0 Yes Pagurus bernhardus Species Contig 658 0 Yes Pagurus bernhardus Species 100 Pagurus bernhardus Species 658 0 Yes Palaemon serratus Species Contig 658 0.2 Yes Palaemon serratus Species 99.9 Palaemon serratus Species 658 0 Yes Palaemon serratus Species LCO only 673 0 No no match contamination - Palaemon serratus Species 658 0 Yes Peringia ulvae Species Contig 625 0 No Peringia ulvae Species 99.8 Peringia ulvae Species 658 0 Yes Peringia ulvae Species Contig 658 0 Yes Peringia ulvae Species 100 Peringia ulvae Species 658 0 Yes Platynereis dumerilii Species contig 658 0 Yes Platynereis dumerilii Species 100 Platynereis dumerilii Species 655 0 Yes Polycladida Order Contig 632 0 Yes Polycladida Order 100 Polycladida Order 658 0 Yes Praunus flexuosus Species Contig 658 0 Yes Praunus flexuosus Species 100 Praunus flexuosus Species 658 0 Yes Praunus flexuosus Species Contig 658 0 Yes Praunus flexuosus Species 100 Praunus flexuosus Species 658 0 Yes Praunus Genus Contig 643 0 Yes Praunus Genus 100 Praunus neglectus Species 658 0 Yes Praunus Genus Contig 658 0 Yes Praunus Genus 100 Praunus neglectus Species 658 0 Yes Psammechinus miliaris Species Contig 658 0.2 Yes Psammechinus miliaris Species 99.9 Psammechinus miliaris Species 657 4.9 No Peringia ulvae contamination Contig 661 0 Yes Pygospio elegans Species - Pygospio elegans Species 661 0 Yes Pygospio elegans Species Contig 661 0 Yes Pygospio elegans Species 100 Pygospio elegans Species 658 0 Yes Rissoa lilacina Species Contig 658 0 Yes Rissoa lilacina Species 100 Rissoa lilacina Species 658 0 Yes Rissoa membranacea Species Contig 658 0 Yes Rissoa membranacea Species 100 Rissoa membranacea Species 655 0 Yes Scoloplos armiger Species Contig 655 0 Yes Scoloplos armiger Species 100 Scoloplos armiger Species 658 0 Yes Steromphala cineraria Species Contig 658 0 Yes Steromphala cineraria Species 100 Steromphala cineraria Species 658 0 Yes Steromphala cineraria Species LCO only 642 0 Yes Steromphala cineraria Species 100 Steromphala cineraria Species 652 0 Yes Trivia monacha Species Contig 599 0 Yes Trivia monacha Species 100 Trivia monacha Species 658 0 Yes Tubificoides Genus Contig 658 0 Yes Tubificoides Genus 100 Tubificoides sp. Genus 658 0 Yes Tubificoides Genus Contig 658 0 Yes Tubificoides Genus 100 Tubificoides sp. Genus 658 0 Yes Tubificoides benedii Species Contig 656 0 Yes Tubificoides benedii Species 100 Tubificoides benedii Species 658 0 Yes Tubificoides benedii Species Contig 658 0 Yes Tubificoides benedii Species 100 Tubificoides benedii Species Table 2 – Flongle run summaries and sequences produced by ONTBarcoder2 prior to mapping. Pores = number of active nanopores at the beginning of the sequence run. Basecall Error p = probability of incorrect base call based on mean read quality (Q score). QC compliant barcodes are translatable, match the target length and free of ambiguous bases. RUN1 (pores = 63) 290701 3.2% COI 142212 (58%) 68 62 (91.1%) 51 (82%) RUN2 (pores = 39) 321661 3.2% COI 170639 (56%) 33 33 (100%) 30 (91%) RUN3 (pores = 60) 360194 3.2% COI 173116 (57%) 117 106 (90.6%) 71 (67%) RUN4 (pores = 45) 264578 0.8% COI 1112 (7%) 13 12 (92.3%) 6 (50%) 18S 153529 (57%) 113 113 (100%) - RUN5 (pores = 50) 303245 2.5% COI 71023 (40%) 79 71 (89.9%) 51 (72%) 18S 69006 (24%) 29 28 (96.6%) - RUN6 (pores = 85) 426031 2.5% COI 84626 (49%) 52 50 (96.2%) 40 (80%) 18S 128468 (31%) 76 74 (97.4%) - Information & Authors Information Version history V1 Version 1 25 February 2025 Peer review timeline Published Ecology and Evolution Version of Record 28 Sep 2025 Published Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords biodiversity studies dna barcoding marine invertebrates oxford nanopore sequencing seagrass habitats Authors Affiliations Ethan Ross 0009-0008-9694-4757 [email protected] University of Aberdeen View all articles by this author Stuart Piertney 0000-0001-6654-0569 University of Aberdeen View all articles by this author Julia Sigwart Senckenberg Research Institute and Natural History Museum Frankfurt View all articles by this author Nathaniel Crook University of Aberdeen View all articles by this author Agathe Moreau University of Aberdeen View all articles by this author Kara Layton 0000-0002-4302-3048 University of Toronto View all articles by this author Metrics & Citations Metrics Article Usage 512 views 218 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Ethan Ross, Stuart Piertney, Julia Sigwart, et al. A comprehensive DNA barcode reference library for the macroinvertebrates of Scottish seagrass beds using Oxford Nanopore Flongle Flowcells. Authorea . 25 February 2025. DOI: https://doi.org/10.22541/au.174049397.70710513/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.174049397.70710513/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9ff732959b0e41e2',t:'MTc3OTQwNDQyOQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.