Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing

preprint OA: closed
Full text JSON View at publisher
Full text 60,439 characters · extracted from preprint-html · click to expand
Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing Jodie Jacobs, Anne Nakamoto, Mira Mastoras, Hailey Loucks, Cade Mirchandani, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4510571/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 Aug, 2024 Read the published version in Scientific Reports → Version 1 posted 10 You are reading this latest preprint version Abstract Wolbachia is an obligate intracellular 𝛼-proteobacterium which commonly infects arthropods and filarial nematodes. Different strains of Wolbachia are capable of a wide range of regulatory manipulations in many hosts and modulate host cellular differentiation to influence host reproduction. The genetic basis for the majority of these phenotypes is unknown. The w Wil strain from the neotropical fruit fly, Drosophila willistoni , exhibits a remarkably high affinity for host germline-derived cells relative to the soma. This trait could be leveraged for understanding how Wolbachia influences the host germline and for controlling host populations in the field. To further the use of this strain in biological and biomedical research, we sequenced the genome of the w Wil strain isolated from host cell culture cells. Here, we present the first high quality nanopore assembly of w Wil, the Wolbachia endosymbiont of D. willistoni . Our assembly resulted in a circular genome of 1.27 Mb with a BUSCO completeness score of 99.7%. Consistent with other insect-associated Wolbachia strains, comparative genomic analysis revealed that wWil has a highly mosaic genome relative to the closely related wMel strain from Drosophila melanogaster . Biological sciences/Genetics/Genomics Biological sciences/Microbiology/Bacteria/Symbiosis Wolbachia Drosophila symbiosis genomics Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Wolbachia is a gram-negative 𝛼-proteobacterium and is found as an endosymbiont in many arthropods and nematodes with a diverse range of effects on host phenotypes 1 , 2 . Wolbachia are maternally transmitted from host oocytes to the developing embryo 1 . Wolbachia strains manipulate host reproduction to promote their transmission to the next generation of hosts 2 . Subsequently, Wolbachia strains have strong affinities for host germline tissues 3 . Intriguingly, the Wolbachia strain from the neotropical fruit fly Drosophila willistoni, w Wil, selectively infects the host germline 4 , 5 . This unique tropism could be informative for understanding how Wolbachia localizes to and regulates the host germline with implications for vectorizing Wolbachia infections for biological control mechanisms. The affinity of w Wil for host germ line cells is unique in comparison to closely related Wolbachia strains. Phylogenetic comparisons based on amplification of the wsp and ftsZ genes by PCR indicate that w Wil is closely related to the w Au strain found in Drosophila simulans 4 . However, unlike w Au, which infects both germline and somatic tissues in D. simulans , w Wil is exclusively found in the primordial germline cells of D. willistoni embryos 4 . Additionally, w Wil exhibits 100% maternal transmission in laboratory lines, which is attributed to w Wil’s tropism towards pole cells and selective infection of only the germ line 4 . Understanding the mechanism underlying the germ line specific tropism of w Wil could inform how other strains of Wolbachia localize to host tissue types. This distinctive example of host cell specificity is crucial for understanding Wolbachia’s ability to colonize new hosts, with significant implications for biological pest control strategies. In D. simulans infected with non-native Wolbachia strains, the host genetic background has been shown to regulate the tissue tropism of the infection 5 . In native infections, D. melanogaster hosts w Mel Wolbachia infections in a broad range of cell types, infecting both somatic and germline tissues. Whereas in D. willistoni , w Wil demonstrates a restrictive infection pattern, targeting germline-derived cells 4 , 5 . Despite the availability of numerous Wolbachia genomes, a complete w Wil genome is particularly important due to its unique germline-specific tropism. Here we present the first high-quality de novo assembly of w Wil obtained from nanopore sequencing of w Wil infected in vitro D. melanogaster cultures. In providing this genome, we seek to identify the genetic differences which exist between the w Wil and w Mel genomes and if those differences can provide insights into the mechanisms underlying w Wil’s germline-specific distribution. Results and discussion wWil genome assembly We collected Wolbachia w Wil from w Wil-infected Drosophila willistoni embryos 6 and introduced w Wil to immortalized Drosophila melanogaster JW18 cell culture cells with the shell vial technique 7 . We allowed the infection to stabilize by maintaining the culture for several weeks at 23°C, then collected the w Wil-infected cells from confluent cultures 7 . For each sample, 1.2 mL (at ~ 2e6 cells/mL) of cells were pelleted by centrifugation at 16,000xg for 10 minutes at 4°C. Following supernatant removal, DNA was extracted using the Wizard HMW DNA Extraction kit (Promega #A2920, Lot: 0000575812). Libraries were prepared with the Native Barcoding Kit V14 for Nanopore MinION R10 (Oxford Nanopore Technologies Cat #SQK-NBD114-24, Lot: NDP1424.10.0010) and sequenced on the Nanopore MinION Mk1B with a MinION R10 Version flow cell (FLO-MIN-114, Lot:11003064). We used Oxford Nanopore’s MinKNOW v23.07.8 software and with live basecalling with Guppy v7.0.8 (Fast model, read splitting ON) and a minimum read length of 200 bp and stopped sequencing after 36 hours. This resulted in 3.65 M reads with an estimated N50 of 1.11 kb and 2.6 Gb called with a minQ of 8. Prior to genome assembly, we preprocessed the raw nanopore reads to remove host-derived sequences. Reads were aligned to the D. melanogaster reference genome (dmell-all-r6.46, retrieved from Flybase) 8 with bwa mem 9 v0.7.17. We used samtools 10 v1.6 to sort and index the resultant file and remove reads which aligned to the host genome (samtools view -b -f 4). Preprocessed reads were output with bedtools 11 v2.31.1 (bamtofastq). We removed sequencing duplicates with seqkit 12 rmdup v2.7.0 and performed a de novo assembly of the w Wil genome with Flye 13 v2.9 ( preset, –nano-hq). We screened the assemblies for foreign genomic and adapter contamination using the NCBI Foreign Contamination Screen (FCS) toolkit version 0.5.0. We ran FCS-GX 14 (taxa ID 953) and FCS-adaptor (run with --prok flag) which both found no evidence of contamination. Genome polishing and quality assessment We generated Illumina short read whole genome sequence (WGS) data from JW18 cell culture cells stably infected with w Wil to polish the Nanopore assembly. Reads were aligned to the w Wil assembly and D. melanogaster reference 8 (dmel6) simultaneously using bwa mem with default settings. Optical duplicates were marked with sambamba 15 . The reads aligning to dmel6 were discarded. The remaining reads were converted back to fastq format using samtools fastq, then re-aligned to the w Wil genome using minimap2 v2.26 with the settings ​​-ax sr --cs --eqx. Reads with de (gap-compressed mismatch ratio) exceeding 0.04 were filtered out to remove mismapping and excess noise prior to polishing. The tool Pilon (v1.24) was run on these filtered alignments using default settings, producing the final polished assembly. We assessed the quality of the polished assembly with BUSCO 16 and annotated the genome with a standard workflow. BUSCO scores were calculated using the rickettsiales_odb10 database and v5.7.0. Polishing produced an improvement in BUSCO score from 98.6–99.7%. Default parameters were used for all software unless otherwise specified. We annotated the w Wil genome with Prokka 17 v1.1.1 (kingdom:bacteria) to identify coding sequences (CDS), tRNAs, rRNAs, and tmRNA. GC Content and GC Skew were calculated with Proksee 18 v1.1.2. We then aligned the w Wil genome against the w Mel reference genome (CP046925.1) with BLASTn with an expected value cut-off of 0.0001. We plotted these annotations with Proksee 18 v1.1.2 to visualize the annotated genome (Fig. 1 ). Genome annotations and assessments To place our w Wil genome within the Wolbachia species phylogeny, we gathered a set of 27 circular, chromosome-level genome assemblies from many Wolbachia supergroups with broad host range 19 , and used Ehrlichia chaffeensis as an outgroup. Genes were annotated using the NCBI Prokaryotic Genome Annotation Pipeline 20 , and groups of orthologous genes (orthogroups) were identified across species with OrthoFinder2 21 . This produced a phylogeny based on single-copy orthologs, rooted on E. chaffeensis . Additionally, we utilized BUSCO 16 analysis to characterize gene presence-absence variation across orthogroups. Our wWil assembly had a high BUSCO completeness score of 98.6% before polishing, which was comparable to the other circular, chromosome-level Wolbachia genomes. We found that the w Wil genome resides in Wolbachia supergroup A, alongside w Mel and many other fly-infecting species (Fig. 2 A). Despite being closely related, alignment of the w Wil genome to the w Mel CP046925.1 22 reference genome with Mauve 23 (snapshot 2015-02-25.1) revealed many breaks in synteny between the genomes (Fig. 3 ). In general, our analysis showed a supergroup-specific pattern of gene presence-absence variation (Fig. 2 B). We also performed a brief assessment of putative secreted and membrane-bound proteins that could play a role in the Wolbachia -host interaction. Proteins with a signal peptide were identified by SignalP 24 , and proteins with a transmembrane domain were identified by TMHMM 25 . Those with a signal peptide and a transmembrane domain were classified as membrane-bound proteins, while those with a signal peptide but without a transmembrane domain were classified as secreted proteins. We then characterized presence-absence variation of putative secreted and membrane proteins within groups of orthologous genes across species. Finally, we identified variable sites in all proteins by calculating the Shannon entropy metric 26 , 27 , and compared the number of high-entropy sites in membrane and secreted proteins versus all proteins in general. Just as for all genes, there was a supergroup-specific pattern in presence-absence variation for both membrane-bound and secreted proteins across Wolbachia species (Fig. 4 ). Additionally, membrane and secreted protein groups had many variable sites compared to all proteins in general. The median number of variable sites in an orthogroup across all Wolbachia genes was one, while the medians for secreted and membrane proteins were 14 and 13.5 variable sites respectively (Fig. 5 ). This analysis revealed proteins with many sites that vary across diverse Wolbachia species with a wide host range, and thus provides candidates for further interrogating Wolbachia -host interactions at the molecular level. Declarations Competing interests: The authors declare no competing interests. Author Contribution JJ, and SLR designed the study. JJ and AN wrote the main manuscript text, performed genome assembly, analyzed data and prepared Figs 1-2 (JJ) 3-5 (AN). JJ, CW, and GP collected the sequencing data. HL performed the decontamination screening. MM performed genome polishing with Illumina data from CM. LK managed data. All authors reviewed the manuscript. Acknowledgement The authors acknowledge the University of California Santa Cruz Genomics Institute for providing computational resources and support for this project. Funding for this project was provided by NIH (T32HG012344) awarded to JJ, CW, HL, LK, MM, and GP, the NIH awarded to SLR (R00GM135583), and the NSF-GRFP awarded to AN. Data Availability The assembled genome and the raw long and short reads are available in BioProject PRJNA1107195. References Russell, S. L. & Castillo, J. R. Trends in Symbiont-Induced Host Cellular Differentiation. Results Probl. Cell Differ. 69, 137–176 (2020). Werren, J. H., Baldo, L. & Clark, M. E. Wolbachia: master manipulators of invertebrate biology. Nat. Rev. Microbiol. 6, 741–751 (2008). Toomey, M. E., Panaram, K., Fast, E. M., Beatty, C. & Frydman, H. M. Evolutionarily conserved Wolbachia-encoded factors control pattern of stem-cell niche tropism in Drosophila ovaries and favor infection. Proc. Natl. Acad. Sci. 110, 10788–10793 (2013). Miller, W. J. & Riegler, M. Evolutionary Dynamics of wAu-Like Wolbachia Variants in Neotropical Drosophila spp. Appl. Environ. Microbiol. 72, 826–835 (2006). Strunov, A., Schmidt, K., Kapun, M. & Miller, W. J. Restriction of Wolbachia Bacteria in Early Embryogenesis of Neotropical Drosophila Species via Endoplasmic Reticulum-Mediated Autophagy. mBio 13, e03863-21 (2022). Müller, M. J. et al. Reevaluating the infection status by the Wolbachia endosymbiont in Drosophila Neotropical species from the willistoni subgroup. Infect. Genet. Evol. 19, 232–239 (2013). Dobson, S. L., Marsland, E. J., Veneti, Z., Bourtzis, K. & O’Neill, S. L. Characterization of Wolbachia Host Cell Range via the In Vitro Establishment of Infections. Appl. Environ. Microbiol. 68, 656–660 (2002). Hoskins, R. A. et al. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 25, 445–458 (2015). Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013). Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016). Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019). Astashyn, A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 25, 60 (2024). Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015). Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021). Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinforma. Oxf. Engl. 30, 2068–2069 (2014). Grant, J. R. et al. Proksee: in-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 51, W484–W492 (2023). Kaur, R. et al. Living in the endosymbiotic world of Wolbachia: A centennial review. Cell Host Microbe 29, 879–893 (2021). Li, W. et al. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res. 49, D1020–D1028 (2021). Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019). Duarte, E. H., Carvalho, A., López-Madrigal, S., Costa, J. & Teixeira, L. Forward genetics in Wolbachia: Regulation of Wolbachia proliferation by the amplification and deletion of an addictive genomic island. PLoS Genet. 17, e1009612 (2021). Darling, A. C. E., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Res. 14, 1394–1403 (2004). Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420–423 (2019). Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001). Magliery, T. J. & Regan, L. Sequence variation in ligand binding sites in proteins. BMC Bioinformatics 6, 240 (2005). Prigozhin, D. M. & Krasileva, K. V. Analysis of intraspecies diversity reveals a subset of highly variable plant immune receptors and predicts their binding sites. Plant Cell 33, 998–1015 (2021). Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958 (2018). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 01 Aug, 2024 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 25 Jun, 2024 Reviews received at journal 21 Jun, 2024 Reviews received at journal 18 Jun, 2024 Reviewers agreed at journal 17 Jun, 2024 Reviewers agreed at journal 10 Jun, 2024 Reviewers invited by journal 10 Jun, 2024 Editor assigned by journal 10 Jun, 2024 Editor invited by journal 04 Jun, 2024 Submission checks completed at journal 01 Jun, 2024 First submitted to journal 31 May, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4510571","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":313608343,"identity":"6b9d274d-1c19-4016-b1e1-3ec61aa12357","order_by":0,"name":"Jodie Jacobs","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Jodie","middleName":"","lastName":"Jacobs","suffix":""},{"id":313608345,"identity":"d29336e9-e4c9-4686-a7dd-74446daff345","order_by":1,"name":"Anne Nakamoto","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Anne","middleName":"","lastName":"Nakamoto","suffix":""},{"id":313608349,"identity":"c82f95e1-a3cd-4e78-992e-9e836cfbfd5b","order_by":2,"name":"Mira Mastoras","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Mira","middleName":"","lastName":"Mastoras","suffix":""},{"id":313608350,"identity":"6d0dbb29-4cbf-4040-a197-d21e4bbcb04d","order_by":3,"name":"Hailey Loucks","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Hailey","middleName":"","lastName":"Loucks","suffix":""},{"id":313608351,"identity":"90e0511a-149b-411d-a2d5-fcf3123f3547","order_by":4,"name":"Cade Mirchandani","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Cade","middleName":"","lastName":"Mirchandani","suffix":""},{"id":313608352,"identity":"be942bcd-4856-4fc5-8ea8-34c67ba94791","order_by":5,"name":"Lily Karim","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Lily","middleName":"","lastName":"Karim","suffix":""},{"id":313608356,"identity":"52d3bb3f-fa4a-48d6-913e-3f5cac656bc7","order_by":6,"name":"Gabriel Penunuri","email":"","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Gabriel","middleName":"","lastName":"Penunuri","suffix":""},{"id":313608357,"identity":"b2c61872-0225-45c7-849a-429d11967607","order_by":7,"name":"Ciara Wanket","email":"","orcid":"","institution":"Department of Ecology and Evolutionary Biology, University of California, Santa Cruz","correspondingAuthor":false,"prefix":"","firstName":"Ciara","middleName":"","lastName":"Wanket","suffix":""},{"id":313608358,"identity":"6a08111b-a007-46df-86aa-cd1109436356","order_by":8,"name":"Shelbi L Russell","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/UlEQVRIiWNgGAWjYDACCQbGwwwGFhDOgwIbIMnYeICAFgagFgkIJ8EgDaSlgQgtDHAth8E0Xi3ys5sPHC4okMg3OH/84YMEg/N2a9sPA22psYnGpcXgzrGEwzMMJCw33MgxNkgwuJ287UwiUMuxtNwGXFokcgwO8xhIGBjc4GGTAGkxOwDUwthwGKcW+Rn5HyBazh9//iPB4Fyy2fmH+LUw3MhhgGg5kGAG9P4BO7MbBGwxuJFmAPKLgSTQL0CHJSeY3QDakoDHL/Izkh8+LvhjY8AHDLEPHyrs7M3Opz988KHGBrfD0EEiWGUCscpBwJ4UxaNgFIyCUTAyAACHRmVpB8t5oQAAAABJRU5ErkJggg==","orcid":"","institution":"Department of Biomolecular Engineering, University of California Santa Cruz","correspondingAuthor":true,"prefix":"","firstName":"Shelbi","middleName":"L","lastName":"Russell","suffix":""}],"badges":[],"createdAt":"2024-05-31 18:08:14","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4510571/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4510571/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-024-68716-w","type":"published","date":"2024-08-01T15:58:09+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":58389481,"identity":"39e26557-b763-4763-81d7-e41852aeae05","added_by":"auto","created_at":"2024-06-14 19:54:45","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":376525,"visible":true,"origin":"","legend":"\u003cp\u003eMap of the \u003cem\u003eWolbachia\u003c/em\u003e\u003ca href=\"https://www.ncbi.nlm.nih.gov/nuccore/CP048819\"\u003e \u003c/a\u003e\u003cem\u003ew\u003c/em\u003eWil genome prepared using Proksee\u003ca href=\"https://www.zotero.org/google-docs/?obzbND\"\u003e\u003csup\u003e18\u003c/sup\u003e\u003c/a\u003e. Circles in order from outer to inner show the following features: the position of coding sequences (CDS), open reading frames (ORF), tmRNA, tRNA, and rRNA genes (circle 1). GC content (circle 2) and GC skew plotted as the deviation from the average for the entire sequence (circle 3). The positions of BLAST hits detected through BLASTn comparisons of \u003cem\u003ew\u003c/em\u003eMel \u003ca href=\"https://www.ncbi.nlm.nih.gov/nuccore/CP046925.1/\"\u003eCP046925.1\u003c/a\u003e\u003ca href=\"https://www.zotero.org/google-docs/?ZmOTod\"\u003e\u003csup\u003e22\u003c/sup\u003e\u003c/a\u003e are shown in transparent blue, darker blue indicates regions which map to multiple regions in the \u003cem\u003ew\u003c/em\u003eMel genome (circle 4).\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4510571/v1/3f9fe268373043f0422d4a55.jpeg"},{"id":58389154,"identity":"5bc10bdf-cc65-4ce1-96e3-698eb4b0e1ad","added_by":"auto","created_at":"2024-06-14 19:46:45","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1348975,"visible":true,"origin":"","legend":"\u003cp\u003eComparative genomics of \u003cem\u003ew\u003c/em\u003eWil among other \u003cem\u003eWolbachia\u003c/em\u003especies. \u003cstrong\u003e(a)\u003c/strong\u003e Phylogeny of \u003cem\u003eWolbachia\u003c/em\u003e genomes based on 470 single-copy orthologous genes (SCOs), with \u003cem\u003ew\u003c/em\u003eWil in supergroup A, along with genome metadata: host species and common name, genome size (Mb), BUSCO completeness score (%), total number of proteins, number of putative transmembrane proteins, and number of putative secreted proteins. \u003cstrong\u003e(b)\u003c/strong\u003e The same phylogeny as in A, with the presence-absence variation of all orthogroups shown. Whitespace indicates the absence of a gene in a particular \u003cem\u003eWolbachia\u003c/em\u003e genome.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-4510571/v1/baee053e2c0e6f3afd4f897f.png"},{"id":58389152,"identity":"a3eac9dd-86dc-400b-9f5c-4960140fdfd3","added_by":"auto","created_at":"2024-06-14 19:46:45","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":303682,"visible":true,"origin":"","legend":"\u003cp\u003eThe comparison of \u003cem\u003ew\u003c/em\u003eMel \u003ca href=\"https://www.ncbi.nlm.nih.gov/nuccore/CP046925.1/\"\u003eCP046925.1\u003c/a\u003e\u003ca href=\"https://www.zotero.org/google-docs/?3ynQUa\"\u003e\u003csup\u003e22\u003c/sup\u003e\u003c/a\u003e with \u003cem\u003ew\u003c/em\u003eWil. (\u003cstrong\u003ea\u003c/strong\u003e) Dotplot generated with D-GENIES\u003ca href=\"https://www.zotero.org/google-docs/?twnJ1L\"\u003e\u003csup\u003e28\u003c/sup\u003e\u003c/a\u003e (\u003cstrong\u003eb\u003c/strong\u003e) Mauve alignment showing local collinear blocks (LCBs) identified along the circular genomes and joined with vertical lines.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4510571/v1/f6a330a047e24acf501a9946.jpeg"},{"id":58389155,"identity":"77f9a02a-0f72-4b5f-89e9-4b9f115f215d","added_by":"auto","created_at":"2024-06-14 19:46:45","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1303223,"visible":true,"origin":"","legend":"\u003cp\u003ePresence-absence variation of putative \u003cstrong\u003e(a)\u003c/strong\u003e membrane protein and \u003cstrong\u003e(b)\u003c/strong\u003e secreted protein genes across orthogroups in \u003cem\u003eWolbachia\u003c/em\u003e species. As in Figure 2, the absence of a tile indicates the absence of a gene in a particular \u003cem\u003eWolbachia\u003c/em\u003e species.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-4510571/v1/8835e620b95c41995e6d18b2.png"},{"id":58389482,"identity":"05b58330-da1c-42bf-b8a9-63920c914b63","added_by":"auto","created_at":"2024-06-14 19:54:45","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":340315,"visible":true,"origin":"","legend":"\u003cp\u003eVariability of membrane proteins and secreted proteins compared to all proteins. Shown is a histogram of the distribution of orthogroups across the number of high-entropy (variable) sites in their protein sequence alignment. Orthogroup counts are plotted separately for all proteins (gray), secreted proteins (pink), and membrane proteins (blue), with median number of variable sites represented by dashed lines of the respective colors. There were 1,003 orthogroups that did not contain any variable sites, which are not included in the plot.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-4510571/v1/9ae1b483693a8434a0a61e93.png"},{"id":61793725,"identity":"e7e8c81d-3678-4e29-844a-0ebdd7f801a8","added_by":"auto","created_at":"2024-08-05 16:14:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3160849,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4510571/v1/94362eec-e43e-45c9-a004-38d03a29013b.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing","fulltext":[{"header":"Introduction","content":"\u003cp\u003e \u003cem\u003eWolbachia\u003c/em\u003e is a gram-negative \u0026#120572;-proteobacterium and is found as an endosymbiont in many arthropods and nematodes with a diverse range of effects on host phenotypes\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e,\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. \u003cem\u003eWolbachia\u003c/em\u003e are maternally transmitted from host oocytes to the developing embryo\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. \u003cem\u003eWolbachia\u003c/em\u003e strains manipulate host reproduction to promote their transmission to the next generation of hosts\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. Subsequently, \u003cem\u003eWolbachia\u003c/em\u003e strains have strong affinities for host germline tissues\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Intriguingly, the\u003c/p\u003e \u003cp\u003e \u003cem\u003eWolbachia\u003c/em\u003e strain from the neotropical fruit fly \u003cem\u003eDrosophila willistoni, w\u003c/em\u003eWil, selectively infects the host germline\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. This unique tropism could be informative for understanding how \u003cem\u003eWolbachia\u003c/em\u003e localizes to and regulates the host germline with implications for vectorizing \u003cem\u003eWolbachia\u003c/em\u003e infections for biological control mechanisms.\u003c/p\u003e \u003cp\u003eThe affinity of \u003cem\u003ew\u003c/em\u003eWil for host germ line cells is unique in comparison to closely related \u003cem\u003eWolbachia\u003c/em\u003e strains. Phylogenetic comparisons based on amplification of the wsp and ftsZ genes by PCR indicate that \u003cem\u003ew\u003c/em\u003eWil is closely related to the \u003cem\u003ew\u003c/em\u003eAu strain found in \u003cem\u003eDrosophila simulans\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. However, unlike \u003cem\u003ew\u003c/em\u003eAu, which infects both germline and somatic tissues in \u003cem\u003eD. simulans\u003c/em\u003e, \u003cem\u003ew\u003c/em\u003eWil is exclusively found in the primordial germline cells of \u003cem\u003eD. willistoni\u003c/em\u003e embryos\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Additionally, \u003cem\u003ew\u003c/em\u003eWil exhibits 100% maternal transmission in laboratory lines, which is attributed to \u003cem\u003ew\u003c/em\u003eWil\u0026rsquo;s tropism towards pole cells and selective infection of only the germ line\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Understanding the mechanism underlying the germ line specific tropism of \u003cem\u003ew\u003c/em\u003eWil could inform how other strains of \u003cem\u003eWolbachia\u003c/em\u003e localize to host tissue types.\u003c/p\u003e \u003cp\u003eThis distinctive example of host cell specificity is crucial for understanding \u003cem\u003eWolbachia\u0026rsquo;s\u003c/em\u003e ability to colonize new hosts, with significant implications for biological pest control strategies. In \u003cem\u003eD. simulans\u003c/em\u003e infected with non-native \u003cem\u003eWolbachia\u003c/em\u003e strains, the host genetic background has been shown to regulate the tissue tropism of the infection\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. In native infections, \u003cem\u003eD. melanogaster\u003c/em\u003e hosts \u003cem\u003ew\u003c/em\u003eMel \u003cem\u003eWolbachia\u003c/em\u003e infections in a broad range of cell types, infecting both somatic and germline tissues. Whereas in \u003cem\u003eD. willistoni\u003c/em\u003e, \u003cem\u003ew\u003c/em\u003eWil demonstrates a restrictive infection pattern, targeting germline-derived cells\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDespite the availability of numerous \u003cem\u003eWolbachia\u003c/em\u003e genomes, a complete \u003cem\u003ew\u003c/em\u003eWil genome is particularly important due to its unique germline-specific tropism. Here we present the first high-quality \u003cem\u003ede novo\u003c/em\u003e assembly of \u003cem\u003ew\u003c/em\u003eWil obtained from nanopore sequencing of \u003cem\u003ew\u003c/em\u003eWil infected \u003cem\u003ein vitro D. melanogaster\u003c/em\u003e cultures. In providing this genome, we seek to identify the genetic differences which exist between the \u003cem\u003ew\u003c/em\u003eWil and \u003cem\u003ew\u003c/em\u003eMel genomes and if those differences can provide insights into the mechanisms underlying \u003cem\u003ew\u003c/em\u003eWil\u0026rsquo;s germline-specific distribution.\u003c/p\u003e"},{"header":"Results and discussion","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ewWil genome assembly\u003c/h2\u003e \u003cp\u003eWe collected \u003cem\u003eWolbachia w\u003c/em\u003eWil from \u003cem\u003ew\u003c/em\u003eWil-infected \u003cem\u003eDrosophila willistoni\u003c/em\u003e embryos\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e and introduced \u003cem\u003ew\u003c/em\u003eWil to immortalized \u003cem\u003eDrosophila melanogaster\u003c/em\u003e JW18 cell culture cells with the shell vial technique\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. We allowed the infection to stabilize by maintaining the culture for several weeks at 23\u0026deg;C, then collected the \u003cem\u003ew\u003c/em\u003eWil-infected cells from confluent cultures\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. For each sample, 1.2 mL (at ~\u0026thinsp;2e6 cells/mL) of cells were pelleted by centrifugation at 16,000xg for 10 minutes at 4\u0026deg;C. Following supernatant removal, DNA was extracted using the Wizard HMW DNA Extraction kit (Promega #A2920, Lot: 0000575812). Libraries were prepared with the Native Barcoding Kit V14 for Nanopore MinION R10 (Oxford Nanopore Technologies Cat #SQK-NBD114-24, Lot: NDP1424.10.0010) and sequenced on the Nanopore MinION Mk1B with a MinION R10 Version flow cell (FLO-MIN-114, Lot:11003064). We used Oxford Nanopore\u0026rsquo;s MinKNOW v23.07.8 software and with live basecalling with Guppy v7.0.8 (Fast model, read splitting ON) and a minimum read length of 200 bp and stopped sequencing after 36 hours. This resulted in 3.65 M reads with an estimated N50 of 1.11 kb and 2.6 Gb called with a minQ of 8.\u003c/p\u003e \u003cp\u003ePrior to genome assembly, we preprocessed the raw nanopore reads to remove host-derived sequences. Reads were aligned to the \u003cem\u003eD. melanogaster\u003c/em\u003e reference genome (dmell-all-r6.46, retrieved from Flybase)\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e with bwa mem\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e v0.7.17. We used samtools\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e v1.6 to sort and index the resultant file and remove reads which aligned to the host genome (samtools view -b -f 4). Preprocessed reads were output with bedtools\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e v2.31.1 (bamtofastq). We removed sequencing duplicates with seqkit\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e rmdup v2.7.0 and performed a \u003cem\u003ede novo\u003c/em\u003e assembly of the \u003cem\u003ew\u003c/em\u003eWil genome with Flye\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e v2.9 ( preset, \u0026ndash;nano-hq). We screened the assemblies for foreign genomic and adapter contamination using the NCBI Foreign Contamination Screen (FCS) toolkit version 0.5.0. We ran FCS-GX\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e (taxa ID 953) and FCS-adaptor (run with --prok flag) which both found no evidence of contamination.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eGenome polishing and quality assessment\u003c/h2\u003e \u003cp\u003eWe generated Illumina short read whole genome sequence (WGS) data from JW18 cell culture cells stably infected with \u003cem\u003ew\u003c/em\u003eWil to polish the Nanopore assembly. Reads were aligned to the \u003cem\u003ew\u003c/em\u003eWil assembly and \u003cem\u003eD. melanogaster\u003c/em\u003e reference\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e (dmel6) simultaneously using \u003cem\u003ebwa mem\u003c/em\u003e with default settings. Optical duplicates were marked with \u003cem\u003esambamba\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. The reads aligning to dmel6 were discarded. The remaining reads were converted back to fastq format using samtools fastq, then re-aligned to the \u003cem\u003ew\u003c/em\u003eWil genome using minimap2 v2.26 with the settings ​​-ax sr --cs --eqx. Reads with de (gap-compressed mismatch ratio) exceeding 0.04 were filtered out to remove mismapping and excess noise prior to polishing. The tool Pilon (v1.24) was run on these filtered alignments using default settings, producing the final polished assembly.\u003c/p\u003e \u003cp\u003eWe assessed the quality of the polished assembly with BUSCO\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e and annotated the genome with a standard workflow. BUSCO scores were calculated using the rickettsiales_odb10 database and v5.7.0. Polishing produced an improvement in BUSCO score from 98.6\u0026ndash;99.7%. Default parameters were used for all software unless otherwise specified. We annotated the \u003cem\u003ew\u003c/em\u003eWil genome with Prokka\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e v1.1.1 (kingdom:bacteria) to identify coding sequences (CDS), tRNAs, rRNAs, and tmRNA. GC Content and GC Skew were calculated with Proksee\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e v1.1.2. We then aligned the \u003cem\u003ew\u003c/em\u003eWil genome against the \u003cem\u003ew\u003c/em\u003eMel reference genome (CP046925.1) with BLASTn with an expected value cut-off of 0.0001. We plotted these annotations with Proksee\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e v1.1.2 to visualize the annotated genome (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eGenome annotations and assessments\u003c/h2\u003e \u003cp\u003eTo place our \u003cem\u003ew\u003c/em\u003eWil genome within the \u003cem\u003eWolbachia\u003c/em\u003e species phylogeny, we gathered a set of 27 circular, chromosome-level genome assemblies from many \u003cem\u003eWolbachia\u003c/em\u003e supergroups with broad host range\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, and used \u003cem\u003eEhrlichia chaffeensis\u003c/em\u003e as an outgroup. Genes were annotated using the NCBI Prokaryotic Genome Annotation Pipeline\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, and groups of orthologous genes (orthogroups) were identified across species with OrthoFinder2\u003csup\u003e21\u003c/sup\u003e. This produced a phylogeny based on single-copy orthologs, rooted on \u003cem\u003eE. chaffeensis\u003c/em\u003e. Additionally, we utilized BUSCO\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e analysis to characterize gene presence-absence variation across orthogroups. Our wWil assembly had a high BUSCO completeness score of 98.6% before polishing, which was comparable to the other circular, chromosome-level \u003cem\u003eWolbachia\u003c/em\u003e genomes. We found that the \u003cem\u003ew\u003c/em\u003eWil genome resides in \u003cem\u003eWolbachia\u003c/em\u003e supergroup A, alongside \u003cem\u003ew\u003c/em\u003eMel and many other fly-infecting species (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). Despite being closely related, alignment of the \u003cem\u003ew\u003c/em\u003eWil genome to the \u003cem\u003ew\u003c/em\u003eMel \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eCP046925.1\u003c/span\u003e\u003csup\u003e22\u003c/sup\u003e reference genome with Mauve\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e (snapshot 2015-02-25.1) revealed many breaks in synteny between the genomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). In general, our analysis showed a supergroup-specific pattern of gene presence-absence variation (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003eWe also performed a brief assessment of putative secreted and membrane-bound proteins that could play a role in the \u003cem\u003eWolbachia\u003c/em\u003e-host interaction. Proteins with a signal peptide were identified by SignalP\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, and proteins with a transmembrane domain were identified by TMHMM\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Those with a signal peptide and a transmembrane domain were classified as membrane-bound proteins, while those with a signal peptide but without a transmembrane domain were classified as secreted proteins. We then characterized presence-absence variation of putative secreted and membrane proteins within groups of orthologous genes across species. Finally, we identified variable sites in all proteins by calculating the Shannon entropy metric\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, and compared the number of high-entropy sites in membrane and secreted proteins versus all proteins in general.\u003c/p\u003e \u003cp\u003eJust as for all genes, there was a supergroup-specific pattern in presence-absence variation for both membrane-bound and secreted proteins across \u003cem\u003eWolbachia\u003c/em\u003e species (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Additionally, membrane and secreted protein groups had many variable sites compared to all proteins in general. The median number of variable sites in an orthogroup across all \u003cem\u003eWolbachia\u003c/em\u003e genes was one, while the medians for secreted and membrane proteins were 14 and 13.5 variable sites respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). This analysis revealed proteins with many sites that vary across diverse \u003cem\u003eWolbachia\u003c/em\u003e species with a wide host range, and thus provides candidates for further interrogating \u003cem\u003eWolbachia\u003c/em\u003e-host interactions at the molecular level.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting interests:\u003c/h2\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eJJ, and SLR designed the study. JJ and AN wrote the main manuscript text, performed genome assembly, analyzed data and prepared Figs 1-2 (JJ) 3-5 (AN). JJ, CW, and GP collected the sequencing data. HL performed the decontamination screening. MM performed genome polishing with Illumina data from CM. LK managed data. All authors reviewed the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors acknowledge the University of California Santa Cruz Genomics Institute for providing computational resources and support for this project. Funding for this project was provided by NIH (T32HG012344) awarded to JJ, CW, HL, LK, MM, and GP, the NIH awarded to SLR (R00GM135583), and the NSF-GRFP awarded to AN.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe assembled genome and the raw long and short reads are available in BioProject PRJNA1107195.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eRussell, S. L. \u0026amp; Castillo, J. R. Trends in Symbiont-Induced Host Cellular Differentiation. Results Probl. Cell Differ. 69, 137\u0026ndash;176 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWerren, J. H., Baldo, L. \u0026amp; Clark, M. E. Wolbachia: master manipulators of invertebrate biology. Nat. Rev. Microbiol. 6, 741\u0026ndash;751 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eToomey, M. E., Panaram, K., Fast, E. M., Beatty, C. \u0026amp; Frydman, H. M. Evolutionarily conserved Wolbachia-encoded factors control pattern of stem-cell niche tropism in Drosophila ovaries and favor infection. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 110, 10788\u0026ndash;10793 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiller, W. J. \u0026amp; Riegler, M. Evolutionary Dynamics of wAu-Like Wolbachia Variants in Neotropical Drosophila spp. Appl. Environ. Microbiol. 72, 826\u0026ndash;835 (2006).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStrunov, A., Schmidt, K., Kapun, M. \u0026amp; Miller, W. J. Restriction of Wolbachia Bacteria in Early Embryogenesis of Neotropical Drosophila Species via Endoplasmic Reticulum-Mediated Autophagy. mBio 13, e03863-21 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eM\u0026uuml;ller, M. J. \u003cem\u003eet al.\u003c/em\u003e Reevaluating the infection status by the \u003cem\u003eWolbachia\u003c/em\u003e endosymbiont in \u003cem\u003eDrosophila\u003c/em\u003e Neotropical species from the \u003cem\u003ewillistoni\u003c/em\u003e subgroup. Infect. Genet. Evol. 19, 232\u0026ndash;239 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDobson, S. L., Marsland, E. J., Veneti, Z., Bourtzis, K. \u0026amp; O\u0026rsquo;Neill, S. L. Characterization of Wolbachia Host Cell Range via the In Vitro Establishment of Infections. Appl. Environ. Microbiol. 68, 656\u0026ndash;660 (2002).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoskins, R. A. \u003cem\u003eet al.\u003c/em\u003e The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 25, 445\u0026ndash;458 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1303.3997\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1303.3997\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H. \u003cem\u003eet al.\u003c/em\u003e The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078\u0026ndash;2079 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQuinlan, A. R. \u0026amp; Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841\u0026ndash;842 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen, W., Le, S., Li, Y. \u0026amp; Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKolmogorov, M., Yuan, J., Lin, Y. \u0026amp; Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540\u0026ndash;546 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAstashyn, A. \u003cem\u003eet al.\u003c/em\u003e Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 25, 60 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. \u0026amp; Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032\u0026ndash;2034 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eManni, M., Berkeley, M. R., Seppey, M., Sim\u0026atilde;o, F. A. \u0026amp; Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647\u0026ndash;4654 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSeemann, T. Prokka: rapid prokaryotic genome annotation. Bioinforma. Oxf. Engl. 30, 2068\u0026ndash;2069 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrant, J. R. \u003cem\u003eet al.\u003c/em\u003e Proksee: in-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 51, W484\u0026ndash;W492 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaur, R. \u003cem\u003eet al.\u003c/em\u003e Living in the endosymbiotic world of Wolbachia: A centennial review. Cell Host Microbe 29, 879\u0026ndash;893 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, W. \u003cem\u003eet al.\u003c/em\u003e RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res. 49, D1020\u0026ndash;D1028 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmms, D. M. \u0026amp; Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuarte, E. H., Carvalho, A., L\u0026oacute;pez-Madrigal, S., Costa, J. \u0026amp; Teixeira, L. Forward genetics in Wolbachia: Regulation of Wolbachia proliferation by the amplification and deletion of an addictive genomic island. PLoS Genet. 17, e1009612 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDarling, A. C. E., Mau, B., Blattner, F. R. \u0026amp; Perna, N. T. Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Res. 14, 1394\u0026ndash;1403 (2004).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlmagro Armenteros, J. J. \u003cem\u003eet al.\u003c/em\u003e SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420\u0026ndash;423 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrogh, A., Larsson, B., von Heijne, G. \u0026amp; Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567\u0026ndash;580 (2001).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMagliery, T. J. \u0026amp; Regan, L. Sequence variation in ligand binding sites in proteins. BMC Bioinformatics 6, 240 (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrigozhin, D. M. \u0026amp; Krasileva, K. V. Analysis of intraspecies diversity reveals a subset of highly variable plant immune receptors and predicts their binding sites. Plant Cell 33, 998\u0026ndash;1015 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCabanettes, F. \u0026amp; Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958 (2018).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Wolbachia, Drosophila, symbiosis, genomics","lastPublishedDoi":"10.21203/rs.3.rs-4510571/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4510571/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e \u003cem\u003eWolbachia\u003c/em\u003e is an obligate intracellular \u0026#120572;-proteobacterium which commonly infects arthropods and filarial nematodes. Different strains of \u003cem\u003eWolbachia\u003c/em\u003e are capable of a wide range of regulatory manipulations in many hosts and modulate host cellular differentiation to influence host reproduction. The genetic basis for the majority of these phenotypes is unknown. The \u003cem\u003ew\u003c/em\u003eWil strain from the neotropical fruit fly, \u003cem\u003eDrosophila willistoni\u003c/em\u003e, exhibits a remarkably high affinity for host germline-derived cells relative to the soma. This trait could be leveraged for understanding how \u003cem\u003eWolbachia\u003c/em\u003e influences the host germline and for controlling host populations in the field. To further the use of this strain in biological and biomedical research, we sequenced the genome of the \u003cem\u003ew\u003c/em\u003eWil strain isolated from host cell culture cells. Here, we present the first high quality nanopore assembly of \u003cem\u003ew\u003c/em\u003eWil, the \u003cem\u003eWolbachia\u003c/em\u003e endosymbiont of \u003cem\u003eD. willistoni\u003c/em\u003e. Our assembly resulted in a circular genome of 1.27 Mb with a BUSCO completeness score of 99.7%. Consistent with other insect-associated \u003cem\u003eWolbachia\u003c/em\u003e strains, comparative genomic analysis revealed that wWil has a highly mosaic genome relative to the closely related wMel strain from \u003cem\u003eDrosophila melanogaster\u003c/em\u003e.\u003c/p\u003e","manuscriptTitle":"Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-14 19:46:40","doi":"10.21203/rs.3.rs-4510571/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-06-25T04:41:20+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-06-21T08:44:21+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-06-18T13:52:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"292092502506644142761825411309042425389","date":"2024-06-18T00:10:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"307348815031541655201589566745332398463","date":"2024-06-10T12:50:40+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-06-10T12:47:32+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-06-10T12:30:43+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2024-06-04T11:31:54+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-06-01T10:08:02+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2024-05-31T18:00:24+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8519d383-0b70-4311-8036-378cf288ce2d","owner":[],"postedDate":"June 14th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":33159177,"name":"Biological sciences/Genetics/Genomics"},{"id":33159178,"name":"Biological sciences/Microbiology/Bacteria/Symbiosis"}],"tags":[],"updatedAt":"2024-08-05T16:06:51+00:00","versionOfRecord":{"articleIdentity":"rs-4510571","link":"https://doi.org/10.1038/s41598-024-68716-w","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2024-08-01 15:58:09","publishedOnDateReadable":"August 1st, 2024"},"versionCreatedAt":"2024-06-14 19:46:40","video":"","vorDoi":"10.1038/s41598-024-68716-w","vorDoiUrl":"https://doi.org/10.1038/s41598-024-68716-w","workflowStages":[]},"version":"v1","identity":"rs-4510571","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4510571","identity":"rs-4510571","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00