1
1 Plos Pathogens
2 Identification of a viral gene essential for the genome replication of a domesticated endogenous
3 virus in ichneumonid parasitoid wasps.
4 Short title (70 characters): A viral gene essential for ichneumonid DEV local DNA amplification
5
6 Ange LORENZI 1,2¶, Fabrice LEGEAI 3,4¶, Véronique JOUAN 1, Pierre-Alain GIRARD 1, Michael R.
7 STRAND2, Marc RAVALLEC 1, Magali EYCHENNE 1, Anthony BRETAUDEAU 3,4, Stéphanie ROBIN 3,4,
8 Jeanne ROCHEFORT 1, Mathilde VILLEGAS 1, Denis TAGU 3, Gaelen R. BURKE 2, Rita REBOLLO 5,
9 Nicolas NÈGRE1*, Anne-Nathalie VOLKOFF1*.
10
11 1 DGIMI, Montpellier University, INRAE, Montpellier, France
12 2 Department of Entomology, University of Georgia, Athens, Georgia, 30602, United States
13 3 INRAE, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics
14 Platform for Agroecosystems Arthropods (BIPAA), Campus Beaulieu, 35042 Rennes, France
15 4 INRIA, IRISA, GenOuest Core Facility, Campus de Beaulieu, Rennes 35042, France
16 5 Univ Lyon, INRAE, INSA Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
17
18 * Corresponding authors:
19 Anne-Nathalie VOLKOFF,
[email protected]
20 Nicolas NÈGRE,
[email protected]
21
22 ¶ These authors contributed equally to this work.
23
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
2
24 Abstract (300 words)
25 Thousands of endoparasitoid wasp species in the families Braconidae and Ichneumonidae harbor
26 "domesticated endogenous viruses" (DEVs) in their genomes. This study focuses on ichneumonid
27 DEVs, named ichnoviruses (IVs), which derive from an unknown virus and produce virions in ovary calyx
28 cells during the pupal and adult stages of female wasps. Females inject IV virions into host insects when
29 laying eggs. Virions infect cells which express IV genes with functions required for wasp progeny
30 development. IVs have a dispersed genome consisting of two genetic components: proviral segment
31 loci that serve as templates for circular dsDNAs that are packaged into capsids, and genes from an
32 ancestral virus controlling virion production. Because of the lack of homology with known viral genes,
33 the molecular control mechanisms of IV genome are largely uncharacterized. We generated a
34 chromosome-scale genome assembly for Hyposoter didymator and identified a total of 67 H. didymator
35 ichnovirus (HdIV) loci distributed across the 12 wasp chromosomes. By analyzing genomic DNA levels,
36 we found that all HdIV loci were locally amplified in calyx cells during the wasp pupal stage, suggesting
37 the implication of viral proteins in DNA replication. We tested a candidate HdIV gene, U16, encoding a
38 protein with a conserved domain found in primases and which is transcribed in calyx cells during the
39 initial stages of replication. Knockdown of U16 by RNA interference inhibited amplification of all HdIV
40 loci, as well as HdIV gene transcription, circular molecule production and virion morphogenesis in calyx
41 cells. Altogether, our results showed that viral DNA amplification is an early step of IV replication
42 essential for virions production, and demonstrated the implication of the viral gene U16 in this process.
43
44 Author Summary (150-200 words)
45 Parasitoid "domesticated endogenous viruses" (DEVs) provide a fascinating example of eukaryotes
46 acquiring new functions through integration of a virus genome. DEVs consist of multiple loci in the
47 genomes of wasps. Upon activation, these elements collectively orchestrate the production of virions or
48 virus-like particles that are crucial for successful parasitism of host insects. Despite the significance of
49 DEVs for parasitoid biology, the mechanisms regulating key steps in virion morphogenesis are largely
50 unknown. In this study, we focused on the ichneumonid parasitoid Hyposoter didymator, which harbors
51 an ichnovirus consisting of 67 proviral loci. Our findings reveal that all proviral loci are simultaneously
52 amplified in ovary calyx cells of female wasps during the early pupal stage suggesting a hijacking of
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
3
53 cellular replication complexes by viral proteins. We tested the implication of such a candidate, U16,
54 encoding a protein with a weakly conserved primase C-terminal domain. Silencing U16 resulted in
55 inhibited viral DNA amplification and virion production, underscoring the key role of this gene for
56 ichnovirus replication. This study provides evidence that genes involved in viral DNA replication have
57 been conserved during the domestication of viruses in the genomes of ichneumonid wasps.
58
59 Introduction
60 Endogenous viral elements (EVEs) refer to viral sequences in eukaryotic genomes that originate from
61 complete or partial integration of a viral genome into the germline [1]. While retroviruses are the best-
62 known sources of EVEs, bioinformatic studies have also identified non-retroviral EVEs across a diverse
63 range of organisms [2]. Although many EVEs become non-functional and decay through neutral
64 evolution [3], some have been preserved and repurposed by their hosts for new functions, often as short
65 regulatory sequences or individual genes [4,5]. A notable exception to this pattern is observed in
66 domesticated endogenous viruses (DEVs) that have been identified in four lineages of endoparasitoid
67 wasps - insects that lay eggs and develop within the bodies of other insects [6]. Parasitoid DEVs consist
68 of numerous genes conserved within the wasp genome that originate from the integration of complete
69 viral genomes. Unlike other EVEs, these genes remain functional and actively interact to produce virus
70 particles in calyx cells, which are located in the apical part of the oviducts of female wasps [7]. Viral
71 particles are produced in the pupal and adult stages, and accumulate in the oviducts of the wasp. Adult
72 female wasps inject these particles along with eggs into insect hosts where they have essential functions
73 in the successful development of wasp offspring [8].
74 Parasitoid DEVs are prevalent among species in two wasp families named the Braconidae and
75 Ichneumonidae. The DEVs identified in these families have evolved from different virus ancestors but
76 through convergence have been similarly repurposed to produce either virions containing circular
77 double-stranded (ds) DNAs or virus-like particles (VLPs) lacking nucleic acid. The hyperdiverse
78 Microgastroid complex in the family Braconidae harbors DEVs named bracoviruses (BVs). BVs evolved
79 from a virus ancestor in the family Nudiviridae [9]. Wasps harboring BVs produce virions containing
80 circular dsDNAs. Other braconids in the subfamily Opiinae and ichneumonids in the subfamily
81 Campopleginae independently acquired two other distinct nudiviruses that wasps have coopted to
82 produce VLPs [10, 11]. The fourth identified DEV lineage, named ichnoviruses (IVs), is present in two
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
4
83 ichneumonid subfamilies (Campopleginae and Banchinae) which produce virions containing circular
84 dsDNAs. Unlike the other three DEVs, IVs likely originated from a Nucleocytoplasmic Large DNA Virus
85 (NCLDV) but the precise ancestor remains unknown [12, 13].
86 BVs have been more studied than IVs but the latter are intriguing because of their uncertain origins.
87 Despite differences in ancestry and gene content, BV and IV genomes are similarly organized into two
88 components that have distinct functions [14]. Insights into the genome components of IVs primarily
89 derive from sequencing two campoplegine wasps named Hyposoter didymator and Campoletis
90 sonorensis [15], along with calyx transcriptome studies [12, 13, 16, 17] and proteomic analyses of
91 purified virions [12, 13]. The first genome component of IVs are domains in the wasp genome that show
92 evidence of deriving from the virus ancestor and having essential functions in virion formation. These
93 domains, named "Ichnovirus Structural Protein Encoding Regions" (IVSPERs), contain intronless genes
94 that are specifically transcribed in calyx cells [12, 13, 17]. Most IVSPER genes are transcribed at the
95 onset of pupation in hyaline stage 1 pupae [16], and some genes in IVSPERs encode proteins
96 associated with IV virions [12, 13]. Six genes have been knocked down by RNA interference (RNAi) in
97 H. didymator which demonstrated that they have functions in virion assembly or cell trafficking [16]. Five
98 IVSPERs have been identified in the H. didymator and C. sonorensis genomes [15], while three have
99 been identified in the genome of the more distantly related banchine G. fumiferanae [13]. The content
100 of IVSPER genes is notably similar between ichneumonid wasp species [12, 13, 17], and their gene
101 order is well-conserved among campoplegine species [15]. Additionally, one intronless gene (U37) was
102 identified in the H. didymator and C. sonorensis genomes outside of any IVSPER with features
103 suggesting it also derives from the virus ancestor [15]. Together, these genes, whether found within or
104 outside IVSPERs, represent the fingerprints of the ancestral viral machinery essential for virion
105 production and are designated as IV core replication genes. Notably, none of these genes are packaged
106 in virions, indicating that IV core genes can only be transmitted vertically through the germline of
107 associated parasitoids.
108 The second component of IV genomes are domains referred to as "proviral segments," which are
109 amplified in calyx cells and produce the circular dsDNAs that are packaged into capsids [18, 19]. The
110 number of proviral segments, typically exceeding 50, are widely dispersed in wasp genomes and exhibit
111 considerable variability between wasp species, [15]. Each proviral segment is characterized by flanking
112 direct repeats (DRs) of variable length (1 kb) and homology that identify where homologous
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
5
113 recombination processes occur to produce circularized DNAs [18, 19]. Some IV proviral segments also
114 contain internal repeats that facilitate additional homologous recombination events, and produce
115 multiple overlapping or nested circularized DNAs per proviral segment [15, 18]. Proviral segments
116 encode genes with and without introns that are predominantly expressed in the hosts of wasps after
117 virion infection [20, 21, 22, 23]. While IV core replication genes represent the conserved viral machinery
118 that produces virions in calyx cells, proviral segments constitute the IV genome components that virions
119 transfer to the hosts wasps parasitize. These segments also play a major role in the virulence of IVs,
120 which contributes to the successful development of parasitoid progeny.
121 The replication of IVs, encompassing the processes leading to the production of virions containing IV
122 segments, occurs within the nuclei of calyx cells during pupal and adult developmental stages [7, 24].
123 Electron microscopy studies of H. didymator ichonovirus (HdIV) shows that fusiform-shaped capsids are
124 individually enveloped in the nuclei of calyx cells during the late pupal stage (pigmented pupae, stage
125 3) [16]. These enveloped "subvirions" exit the nucleus, traverse the cytoplasm, and exit calyx cells by
126 budding, resulting in mature virions with two envelopes that accumulate in the calyx lumen of the ovaries
127 [7, 24]. Earlier findings indicated that IVSPERs and proviral segments undergo amplification in newly
128 emerged adult wasps [12]. However, these data focused on only a subset of IVSPER genes and one
129 proviral segment, leaving our knowledge of whether all IV genome components are amplified in calyx
130 cells incomplete. Similarly, the initiation time of amplification during pupal development and IV virion
131 production remains unknown. The specific role of IV core genes in virion production is also poorly
132 documented when compared to BVs [25, 26]. The limited sequence homology of IVSPER genes with
133 genes in other viruses provides minimal insights into potential functions. To date, only the six genes
134 mentioned above that are involved in subvirion assembly or cell trafficking have been studied [16].
135 In this work, we explored IV replication using the campoplegine wasp H. didymator. We first generated
136 a chromosome-level assembly for the H. didymator genome. Through this assembly, we determined
137 that all genome components undergo local amplification in calyx cells which initiates between pupal
138 stages 1 and 2. Notably, IVSPERs, isolated IV core genes, and proviral segments were amplified in
139 large regions with non-discrete boundaries. Next, we studied the function of U16 which is located on H.
140 didymator IVSPER-3. U16 is one of the most transcribed IVSPER genes during the initial pupal stage
141 and contains a weakly conserved domain found in the C-terminus of primases. RNAi knockdown of U16
142 inhibited virion formation. Knockdown also significantly reduced DNA amplification of all HdIV genome
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
6
143 components, which decreased transcript abundance of IV core genes and the abundance of circular
144 dsDNA viral molecules. We conclude U16 is an essential gene for amplification of the HdIV genome and
145 virion production, demonstrating that genes from the IV ancestor regulating IV replication have been
146 conserved during virus domestication. Additionally, our results show that viral DNA amplification is
147 essential for IV virion production.
148 Results
149 Genomic localization of Hyposoter didymator IV components in a novel chromosome-level
150 assembly.
151 The genome assembly for H. didymator we previously generated [15] consisted of 2,591 scaffolds with
152 an N50 of 4 Mbp. We concluded this assembly was overly fragmented to evaluate DNA amplification in
153 calyx cells during virion morphogenesis. We therefore used proximity ligation technology to produce a
154 new chromosome level assembly consisting of twelve large scaffolds that corresponds with the haploid
155 karyotype for H. didymator [27]. The sizes of these scaffolds ranged from 6.7 Mbp to 29.3 Mbp (S1
156 Dataset A, B).
157 The five IVSPERs (IVSPER-1 to IVSPER-5), the predicted IV core gene (U37) located outside of an
158 IVSPER, and 53 of the 54 previously identified proviral segment loci (Hd1 to Hd54) [15] were identified
159 in the new assembly. The new assembly did not include the scaffold containing Hd51, possibly due to
160 low-quality sequencing data (S1 Dataset, B). Our chromosome-level assembly revealed that each
161 scaffold contained at least one HdIV locus, but notably, all IVSPERs and 40% of the proviral segment
162 loci resided on two (scaffold 7 and 11) (S1 Dataset, B).
163 While three IVSPERs and the majority of proviral segments were distantly located from each other in
164 the H. didymator genome, there were exceptions to this pattern including certain pairs of proviral
165 segments separated by less than 20 kb (e.g., Hd36 and Hd38; Hd46 and Hd43; Hd44.1 and Hd44.2;
166 Hd12 and Hd16). In all of these cases, the paired segments exhibited significant homology which
167 suggested they derive from recent duplication events (S1 Dataset, C). Additionally, several proviral
168 segments were in proximity to IVSPERs or IV replication genes that resided outside of IVSPERs (e.g.,
169 Hd46 near U37; Hd29 and Hd24 on each side of IVSPER-2; Hd15 near IVSPER-1; also see below).
170 Amplification of Hyposoter didymator IV genome components in calyx cells during wasp pupal
171 development.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
7
172 To investigate whether all or only specific components of the HdIV genome undergo amplification in
173 association with virion morphogenesis, we isolated DNA from calyx cells from stage 1 pupae (one day
174 old, hyaline) and stage 3 pupae (five days old, pigmented abdomen). We then generated paired-end
175 libraries, which were sequenced using the Illumina platform, followed by read alignment to the new
176 chromosome-level genome assembly. When analyzing the reads from stage 1 pupae, read coverage
177 per HdIV locus did not differ significantly from the coverage of randomly selected regions of the same
178 size from the rest of the wasp genome (Fig 1A). In contrast, read coverage for stage 3 pupae was higher
179 for all HdIV loci when compared to the rest of the wasp genome or to values obtained for pupal stage 1
180 (Fig 1A, S1 Table).
181 To more precisely investigate the temporal dynamics of amplification, we conducted relative quantitative
182 (q) PCR assays that measured copy number of genes in IVSPER-1, -2, and -3 in calyx DNA samples
183 that were collected from stage 1-4 pupae. We compared these treatments to DNA samples from hind
184 legs of stage 1 pupae where no HdIV replication occurs. We also included a wasp gene (XRCC1) located
185 in close proximity to IVSPER-1. Results showed that copy number of each tested gene was similar in
186 calyx and hind legs in stage 1 pupae, indicating none were amplified during the initial pupal stage.
187 Subsequently, the copy number of each gene increased progressively with each pupal stage (Fig 1B).
188 While exhibiting lower amplification levels than the IVSPER genes we analyzed, a similar trend was
189 observed for the wasp gene XRCC1 (Fig. 1B). These findings indicated IVSPER amplification in calyx
190 cells begins between pupal stage 1 and stage 2, which further increased in pupal stage 3 and 4.
191 Fig 1. DNA amplification of HdIV loci. (A) Coverage of HdIV loci compared to the rest of the wasp
192 genome. Read coverage values per analyzed region (see Materials and Methods) are presented for
193 each locus type (proviral segments and IVSPERs) at pupal stage 1 (hyaline pupa) and pupal stage 3
194 (pigmented pupa). The coverages per HdIV locus are compared to the coverage per random genome
195 regions outside of HdIV loci (wasp). Note that the coverage value for random wasp regions is lower for
196 DNA samples collected from stage 3 versus stage 1 pupae. This difference is attributed to the higher
197 proportion of reads mapping to HdIV regions among the total number of reads in stage 3 compared to
198 stage 1. The significance levels are indicated as follows: ns = non-significant, **p<0.01, and ***p<0.001.
199 (B) qPCR analysis of select IVSPER genes in calyx cells during wasp pupal development. Top panel.
200 A schematic representation of H. didymator IVSPERs-1, -2, and -3 (GenBank GQ923581.1,
201 GQ923582.1, and GQ923583.1); genes selected for qPCR assays are highlighted in white. U1-24 are
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
8
202 unknown protein-encoding genes, while IVSPs are members of a gene family encoding ichnovirus
203 structural proteins. Bottom panel. Genomic (g) DNA amplification levels of IVSPER genes and wasp
204 XRCC1 in calyx cells from pupal stage 1-4. The XRCC1 (X-Ray Repair Cross Complementing 1)
205 encoding gene is located 1,200 bp from U1 (position 3,270,470 to 3,272,519 in Scaffold-11). Data
206 corresponds to gDNA amplification relative to amplification of the housekeeping gene elongation factor
207 1 (ELF1). The Y-axis was transformed using the square root function for better data visualization.
208 Differential levels of amplification across all components of the HdIV genome
209 The qPCR results presented in Fig 1 indicated amplification levels varied, with genes in IVSPER-3
210 exhibiting higher levels of amplification than genes in IVSPER-1 and -2 (Fig 1B). This variability was
211 corroborated genome-wide by analyzing read coverage per position and the ratio between stage 3 and
212 stage 1 (Fig 2, S1 Fig). Amplification levels of IVSPER loci, determined at the summit of the coverage
213 curve, ranged from 10X for IVSPER-5 in Scaffold-7 to over 200X for IVSPER-3 in Scaffold-3 (S1 Table).
214 This observation aligned with the findings from qPCR analyses, indicating that genes in IVSPER-3 were
215 more highly amplified than those in IVSPER-1 and -2 (Fig 1B). Read mapping further indicated that the
216 peak of amplification occurs toward the middle of each IVSPER (Fig 1B, S1 Fig), consistent with qPCR
217 analyses revealing that within each IVSPER, genes closer to the cluster boundary tended to exhibit
218 lower levels of amplification compared to genes situated in the middle of the cluster (Fig 1B).
219 Fig 2. HdIV DNA amplification. DNA amplification in pupal stage 3 was assessed by mapping genomic
220 DNA Illumina reads against the 12 large H. didymator genome scaffolds. In each scaffold, red bars
221 indicate amplified loci, with the intensity of red corresponding to increased values of the CPM ratio
222 between pupal stage 3 and pupal stage 1. The positions of IVSPERs and isolated IV replication genes
223 are indicated by purple squares, while proviral segments are indicated by green circles. For selected
224 HdIV loci, amplification curves (representing the ratio of the CPM values calculated for 10 bp intervals
225 between pupal stage 3 and pupal stage 1) are shown in boxes. Amplification curves for all of the
226 annotated HdIV loci are shown in S1 Fig. Each HdIV locus is indicated in red while 10,000 bp of flanking
227 sequence on each side of the locus is also shown. For proviral segments, loci are defined as the
228 sequence delimited by two direct repeats; IVSPERs are defined as the region between the start and
229 stop codon of the first and last coding sequences in the cluster; isolated IV replication genes are defined
230 by their coding sequence.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
9
231 Proviral segment loci were relatively more amplified than IV replication gene loci, and also variable in
232 intensity (Fig 2, S1 Fig). For example, coverage ratio between stages 3 and 1 ranged from 30X for
233 proviral locus Hd40 in Scaffold-6 to over 1,100X for Hd27 in Scaffold-7 (S1 Table) at the summit of the
234 coverage curves. Variability in the number of reads mapping to a given proviral locus was consistent
235 with earlier studies indicating that the circularized DNAs packaged into IV capsids are non-equimolar in
236 abundance [8, 28].
237 All proviral segments consistently exhibited a substantial increase in amplification that peaked between
238 the two DRs (as exemplified by Hd14 or Hd12 in S2 Fig). For numerous proviral loci, the reads mapping
239 between the flanking DRs displayed uniform coverage. However, in other cases, peaks with varying
240 read coverage were evident (as exemplified by Hd32 or Hd16 in S2 Fig). This differential coverage
241 usually applied to proviral segments containing more than one pair of DRs, as illustrated by proviral
242 locus Hd11 (Fig 3A) or Hd32 and Hd16 (S2 Fig). Previous studies indicated Hd11 contains two pairs of
243 DRs, enabling the formation of two nested, circularized segments termed Hd11-1 (formed by
244 recombination between DR1Left (DR1L) and DR1Right (DR1R)) and Hd11-2 (formed by recombination
245 between DR2L and DR2R) (Fig 3A). Reads mapping to the Hd11 locus (bounded by DR1L and DR2R)
246 exhibited three relatively uniform plateaus of different values. Two plateaus corresponded to reads
247 mapping to the predicted locations of Hd11-1 (235X) and Hd11-2 (111X), while the central region with
248 higher coverage (311X) corresponded to reads mapping to both nested segments (Fig 3A). This
249 differential coverage would not be expected if reads mapped only to Hd11 chromosomal DNA.
250 Consequently, the pattern of proviral segment coverage suggested part of the coverage values were
251 due to reads mapping to amplification intermediates and/or circularized dsDNAs that were also present
252 in our DNA samples. Some amplified HdIV loci contain both an IVSPER and proviral segments. Two of
253 these loci resided on Scaffold-11 (Hd29, IVSPER-2, Hd24, and Hd33, Hd15, IVSPER-1 (Fig 3B)). For
254 these loci, the amplification curves spanned the length of the amplified region (yellow dotted line in Fig
255 3B) but were interrupted by peaks corresponding to the length of proviral segments. This pattern
256 suggested amplification levels of the chromosomal form of the proviral segments could correspond to
257 the IVSPER amplification curves, but were higher because reads additionally mapped to circular
258 dsDNAs or amplification intermediates.
259 Fig 3. HdIV amplified regions in Scaffold-11. (A) Detail of the amplified region at the Hd11 locus. (B)
260 Detail of two other amplified regions containing IVSPERs and HdIV proviral loci. In (A) and (B),
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
10
261 amplification curves represent the ratio of the CPM values (calculated for 10 bp intervals) obtained in
262 pupal stage 3 compared to pupal stage 1. For each locus, amplification values at the summit of the
263 peaks and at the start and end positions of HdIV segments are indicated. In (B), amplification curves
264 of IVSPERs are highlighted in yellow. Each amplification curve figure was generated by Integrated
265 Genome Viewer (IGV) [29].
266 Amplification of H. didymator IV genome components in extensive wasp genome domains with
267 undefined boundaries
268 Since our read coverage data indicated amplified regions were larger than the annotated HdIV loci (Fig
269 2, S1 Fig), we used the MACS2 peak calling program, originally developed for chromatin
270 immunoprecipitation sequencing experiments, to identify areas in the H. didymator genome that were
271 enriched for reads when compared to a control [30]. Amplification peaks were called with MACS2 using
272 alignments from stage 3 pupae as the treatment and alignments from stage 1 pupae as the control.
273 MACS2 identified all HdIV genome components that we had annotated in our earlier study [15] plus
274 several previously unrecognized domains (S2 Table). Manual curation (see Materials and Methods
275 section) indicated three of these new domains were proviral segment loci that we named Hd52, Hd53,
276 and Hd54. Five others were intronless genes, suggesting origins from the IV ancestor, that were outside
277 of IVSPERs. We thus named these genes U38, U39, U40, U41, and U42. The remaining domains
278 detected by MACS2 either contained predicted wasp genes or lacked any features that identified them
279 as IV replication genes or proviral segments. Altogether, the MACS2 algorithm predicted a total of 55
280 domains in the H. didymator genome containing HdIV loci. Two proviral segments (Hd45.1 on Scaffold-4
281 and Hd2-like on Scaffold-7) escaped MACS2 detection, possibly because they were located too close
282 to the ends of each scaffold. However, our read mapping data clearly indicated these two segments are
283 amplified in stage 3 (Table 1) with a profile similar to the other segments (S1 Fig). In total, our read
284 mapping and MACS2 data indicated the H. didymator genome contains 67 HdIV loci (56 proviral
285 segments, five IVSPERs, and six predicted IV replication genes that reside outside of IVSPERs) that
286 are amplified in calyx cells at pupal stage 3 (Fig 2, Table 1).
287 Table 1. All HdIV loci amplified in calyx cells from stage 3 pupae identified by read mapping
288 and/or the MACS2 algorithm. For each scaffold, the position and size of the HdIV loci are indicated.
289 Loci newly identified in the present work are marked with asterisks. Corresponding amplified regions
290 (i.e., the peak predicted by the MACS2 algorithm) are provided for each locus or groups of loci. Start
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
11
291 and end positions delimiting the HdIV loci and the amplified regions detected by MACS2 are indicated.
292 The distance between the start or the end of the amplified region and the locus is presented. For each
293 HdIV locus and amplified region detected by MACS2, coverage values are provided for calyx cell
294 samples collected from stage 1 or stage 3 pupae. Coverage is based on the length of the HdIV locus or
295 amplified region. ND indicates amplified regions not detected by MACS2.
296 Our overall results also indicated all amplified regions in the H. didymator genome containing HdIV loci
297 consist of the annotated HdIV locus plus flanking wasp sequence consistent with our detailed analysis
298 of the wasp gene XRCC1 that is located in close proximity to IVSPER-1 (Fig 1B). Across all HdIV loci,
299 we determined that the flanking regions containing wasp sequence that were amplified varied from 7,000
300 to 15,000 bp (Table 1). The total size of the amplified regions ranged from 10,692 bp (Hd28 on Scaffold-
301 12) to 54,005 bp (IVSPER-2 on Scaffold-11). Most amplified regions contained a single HdIV locus, but
302 seven contained a mix of HdIV genome components (Table 1). Three amplified regions contained the
303 neighboring and closely related proviral segments mentioned above (e.g., Hd36 and Hd38 on Scaffold-1,
304 Hd44.1 and Hd44.2 on Scaffold-2, Hd12 and Hd16 on Scaffold-11). In addition to the two examples
305 noted above on Scaffold 11 (see Fig. 3B), two other amplified loci also contained both IVSPERs and
306 proviral segments (U37, Hd46, and Hd43 on Scaffold-2; U40 and Hd39 on Scaffold-9). Lastly, we
307 searched for sequence signatures that potentially identify the amplification boundaries for each HdIV
308 locus. However, our analysis identified only low-complexity A-tract sequences, which were not specific
309 to HdIV components as they were also found in random wasp genomic sequences (S3 Fig). Thus, no
310 motifs were identified that distinguished the amplification boundaries of HdIV loci.
311 RNAi knockdown of U16 inhibits virion morphogenesis.
312 We selected the gene U16 located on H. didymator IVSPER-3 as a factor with potential functions in
313 activating IV replication. U16 is conserved among all IV-producing wasps for which genome or
314 transcriptome data is available (Fig 4A). In H. didymator calyx cells, U16 is also one of the most
315 transcribed IV genes detected in calyx cells from stage 1 pupae [16]. Sequence analysis using the basic
316 local alignment search tool and DeepLoc2.0 predicted all U16 family members contain a C-terminal
317 alpha-helical domain (PriCT-2) of unknown function that is present in several primases [31] (Iyer et al.,
318 2005) and a nuclear localization signal (Fig 4A, S2 Dataset). We next assessed the effects of knocking
319 down U16 by RNAi on virion morphogenesis in calyx cells. We injected newly pupated wasps with
320 dsRNAs that specifically targeted U16 using previously established methods [16]. RT-qPCR analysis
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
12
321 indicated transcript abundance in the calyx of newly emerged adult females was reduced more than
322 90% when compared to control wasps that were injected with dsGFP (Fig 4B). Inspection of the ovaries
323 further indicated that the calyx lumen of control wasps contained blue 'calyx fluid' indicative of HdIV
324 virions being present, whereas almost no calyx fluid was seen in dsU16-injected wasps (Fig 4B).
325 Examination of calyx cell nuclei by transmission electron microscopy similarly showed that calyx cells in
326 one day old control females contained an abundance of subvirions, whereas no subvirions were
327 observed in treatment wasps (Fig 4C). We thus concluded that U16 is required for virion morphogenesis.
328 Fig 4. RNAi knockdown of U16. (A) U16 proteins identified in the campoplegine Hyposoter didymator
329 [12], Campoletis sonorensis [15], and Bathyplectes anurus [17], and in two banchine wasps Glypta
330 fumiferanae [13] and Lissonota sp. [32]. For each, protein size, percentage of identity with H. didymator
331 protein and location of the PRiCT_2 domain are indicated. (B) RT-qPCR data showing relative
332 expression of U16 in dsGFP (control) and ds U16 injected females. ** p<0.01. Images of ovaries
333 dissected from newly emerged adult females that were injected with dsGFP (left) or dsU16 (right). Note
334 the blue color in the oviduct of the dsGFP control indicating the presence of HdIV virions. (C) Schematics
335 and electron micrographs showing that (a) calyx cell nuclei (N) from females treated with dsGFP-injected
336 contain subvirions (V) while (b) calyx cell from a dsU16 -injected wasps do not. This results in no
337 accumulation of virions in the calyx lumen as illustrated in the schematic images. CL, calyx lumen; Cyt,
338 cytoplasm. Scale bars = 5 μm, zooms = 1 μm.
339 RNAi knockdown of U16 also disables amplification of HdIV loci
340 Since U16 contained a domain found in primases, we investigated whether RNAi knockdown also
341 disabled amplification of HdIV genome components. We injected newly pupated wasps with dsU16 or
342 dsGFP, followed by isolation and deep sequencing of calyx cell DNA from stage 3 pupae in three
343 independent replicates. Mapping the reads from dsGFP-treated calyx samples to the H. didymator
344 genome indicated all HdIV loci were amplified as evidenced by higher coverage values when compared
345 to random regions of the wasp genome (Fig 5A). Conversely, coverage values did not differ between
346 HdIV loci and other regions of the wasp genome in dsU16-treated calyx samples (Fig 5A). When
347 analyzing coverage per each HdIV genome component (IVSPERs, isolated IV replication genes, or HdIV
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
13
348 proviral segments), we also determined that values were systematically lower for the dsU16 than
349 dsGFP-treatments (Fig 5B and 5C, S3 Table).
350 Fig 5. Impact of U16 RNAi knockdown on DNA proviral amplification. (A) Comparative distribution
351 of read coverages in ds GFP- and dsU16-injected females. For each of the three replicates, coverage
352 values are given per HdIV loci (V) and per random genome regions outside of the HdIV loci (W), both
353 with the same size distribution. IVSPERs and IV replication genes loci are shown in the left panel, while
354 proviral segment loci are shown in the right panel. (B) Coverage values per IVSPERs, and per IV
355 replication genes residing outside an IVSPER, in the three biological replicates of both dsU16- and
356 dsGFP-injected samples. Names of HdIV loci are indicated as well as the scaffold (Scaf-) they are
357 located in. (C) Coverage values for proviral segment loci in the three biological replicates of the dsU16
358 and dsGFP samples. For better visualization, only the scaffold (Scaf-) in which the proviral segments
359 are located is indicated. The list of the proviral segment loci within each scaffold is available in Table 1.
360 The y-axis was transformed by the log function for better data visualization. Statistical analyses are
361 available at https://github.com/flegeai/EVE_amplification.
362 We extended our analysis by injecting dsGFP or dsU16 into newly formed pupae, followed by isolation
363 of DNA from calyx cells and hind legs, where no HdIV replication occurs. We then used specific primers
364 and qPCR assays that measured DNA abundance of three wasp genes, selected HdIV replication genes
365 inside and outside of IVSPERs, and selected HdIV genes in different proviral segments. As anticipated,
366 no genes were amplified in hind legs from either control or treatment wasps (Fig 6). In dsGFP-injected
367 control wasps, all HdIV genes were amplified in calyx cell samples (Fig 6). Among the wasp genes, only
368 XRCC1 exhibited significant amplification, consistent with its location within the IVSPER-1 amplified
369 region (Fig 6). In contrast, when examining calyx cell DNA from wasps injected with dsU16, none of the
370 HdIV genes nor XRCC1 were amplified (Fig 6). Altogether, our results indicated U16 is required for
371 amplification of all HdIV loci.
372 Fig 6. Impact of U16 RNAi knockdown on amplification of select wasp and HdIV genes. Relative
373 genomic amplification of selected HdIV genes in two-day-old females injected with dsGFP or dsU16.
374 The wasp gene XRCC1, located within the amplified region of the IVSPER-1 locus, was incorporated
375 into the analysis. Wasp histone (H1) and ribosomal protein (rpl) genes served as controls. Samples
376 were obtained from calyx cells (where virion are produced) and hind legs (control). Statistical
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
14
377 significance levels are denoted as follows: ns = non-significant, *p<0.05, **p<0.01, and ***p<0.001. The
378 y-axis values were transformed using the square root function for better data visualization.
379 Impact of DNA amplification on IV replication gene transcription levels and abundance of
380 circularized HdIV molecules in calyx cells.
381 We hypothesized that amplification of IV replication genes would increase transcript abundance which
382 in turn would be affected by inhibiting HdIV DNA amplification. We thus compared transcript abundance
383 of various genes in IVSPER-1, -2, and -3, in calyx RNA samples that were collected from wasps treated
384 with dsU16 or dsGFP. U16 knockdown reduced expression of every HdIV replication gene we examined
385 (Fig 7A). Finally, we investigated the impact of U16 knockdown on the abundance of the circularized
386 dsDNAs that are processed from amplified proviral segments. For this assay, we used PCR primers that
387 specifically amplified the proviral form, circularized (episomal) form or both forms of Hd29 (Fig 7B).
388 Results showed a significant reduction in both the proviral and circularized forms of Hd29 in calyx cell
389 DNA from wasps injected with dsU16 when compared to DNA from wasps injected with dsGFP (Fig.
390 7B). Our results thus indicated U16 is required for proviral segment amplification which is also required
391 for production of circularized segments.
392 Fig 7. Impact of U16 RNAi knockdown on HdIV replication gene expression and proviral segment
393 amplification. (A) Relative expression of nine IVSPER genes in 2-day-old adult females injected with
394 dsGFP (control) or dsU16. (B) Relative DNA amplification of the integrated linear (proviral) and
395 circularized (episomal) forms of viral segment Hd29 in 2-day-old adult females injected with dsGFP
396 (control) or dsU16. The left panel illustrates the position of primer pairs designed to selectively amplify
397 the proviral form (Proviral Left and Right, indicated by red and black arrows), the circularized form
398 (Episomal, red arrows), or both (Proviral + Episomal, brown arrows). The right panel presents the relative
399 amplification of each form using DNA from dsGFP- and dsU16-injected females. In both (A) and (B),
400 significance levels are indicated as follows: ns = non-significant, *p<0.005, **p<0.01, and ***p<0.001.
401 The y-axis values were transformed using the square root function for better data visualization.
402 Discussion
403 During parasitism, wasps associated with IVs, BVs and other DEVs simultaneously inject virus-derived
404 particles and eggs into their host. The role of DEV-derived particles in the success of wasp parasitism
405 is well documented in the literature [22, 33, 34]. BVs, which evolved from a nudivirus, share a set of
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
15
406 genes homologous to nudivirus and baculovirus core genes. Functional studies, guided in part by these
407 similarities, have provided insights into several key processes underlying BV virion production.
408 Identification of BV core genes that regulate the expression of other BV core genes encoding structural
409 proteins [25], are involved in BV virion formation [25, 26, 35], or are required for processing proviral
410 segments into circular DNA molecules packaged into capsids [25, 26] have been documented. In
411 contrast, identifying the components of IV genomes and functions of IV genes regulating replication is
412 more challenging because the hypothesized NCLDV ancestor is unknown. In turn, IV genome
413 components with known or hypothesized functions in replication share little or no homology with known
414 viruses. This study significantly advances understanding of IV replication by generating a chromosome
415 level assembly for the H. didymator genome, presenting several lines of evidence showing that all HdIV
416 loci are amplified in calyx cells when virions are being produced, and identifying U16 as an essential
417 gene for amplification of all HdIV loci and virion formation. This study also highlights the critical role of
418 viral DNA amplification for IV virion production.
419 Earlier studies suggested IV proviral segment loci undergo amplification before viral segment processing
420 [18, 19]. Another study indicated amplification of a few IVSPER genes and one proviral segment located
421 in close vicinity of an IVSPER in one-day-old H. didymator adults [12]. However, the question persisted
422 regarding whether all IV genome components were amplified in calyx cells and when amplification
423 initiates during the time-course of virion production. To address these questions, we used our new
424 chromosome-level genome assembly to map domains that undergo amplification in calyx cells during
425 virion morphogenesis. Read mappings to genomic DNA extracted from H. didymator pupal stages 1 and
426 3 revealed that all HdIV genome components are simultaneously and locally amplified in calyx cells in
427 stage 3 pupae. This analysis further identified five proviral segments and five IV replication genes
428 located outside of IVSPERs that were previously unknown, resulting in a total of 67 HdIV proviral loci
429 dispersed among the 12 H. didymator chromosomes. To elucidate the time-course of HdIV loci
430 replication, the amplification of a subset of IV genome components was analyzed by qPCR. Our results
431 show that HdIV loci amplification initiates between stage 1 and stage 2 pupae and reaches its maximum
432 in stage 4 pupae. The temporal pattern observed in H. didymator is similar to BV-associated braconids.
433 In the braconid wasp Chelonus inanitus, where the amplification kinetics of two proviral segments have
434 been studied, local chromosomal amplification does not occur in the initial stages of pupal development
435 [36]. Instead, it is preceded by an increase in DNA content through endoreduplication [37]. The question
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
16
436 of whether calyx cell nuclei undergo polyploidization before local DNA amplification occurs in the case
437 of H. didymator has yet to be investigated. Collectively, our results indicate DNA amplification of IV
438 genome components constitutes one of the initial steps of virion morphogenesis.
439 Our data indicate all HdIV loci and genes located outside of IVSPERs are amplified with non-discrete
440 boundaries that extend variable distances into flanking wasp DNA. In contrast to certain integrated
441 viruses, such as polyomaviruses, which can be amplified in an "onion skin" type of replication with
442 replication forks terminating at discrete boundaries [38], IVSPER amplification more closely resembles
443 the local amplification observed in Drosophila follicle cells. In Drosophila, six loci corresponding to
444 chorion genes or genes related to oogenesis are amplified in large regions of about 100 Kbp beyond
445 the genes themselves, without discrete termination sites [39, 40]. Similar to IVSPERs, levels of DNA
446 amplification in Drosophila follicle cells vary among different amplicons [40, 41]. In Drosophila follicle
447 cells, amplification of these loci is associated with repeated firing of origins of replication (ORs)
448 interspersed within each gene cluster. This results in overlapping bidirectional replication forks
449 progressing outward on either side of the ORs [41]. These similarities between the pattern of DNA
450 amplification of Drosophila genes and H. didymator proviral loci suggest that IVSPERs and IV replication
451 genes may also be amplified through repeated firing of ORs present within the loci. However, additional
452 approaches, such as nascent strand sequencing based on λ-exonuclease enrichment [42], will be
453 necessary to identify ORs within IV genome components and validate this hypothesis.
454 Amplification of proviral segment loci is further characterized by a significant increase in read coverage
455 at the Direct Repeat (DR) positions bordering the proviral segments, which serve as sites for
456 homologous recombination and circularization of the segments. This suggests that a portion of the rapid
457 increase in read coverage is due to reads mapping to amplification intermediates and circularized
458 segments. The presence of circular forms in the sequenced genomic DNA samples is supported by our
459 qPCR results for segment Hd29, which indicate the presence of amplicons specific to the circular form
460 of Hd29 (Fig 7B). Accurately quantifying the proportion of reads mapping to the chromosomal form of
461 HdIV segments, and estimating the actual extent of local DNA amplification presents a challenge. This
462 is because paired-end reads that align within HdIV segment loci cannot discriminate between
463 chromosomal HdIV DNA, potential replication intermediates, or circularized DNA. Nevertheless,
464 considering the observed pattern of amplification in regions containing both IVSPERs and segments
465 (Fig 3B), we propose that proviral segment loci may undergo amplification similar to IVSPERs or HdIV
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
17
466 replication gene loci. The question persists regarding the subsequent processing of chromosomally
467 amplified DNA and the mechanism behind the generation of a large number of circular molecules. The
468 short-read data generated in this study have several limitations in characterizing whether amplification
469 of proviral segment loci generates concatemeric intermediates and, if so, their orientation. Long-read
470 data will be necessary to address these questions. Nonetheless, our results suggest HdIV proviral
471 segment amplification involves both local chromosomal amplification and amplification of intermediates
472 related to producing the circular dsDNAs that are packaged into capsids.
473 Our interest in U16 stemmed from previous results indicating it is transcriptionally upregulated in calyx
474 cells before the appearance of envelope and capsid components [16]. Sequence analysis during this
475 study revealed a PriCT-2 domain in U16, known from primases in herpesviruses, whose function is
476 unknown but may facilitate the association of the large primase domain (AEP) with DNA [31, 43].
477 Although other known primase domains were not identified in the U16 sequence, the presence of a
478 PriCT-2 domain suggested this protein might play a role in the replication of HdIV genome components.
479 Additionally, our RNAi experiments demonstrate that U16 knockdown resulted in the complete absence
480 of virion production in calyx cell nuclei and calyx fluid. These observations indicated an essential role
481 for U16 in the early stages of viral replication, potentially involved in the amplification of HdIV genome
482 components and/or the transcriptional regulation of IV replication genes. Subsequently, we analyzed
483 the genome-wide impact of RNAi knockdown of U16 on HdIV loci amplification, revealing that this gene
484 is crucial for the amplification of all H. didymator IV genome components. In the case of IV replication
485 genes, reduced amplification was accompanied by a simultaneous significant reduction in transcript
486 abundance, likely resulting in insufficient amounts of HdIV structural proteins. However, amplification
487 and transcription abundance levels did not fully correlate with each other. For instance, U11 and IVSP3-
488 1 (both located on IVSPER-2) exhibit similar amplification patterns (Fig 1), but earlier findings showed
489 that transcript abundances were not the same in calyx cells [15]. Thus, differences in gene expression
490 observed among genes located within the same amplified regions (Fig 1) could also be affected by
491 promoter strength or other factors. On the other hand, inhibition of proviral segment loci amplification
492 had consequences for the abundance of the circularized dsDNA that are packaged into capsids, which
493 were drastically reduced. Thus, our results identify U16 as an essential protein for virion morphogenesis.
494 However, its precise role in viral replication remains to be understood. Questions to be addressed in the
495 future include whether U16 acts at the initiation or elongation step of HdIV DNA replication, whether it
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
18
496 interacts directly with DNA, or with proteins from the replisome complex, which itself could be composed
497 of a mixture of HdIV and wasp proteins.
498 BVs share some features with IVs but also exhibit differences. Notably, in contrast to IVs, where most
499 core genes with functions in virion morphogenesis reside in IVSPERs, many BV core replication genes
500 are widely dispersed in the genomes of wasps [44, 45, 46] and are not amplified in calyx cells during
501 virion morphogenesis [47]. However, the genomes of some BV-producing wasps do contain a ~400 kb
502 DNA domain in which several nudiviral core genes are located, known as the nudivirus-like cluster. This
503 feature potentially identifies a site where the nudivirus ancestor of BVs integrated into the common
504 ancestor of microgastroid braconids [9]. Notably, the nudivirus-like cluster is amplified with non-discrete
505 boundaries [47], similar to what is reported for IV genome components in this study. The observed
506 similarity in the amplification pattern between the BV nudivirus cluster and the proviral components of
507 IVs could suggest they are amplified through a common mechanism, even though the molecules
508 involved differ.
509 BV genomes also contain proviral segment loci with boundaries defined by flanking DRs and amplified
510 in regions that include flanking regions outside of each DR. However, unlike IV proviral segments, the
511 amplified flanking regions in BVs contain very precise nucleotide junctions that identify the boundaries
512 of amplification [47, 48]. It is also known that some BV proviral segments are amplified as head-to-tail
513 concatemers, consistent with a rolling circle amplification mechanism, while others are amplified as
514 head-to-head and tail-to-tail concatemers, suggesting amplification by different mechanisms. However,
515 all of these concatemers are similarly processed into circular DNAs by recombination at a precise site
516 within DRs, which is a tetramer conserved in all BV segments [47, 48]. Nudiviral genes encoding tyrosine
517 recombinases are further known to mediate this homologous recombination event [25, 26]. These types
518 of molecules could also be present in IV genomes and need to be discovered. Currently, a detailed
519 comparison between BV and IV proviral segment amplification is challenging and will require more
520 information about the machinery involved in the processing of IV proviral segments into circular dsDNAs
521 that are packaged into capsids.
522 Collectively, our results identify U16 as a gene deriving from the IV ancestor that is required for HdIV
523 DNA replication. This suggests that viral regulatory factors required for DNA amplification other than
524 U16 have been preserved in parasitoid genomes. U16 may also interact with wasp cellular machinery
525 in regulating DNA amplification, virion morphogenesis or both. Furthermore, this work emphasizes the
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
19
526 value of studying original endogenized viruses, such as those found in parasitoids, to unveil new
527 regulators of DNA processing.
528 Materials and Methods
529 Insects. H. didymator was reared as previously outlined by [49]. Female pupae obtained from cocoons
530 were staged using pigmentation patterns: stage 1, corresponding to hyaline pupae (approximately 3-
531 day-old pupae); stage 2, had a pigmented thorax (4-day-old); stage 3, had a pigmented thorax and
532 abdomen (5-day-old); stage 4, were pharate adults just before emergence.
533 Dovetail Omni-C Library Preparation and Sequencing. DNA from 10 male offspring (i.e., haploid
534 genomes) from a single female H. didymator was sent on dry ice to Dovetail Genomics for Omni-C™
535 library construction. In the process of constructing the Dovetail Omni-C library, chromatin was fixed in
536 place within the nucleus using formaldehyde and subsequently extracted. The fixed chromatin was
537 digested with DNAse I followed by repair of chromatin ends and ligation to a biotinylated bridge adapter.
538 Proximity ligation of adapter-containing ends ensued. Post-proximity ligation, crosslinks were reversed,
539 and the DNA was purified. The purified DNA underwent treatment to eliminate biotin not internal to
540 ligated fragments. Sequencing libraries were generated utilizing NEBNext Ultra enzymes and Illumina-
541 compatible adapters. Fragments containing biotin were isolated using streptavidin beads before PCR
542 enrichment of each library. The library was sequenced using the Illumina HiSeqX platform, which
543 generated approximately 30x coverage. Subsequently, HiRise utilized reads with a mapping quality
544 greater than 50 (MQ>50) for scaffolding purposes.
545 Scaffolding the Assembly with HiRise. The de novo assembly from [15], and the Dovetail OmniC
546 library reads served as input data for HiRise, a specialized software pipeline designed for leveraging
547 proximity ligation data to scaffold genome assemblies, as outlined by [50]. The sequences from the
548 Dovetail OmniC library were aligned to the initial draft assembly using the bwa tool (available at
549 https://github.com/lh3/bwa). HiRise then analyzed the separations of Dovetail OmniC read pairs mapped
550 within the draft scaffolds. This analysis generated a likelihood model for the genomic distance between
551 read pairs. The model was subsequently employed to identify and rectify putative misjoins, score
552 potential joins, and execute joins above a specified threshold. A contact map was generated from a
553 BAM file by utilizing read pairs where both ends were aligned with a mapping quality of 60.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
20
554 Genomic DNA (gDNA) extraction for high throughput sequencing. Comparative analysis of two
555 pupal stages. Genomic DNA (gDNA) was extracted from pooled calyx samples dissected from H.
556 didymator female pupae at stage 1 (~60 females) and stage 3 (~50 females). Since the aim was to
557 compare the two developmental pupal stages, a single replicate was done for each stage. Impact of
558 U16 knockdown. Genomic DNA from calyces was collected from stage 3 female pupae that were
559 injected with dsGFP and dsU16. This experiment involved three biological replicates, each
560 corresponding to 30 to 50 calyx samples. Genomic DNA was extracted using the phenol-chloroform
561 method. Briefly, calyx samples were incubated in proteinase K (Ambion, 0.5 μg/μl) and Sarkosyl
562 detergent (Sigma, 20%), followed by treatment with RNAse (Promega, 0.3 μg/μl). Total genomic DNA
563 was then extracted through phenol-chloroform extraction and ethanol precipitation. Following extraction,
564 gDNA was quantified using a QBIT fluorometer (ThermoFisher) and subsequently sent for sequencing
565 to Genewiz/Azenta company. Paired-end sequencing was carried out using Illumina technology and
566 NovaSeq 2x150bp platform.
567 NGS data analyses. Illumina reads were aligned to the updated version of the H. didymator genome
568 using bwa mem [51], version 0.7.17, with default parameters. Subsequently, the aligned reads were
569 converted to BAM files utilizing samtools view (version 1.15) [52].
570 Prediction of the amplified regions. Amplification peaks were identified using MACS2 [30] by comparing
571 the pupal stage 3 alignment file as treatment and the pupal stage 1 alignment file as control. The
572 specified parameters for this analysis were: --broad --nomodel -g 1.8e8 -q 0.01 --min-length 5000. Out
573 of the 165 predicted peaks (i.e., amplified regions), only those with a fold change (FC) higher than 2
574 were retained for further analyses, resulting in a total of 59 peaks. These 59 peaks encompassed all
575 known proviral loci, except for Hd40, which had a slightly lower value than the specified threshold
576 (FC=1.9), and Hd45.1 and Hd2-like, located too close to the scaffold end and potentially missed. For
577 the predicted peaks with FC>2 that did not correspond to known proviral loci, a manual curation was
578 performed to determine whether these regions corresponded to HdIV loci. Proviral segments were
579 identified by their flanking direct repeats (DRs) and gene contents, specifically the presence of genes
580 belonging to IV segment conserved gene families. To identify putative core IV replication genes, genes
581 present in the MACS2 peak were analyzed. Only those with no similarity to wasp proteins and that were
582 transcribed in calyx cells (based on the available transcriptome from [16]) were retained.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
21
583 Read coverage per proviral region (HdIV locus or amplified region). Raw read counts were determined
584 for each proviral region using featureCounts [53] from the Subread package (version 2.0.1) with the
585 parameters (-c -P -s 0 -O). Subsequently, coverage values were computed with a custom script available
586 at https://github.com/flegeai/EVE_amplification. Coverage values for each region were calculated by
587 dividing the number of fragments mapped to the region by the size of the region (expressed in kilobase
588 pairs, kbp), and further normalized by the depth of the library (expressed in million reads). These
589 coverages were computed for various types of genomic regions, including each locus (IVSPERs, IV
590 replication genes outside IVSPERs, proviral segments), each MACS2-detected amplified region, and
591 for each pupal stage (stage 1, St1 and stage 3, St3), as well as for each experiment (dsGFP and dsU16)
592 and each replicate.
593 Genome coverages per position on H. didymator scaffolds (Counts per Million, CPM) and Maximal value
594 of amplification per proviral locus. Genome coverages per position in 10 bp bins were acquired using
595 the BamCoverage tool from the deeptools package [54] with the options: --normalizeUsing CPM and -
596 bs 10. Subsequently, for each 10 bp bin, the pupal stage 3 (St3) versus stage 1 (St1) ratio was computed
597 through an in-house script available at https://github.com/flegeai/EVE_amplification. This script utilized
598 the pyBigWig python library from deeptools [54]. To determine the maximal counts per million (CPM) at
599 each stage for every proviral locus, an in-house script importing the pyBigWig python library was
600 employed. The maximum CPM value for the "stage 3 / stage 1" ratio was then calculated based on the
601 10 bp bin bigwig file, specifically for the position displaying the highest CPM value at stage 3 (summit).
602 Comparison of read coverages between HdIV loci and the rest of the wasp genome. One hundred sets
603 of random regions, each mimicking the size distribution of HdIV loci, were generated using the shuffle
604 tool from bedtools version 2.27 [55]. This was achieved by utilizing the bed file of HdIV loci (56 for
605 proviral segments and 11 for IVSPERs) as parameters for the shuffle tool. Raw read counts for these
606 randomly generated regions were computed in the same manner as for proviral regions, employing
607 featureCounts [53] from the Subread package (version 2.0.1) with the parameters (-c -P -s 0 -O).
608 Subsequently, coverage values per region were calculated using the same methodology as described
609 earlier, with an in-house script available at https://github.com/flegeai/EVE_amplification.
610 Search for motifs at the HdIV amplified regions boundaries. The MEME suite [56] was employed for
611 analyses using default parameters and a search for six motifs. A dataset comprising a total of 110
612 sequences, each spanning 1,000 nucleotides on both sides of the start and end positions of the 55 HdIV
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
22
613 amplified regions predicted by the MACS2 algorithm, was utilized for this analysis. As a control, a parallel
614 analysis was conducted using 110 sequences, each 2,000 nucleotides in length, randomly selected from
615 locations within the H. didymator genome but outside the HdIV loci. This control dataset allowed for the
616 comparison of motif patterns between the HdIV amplified regions and randomly chosen genomic
617 regions.
618 Genomic DNA extraction for gDNA amplification analyses by quantitative real-time PCR. To
619 assess the level of DNA amplification, total genomic DNA (gDNA) was extracted using the DNeasy
620 Blood & Tissue Kit (Qiagen) following the manufacturer's protocol. Ovaries (ovarioles removed) and hind
621 legs, representing the negative control, were dissected from ten pupae at four different stages. Three
622 replicates were generated for each pupal stage. Quantification of target gene amplification was
623 conducted through quantitative PCR, utilizing LightCycler® 480 SYBR Green I Master Mix (Roche) in
624 384-well plates (Roche). The total reaction volume per well was 3 µl, comprising 1.75 µl of the reaction
625 mix (1.49 µl SYBR Green I Master Mix, 0.1 µl nuclease-free water, and 0.16 µl diluted primer), and 1.25
626 µl of each gDNA sample diluted to achieve a concentration of 1.2 ng/µl. Primers used are listed in S4
627 Table. The gDNA levels corresponding to the viral genes and the housekeeping wasp gene (elongation
628 factor (ELF-1)) were determined using the LightCycler 480 System (Roche). The cycling conditions
629 involved heating at 95°C for 10 min, followed by 45 cycles of 95°C for 10 s, 58°C for 10 s, and 72°C for
630 10 s. Each sample was evaluated in triplicate. The obtained DNA levels were normalized with respect
631 to the wasp gene ELF-1. Raw data are provided in S3 Dataset.
632 Total RNA extraction. Total RNA was extracted from ovaries (ovarioles removed) dissected from pupae
633 at different stages using the Qiagen RNeasy extraction kit in accordance with the manufacturer's
634 protocol. To control for gene silencing, total RNAs were also extracted from individual adult wasp
635 abdomens (2 to 4 days old). For this, Trizol reagent (Ambion) was initially used followed by extraction
636 using the NucleoSpin® RNA kit (Macherey-Nagel). Isolated RNA was then subjected to DNase
637 treatment using the TURBO DNA-free Kit (Life Technologies) to assure removal of any residual genomic
638 DNA from the RNA samples.
639 Protein sequence analyses. Conserved domains of U16 were identified using the CD-search tool
640 available through NCBI's conserved domain database resource [57, 58]. Subcellular localization
641 predictions were made using the DeepLoc - 2.0 tool, a deep learning-based approach for predicting the
642 subcellular localization of eukaryotic proteins [59]. For multiple sequence alignment, CLUSTAL Omega
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
23
643 (version 1.2.4) was employed [60]. Structure predictions for U16 were carried out using the MPI
644 Bioinformatics Toolkit [61].
645 RNA interference (RNAi). Gene-specific double-stranded RNA (dsRNA) used for RNAi experiments
646 was prepared using the T7 RiboMAX™ Express RNAi System (Promega). Initially, a 350-450 bp
647 fragment corresponding to the U16 sequence was cloned into the double T7 vector L4440 (a gift from
648 Andrew Fire, Addgene plasmid # 1654). Subsequently, an in vitro transcription template DNA was PCR
649 amplified with a T7 primer, and this template was used to synthesize sense and antisense RNA strands
650 with T7 RNA polymerase at 37°C for 5 hours. The primers used for dsRNA production are listed in S4
651 Table. After annealing and DNase treatment using the TURBO DNA-free Kit (Life Technologies), the
652 purified dsRNAs were resuspended in nuclease-free water, quantified using a NanoDrop ND-1000
653 Spectrophotometer (Thermo Scientific), and examined by agarose gel electrophoresis to ensure their
654 integrity. Injections were performed in less than one-day-old female pupae using a microinjector
655 (Fentojet® Express, Eppendorf®) and a micromanipulator (Narishige®). Approximately 0.3-0.6 μl of 500
656 ng/μl dsRNA was injected into each individual. Control wasps were injected with a non-specific dsRNA
657 homologous to the green fluorescent protein (GFP) gene. Treated pupae were kept in an incubator until
658 adult emergence, which occurred approximately 5 days after injection.
659 Transmission electron microscopy. Ovaries were dissected from adult wasps between 2 and 3 days
660 after emergence, following the procedures outlined in [17]. To ensure consistency of the observed
661 phenotype, at least three females (taken at different microinjection dates) were observed for each tested
662 dsRNA. For transmission electron microscopy (TEM) observations, calyces were fixed in a solution of
663 2% glutaraldehyde in PBS for 2 hours and then post-fixed in 2% osmium tetroxide in the same buffer
664 for 1 hour. Tissues were subsequently bulk-stained for 2 hours in a 5% aqueous uranyl acetate solution,
665 dehydrated in ethanol, and embedded in EM812 resin (EMS). Ultrathin sections were double-stained
666 with Uranyless (DeltaMicroscopy) and lead citrate before examination under a Jeol 1200 EXII electron
667 microscope at 100 kV (MEA Platform, University of Montpellier). Images were captured with an EMSIS
668 Olympus Quemesa 11 Megapixels camera and analyzed using ImageJ software [62].
669 Reverse-transcriptase quantitative real-time PCR (RT-qPCR). For RT-qPCR assays, 400 ng of total
670 RNA was reverse-transcribed using the SuperScript III Reverse Transcriptase kit (Life Technologies)
671 and oligo(dT)15 primer (Promega). The mRNA transcript levels of selected IVSPER genes were
672 measured by quantitative reverse transcription-PCR (qRT-PCR) using a LightCycler® 480 System
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
24
673 (Roche) and SYBR Green I Master Mix (Roche). Expression levels were normalized relative to a
674 housekeeping wasp gene (elongation factor 1 ELF-1). Each sample was evaluated in triplicate, and the
675 total reaction volume per well was 3 µl, including 0.5 µM of each primer and cDNA corresponding to
676 0.88 ng of total RNA. The amplification program consisted of an initial step at 95°C for 10 min, followed
677 by 45 cycles of 95°C for 10 s, 58°C for 10 s, and 72°C for 10 s. The primers used for this analysis are
678 listed in S4 Table.
679 qPCR data analysis. Data were acquired using Light-Cycler® 480 software. PCR amplification
680 efficiency (E) for each primer pair was determined by linear regression of a dilution series (5x) of the
681 cDNA pool. Relative expression, using the housekeeping gene ELF-1 as a reference, was calculated
682 through advanced relative quantification (Efficiency method) software provided by Light-Cycler® 480
683 software. For statistical analyses, Levene’s and Shapiro-Wilk tests were employed to verify homogeneity
684 of variance and normal distribution of data among the tested groups. Differences in gene relative
685 expression between developmental stages and between dsGFP and dsU16-injected females were
686 assessed using a two-tailed unpaired t-test for group comparison. In cases where homogeneity of
687 variance was not assumed, the Welch-test was used to compare gene relative expression between
688 groups. A p-value < 0.05 was considered significant. All statistical analyses were conducted using R
689 [63]. Detailed statistical analyses of qPCR results are provided in S3 Dataset.
690 Data availability. The datasets supporting the conclusions in this article are accessible at the NCBI
691 Sequence Read Archive (SRA) under the Bioproject accession number PRJNA589497. Additionally, the
692 new version of the H. didymator genome, annotation, alignments of reads, and coverage information
693 can be found at BIPAA (https://bipaa.genouest.org/sp/hyposoter_didymator/). Raw data and statistical
694 analyses for all the qPCR analyses are provided in S3 Dataset. Furthermore, sequencing raw data, read
695 coverage analyses, statistical analyses, and in-house scripts are available at
696 https://github.com/flegeai/EVE_amplification.
697
698 Acknowledgments
699 The insects used in the experiments were provided by Raphaël BOUSQUET and Gaétan CLABOTS
700 from the DGIMI insect rearing facility. All RNAi experiments were conducted in the insect quarantine
701 platform (PIQ) of DGIMI lab, which is a member of the Montpellier Vectopole Sud network
702 (https://www.vectopole-sud.fr/). Microscopy observations were facilitated by the Montpellier MEA
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
25
703 platform (https://mea.edu.umontpellier.fr/). All qPCR analyses were performed with the assistance of
704 the Montpellier Genomix qPHD platform (http://www.pbs.univ-montp2.fr/).
705
706 References
707 1. Katzourakis A, Gifford RJ. Endogenous viral elements in animal genomes. PLoS Genet. 2010 Nov
708 18;6(11):e1001191. doi: 10.1371/journal.pgen.1001191.
709 2. Kryukov K, Ueda MT, Imanishi T, Nakagawa S. Systematic survey of non-retroviral virus-like
710 elements in eukaryotic genomes. Virus Res. 2019 Mar;262:30-36. doi:
711 10.1016/j.virusres.2018.02.002.
712 3. Frank JA, Feschotte C. Co-option of endogenous viral sequences for host cell function. Curr Opin
713 Virol. 2017 Aug;25:81-89. doi: 10.1016/j.coviro.2017.07.021.
714 4. Feschotte C, Gilbert C. Endogenous viruses: insights into viral evolution and impact on host biology.
715 Nat Rev Genet. 2012;13(4):283-296. doi: 10.1038/nrg3199.
716 5. Gilbert C, Feschotte C. Genomic fossils calibrate the long-term evolution of hepadnaviruses. PLoS
717 Biol. 2010 Sep;8(9):e1000495. doi: 10.1371/journal.pbio.1000495.
718 6. Drezen JM, Bézier A, Burke GR, Strand MR. Bracoviruses, ichnoviruses, and virus-like particles
719 from parasitoid wasps retain many features of their virus ancestors. Curr Opin Insect Sci. 2022
720 Feb;49:93-100. doi: 10.1016/j.cois.2021.12.003.
721 7. Stoltz DB, Vinson SB. Viruses and parasitism in insects. Adv Virus Res. 1979;24:125-71. doi:
722 10.1016/s0065-3527(08)60393-0.
723 8. Webb BA, Strand MR. The biology and genomics of polydnaviruses. In: Comprehensive Molecular
724 Insect Science, Vol. 6, ed. K Iatrou, S Gill, pp. 323–60. Amsterdam: Pergamon. 2005.
725 9. Bézier A, Annaheim M, Herbinière J, Wetterwald C, Gyapay G, Bernard-Samain S, et al.
726 Polydnaviruses of braconid wasps derive from an ancestral nudivirus. Science. 2009 Feb
727 13;323(5916):926-30. doi: 10.1126/science.1166788.
728 10. Pichon A, Bézier A, Urbach S, Aury JM, Jouan V, Ravallec M, et al. Recurrent DNA virus
729 domestication leading to different parasite virulence strategies. Sci Adv. 2015 Nov
730 27;1(10):e1501150. doi: 10.1126/sciadv.1501150.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
26
731 11. Burke GR. Common themes in three independently derived endogenous nudivirus elements in
732 parasitoid wasps. Curr Opin Insect Sci. 2019 Apr;32:28-35. doi: 10.1016/j.cois.2018.10.005. Epub
733 2018 Oct 23. PMID: 31113628.
734 12. Volkoff AN, Jouan V, Urbach S, Samain S, Bergoin M, Wincker P, et al. Analysis of virion structural
735 components reveals vestiges of the ancestral ichnovirus genome. PLoS Pathog. 2010 May
736 27;6(5):e1000923. doi: 10.1371/journal.ppat.1000923.
737 13. Béliveau C, Cohen A, Stewart D, Periquet G, Djoumad A, Kuhn L, et al. Genomic and Proteomic
738 Analyses Indicate that Banchine and Campoplegine Polydnaviruses Have Similar, if Not Identical,
739 Viral Ancestors. J Virol. 2015 Sep;89(17):8909-21. doi: 10.1128/JVI.01001-15.
740 14. Volkoff A-N, Huguet E. Polydnaviruses (Polydnaviridae). In: Bamford DH, Zuckerman M, editors.
741 Encyclopedia of Virology (Fourth Edition). Academic Press, Oxford; 2021. pp. 849-857. DOI:
742 10.1016/B978-0-12-809633-8.21556-2.
743 15. Legeai F, Santos BF, Robin S, Bretaudeau A, Dikow RB, Lemaitre C, et al. Genomic architecture
744 of endogenous ichnoviruses reveals distinct evolutionary pathways leading to virus domestication
745 in parasitic wasps. BMC Biol. 2020 Jul 24;18(1):89. doi: 10.1186/s12915-020-00822-3.
746 16. Lorenzi A, Ravallec M, Eychenne M, Jouan V, Robin S, Darboux I, et al. RNA interference identifies
747 domesticated viral genes involved in assembly and trafficking of virus-derived particles in
748 ichneumonid wasps. PLoS Pathog. 2019 Dec 13;15(12):e1008210. doi:
749 10.1371/journal.ppat.1008210.
750 17. Robin S, Ravallec M, Frayssinet M, Whitfield J, Jouan V, Legeai F, et al. Evidence for an ichnovirus
751 machinery in parasitoids of coleopteran larvae. Virus Res. 2019;263: 189–206. doi:
752 10.1016/j.virusres.2019.02.001.
753 18. Cui L, Webb BA. Homologous sequences in the Campoletis sonorensis polydnavirus genome are
754 implicated in replication and nesting of the W segment family. J Virol. 1997 Nov;71(11):8504-13.
755 doi: 10.1128/JVI.71.11.8504-8513.1997.
756 19. Rattanadechakul W, Webb BA. Characterization of Campoletis sonorensis ichnovirus unique
757 segment B and excision locus structure. J Insect Physiol. 2003 May;49(5):523-32. doi:
758 10.1016/s0022-1910(03)00053-2.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
27
759 20. Blissard GW, Smith OP, Summers MD. Two related viral genes are located on a single superhelical
760 DNA segment of the multipartite Campoletis sonorensis virus genome. Virology. 1987
761 Sep;160(1):120-34. doi: 10.1016/0042-6822(87)90052-3.
762 21. Theilmann DA, Summers MD. Molecular analysis of Campoletis sonorensis virus DNA in the
763 lepidopteran host Heliothis virescens. J Gen Virol. 1986 Sep;67(Pt 9):1961-9. doi: 10.1099/0022-
764 1317-67-9-1961.
765 22. Webb BA, Strand MR, Dickey SE, Beck MH, Hilgarth RS, Barney WE, et al. Polydnavirus genomes
766 reflect their dual roles as mutualists and pathogens. Virology. 2006 Mar 30;347(1):160-74. doi:
767 10.1016/j.virol.2005.11.010.
768 23. Dorémus T, Cousserans F, Gyapay G, Jouan V, Milano P, Wajnberg E, et al. Extensive
769 transcription analysis of the Hyposoter didymator ichnovirus genome in permissive and non-
770 permissive lepidopteran host species. PLoS One. 2014 Aug 12;9(8):e104072. doi:
771 10.1371/journal.pone.0104072.
772 24. Volkoff AN, Ravallec M, Bossy JP, Cerutti P, Rocher J, Cerutti M, Devauchelle G. The replication
773 of Hyposoter didymator polydnavirus: Cytopathology of the calyx cells in the parasitoid. Biology of
774 the Cell. 1995;83(1):1-13.
775 25. Burke GR, Thomas SA, Eum JH, Strand MR. Mutualistic polydnaviruses share essential replication
776 gene functions with pathogenic ancestors. PLoS Pathog. 2013;9(5):e1003348. doi:
777 10.1371/journal.ppat.1003348.
778 26. Lorenzi A, Arvin MJ, Burke GR, Strand MR. Functional characterization of Microplitis demolitor
779 bracovirus genes that encode nucleocapsid components. J Virol. 2023 Oct 25:e0081723. doi:
780 10.1128/jvi.00817-23.
781 27. Rocher J, Ravallec M, Barry P, Volkoff AN, Ray D, Devauchelle G, Duonor-Cérutti M. Establishment
782 of cell lines from the wasp Hyposoter didymator (Hym., Ichneumonidae) containing the symbiotic
783 polydnavirus H. didymator ichnovirus. J Gen Virol. 2004 Apr;85(Pt 4):863-868. doi:
784 10.1099/vir.0.19713-0.
785 28. Krell PJ, Summers MD, Vinson SB. Virus with a multipartite superhelical DNA genome from the
786 ichneumonid parasitoid Campoletis sonorensis. J Virol. 1982 Sep;43(3):859-70. doi:
787 10.1128/JVI.43.3.859-870.1982.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
28
788 29. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP.
789 Integrative Genomics Viewer. Nat Biotechnol. 2011;29:24-26. doi:10.1038/nbt.1754.
790 30. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis
791 of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137.
792 31. Iyer LM, Koonin EV, Leipe DD, Aravind L. Origin and evolution of the archaeo-eukaryotic primase
793 superfamily and related palm-domain proteins: structural insights and new members. Nucleic Acids
794 Res. 2005 Jul 15;33(12):3875-96. doi: 10.1093/nar/gki702. PMID: 16027112.
795 32. Burke GR, Hines HM, Sharanowski BJ. The presence of ancient core genes reveals
796 endogenization from diverse viral ancestors in parasitoid wasps. Genome Biol Evol. 2021 Jul
797 6;13(7):evab105. doi: 10.1093/gbe/evab105. PMID: 33988720.
798 33. Beckage NE. Polydnaviruses as Endocrine Regulators. In: Beckage NE, Drezen J-M, eds.
799 Parasitoid Viruses. Academic Press; 2012. pp. 163-168 (Chapter 13). doi: 10.1016/b978-0-12-
800 384858-1.00013-8.
801 34. Strand MR. Polydnavirus gene products that interact with the host immune system. In Beckage NE,
802 Drezen J-M (eds.), Parasitoid Viruses. Elsevier. Academic Press, San Diego. 2012. pp. 149-161.
803 doi: 10.1016/B978-0-12-384858-1.00012-6.
804 35. Arvin MJ, Lorenzi A, Burke GR, Strand MR. MdBVe46 is an envelope protein that is required for
805 virion formation by Microplitis demolitor bracovirus. J Gen Virol. 2021 Mar;102(3):001565. doi:
806 10.1099/jgv.0.001565.
807 36. Marti D, Grossniklaus-Bürgin C, Wyder S, Wyler T, Lanzrein B. Ovary development and
808 polydnavirus morphogenesis in the parasitic wasp Chelonus inanitus. I. Ovary morphogenesis,
809 amplification of viral DNA and ecdysteroid titres. J Gen Virol. 2003 May;84(Pt 5):1141-1150. doi:
810 10.1099/vir.0.18832-0.
811 37. Wyler T, Lanzrein B. Ovary development and polydnavirus morphogenesis in the parasitic wasp
812 Chelonus inanitus. II. Ultrastructural analysis of calyx cell development, virion formation and
813 release. J Gen Virol. 2003;84:1151-63. doi: 10.1099/vir.0.18830-0.
814 38. Baran N, Neer A, Manor H. "Onion skin" replication of integrated polyoma virus DNA and flanking
815 sequences in polyoma-transformed rat cells: termination within a specific cellular DNA segment.
816 Proc Natl Acad Sci U S A. 1983 Jan;80(1):105-9. doi: 10.1073/pnas.80.1.105.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
29
817 39. Spradling AC. The organization and amplification of two chromosomal domains containing
818 Drosophila chorion genes. Cell. 1981 Nov;27(1 Pt 2):193-201. doi: 10.1016/0092-8674(81)90373-
819 1.
820 40. Kim JC, Nordman J, Xie F, Kashevsky H, Eng T, Li S, et al. Integrative analysis of gene amplification
821 in Drosophila follicle cells: parameters of origin activation and repression. Genes Dev. 2011 Jul
822 1;25(13):1384-98. doi: 10.1101/gad.2043111.
823 41. Tower J. Developmental gene amplification and origin regulation. Annu Rev Genet. 2004;38:273-
824 304. doi: 10.1146/annurev.genet.37.110801.143851.
825 42. Foulk MS, Urban JM, Casella C, Gerbi SA. Characterizing and controlling intrinsic biases of lambda
826 exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-
827 quadruplex motifs around a subset of human replication origins. Genome Res. 2015
828 May;25(5):725-35. doi: 10.1101/gr.183848.114.
829 43. Weller SK, Kuchta RD. The DNA helicase-primase complex as a target for herpes viral infection.
830 Expert Opin Ther Targets. 2013 Oct;17(10):1119-32. doi: 10.1517/14728222.2013.827663.
831 44. Burke GR, Walden KK, Whitfield JB, Robertson HM, Strand MR. Widespread genome
832 reorganization of an obligate virus mutualist. PLoS Genet. 2014 Sep;10(9):e1004660. doi:
833 10.1371/journal.pgen.1004660.
834 45. Gauthier J, Boulain H, van Vugt JJFA, Baudry L, Persyn E, Aury JM, et al. Chromosomal scale
835 assembly of parasitic wasp genome reveals symbiotic virus colonization. Commun Biol. 2021 Jan
836 22;4(1):104. doi: 10.1038/s42003-020-01623-8. Erratum in: Commun Biol. 2021 Jul 30;4(1):940.
837 46. Mao M, Strand MR, Burke GR. The complete genome of Chelonus insularis reveals dynamic
838 arrangement of genome components in parasitoid wasps that produce bracoviruses. J Virol. 2022
839 Mar 9;96(5):e0157321. doi: 10.1128/JVI.01573-21.
840 47. Burke GR, Simmonds TJ, Thomas SA, Strand MR. Microplitis demolitor Bracovirus proviral loci and
841 clustered replication genes exhibit distinct DNA amplification patterns during replication. J Virol.
842 2015 Sep;89(18):9511-23. doi: 10.1128/JVI.01388-15.
843 48. Louis F, Bézier A, Periquet G, Ferras C, Drezen JM, Dupuy C. The bracovirus genome of the
844 parasitoid wasp Cotesia congregata is amplified within 13 replication units, including sequences
845 not packaged in the particles. J Virol. 2013 Sep;87(17):9649-60. doi: 10.1128/JVI.00886-13.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
30
846 49. Visconti V, Eychenne M, Darboux I. Modulation of antiviral immunity by the ichnovirus HdIV in
847 Spodoptera frugiperda. Mol Immunol. 2019 Apr;108:89-101. doi: 10.1016/j.molimm.2019.02.011.
848 50. Putnam NH, O'Connell BL, Stites JC, Rice BJ, Blanchette M, Calef R, et al. Chromosome-scale
849 shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2016
850 Mar;26(3):342-50. doi: 10.1101/gr.193474.115.
851 51. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.
852 arXiv:1303.3997v2 [q-bio.GN]. doi: 10.48550/arXiv.1303.3997.
853 52. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools
854 and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008.
855 53. Liao Y, Smyth GK, Shi W. featureCounts: An efficient general-purpose program for assigning
856 sequence reads to genomic features. Bioinformatics. 2014 Apr 1;30(7):923-30. doi:
857 10.1093/bioinformatics/btt656.
858 54. Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: A next-
859 generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016 Jul
860 8;44(W1):W160-5. doi: 10.1093/nar/gkw257.
861 55. Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features.
862 Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033.
863 56. Bailey TL, Johnson J, Grant CE, Noble WS. The MEME Suite. Nucleic Acids Res. 2015 Jul
864 1;43(W1):W39-49. doi: 10.1093/nar/gkv416.
865 57. Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional
866 classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017 Jan
867 4;45(D1):D200-D203. doi: 10.1093/nar/gkw1129.
868 58. Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE: the
869 conserved domain database in 2020. Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268. doi:
870 10.1093/nar/gkz991.
871 59. Thumuluri V, Armenteros JJA, Johansen AR, Nielsen H, Winther O. DeepLoc 2.0: multi-label
872 subcellular localization prediction using protein language models. Nucleic Acids Research. 2022.
873 doi:10.1093/nar/gkac278.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
31
874 60. Madeira F, Pearce M, Tivey ARN, Basutkar P, Lee J, Edbali O, et al. Search and sequence analysis
875 tools services from EMBL-EBI in 2022. Nucleic Acids Res. 2022 Jul 5;50(W1):W276-W279. doi:
876 10.1093/nar/gkac240.
877 61. Gabler F, Nam SZ, Till S, Mirdita M, Steinegger M, Söding J, et al. Protein sequence analysis using
878 the MPI Bioinformatics Toolkit. Curr Protoc Bioinformatics. 2020 Dec;72(1):e108. doi:
879 10.1002/cpbi.108.
880 62. Rasband WS. ImageJ. National Institutes of Health, Bethesda, Maryland, USA. 1997-2018.
881 http://imagej.nih.gov/ij
882 63. R: A language and environment for statistical computing. R Foundation for Statistical Computing,
883 Vienna, Austria. R Core Team. 2023. URL https://www.R-project.org/.
884
885 Supporting information captions
886 S1 Dataset. Hyposoter didymator Hi-C genome assembly. The dataset includes: A. Figure depicting
887 the Hi‐C scaffold contact map; B. Table presenting the Hi-C scaffolds containing HdIV loci; C. Figure
888 displaying the pairwise comparisons of HdIV segments located in close proximity within the H. didymator
889 scaffolds.
890 S2 Dataset. Sequence analysis and alignment of the U16 gene from H. didymator to four other wasp
891 species that harbor IVs. The dataset includes: A. Multiple sequence alignment of U16 proteins from
892 different parasitoid species. B. Detail of the predicted secondary structure of the PricT-2 domain in the
893 H. didymator U16 protein. C. Subcellular localization of U16 predicted by DeepLoc 2.0.
894 S3 Dataset. Raw data and statistical analyses of qPCR analyses. The dataset includes raw data and
895 statistical analyses for: A. Genomic DNA amplification of IVSPER genes at four different H. didymator
896 pupal stages; B. Genomic DNA amplification of IVSPER and HdIV segment genes in dsGFP and dsU16-
897 injected wasps; C. RNA quantification of IVSPER genes in dsGFP and dsU16-injected wasps; D. DNA
898 amplification of Hd29 segment in dsGFP and dsU16-injected wasps.
899 S1 Table. Read coverage of HdIV loci on each scaffold of the H. didymator genome.
900 S2 Table. List of the peaks predicted in H. didymator genome scaffolds using MACS2 algorithm.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
32
901 S3 Table. Read coverage of HdIV amplified regions in calyx cell DNA from dsGFP- and dsU16-injected
902 female pupae.
903 S4 Table. List of primers used in the present work.
904 S1 Fig. DNA amplification patterns of HdIV loci in calyx cells of H. didymator.
905 S2 Fig. HdIV amplified regions in Scaffold-11.
906 S3 Fig. MEME analysis of boundaries of the predicted MACS2 HdIV amplified regions.
907
908
909
910
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
33
911 Author contribution
912 A. LORENZI: Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization,
913 Writing – Original Draft Preparation, Writing – Review & Editing
914 F. LEGEAI: Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization,
915 Writing – Original Draft Preparation, Writing – Review & Editing
916 V. JOUAN, P.-A. GIRARD, M. EYCHENNE, M. RAVALLEC, Investigation, Methodology, Validation
917 A. BRETAUDEAU, S. ROBIN, Data Curation
918 J. ROCHEFORT, M. VILLEGAS, Investigation
919 M. R. STRAND, G. R. BURKE, Writing – Review & Editing
920 R. REBOLLO, Funding Acquisition, Validation, Writing – Original Draft Preparation, Writing – Review &
921 Editing
922 N. NÈGRE, Conceptualization, Data curation, Funding acquisition, Investigation, Methodology,
923 Resources, Supervision, Validation, Writing – original draft, Writing – review & editing
924 A.-N. VOLKOFF, Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation,
925 Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing –
926 original draft, Writing – review & editing
927
928 Keywords: Endogenous viral element, DNA amplification, Hyposoter didymator, Ichnovirus,
929 polydnavirus, viral replication, RNA interference, co-option, co-evolution
930
931 Fundings
932 This work has been financially supported by the INRAE SPE department (EPIHYPO project) and the
933 French National Research Agency (ENDOVIRE project, #ANR-22-CE20-0005-01). The Dovetail
934 sequencing of the H. didymator genome has received funding from the European Union’s Horizon 2020
935 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 764840 for
936 the ITN IGNITE project, with Denis TAGU from IGEPP as a partner.
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 18, 2024. ; https://doi.org/10.1101/2024.01.18.576166doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.