Abstract
15
Arbuscular mycorrhizal fungi (AMF) are widespread plant symbionts that enhance nutrient 16
acquisition and influence ecosystem productivity. Previous chromosome -level assemblies of a 17
model species revealed a two -compartment genome architecture (active A and repressed B 18
chromatin compartments), yet its conservation across evolutionarily distant AMF lineages remains 19
unresolved. Here, we present a chromosome- scale and 3D genome assembly of Gigaspora 20
margarita isolate BEG34—the largest and most repeat -rich AMF genome to date—alongside that 21
of its obligate endobacterium, Candidatus Glomerobacter gigasporarum (CaGg), using PacBio 22
HiFi and Hi -C sequencing. The G. margarita genome comprises 43 chromosomes (792 Mb) 23
organized into stable A/B compartments and Topologically Associating Domains structures, 24
irrespective of the presence of endobacteria. We uncover 21 divergent rDNA operons distributed 25
across six chromosomes and show that these physically interact, suggesting conserved nucleolar 26
organization. We also reveal that t he CaGg genome is tripartite and mobilome -rich, encoding 27
prophages, an orphan CRISPR array, and complete pathways for many novel and essential 28
cofactors, including heme, which may enhance host bioenergetics. We also find that the 29
endobacterium's presence regulates transposable elements in G. margarita. These findings reveal 30
conserved principles of chromatin architecture in AMF symbionts and highlight the tight 31
molecular interplay between fungal hosts and their endosymbionts, offering new insights into 32
genome evolution and symbiotic adaptation. 33
34
35
36
37
38
39
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
2
Introduction
40
Arbuscular mycorrhizal fungi (AMF) are plant root symbionts that belong to the subphylum 41
Glomeromycotina 1. As obligate biotrophs, AMF require a living plant host to complete their life 42
cycle, leading them to colonize the cortical root cells and develop specialized tree -like structures 43
called arbuscules 2. Within these structures, the plant supplies the fungi with lipids and sugars 3,4, 44
while the AMF provides the plant with essential nutrients that are limiting factors for plant growth 45
5, primarily phosphate, resulting in improved crop yields 6,7, carbon storage 8, and enhanced 46
defence against pathogens 9. AMF are always multinucleated, with individual spores carrying up 47
to 20,000 haploid nuclei in some species 10. Although sexual reproductive structures have not yet 48
been observed in AMF, genome and single-nucleus analyses have shown that these organisms 49
carry conserved mating-related genes 11–16 and follow homokaryotic/heterokaryotic cycles 17,18, 50
which define sexual processes in fungi 19,20. Genome analyses show that AMF plant dependence 51
is likely linked to the loss of genes involved in fatty acid production, thiamine biosynthesis, sugar 52
utilization, and plant cell wall degradation 12,21. Their genomes are also highly enriched in 53
transposable elements (TEs), and closely related AMF strains exhibit striking variation in gene 54
content 22,23. The abundance of TEs in AMF is the main reason their genome assemblies have long 55
been highly fragmented, thereby hampering our understanding of their genetic structure and 56
overall genome biology. 57
This issue was recently addressed by combining long reads with chromatin capture (Hi-C) datasets, 58
which allowed the generation of chromosomal -level assemblies for model AMF R. irregularis 59
strains (order Glomerales)18,24. This approach revealed that AMF chromatin separates into two 60
compartments (A/B). The compartment A contains transcriptionally active genes, high 61
methylation of transposable elements (TEs), and most conserved core genes. In contrast, 62
compartment B harbours transcriptionally repressed genes and is rich in genes encoding secreted 63
proteins, candidate effectors, and TEs that are upregulated in planta (vs. extra-radical mycelium) 64
18,21,24–28. This stage-specific upregulation suggests that root colonization leads to the relaxation of 65
the B sub-compartments involved in the molecular dialogues between partners of the mycorrhizal 66
symbiosis 21,24. In support of this , transmission electron microscopy of the AMF Gigaspora 67
margarita shows shifts in chromatin condensation, transitioning from a tightly packed state in 68
spores to a looser state in intra-radical hyphae during plant root colonization 29. 69
In addition to being separated into A/B compartments, the available R. irregularis genomes are 70
also organized into finer -scale structural units known as Topologically Associating Domains 71
(TADs)-like structures , within which chromatin interactions occur more frequently than with 72
neighbouring regions 24,30–32. In model eukaryotes, TADs are separated by “boundaries” that act 73
as insulators, limiting epigenetic interactions between adjacent TAD s 30,31,33, and are often 74
enriched for protein-coding genes and depleted of DNA repeats 31,33,34. In contrast, in fungal 75
species, repeats were shown to dominate TAD boundaries 35, highlighting the ir functional 76
divergence and potentially higher malleability. Hi-C-based identification of A/B compartments in 77
AMF remains confined to the model AMF species R. irregularis, and independent support for their 78
existence and conservation in other AMF lineages is lacking. Similarly, although data from model 79
eukaryotes 36–38 supports the hypothesis that the AMF A/B sub-compartments may change 80
conformation in response to host and environment al factors, direct evidence based on chromatin 81
analyses that such changes occur in AMF has not yet been provided. 82
Here, we aim to f ill these knowledge gaps by sequencing the genome of G. margarita isolate 83
BEG34 (order Diversisporales) using PacBio HiFi and Hi-C data. This species contains the largest 84
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
3
and most repeat-rich AMF genome known to date 39. It also carries beneficial obligate 85
Burkholderia-like endobacteria (Candidatus Glomeribacter gigasporarum ; Ca Gg) within its 86
cytoplasm, which have been shown to impact fungal biology 40,41,42. The presence of CaGg in G. 87
margarita spores (+CaGg) and the availability of a cured line ( -CaGg), along with its large and 88
repeated genome and phylogenetic placement , collectively position G. margarita as an ideal 89
species to elucidate diversity and conservation of AMF chromosome biology and the malleability 90
of their A/B compartments and TADs. 91
Results
92
Chromosome-level assembly, annotation, and phylogenomics of G. margarita 93
We performed PacBio and Hi-C sequencing of genomic DNA extracted from G. margarita +CaGg 94
spores. Using Hifiasm with Hi-C integration mode 43, we assembled the HiFi long reads into draft 95
contigs, which were then scaffolded using Hi -C data 18,43. For G. margarita, t his approach 96
generated 43 chromosome-level scaffolds, with a genome coverage of 82X, a size of 792.14 Mb 97
and an N50 of 18.89 Mb (Table 1; Fig. 1a, b). Of these, 20 chromosomes have telomeres at both 98
ends, 20 have telomeres at only one end, and three lack telomeres entirely. Only 23 contigs (0.718 99
Mb) could not be assigned to any chromosome. This assembly represents a significant 100
improvement over the previously available datasets39 in assembly size (792.14 Mb vs. 773.10 Mb), 101
fragmentation (43 vs. 6490 scaffolds), contiguity (N50 of 18.89 Mb vs. 326.79 kb), and gene count 102
(30211 vs. 26603). 103
Table 1. Summary statistics for the genome assembly of G. margarita 104
Feature G. margarita
Genome size (Mbp) 792.14
No. of scaffolds 43
No. of genes 30,211
No. of rDNA clusters 21
Repeat content (%) (after curation) 78.28
Busco completeness (%) 95.4
GC (%) 27.77
Genome annotation identified 30,211 protein-coding genes, resulting in a BUSCO completeness 105
of 95.4% (Table 1 ). Of these, 2,438 (8.1%) were annotated as putative secreted proteins and 106
effectors, numbers higher than those reported in R. irregularis (Table S2). We confirm that G. 107
margarita lacks the hallmark “Missing Glomeromycota Core Genes (MGCGs)” 44 (Table S3) and 108
has a reduced set of carbohydrate-active enzymes (CAZymes), but shows a significant expansion 109
of CAZyme families involved in chitin metabolism (GH18, GT1, GT2, AA7, and CE4)39,45 (Table 110
S4). The annotation also uncovered cobalamin-dependent enzymes in G. margarita (Table S5), 111
supporting the hypothesis that the fungus uses cobalamin supplied by CaGg 46. The chromosome-112
level assembly confirms that all AMF with sequenced genomes carry all known meiosis-specific 113
genes, and supports that, as opposed to most AMF with sequenced genomes 21, including 114
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
4
representatives of early branching lineages 47,48, the genes composing the putative AMF mating-115
type17,18, namely a choline transporter, two homeodomain proteins (HD1- 2), and a phosphate 116
glycerate mutase, are not adjacent within members of the Gigasporaceae 16. 117
118
Figure 1. Gigaspora margarita chromosomes and genome content. (A) Karyoplots of 43 chromosomes, illustrating
rDNA genes in lime green colour and A/B compartments in violet -red and turquoise colours, respectively, within the
ideograms. (B) Genome-wide Hi-C contact map of G. margarita . The black squares represent chromosomes sorted by
size. The colour intensity corresponds to interaction frequencies between loci, with darker red indicating high contact
probability and white signifying low or no interactions.
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
5
Alongside the nuclear genome, we recovered the complete mitochondrial genome as a single 119
contig. The genome size is 96,986 bp, and it encodes core mitochondrial genes, including 19 120
protein-coding genes, 2 trans-spliced rRNA genes, and 24 tRNA genes, supporting previous 121
findings of the G. margarita mitochondrion53. 122
We took advantage of this new chromosome-level dataset to determine the evolutionary placement 123
of G. margarita relative to other AMF species via phylogenomic analyses ( Fig. S1). These 124
analyses support the sister relationship between Glomeromycotina and Mortierellomycotina 125
within the Mucoromycota phylum 1,49. It also backs the close evolutionary relationship between 126
Glomerales and Diversisporales, the paraphyletic placement of Entrophospora species 45,48,50,51, as 127
well as the early branching of Paraglomerales and Archeosporales within the AMF phylogeny. 128
Notably, in support of phylogenetic analyses based on ribosomal genes 52 and available genome 129
datasets 47, our findings highlight Paraglomerales as representing the earliest AMF phylogenetic 130
node. However, we were unable to fully reject an alternative placement of P. occultum with 131
members of the Archaeosporales. 132
The divergent rDNA operons are physically linked in G. margarita 133
The chromosome-level genome annotation also confirms that, unlike most eukaryotes, AMF carry 134
few copies of highly divergent ribosomal DNA (rDNA) operons within their genomes. The G. 135
margarita genome has 21 rDNA copies, approximately twice as many R. irregularis strains24, and 136
these vary in copy number and sequence both within and across chromosome s 3, 12, 17, 25, 34 137
and 40 (Fig. 1a, b; Table S6). The high rDNA sequence paralogy within members of Glomerales 138
was reported to cause significant taxonomic challenges, as rDNA paralogs can sometimes cluster 139
across species boundaries 54,55. We find that this issue is exacerbated in Gigaspora species, further 140
building concerns about the utility of these genes alone for AMF taxonomy 54–56. For instance, the 141
rDNA paralogs from the G. margarita genom e scatter across three clades and are shared with 142
sequences from G. decipiens, G. albida, and G . gigantea. Almost all Gigaspora species show 143
similar paraphyletic clustering based on rDNA paralogy rather than speciation (Fig. S2). 144
145
146
147
148
149
150
151
152
153
154
155
Figure 2. Model of the rDNA 3D organization within the AMF nucleolus. Based on current Hi -C data and
knowledge of rRNA transcription in model eukaryotes. Transcriptionally active rDNA repeats from different scaffolds
hypothetically interact physically in the nucleolus. The methylated rDNA copy sits outside the nucleolus.
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
6
In addition to enabling chromosome-level assemblies, the Hi-C data also revealed strong physical 156
interactions among rDNA copy regions on chromosomes 3, 12, 17, 25, 34, and 40, as evidenced 157
by visually bright signal s off the diagonal line in Hi -C maps (Fig. 1b, Table S 6). In model 158
eukaryotes, active rDNA copies physically cluster in the nucleolus to allow rDNA biogenesis, 159
while inactive copies are found at its edges or outside 57–59. A very similar pattern is seen in G. 160
margarita: most rDNA units are hypomethylated (i.e., actively transcribed), with only one unit, 161
located on chromosome 12, highly methylated, and thus possibly representing a pseudogene . 162
Notably, investigations of Hi -C data from R. irregularis strains uncovered similar r DNA 163
interactions (Fig. S3) between regions on chromosomes 9, 18, 23, and 28. Taken together , our 164
Results
indicate that AMF divergent rDNAs follow a cellular mechanism found in model 165
eukaryotes, in which active rDNA units are physically close to each other within the nucleus , 166
presumably residing within the nucleolus, while inactive units are positioned at the periphery of 167
the nucleolus or outside it (Fig. 2). 168
Endobacteria-driven regulation of transposable elements in G. margarita 169
The genome of G. margarita is known to be highly repetitive, but the true extent and diversity of 170
these repeats was likely obscured by the fragmented nature of available genome datasets 60 and a 171
lack of curated analysis of such repeats 25. Curating our chromosome-level assembly revealed that 172
78.34% of the G. margarita genome consists of repeats, of which 56.1% belong to known TE 173
families. Among these, 19.2% correspond to DNA/TIRS elements (i.e., DNA transposons), while 174
17.1% and 13.1% are LTR and LINE elements, respectively. Overall, 22.7% of the repeats remain 175
unknown and cannot be classified after manual curation ( Fig. 3), which compares to 41% in 176
previous work 39. These unknown repeats may represent expanded gene families commonly found 177
in AMF25,61, as well as TE families that have not yet been formally characterized in model species, 178
including fungi . We assessed the expansion and degeneration of TEs in G. margarita by 179
constructing their repeat landscapes based on Kimura substitution calculations ( Fig. 3) , where 180
lower Kimura substitution rates indicate recent TE insertions, while higher rates suggest older 181
insertions 61. Our analyses build on previous findings showing that TEs in Diversisporales are 182
enriched in recent and active expansions 25, with DNA/TIR elements, LINEs, and LTRs being the 183
primary contributors to recent TE bursts in G. margarita. 184
In AMF, including G. margarita, TEs are preferentially located close to promoters and genes 185
involved in molecular communication with the host (e.g., effectors) 25, which indicates their 186
potential role in regulating gene expression and maintaining successful mycorrhizal relationships. 187
Reports of significant TE upregulation in the model AMF R. irregularis following root 188
colonization supported this view 27. A more comprehensive annotation of TEs allowed a detailed 189
analysis of the differential regulation of fungal TEs in Lotus japonicus roots colonized by G. 190
margarita, with and without endobacteria. Remarkably, t he presence of the bacterial 191
endosymbiont alters the regulation of specific TE families during root colonization ( Fig. 3b, c). 192
Among these, some members of the Helitron , Maverick, and LINE families were specifically 193
upregulated only in the presence of CaCg, while members of the PLE family were upregulated 194
exclusively in the absence of CaCg. Furthermore, DIRS families were downregulated in the 195
absence of CaCg (Fig. 3c). 196
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
7
197
Figure 3. Composition and activity of transposable elements in G. margarita genome. (A) Transposable element distribution
and repeat landscape. The pie chart depicts the relative abundance of each type of sequence in the G. margarita genome. The
histogram represents the repeat landscape grouped by Kimura divergence levels (x -axis) in relation to the genome (x- axis). (B)
Heatmap for differentially expressed TE families during Lotus japonicus root colonization, comparing samples lacking the CaCg
endobacterium (-CaCg) with CaCg-containing samples (+ CaCg) in relation to spores (control). Rows represent TE families,
annotated by TE order and clustered by expression profile. (C) Number of differentially expressed TE families by order and
direction of regulation (up- or down-regulated) in colonized roots, shown separately for samples without and with CaCg.
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
8
Identification of A/B compartments and Topologically Associating Domains in G. margarita 198
Hi-C data also revealed that the G. margarita genome exhibits a checkered pattern delineating two 199
A/B compartments (Figure 4a, b; Fig. S4), as reported in all R. irregularis strains with available 200
Hi-C data 18,24. Some chromosomes carry large blocks of A or B compartments (Fig 4a), while 201
others harbour a finely interleaved, plaid-like pattern of A/B compartmentalization (Fig. 4b). 202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
Figure 4. Genome compartmentalization in G. margarita. Examples of Hi-C contact maps showing A/B compartments in G.
margarita chromosomes (A) 3 and (B) 36. The bottom track shows the first eigenvector values, identifying the compartment
interaction at 50-kb resolution. Regions that interact more frequently are visualized as brighter squares on the contact map.
Gene/repeat densities, methylation frequency and gene expression in compartments A and B of G. margarita. Boxplots showing
(C) Genes (Genes per 50 kb), (D) repeat densities (Repeats per 50 kb), (E) CpG methylation frequency, and (F) gene expression
levels (logTPM + 1) in three conditions: Spores, ERM, and IRM ( L. japonicus) in A and B compartments. Boxes show the first
quartile (25%), the middle black line (50%), and the third quartile (75%). The whiskers extend to 1.5× the box length, representing
the outliers as dots. Asterisks above the boxplots indicate significant differ ences between the A/B compartments ( p < 0.05,
Wilcoxon rank-sum test).
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
9
The G. margarita A compartment has significantly higher gene density and average gene 225
expression compared to the B compartment and contains most core genes and all r DNA operons 226
(Fig. 4a), while the B compartment is significantly enriched for secreted proteins and candidate 227
effector genes (Table S7). Although both compartments have similar TE densities (Fig. 4d), the 228
TEs in the A compartment are significantly more methylated (Fig. 4e) . Like R. irregularis 229
homokaryons24,61, G. margarita shows a strong bimodal distribution of methylation levels, with a 230
larger percentage of CpG sites either highly methylated (>8 = 58.8%) or weakly methylated (<2 = 231
38.6%). Notably, A/B compartments remain remarkably stable between the +CaCg and - CaCg 232
conditions – i.e., no significant change in checkered patterns was observed both within and among 233
chromosomes. In G. margarita, the A compartments are preferentially located within chromosome 234
cores (Fig 1 ), while B-compartments are generally found at the chromosome ends . This clear 235
distinction in A/B localization within chromosomes is not present in R. irregularis isolates 24. 236
Hi-C data analysis also revealed 1,407 TAD-like structures in G. margarita (Fig. 5a). The TADs 237
cover 92.16% of the genom e, with a median size of 420 kb (Fig. 5b), in line with reports from 238
other eukaryotes, including non-AMF fungi 30,32,62. 239
Figure 5. Topologically associating domains (TADs) -like structures in the G. margarita genome. (A) A representative section
of chromosome 2 showing examples of TADs as black triangles. The bottom green line corresponds to the insulation score (TAD
score), and the vertical black dashed lines highlight the predicted TAD. (B) Density plot showing TAD size distribution. The vertical
dashed line represents the mean value. (C -E) Line plots of (C) genes, (D) repeats, and (E) DNA methylation (CpG) at domain
boundaries and ±50 kb from boundaries. (F) Gene expression (log TPMs + 1) in boundary and non- boundary regions across three
conditions: Spores, ERM, and IRM ( L. japonicus), respectively. Boxes show the first quartile (25%), the middle black line (50%),
and the third quartile (75%). The whiskers extend to 1.5× the box length, representing the outliers as dots. Asterisks above the
boxplots indicate significant differences between the boundary and non-boundary regions (p < 0.05, Wilcoxon rank-sum test).
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
10
We identified TAD boundaries based on low insulation scores and, surprisingly, found that they 240
are gene-rich and depleted of repeats (Fig. 5c, d). The gene enrichment at boundary regions is 241
linked to low methylation levels (Fig. 5e ) and, accordingly, boundary -associated genes have 242
significantly higher expression levels across multiple life stages, including germinating spores, 243
extraradical, and intraradical mycelium (L. japonicus), compared to non- TAD boundaries ( Fig. 244
5f). Moreover, genes within the same TAD are co-expressed across life stages (Fig. S5), indicating 245
that TAD boundaries function as transcriptional hotspots in AMF. 246
The tripartite CaGg endosymbiotic genome supports genomic plasticity and reveals novel 247
pathways for enhanced stress defence and cofactor biosynthesis 248
Alongside the AMF genome, we obtained the complete genome of the CaGg endosymbiont, 249
comprising a single circular chromosome of 1,998,997 bp and two circular plasmids, referred to 250
herein as pCaGg01 (99,883 bp) and pCaGg02 (22,198 bp), with a read depth double that of the 251
chromosome (Fig. 6; Table 2). The endobacterial circular chromosome and plasmid assemblies 252
contain the expected ORI motifs and were all corroborated by uniform read coverage, as well as 253
Hi-C contact mapping (Fig. S6) and independent assemblers 63,64. 254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
Figure 6. Circular genome map of CaGg. From outside to inside: + strand clusters of orthologous genes (COGs);
coding sequences (CDS) on the + strand; rRNA and tRNA on the + strand; CDSs, rRNA, and tRNA on the – strand;
COGs on the - strand; GC content; GC skew.
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
11
Table 2. Genome statistics and mobile genetic elements (MGEs) of the CaGg chromosome and 271
associated plasmids. 272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
This new assembly recovers substantially more coding capacity and non-coding features than the 291
previously fragmented version did. Specifically, our genome assembly, totalling 2,121,078 bp, is 292
20% larger than the previous fragmented assemblies obtained with pyrosequencing and fosmid 293
libraries, which reported 3 ORIs in 4 genomes at 15x coverage across 125 contigs 46. It is also 294
complete, fully resolving previously undetected plasmid sequences. The CaGg chromosome 295
contains 2,292 protein-coding genes, 44 tRNAs, 21 non-coding RNAs, one transfer-messenger 296
RNA, and a single complete rRNA operon. The plasmid pCaGg01 has 130 coding genes and one 297
ncRNA, while pCaGg02 has 30 coding genes. A cluster of Orthologous Genes (COGs) analysis 298
reveals that the CaGg chromosome contains a high proportion of genes associated with mobilome 299
(X), while pCaGg01 is dominated by genes classified as defence (V) , and pCa Gg02 mainly 300
contains genes involved in signal transduction (T) (Fig. 6). 301
The metabolic reconstruction of the CaGg chromosome confirms previously reported core 302
metabolic modules, including the absence of phosphofructokinase ( pfk), fatty acid degradation 303
genes, reduced amino acid biosynthesis enzymes, and a complete cobalamin (B12) biosynthesis 304
Feature CaGg Chromosome pCaGg01 pCaGg02
Size (bp) 1998997 99883 22198
GC (%) 54.80 53.06 47.04
CDSs 2292 130 30
Coverage 27 55 56
MGEs
Integration/Excision
(IE)
102 2 0
Replication / Recombination / Repair (RRR) 21 0 0
Phages
(P)
22 0 0
Stability/Transfer/Defence
(STD)
47 5 0
Transfer (T) 15 8 0
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
12
pathway. The endosymbiont also encodes multiple membrane transport systems, including ABC 305
transporters and members of the major facilitator superfamily (MFS) , as well as Type II, III, and 306
IV secretion systems and complete Sec and Tat pathways 46. 307
Our improved assembly also uncovered novel and biologically relevant metabolic capabilities of 308
CaGg. These include complete pathways for coenzyme A, NAD, lipoic acid, PreQ1, and heme 309
metabolism, revealing a much broader capacity for cofactor production than previously assumed. 310
We also found a CRISPR arra y lacking a cas -motif. This orphan array may be linked to the 311
identification of phage elements in our assembly, including two intact prophages and several 312
remnants on the CaGg chromosome (Table S8), thereby explaining the remarkable enrichment of 313
Mobile Genetic Elements (MGEs) (Table 2). Remarkably, some of the MGEs are classified as 314
plasmid-origin gene families, and upon further examination, we identified several plasmid-related 315
genes in the CaGg genome, including some involved in plasmid replication, partitioning, 316
stabilization, and conjugation, altogether suggesting possible distinct plasmid integrations into the 317
CaGg genome (Table S9). None of these putative insertions result in coverage drops or changes 318
in Hi-C signals, indicating that these are not artefactual. 319
Overall, the MGEs cover 6.97% of the CaGg genome and are classified into five functional 320
categories: Integration/Excision (IE, 102), Replication/Recombination/Repair (RRR, 21), Phage 321
(P, 22), Stability/Transfer/Defense (STD, 47), and Transfer (T, 15). As observed in other 322
Glomeromycotina endosymbionts 65, IE elements are enriched in the genome, suggesting that they 323
are major contributors to genome plasticity in Glomeromycotina endosymbionts. The high 324
mobilome content prompted us to analyze type II toxin- antitoxin (TA s) modules, which are 325
thought to stabilize MGEs 66 and mediate stress -induced persistence in CaGg 67. The present 326
assembly uncovered 41 new TAs (39 chromosomal and 8 on pCa Gg01), up from 9 in earlier 327
predictions 67 (Table S10). 328
Discussion
329
In this work, we reported the first chromosome-level assembly and 3D analysis of an AMF genome 330
outside the Glomerales. This provided an unprecedented view into the biology of one of the largest 331
and most repeat-rich genomes known within the fungal kingdom and uncovered novel insights into 332
the biology and evolution of its bacterial endosymbiont. 333
A chromosomal view of a large and highly repetitive non-model AMF 334
Combining long reads with chromatin-capture methods has successfully helped assemble the large, 335
highly repetitive G. margarita genome and its endosymbiont into a complete chromosome-level 336
assembly. This allowed us to demonstrate that AMF have more chromosomes than most other 337
fungal relatives and that, with 43 chromosome-level scaffolds, G. margarita likely has the highest 338
reported chromosome count to date in the fungal kingdom. Our chromosome annotation reveals 339
that G. margarita encodes approximately 30,211 genes and retains all the hallmarks of obligate 340
biotrophy typical of AMF 48. These include a reduced set of cell wall-degrading enzymes, the lack 341
of fatty acid biosynthesis genes, the loss of thiamine synthesis genes, and the uptake of soluble 342
sugars, such as glucose, which are essential compounds they obtain directly from their hosts 12,16. 343
We also report a larger-than-expected set of proteins potentially involved in the dialogue with the 344
host 24,26,68,69, suggesting an improved ability to establish symbioses with multiple plants, and we 345
found primary evidence that G. margarita utilizes cobalamin produced by its endobacterial host, 346
CaGg. 347
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
13
Beyond gene content, the chromosome-level assembly provides a curated and comprehensive view 348
of TE diversity in a genome long recognized for its repetitiveness and fragmentation 39. While 349
previous work linked the large genome size of G. margarita primarily to LINE expansions 39, our 350
refined TE annotation uncovers a substantial contribution of Class II elements, including 351
DNA/TIR transposons and Mavericks in G. margarita genome biology. These DNA transposons 352
have been implicated in gene duplication, genome restructuring, and regulatory innovation in fungi 353
and other eukaryotes 60, suggesting that multiple TE classes , not only retrotransposons, continue 354
to contribute to genome complexity in G. margarita. Our analyses also continue to support the 355
view that, in addition to dictating genome evolution, the AMF TEs are major players in regulating 356
gene expression during colonization, presumably due to the ir localization within regulatory 357
regions and in proximity to candidate secreted proteins and effectors. It is noteworthy that different 358
TEs families are regulated in G. margarita and in R. irregularis strains during root colonization. 359
This could mirror lineage-specific regulatory adaptations in AMF and/or host plant -driven TE 360
regulations27 . It will be important to see how these elements are regulated across additional hosts 361
and conditions in future studies, particularly in the presence or absence of the endobacterium. 362
The genome analyses of a large AMF genome also reinforce the growing evidence that rDNA 363
sequence diversity is significant within individual AMF genomes. This continues to highlight the 364
challenges of using this locus alone for taxonomic resolution, as divergent rDNA copies cluster by 365
paralogy rather than by speciation, and supports the need for population genetics-based approaches 366
in AMF taxonomy 54,70. Our e vidence that some r DNA copies are likely pseudogenes further 367
exacerbates these challenges. 368
Remarkable conservation in genome biology among AMF lineages 369
In addition to sharing losses in key genes associated with obligate biotrophy and high rDNA gene 370
paralogy, the genome biology of G. margarita follows patterns conserved among distinct AMF 371
lineages. All species investigated to date with Hi -C have genomes partitioned into an A 372
compartment with high gene density, expression, and repeat methylation, and a B compartment 373
with low repeat methylation and an enrich ment in secreted and effector proteins. This spatial 374
organization of AMF chromatin likely maintains genome stability by silencing repeats in the A 375
compartment, which is rich in housekeeping genes 24 while allowing the B compartment to serve 376
as a hotspot for diversifying secreted proteins and effector encoding genes. Altogether, these 377
features continue to underpin striking analogies in the genome biology and evolution of AMF and 378
filamentous plant pathogens 71,72. 379
Despite the shared similarities, AMF genomes exhibit some notable epigenetic distinctions. For 380
example, while the R. irregularis heterokaryons show a tripartite genome-wide CpG methylation 381
distribution 18, the homokaryons and G. margarita display a bimodal methylation pattern 24,73. The 382
reasons for these differences remain unknown, although it is tempting to speculate that the nuclear 383
type (homokaryon vs. heterokaryon) influences AMF methylation distribution patterns. Our work 384
also identified notable distinctions between TADs in AMF and those of fungal relatives. For 385
example, i n the fungal species Epichloë festucae, AT-rich, repeated blocks contribute to the 386
formation of TADs 35, yet the repeat -depletion at the boundaries we observed suggests an 387
alternative mechanism for TAD establishment in AMF that is more consistent with reports from 388
other eukaryotic lineages33,34,62,74 , whereby boundary genes are co -expressed across different 389
conditions, indicating that TAD organization facilitates transcriptional coordination in these 390
prominent symbionts. 391
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
14
A model that explains the co-transcription of divergent rDNA operons in AMF 392
In most eukaryotes, rDNA genes exist as hundreds to thousands of identical copies scattered across 393
different chromosomes, yet these copies are all transcribed in a spatially coordinated manner 394
within the nucleolus 57,58,75. The localization of rDNA genes within the nucleolus has been 395
primarily studied using microscopy techniques, such as fluorescence in situ hybridization (FISH) 396
and immunofluorescence 57,76. Recently, 3C-based methods, such as Hi-C, have been employed to 397
identify these interactions at a genome -wide scale, providing an additional independent line of 398
evidence for rDNA co-localization 77. In our work, the presence of fewer, divergent rDNA copies 399
in G. margarita has likely facilitated their capture and separation by Hi-C, thereby enabling the 400
detection of their strong physical interactions, indicating that these genes co-localize, presumably 401
within the nucleolus, to ensure their co -transcription and enhance ribosome biogenesis 57,58. In 402
addition, the identification of identical signals in R. irregularis 24 shows that the co-localization of 403
rDNA is a conserved mechanism of rDNA organization and co-regulation in AMF. 404
An improved view of endosymbiotic contribution to G. margarita 405
Our complete genome assembly and comparative analysis revealed that the Ca Gg genome 406
comprises a large circular chromosome and two smaller complete circular plasmids, each with 407
distinct ORI motifs, supported by HiFi and Hi -C data and independent assemblers. As such, this 408
work clarifies previous assumptions, which suggested that the CaGg genome is organized into 2 409
to 4 genomic units ranging between 1.4 to 2.5 in size 46,78. Like other AMF endosymbionts 40,79,80, 410
the CaGg genome is smaller than that of free -living Burkholderia relatives81, resulting in limited 411
metabolic capabilities, including the inability to utilize glycolysis as an energy source and a limited 412
capacity to biosynthesize essential amino acids, underscoring the AMF host's strong dependence. 413
The close-knit relationship between CaGg and G. margarita is further emphasized by our finding 414
that some fungal TEs are upregulated only in the presence of the endobacterium, and the 415
identification of novel pathways through which CaGg may influence the host's fungal physiology. 416
In particular, the presence of a complete heme biosynthesis pathway suggests an unsuspected role 417
in supporting AMF host bioenergetics, consistent with observations in other host-endosymbiont 418
systems 82. Indeed, heme is an essential cofactor for many proteins, including cytochromes of the 419
mitochondrial electron transport chain, which are highly abundant in the G. margarita genomes . 420
As such, it is intriguing to speculate that CaGg-derived heme enhances the activity of respiratory 421
complexes in the AMF host, providing a possible explanation for the higher ATP production and 422
respiratory activity reported in endobacteria-containing G. margarita compared to the cured line 423
83. In other filamentous fungi 84, heme also plays roles in growth, development, and stress 424
adaptation, suggesting that CaGg-encoded heme production could affect multiple aspects of AMF 425
physiology beyond respiration. Future analyses will hopefully reveal how these novel 426
endobacterial genes are regulated in response to environmental changes. 427
Another surprising discovery was that CaGg carries several MGEs, including prophages, 428
expanding the viral pool within the G. margarita symbiotic system 41. While most bacteria rely on 429
CRISPR-Cas systems to defend against foreign DNA85,86, CaGg, like many endosymbionts, lacks 430
a functional CRISPR-Cas system 87,88. The presence of this orphan CRISPR array likely explains 431
the high MGE content, including prophages, in the CaGg genome, and it indicates that CRISPR -432
based defence was active in the BRE endobacterial ancestors prior to some lineages becoming 433
obligate endosymbionts. Conversely, the enrichment of toxin- antitoxin systems in CaGg might 434
represent an alternative defence strategy in the absence of CRISPR-Cas 86,89. 435
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
15
Conclusions
436
This study presents the first chromosome-scale and 3D genome assembly of a non-model AMF, 437
revealing conserved principles of chromatin architecture among AMF despite dramatic differences 438
in genome size and repeat content. Our findings confirm that A/B compartmentalization and TAD-439
like structures are fundamental features of AMF genome organization, supporting coordinated 440
gene expression and genome stability. We also uncover a novel mechanism f or rDNA co -441
localization within the nucleolus, which likely facilitates ribosome biogenesis despite extensive 442
rDNA sequence divergence—a hallmark of AMF genomes. 443
Beyond fungal chromatin biology, our work sheds light on new functional contributions of the 444
obligate endobacterium CaGg by revealing a multipartite architecture enriched in mobile genetic 445
elements, underscoring its genomic plasticity that cannot be offset by an incomplete CRISPR-Cas 446
system. Additionally, we identify novel pathways for essential cofactors , including heme and 447
cobalamin, suggesting a direct role in enhancing host bioenergetics and stress resilience. The 448
observation that CaGg presence modulates transposable element activity in G. margarita further 449
highlights a complex interplay between host and symbiont at the epigenetic and genome stability 450
levels. 451
Together, these findings point to a model in which chromatin architecture, rDNA organization, 452
and endosymbiont-derived metabolic capabilities collectively shape the biology and ecological 453
success of AMF. Future work should explore how these interactions respond to environmental 454
cues and influence plant fitness, including in early- branching lineages, to provide complete 455
insights into the molecular foundations of one of the most impactful plant symbioses. 456
Methods
457
Sample preparation for HiFi sequencing 458
Spores of Gigaspora margarita Becker and Hall (BEG 34, deposited at the European Bank of 459
Glomeromycota) containing (+CaGg) or lacking (-CaGg) the obligate endobacterium Candidatus 460
Glomeribacter gigasporarum were used in this study. The -CaGg spores were obtained from 461
+CaGg spores as described in 90. The + CaGg and -CaGg spores were isolated from clover trap 462
plants by wet sieving 91 and collected individually under a dissecting microscope. Only the 463
healthiest spores were selected and pooled in to groups of 100, then surface -sterilized with a 464
solution of Chloramine T (3% w/v) and streptomycin sulphate (0.03% w/v) in sterile distilled 465
water. A batch of +Ca Gg spores was immediately frozen in liquid nitrogen, while the remaining 466
batches (+CaGg and -CaGg) were incubated to allow germination. 467
DNA extraction and purification 468
DNA was extracted from pools of 100 sterilized and frozen +CaGg spores using the DNeasy Plant 469
Pro Kit (Qiagen). The s pores were ground in the extraction buffer using a TissueLyser 470
homogenizer (2 min at 24 Hz, repeated twice), and DNA extraction was performed according to 471
the manufacturer’s instructions, except that the PS solution was omitted and the samples were not 472
heated. DNA from two independent pools of 100 spores each was combined, and a cleanup step 473
was performed with the DNeasy PowerClean Pro Kit (Qiagen) according to the manufacturer’s 474
instructions. 475
476
477
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
16
Sample preparation and Hi-C sequencing 478
For Hi-C analysis, 2,900 +CaGg and 2,900 -CaGg sterilized spores were placed in multiwell plates 479
containing 1 mL of sterile distilled water and incubated in the dark at 30 °C for 7 days to allow 480
germination. Germinated spores and pre -symbiotic mycelium were fixed at room temperature 481
under gentle shaking for 20 min in 1% (v/v) formaldehyde in 0.01 M PBS (pH 7.2), followed by 482
an additional 15 min incubation in 0.125 M glycine in 1% formaldehyde in 0.01 M PBS (pH 7.2). 483
Spores and mycelium were then frozen in liquid nitrogen and ground with sterile micropestles to 484
a fine powder. Samples were stored at −80 °C until library preparation was performed. 485
Hi-C data was processed using the Proximo Hi -C Kit (Fungal) from Phase Genomics (Seattle, 486
WA, USA), employing four restriction enzymes: DpnII, MseI, DdeI, and HinFI. The Hi-C libraries 487
were sequenced on the Illumina NovaSeq X PE150 platform, generating 194 and 202 million reads 488
for the -CaGg and +CaGg samples, respectively (Table S1). 489
Genome assembly and Hi-C scaffolding 490
A total of 9.88 million PacBio HIFI reads were obtained on the Revio Platform at Mount Sinai 491
Hospital (Toronto) and assembled into contigs using Hifiasm v0.16.1 with the parameters -l0 (no 492
purging) and --h1 and --h2 for Hi-C data integration 93. The raw assembly was queried against the 493
NCBI nr database using Diamond blastx (v0.9.14.115; perc_identity 75, -evalue 1e-5) to detect 494
and remove contaminants, and identify mitochondria and CaGg contigs, which were annotated 495
using MitoHifi v3.2.3 94 and Bakta v1.8.2 95, respectively. 496
The remaining contigs were scaffolded using the Hi-C reads, which were first processed using the 497
Arima Genomics pipeline 96. The resulting BAM file alignments were used as input to the YaHS 498
scaffolding tool 97. The assembled scaffolds were manually curated using PretextView 98 to 499
generate chromosome-scale scaffolds and identify the unplaced contigs. A Python script was used 500
to identify telomeres ( https://github.com/Jana Sperschneider/FindTelomeres ). The final 501
chromosome-scale scaffolds were visualized using the genome- wide Hi-C heatmap generated by 502
Juicebox Assembly Tools v1.11.08 99 and through a chromosome ideogram layout using 503
RIdeogram100. 504
Repeat masking and transposable elements analysis 505
The assembled genome was submitted to RepeatModeler2 101 to generate a consensus library of 506
repetitive sequences. To better classify repeats and address the high number of unclassified 507
sequences (“Unknown”), we curated the consensus library using TEtrimmer102 and manual 508
curation, as has been previously done for other AMF species 25,27. Using this method, sequences 509
known to be expanded genes were removed. The genome masking and TEs annotation w ere 510
performed on the final library, which contains a curated consensus of the assembled genome using 511
RepeatMasker103. 512
RNA-seq data from intraradical mycelium with and without the endobacterium, from colonizing 513
Lotus japonicus roots, and from extraradical mycelium of G. margarita (PRJNA267628)104 were 514
used to assess the differential expression of TEs . The reads were first aligned to the reference 515
genome assembled in this study using HISAT2. TE expression was then evaluated using 516
TEtranscripts105, applied to the aligned reads, guided by the gene and TE annotations carried out 517
in this study. 518
519
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
17
Genome annotation 520
Gene prediction was performed on a soft-masked genome assembly using the Funannotate pipeline 521
v1.8.15 106. First, the command "funannotate train" was executed to train ab initio gene predictors 522
with previously published RNA -seq data (PRJNA751155)104. Next, the command "funannotate 523
predict" was run with the parameters --optimize_augustus and --ploidy 1. Multiple sources of 524
evidence were utilized as input during the prediction of protein- coding sequences: (1) transcript 525
assemblies ( --transcript_evidence) and alignments ( --rna_bam); (2) gene models generated by 526
PASA (--pasa_gff); and (3) protein sequences from UniProtKB and the G. margarita proteome. 527
The quality of the annotated genome was assessed using Benchmarking Universal Single -Copy 528
Orthologs v5.2.2 (BUSCO) 107. SignalP v6.0 108 was employed to predict secreted proteins with 529
parameters --organism eukarya and --mode slow. EffectorP v3.0 109 was utilized to predict effector 530
genes, and carbohydrate -active enzymes (CAZymes) were annotated using dbCAN 110,111. To 531
identify B12- dependent enzymes in G. margarita, homologs previously identified in R. 532
irregularis112 were used as BLASTP queries. 533
Phylogenetic analyses 534
To infer the phylogenetic relationships among SSU-ITS-LSU rDNA paralogs found in G. 535
margarita and rDNA copies from other Gigaspora species, a dataset was created including 536
sequences of Dentiscutata savannicola and Fuscutata heterogama as outgroups . Members of 537
Paradentiscutata and Intraornatospora, the closest genera to Gigaspora, were not chosen as 538
outgroups because they only have partial LSU sequences. Among the 21 rDNA operons of G. 539
margarita, 19 were used as overlapping the partial SSU-ITS-LSU region used for phylogenetic 540
inference. Overall, the dataset comprised 96 sequences representing eight Gigaspora species and 541
10 outgroup sequences. The dataset was aligned with the online version of MAFFT v.7 113, using 542
the E-INS-i iterative refinement method, and edited in MEGA v.5.2.2. by manual trimming of 543
overarching and misaligned ends. Maximum likelihood phylogenetic inference was carried out in 544
RAxML-NG via CIPRES Science Gateway 3.1114 and in IQ-TREE v.2.2.5 115, with 1000 bootstrap, 545
1000 ultrafast bootstrap replicates and 1000 SH-aLRT tests, respectively. Additionally, a Bayesian 546
analysis was performed in MrBayes v3.2.6 with 40 million generations and a stop rule at split 547
frequency standard deviation = 0.01. All analyses were conducted with partitions and nucleotide 548
substitution models as previously described 116. Notably, we found that most ingroup branches 549
were either unsupported or poorly supported, particularly with respect to bootstrap and posterior 550
probability values. In the Bayesian analysis, after 40 million generations, the a verage standard 551
deviation of split frequencies did not approach 0.01 but instead fluctuated around 0.055, suggesting 552
a potential lack of convergence in some bipartitions. 553
To determine the placement of G. margarita in the AMF phylogeny, phylogenomic analysis was 554
performed using the “fungi_odb10” dataset from the BUSCO v4.0 107. Profile-Hidden-Markov-555
Models corresponding to these markers were used to identify homologous sequences in G. 556
margarita and 50 additional fungal genomes, using HMMER3 v3.1b2 as implemented in the 557
PHYling pipeline (https://zenodo.org/records/ 10129968). A total of 702 conserved, single -copy 558
orthologous proteins were identified, aligned, and concatenated for downstream phylogenetic 559
inference. The phylogenetic tree was reconstructed using a maximum likelihood approach, with 560
the best-fit substitution model selected for each gene partition in IQ -TREE v.1.6.12. The final 561
analysis was based on 604,857 aligned sites, with 588,555 distinct patterns contributing to the tree 562
topology. 563
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
18
Methylation and RNA-seq analyses 564
PacBio HiFi reads with kinetic data were processed using Jasmine v3.0.0 117. The output, which 565
identified 5mC sites, was saved as ML and MM tags in the unaligned BAM file. The reads were 566
then aligned to the G. margarita genome using pbmm2 1.10.0 118 with the -preset HiFi parameter. 567
PacBio CpG tools 119 was then used to calculate CpG methylation frequency, using the options 568
pileup_calling_model. v1, tflite model, and –min-mapq 0. The resulting CpG locations were then 569
intersected with genome-wide compartments, genes, and repeat regions using bedtools v2.30.0 120 570
to determine median methylation frequencies within 50-kb windows. 571
For gene expression analysis, RNA-seq datasets from G. margarita germinating spores, ERM, and 572
intra-radical mycelium colonizing L. japonicus roots were mapped to the transcriptome using 573
Salmon v.1.3.0 121. The transcriptome was first indexed using the salmon index module with the -574
keepDuplicates option, and reads were quantified with salmon quant, specifying the -575
validateMappings parameter. The Transcripts Per Million values from the salmon output were log-576
transformed and compared between A/B compartments across the three RNA-seq datasets. 577
Compartment and topologically associating domains (TADs) analysis 578
HiC-Pro v2.11.1 122 and HicExplorer were used to process, analyze, and visualize Hi-C data. First, 579
HiC-Pro was used to map and filter +CaGg and -CaGg Hi-C reads, retaining only alignments with 580
a MAPQ score greater than 10. Contact matrices were then generated at multiple resolutions, 581
ranging from 20 kb to 100 kb in 10 kb increments. The resulting contact matrices were converted 582
to HDF5 (.h5) format using the hicConvertFormat module in HicExplorer. The eigenvector 583
decomposition implemented in the hicPCA command of Hicexplorer was used to call A/B 584
compartments from the contact maps. The first eigenvector (PC1) corresponded to the A and B 585
compartments, and the direction of the PC1 values (positive or negative) was used to determine 586
the compartment identity. Specifically, the contact matrices were manually examined per 587
chromosomal scaffold, and regions with identical positive values were labelled “1”, while those 588
with negative values were labelled “2”. Finally, to assign A/B compartments, the label associated 589
with higher average RNA-seq gene expression and gene density was assigned to compartment A, 590
while the opposite label was assigned to compartment B. 591
To assess whether the presence of an endobacteria primes compartment switches, eigenvalues from 592
contact matrices of G. margarita spores with and without endobacteria were compared. Regions 593
that showed a change in PC1 values from positive to negative , or vice versa, were considered 594
compartment switches. Additionally, the correlation of eigenvalues between the two conditions 595
was calculated to determine the degree of shifts in compartmentalization. 596
We predicted TAD bodies and boundaries using hicFindTADs from HiCExplorer at 30 kb 597
resolution, with the following parameters: --correctForMultipleTesting bonferroni and --598
thresholdComparisons 0.01. We then assessed the distribution of repetitive elements, protein-599
coding genes, and methylation levels around TAD boundaries and within ±50 kb from these 600
boundaries using BEDTools. The resulting metaplots were generated using deepTools 123. Next, 601
we compared gene expression (TPMs) patterns of boundary- associated genes and non- boundary 602
genes across three conditions: germinating spores, ERM, and in planta (lotus). Additionally, to 603
explore whether genes within the same TAD show coordinated expression, we calculated the 604
coefficient of variation for TAD and non-TAD regions. 605
CaGg genomic analyses 606
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
19
To confirm the number and circularity of the CaGg contigs, we reassembled HiFi reads using Flye 607
v2.9.663 and Unicycler v0.5.164 genome assemblers. This process resulted in three CaGg contigs, 608
all of which were verified as circular. One contig, containing core bacterial genes, was identified 609
as the chromosome, while two smaller contigs containing plasmid-associated genes were classified 610
as plasmids. The CaGg circular genome map was generated using GenoVi v0.2.16124. To identify 611
Type IV Secretion System (T4SS) and Mobilization enzymes (MOB), SecReT4.0 (https://bioinfo-612
mml.sjtu.edu.cn/SecReT4/) and MOBscan ( https://castillo.dicom.unican.es/mobscan/) were 613
employed. Prophage regions were located using Phaster (https://phaster.ca/) with default settings. 614
Phaster categorizes predicted prophage sequences as “intact” (score ≥ 90), “Questionable” (score 615
70-90), and “Incomplete” (score < 70), based on size, the presence of phage -related genes, and 616
similarity to known phages. Predicted protein sequences were assigned KEGG Orthologs (KOs) 617
and mapped to KEGG pathways with KofamKOALA ( www.genome.jp/tools/kofamkoala/). The 618
type II TA system was predicted using TAfinder 2.0 ( https://bioinfo-619
mml.sjtu.edu.cn/TADB3/TAfinder.php)125. CRISPRCasFinder v4.2.20, implemented in Prokesee, 620
was used to identify CRISPR arrays and cas genes. 621
Acknowledgements
622
Our research is funded by the Discovery Program of the Natural Sciences and Engineering 623
Research Council (RGPIN -2020-05643) and by a Discovery Accelerator Supplements Program 624
(RGPAS-2020-00033). N.C. is a University of Ottawa Research Chair in Microbial Genomics. 625
J.O. and G.K. were supported by MITACS projects (IT30302). At the University of Torino, 626
research was funded under the National Recovery and Resilience Plan (NRRP), Mission 4, 627
Component 2, Investment 1.4 - Call for tender No. 3138 of 16 December 2021, rectified by Decree 628
n.3175 of 18 December 2021 of the Italian Ministry of University and Research, funded by the 629
European Union – NextGenerationEU; Project code CN_00000033, Concession Decree No. 1034 630
of 17 June 2022, adopted by the Italian Ministry of University and Research, CUP 631
D13C22001350001, Project title “National Biodiversity Future Center - NBFC”. 632
Data Availability 633
The genome data used in our study are available in GenBank under the BioProject 634
PRJNA1364746. Chromosome annotations are available at Zenodo: 635
https://doi.org/10.5281/zenodo.18236849. All the scripts used can be accessed here: 636
https://github.com/kenmurithi/Mugambi-2026-AMF-endosymbiont-genomics 637
References
638
1. Spatafora, J. W. et al. A phylum-level phylogenetic classification of zygomycete fungi 639
based on genome-scale data. Mycologia 108, 1028–1046 (2016). 640
2. Smith, S. E. & Read, D. J. Mycorrhizal Symbiosis. (Academic, 2008). 641
3. Luginbuehl, L. H. et al. Fatty acids in arbuscular mycorrhizal fungi are synthesized by the 642
host plant. Science 356, 1175–1178 (2017). 643
4. Keymer, A. et al. Lipid transfer from plants to arbuscular mycorrhiza fungi. Elife 6, (2017). 644
5. Bonfante, P. The future has roots in the past: the ideas and scientists that shaped 645
mycorrhizal research. New Phytol. 220, 982–995 (2018). 646
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
20
6. Terry, V. et al. Mycorrhizal response of Solanum tuberosum to homokaryotic versus 647
dikaryotic arbuscular mycorrhizal fungi. Mycorrhiza (2023) doi:10.1007/s00572-023-648
01123-7. 649
7. MacColl, K. A. & Maherali, H. The effect of ecological restoration on mutualistic services 650
provided by arbuscular mycorrhizal fungi depends on site location and host identity. Plant 651
Soil 512, 347–360 (2025). 652
8. Ferguson, R. et al. Arbuscular mycorrhizal fungal genotype and nuclear organization as 653
driving factors in host plant nutrient acquisition and stable carbon storage. Plants People 654
Planet (2025) doi:10.1002/ppp3.10645. 655
9. Pozo, M. J. & Azcón-Aguilar, C. Unraveling mycorrhiza-induced resistance. Curr. Opin. 656
Plant Biol. 10, 393–398 (2007). 657
10. Kokkoris, V., Stefani, F., Dalpé, Y., Dettman, J. & Corradi, N. Nuclear dynamics in the 658
arbuscular mycorrhizal fungi. Trends Plant Sci. 25, 765–778 (2020). 659
11. Halary, S. et al. Conserved meiotic machinery in Glomus spp., a putatively ancient asexual 660
fungal lineage. Genome Biol. Evol. 3, 950–958 (2011). 661
12. Tisserant, E. et al. Genome of an arbuscular mycorrhizal fungus provides insight into the 662
oldest plant symbiosis. PNAS 110, 20117–20122 (2013). 663
13. Riley, R. & Corradi, N. Searching for clues of sexual reproduction in the genomes of 664
arbuscular mycorrhizal fungi. Fungal Ecol. 6, 44–49 (2013). 665
14. Halary, S. et al. Mating Type Gene Homologues and Putative Sex Pheromone-Sensing 666
Pathway in Arbuscular Mycorrhizal Fungi, a Presumably Asexual Plant Root Symbiont. 667
PLoS One 8, e80729 (2013). 668
15. Riley, R. et al. Extreme diversification of the mating type-high-mobility group (MATA-669
HMG) gene family in a plant-associated arbuscular mycorrhizal fungus. New Phytol. 201, 670
254–268 (2014). 671
16. Morin, E. et al. Comparative genomics of Rhizophagus irregularis, R. cerebriforme, R. 672
diaphanus and Gigaspora rosea highlights specific genetic features in Glomeromycotina. 673
New Phytol. 222, 1584–1598 (2019). 674
17. Ropars, J. et al. Evidence for the sexual origin of heterokaryosis in arbuscular mycorrhizal 675
fungi. Nat. Microbiol. 1, 16033 (2016). 676
18. Sperschneider, J. et al. Arbuscular mycorrhizal fungi heterokaryons have two nuclear 677
populations with distinct roles in host-plant interactions. Nat. Microbiol. 8, 2142–2153 678
(2023). 679
19. Wallen, R. M. & Perlin, M. H. An overview of the function and maintenance of sexual 680
reproduction in dikaryotic fungi. Front. Microbiol. 9, 503 (2018). 681
20. Ropars, J. et al. Sex in cheese: evidence for sexuality in the fungus Penicillium roqueforti. 682
PLoS One 7, e49665 (2012). 683
21. Oliveira, J., Yildirir, G. & Corradi, N. From chaos comes order: Genetics and genome 684
biology of arbuscular mycorrhizal fungi. Annu. Rev. Microbiol. 78, 147–168 (2024). 685
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
21
22. Sahraei, S. E. et al. Whole genome analyses based on single, field collected spores of the 686
arbuscular mycorrhizal fungus Funneliformis geosporum. Mycorrhiza 32, 361–371 (2022). 687
23. Chen, E. C. H. et al. High intraspecific genome diversity in the model arbuscular 688
mycorrhizal symbiont Rhizophagus irregularis. New Phytol. 220, 1161–1171 (2018). 689
24. Yildirir, G. et al. Long reads and Hi-C sequencing illuminate the two-compartment genome 690
of the model arbuscular mycorrhizal symbiont Rhizophagus irregularis. New Phytol. 233, 691
1097–1107 (2022). 692
25. Oliveira, J. I. N. et al. Analyses of transposable elements in arbuscular mycorrhizal fungi 693
support evolutionary parallels with filamentous plant pathogens. Genome Biol. Evol. 17, 694
evaf038 (2025). 695
26. Kloppholz, S., Kuhn, H. & Requena, N. A secreted fungal effector of Glomus intraradices 696
promotes symbiotic biotrophy. Curr. Biol. 21, 1204–1209 (2011). 697
27. Oliveira, J. I. N. & Corradi, N. Strain-specific evolution and host-specific regulation of 698
transposable elements in the model plant symbiont Rhizophagus irregularis. G3 (Bethesda) 699
14, jkae055 (2024). 700
28. Teulet, A. et al. A pathogen effector FOLD diversified in symbiotic fungi. New Phytol. 239, 701
1127–1139 (2023). 702
29. Lanfranco, L. & Bonfante, P. Lessons from arbuscular mycorrhizal fungal genomes. Curr. 703
Opin. Microbiol. 75, 102357 (2023). 704
30. Torres, D. E., Reckard, A. T., Klocko, A. D. & Seidl, M. F. Nuclear genome organization in 705
fungi: from gene folding to Rabl chromosomes. FEMS Microbiol. Rev. 47, fuad021 (2023). 706
31. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of 707
chromatin interactions. Nature 485, 376–380 (2012). 708
32. Dekker, J. & Heard, E. Structural and functional diversity of Topologically Associating 709
Domains. FEBS Lett. 589, 2877–2884 (2015). 710
33. Glavincheska, I. & Lorrain, C. Three-dimensional genome architecture connects chromatin 711
structure and function in a major wheat pathogen. bioRxiv 2025.05.13.653796 (2025) 712
doi:10.1101/2025.05.13.653796. 713
34. Kurbidaeva, A. et al. Topologically associating domains and the evolution of three-714
dimensional genome architecture in rice. Plant J. 122, e70139 (2025). 715
35. Winter, D. J. et al. Repeat elements organise 3D genome structure and mediate transcription 716
in the filamentous fungus Epichloë festucae. PLoS Genet. 14, e1007467 (2018). 717
36. Zhang, G., Li, Y. & Wei, G. Multi-omic analysis reveals dynamic changes of three-718
dimensional chromatin architecture during T cell differentiation. Commun. Biol. 6, 773 719
(2023). 720
37. Hansen, A. S., Cattoglio, C., Darzacq, X. & Tjian, R. Recent evidence that TADs and 721
chromatin loops are dynamic structures. Nucleus 9, 20–32 (2018). 722
38. Li, H., Playter, C., Das, P. & McCord, R. P. Chromosome compartmentalization: causes, 723
changes, consequences, and conundrums. Trends Cell Biol. 34, 707–727 (2024). 724
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
22
39. Venice, F. et al. At the nexus of three kingdoms: the genome of the mycorrhizal fungus 725
Gigaspora margarita provides insights into plant, endobacterial and fungal interactions. 726
Environ. Microbiol. 22, 122–141 (2020). 727
40. Bonfante, P. & Desirò, A. Who lives in a fungus? The diversity, origins and functions of 728
fungal endobacteria living in Mucoromycota. ISME J. 11, 1727–1735 (2017). 729
41. Turina, M. et al. The virome of the arbuscular mycorrhizal fungus Gigaspora margarita 730
reveals the first report of DNA fragments corresponding to replicating non-retroviral RNA 731
viruses in fungi. Environ. Microbiol. 20, 2012–2025 (2018). 732
42. Salvioli, A. et al. Symbiosis with an endobacterium increases the fitness of a mycorrhizal 733
fungus, raising its bioenergetic potential. ISME J. 10, 130–144 (2016). 734
43. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo 735
assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021). 736
44. Tang, N. et al. A survey of the gene repertoire of Gigaspora rosea unravels conserved 737
features among Glomeromycota for obligate biotrophy. Front. Microbiol. 7, 233 (2016). 738
45. Rosling, A. et al. Evolutionary history of arbuscular mycorrhizal fungi and genomic 739
signatures of obligate symbiosis. BMC Genomics 25, 529 (2024). 740
46. Ghignone, S. et al. The genome of the obligate endobacterium of an AM fungus reveals an 741
interphylum network of nutritional interactions. ISME J. 6, 136–145 (2012). 742
47. Malar C, M. et al. Early branching arbuscular mycorrhizal fungus Paraglomus occultum 743
carries a small and repeat-poor genome compared to relatives in the Glomeromycotina. 744
Microb. Genom. 8, 000810 (2022). 745
48. Malar C, M. et al. The genome of Geosiphon pyriformis reveals ancestral traits linked to the 746
emergence of the arbuscular mycorrhizal symbiosis. Curr. Biol. 31, 1578–1580 (2021). 747
49. Pelin, A. et al. The mitochondrial genome of the arbuscular mycorrhizal fungus Gigaspora 748
margarita reveals two unsuspected trans-splicing events of group I introns. New Phytol. 749
194, 836–845 (2012). 750
50. Wijayawardene, N. N. et al. Outline of fungi and fungus-like taxa – 2021. Mycosphere 13, 751
53–453 (2022). 752
51. Beaudet, D. et al. Ultra-low input transcriptomics reveal the spore functional content and 753
phylogenetic affiliations of poorly studied arbuscular mycorrhizal fungi. DNA Res. 0, 1–11 754
(2017). 755
52. Błaszkowski, J. et al. A new order, Entrophosporales, and three new Entrophospora species 756
in Glomeromycota. Front. Microbiol. 13, 962856 (2022). 757
53. Redecker, D. et al. An evidence-based consensus for the classification of arbuscular 758
mycorrhizal fungi (Glomeromycota). Mycorrhiza (2013) doi:10.1007/s00572-013-0486-y. 759
54. Corradi, N., Antunes, P. M. & Magurno, F. A call for reform: implementing genome-based 760
approaches for species classification in Glomeromycotina. New Phytol. (2025) 761
doi:10.1111/nph.70148. 762
55. Maeda, T. et al. Evidence of non-tandemly repeated rDNAs and their intragenomic 763
heterogeneity in Rhizophagus irregularis. Commun. Biol. 1, 87 (2018). 764
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
23
56. Stefani, F. et al. The pitfalls of rDNA-based AMF identification: a comparative analysis of 765
rDNA and protein-coding genes. New Phytol. 248, 1501–1515 (2025). 766
57. Boisvert, F.-M., van Koningsbruggen, S., Navascués, J. & Lamond, A. I. The 767
multifunctional nucleolus. Nat. Rev. Mol. Cell Biol. 8, 574–585 (2007). 768
58. Pederson, T. The nucleolus. Cold Spring Harb. Perspect. Biol. 3, a000638–a000638 (2011). 769
59. Schöfer, C. & Weipoltshammer, K. Nucleolus and chromatin. Histochem. Cell Biol. 150, 770
209–225 (2018). 771
60. Muszewska, A., Steczkiewicz, K., Stepniewska-Dziubinska, M. & Ginalski, K. 772
Transposable elements contribute to fungal genes and impact fungal lifestyle. Sci. Rep. 773
(2019) doi:10.1038/s41598-019-40965-0. 774
61. Dallaire, A. et al. Transcriptional activity and epigenetic regulation of transposable 775
elements in the symbiotic fungus Rhizophagus irregularis. Genome Res. 31, 2290–2302 776
(2021). 777
62. Li, D. et al. Comparative 3D genome architecture in vertebrates. BMC Biol. 20, 99 (2022). 778
63. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads 779
using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019). 780
64. Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial 781
genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, 782
e1005595 (2017). 783
65. Sorwar, E., Oliveira, J. I. N., Malar C, M., Krüger, M. & Corradi, N. Assembly and 784
comparative analyses of the Geosiphon pyriformis metagenome. Environ. Microbiol. 26, 785
e16681 (2024). 786
66. Chan, W. T., Garcillán-Barcia, M. P., Yeo, C. C. & Espinosa, M. Type II bacterial toxin-787
antitoxins: hypotheses, facts, and the newfound plethora of the PezAT system. FEMS 788
Microbiol. Rev. 47, fuad052 (2023). 789
67. Salvioli di Fossalunga, A., Lipuma, J., Venice, F., Dupont, L. & Bonfante, P. The 790
endobacterium of an arbuscular mycorrhizal fungus modulates the expression of its toxin-791
antitoxin systems during the life cycle of its host. ISME J. 11, 2394–2398 (2017). 792
68. Teulet, A. et al. A pathogen effector FOLD diversified in symbiotic fungi. bioRxiv (2022) 793
doi:10.1101/2022.12.16.520752. 794
69. Voß, S., Betz, R., Heidt, S., Corradi, N. & Requena, N. RiCRN1, a crinkler effector from 795
the arbuscular mycorrhizal fungus Rhizophagus irregularis, functions in arbuscule 796
development. Front. Microbiol. 9, 2068 (2018). 797
70. Bruns, T. D., Corradi, N., Redecker, D., Taylor, J. W. & Öpik, M. Glomeromycotina: What 798
is a species and why should we care? New Phytol. (2017) doi:10.1111/nph.14913. 799
71. Reinhardt, D., Roux, C., Corradi, N. & Di Pietro, A. Lineage-Specific Genes and Cryptic 800
Sex: Parallels and Differences between Arbuscular Mycorrhizal Fungi and Fungal 801
Pathogens. Trends Plant Sci. (2020) doi:10.1016/j.tplants.2020.09.006. 802
72. Dong, S., Raffaele, S. & Kamoun, S. The two-speed genomes of filamentous pathogens: 803
waltz with plants. Curr. Opin. Genet. Dev. 35, 57–65 (2015). 804
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
24
73. Manley, B. F. et al. A highly contiguous genome assembly reveals sources of genomic 805
novelty in the symbiotic fungus Rhizophagus irregularis. G3 (Bethesda) 13, jkad077 806
(2023). 807
74. Dixon, J. R., Gorkin, D. U. & Ren, B. Chromatin domains: The unit of chromosome 808
organization. Mol. Cell 62, 668–680 (2016). 809
75. Grummt, I. Life on a planet of its own: regulation of RNA polymerase I transcription in the 810
nucleolus. Genes Dev. 17, 1691–1702 (2003). 811
76. Mancio-Silva, L., Zhang, Q., Scheidig-Benatar, C. & Scherf, A. Clustering of dispersed 812
ribosomal DNA and its role in gene regulation and chromosome-end associations in malaria 813
parasites. Proc. Natl. Acad. Sci. U. S. A. 107, 15117–15122 (2010). 814
77. Rabuffo, C. et al. Inter-chromosomal transcription hubs shape the 3D genome architecture 815
of African trypanosomes. Nat. Commun. 15, 10716 (2024). 816
78. Jargeat, P. et al. Isolation, free-living capacities, and genome structure of “Candidatus 817
Glomeribacter gigasporarum,” the endocellular bacterium of the mycorrhizal fungus 818
Gigaspora margarita. J. Bacteriol. 186, 6876–6884 (2004). 819
79. Pawlowska, T. E. et al. Biology of fungi and their bacterial endosymbionts. Annu. Rev. 820
Phytopathol. 56, 289–309 (2018). 821
80. Uehling, J. K. et al. Bacterial endosymbionts of Mucoromycota fungi: Diversity and 822
function of their interactions. in The Mycota 177–205 (Springer International Publishing, 823
Cham, 2023). 824
81. Winsor, G. L. et al. The Burkholderia Genome Database: facilitating flexible queries and 825
comparative analyses. Bioinformatics 24, 2803–2804 (2008). 826
82. Strübing, U., Lucius, R., Hoerauf, A. & Pfarr, K. M. Mitochondrial genes for heme-827
dependent respiratory chain complexes are up-regulated after depletion of Wolbachia from 828
filarial nematodes. Int. J. Parasitol. 40, 1193–1202 (2010). 829
83. Venice, F. et al. Gigaspora margarita with and without its endobacterium shows adaptive 830
responses to oxidative stress. Mycorrhiza 27, 747–759 (2017). 831
84. Wang, J. et al. Crucial involvement of heme biosynthesis in vegetative growth, 832
development, stress response, and fungicide sensitivity of Fusarium graminearum. Int. J. 833
Mol. Sci. 25, 5268 (2024). 834
85. Horvath, P. & Barrangou, R. CRISPR/Cas, the immune system of bacteria and archaea. 835
Science 327, 167–170 (2010). 836
86. Rostøl, J. T. & Marraffini, L. (ph)ighting phages: How bacteria resist their parasites. Cell 837
Host Microbe 25, 184–194 (2019). 838
87. Burstein, D. et al. Major bacterial lineages are essentially devoid of CRISPR-Cas viral 839
defence systems. Nat. Commun. 7, 10613 (2016). 840
88. Siozios, S. et al. Genome dynamics across the evolutionary transition to endosymbiosis. 841
Curr. Biol. 34, 5659-5670.e7 (2024). 842
89. Song, S. & Wood, T. K. A primary physiological role of toxin/antitoxin systems is phage 843
inhibition. Front. Microbiol. 11, 1895 (2020). 844
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
25
90. Lumini, E. et al. Presymbiotic growth and sporal morphology are affected in the arbuscular 845
mycorrhizal fungus Gigaspora margarita cured of its endobacteria. Cell. Microbiol. 9, 846
1716–1729 (2007). 847
91. Spores of mycorrhizal Endogone species extracted from soil by wet sieving and decanting. 848
Transactions of the British Mycological Society 46, 235–244 (1963). 849
92. Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C 850
and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, 851
W177–W184 (2020). 852
93. Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. 853
Nat. Biotechnol. 40, 1332–1335 (2022). 854
94. Uliano-Silva, M. et al. MitoHiFi: a python pipeline for mitochondrial genome assembly 855
from PacBio high fidelity reads. BMC Bioinformatics 24, 288 (2023). 856
95. Schwengers, O. et al. Bakta: rapid and standardized annotation of bacterial genomes via 857
alignment-free sequence identification. Microb. Genom. 7, (2021). 858
96. Arima Genomics. Mapping_pipeline. (2019). 859
97. Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. 860
Bioinformatics 39, (2023). 861
98. Harry, E. PretextView (Paired REad TEXTure Viewer): A Desktop Application for Viewing 862
Pretext Contact Maps. (Github, 2020). 863
99. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C 864
experiments. Cell Syst. 3, 95–98 (2016). 865
100. Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data 866
on the idiograms. PeerJ Comput. Sci. 6, e251 (2020). 867
101. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable 868
element families. Proc. Natl. Acad. Sci. U. S. A. 117, 9451–9457 (2020). 869
102. Qian, J. et al. TEtrimmer: a tool to automate the manual curation of transposable elements. 870
Nat. Commun. 16, 8429 (2025). 871
103. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker. Published on the web at http://www. 872
repeatmasker. org (1996). 873
104. Venice, F. et al. Symbiotic responses of Lotus japonicus to two isogenic lines of a 874
mycorrhizal fungus differing in the presence/absence of an endobacterium. Plant J. 108, 875
1547–1564 (2021). 876
105. Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including 877
transposable elements in differential expression analysis of RNA-seq datasets. 878
Bioinformatics 31, 3593–3599 (2015). 879
106. Palmer, J. M. & Stajich, J. Funannotate v1.8.1: Eukaryotic Genome Annotation. (Zenodo, 880
2020). doi:10.5281/ZENODO.4054262. 881
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
26
107. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. 882
BUSCO: assessing genome assembly and annotation completeness with single-copy 883
orthologs. Bioinformatics 31, 3210–3212 (2015). 884
108. Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language 885
models. Nat. Biotechnol. 40, 1023–1025 (2022). 886
109. Sperschneider, J. & Dodds, P. N. EffectorP 3.0: Prediction of apoplastic and cytoplasmic 887
effectors in fungi and Oomycetes. Mol. Plant. Microbe. Interact. 35, 146–156 (2022). 888
110. Huang, L. et al. dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) 889
sequence and annotation. Nucleic Acids Res. 46, D516–D521 (2018). 890
111. Zheng, J. et al. dbCAN3: automated carbohydrate-active enzyme and substrate annotation. 891
Nucleic Acids Res. 51, W115–W121 (2023). 892
112. Orłowska, M., Steczkiewicz, K. & Muszewska, A. Utilization of cobalamin is ubiquitous in 893
early-branching fungal phyla. Genome Biol. Evol. 13, evab043 (2021). 894
113. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: 895
improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). 896
114. Miller, M. A., Pfeiffer, W. & Schwartz, T. The CIPRES science gateway: a community 897
resource for phylogenetic analyses. in Proceedings of the 2011 TeraGrid Conference: 898
Extreme Digital Discovery (ACM, New York, NY, USA, 2011). 899
doi:10.1145/2016741.2016785. 900
115. Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic 901
inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020). 902
116. Magurno, F. et al. Glomus mongioiense, a new species of arbuscular mycorrhizal fungi 903
from Italian Alps and the phylogeny-spoiling issue of ribosomal variants in the Glomus 904
genus. Agronomy (Basel) 14, 1350 (2024). 905
117. jasmine. Jasmine: Call Select Base Modifications in PacBio HiFi Reads. (Github, 2023). 906
118. pbmm2. Pbmm2: A Minimap2 Frontend for PacBio Native Data Formats. (Github, 2017). 907
119. pb-CpG-tools. Pb-CpG-Tools: Collection of Tools for the Analysis of CpG Data. (Github, 908
2022). 909
120. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic 910
features. Bioinformatics 26, 841–842 (2010). 911
121. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and 912
bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). 913
122. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. 914
Genome Biol. 16, 259 (2015). 915
123. Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible 916
platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187–W191 (2014). 917
124. Cumsille, A. et al. GenoVi, an open-source automated circular genome visualizer for 918
bacteria and archaea. PLoS Comput. Biol. 19, e1010998 (2023). 919
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
27
125. Guan, J. et al. TADB 3.0: an updated database of bacterial toxin-antitoxin loci and 920
associated mobile genetic elements. Nucleic Acids Res. 52, D784–D790 (2024). 921
922
923
.CC-BY-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.