The 3D Genome of Gigaspora margarita Unveils Stable Chromatin and Nucleolar Organization and Symbiont-Dependent Genome Dynamics

preprint OA: closed CC-BY-ND-4.0
📄 Open PDF Full text JSON View at publisher
Full text 87,213 characters · extracted from oa-pdf · 9 sections · click to expand

Abstract

15 Arbuscular mycorrhizal fungi (AMF) are widespread plant symbionts that enhance nutrient 16 acquisition and influence ecosystem productivity. Previous chromosome -level assemblies of a 17 model species revealed a two -compartment genome architecture (active A and repressed B 18 chromatin compartments), yet its conservation across evolutionarily distant AMF lineages remains 19 unresolved. Here, we present a chromosome- scale and 3D genome assembly of Gigaspora 20 margarita isolate BEG34—the largest and most repeat -rich AMF genome to date—alongside that 21 of its obligate endobacterium, Candidatus Glomerobacter gigasporarum (CaGg), using PacBio 22 HiFi and Hi -C sequencing. The G. margarita genome comprises 43 chromosomes (792 Mb) 23 organized into stable A/B compartments and Topologically Associating Domains structures, 24 irrespective of the presence of endobacteria. We uncover 21 divergent rDNA operons distributed 25 across six chromosomes and show that these physically interact, suggesting conserved nucleolar 26 organization. We also reveal that t he CaGg genome is tripartite and mobilome -rich, encoding 27 prophages, an orphan CRISPR array, and complete pathways for many novel and essential 28 cofactors, including heme, which may enhance host bioenergetics. We also find that the 29 endobacterium's presence regulates transposable elements in G. margarita. These findings reveal 30 conserved principles of chromatin architecture in AMF symbionts and highlight the tight 31 molecular interplay between fungal hosts and their endosymbionts, offering new insights into 32 genome evolution and symbiotic adaptation. 33 34 35 36 37 38 39 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 2

Introduction

40 Arbuscular mycorrhizal fungi (AMF) are plant root symbionts that belong to the subphylum 41 Glomeromycotina 1. As obligate biotrophs, AMF require a living plant host to complete their life 42 cycle, leading them to colonize the cortical root cells and develop specialized tree -like structures 43 called arbuscules 2. Within these structures, the plant supplies the fungi with lipids and sugars 3,4, 44 while the AMF provides the plant with essential nutrients that are limiting factors for plant growth 45 5, primarily phosphate, resulting in improved crop yields 6,7, carbon storage 8, and enhanced 46 defence against pathogens 9. AMF are always multinucleated, with individual spores carrying up 47 to 20,000 haploid nuclei in some species 10. Although sexual reproductive structures have not yet 48 been observed in AMF, genome and single-nucleus analyses have shown that these organisms 49 carry conserved mating-related genes 11–16 and follow homokaryotic/heterokaryotic cycles 17,18, 50 which define sexual processes in fungi 19,20. Genome analyses show that AMF plant dependence 51 is likely linked to the loss of genes involved in fatty acid production, thiamine biosynthesis, sugar 52 utilization, and plant cell wall degradation 12,21. Their genomes are also highly enriched in 53 transposable elements (TEs), and closely related AMF strains exhibit striking variation in gene 54 content 22,23. The abundance of TEs in AMF is the main reason their genome assemblies have long 55 been highly fragmented, thereby hampering our understanding of their genetic structure and 56 overall genome biology. 57 This issue was recently addressed by combining long reads with chromatin capture (Hi-C) datasets, 58 which allowed the generation of chromosomal -level assemblies for model AMF R. irregularis 59 strains (order Glomerales)18,24. This approach revealed that AMF chromatin separates into two 60 compartments (A/B). The compartment A contains transcriptionally active genes, high 61 methylation of transposable elements (TEs), and most conserved core genes. In contrast, 62 compartment B harbours transcriptionally repressed genes and is rich in genes encoding secreted 63 proteins, candidate effectors, and TEs that are upregulated in planta (vs. extra-radical mycelium) 64 18,21,24–28. This stage-specific upregulation suggests that root colonization leads to the relaxation of 65 the B sub-compartments involved in the molecular dialogues between partners of the mycorrhizal 66 symbiosis 21,24. In support of this , transmission electron microscopy of the AMF Gigaspora 67 margarita shows shifts in chromatin condensation, transitioning from a tightly packed state in 68 spores to a looser state in intra-radical hyphae during plant root colonization 29. 69 In addition to being separated into A/B compartments, the available R. irregularis genomes are 70 also organized into finer -scale structural units known as Topologically Associating Domains 71 (TADs)-like structures , within which chromatin interactions occur more frequently than with 72 neighbouring regions 24,30–32. In model eukaryotes, TADs are separated by “boundaries” that act 73 as insulators, limiting epigenetic interactions between adjacent TAD s 30,31,33, and are often 74 enriched for protein-coding genes and depleted of DNA repeats 31,33,34. In contrast, in fungal 75 species, repeats were shown to dominate TAD boundaries 35, highlighting the ir functional 76 divergence and potentially higher malleability. Hi-C-based identification of A/B compartments in 77 AMF remains confined to the model AMF species R. irregularis, and independent support for their 78 existence and conservation in other AMF lineages is lacking. Similarly, although data from model 79 eukaryotes 36–38 supports the hypothesis that the AMF A/B sub-compartments may change 80 conformation in response to host and environment al factors, direct evidence based on chromatin 81 analyses that such changes occur in AMF has not yet been provided. 82 Here, we aim to f ill these knowledge gaps by sequencing the genome of G. margarita isolate 83 BEG34 (order Diversisporales) using PacBio HiFi and Hi-C data. This species contains the largest 84 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 3 and most repeat-rich AMF genome known to date 39. It also carries beneficial obligate 85 Burkholderia-like endobacteria (Candidatus Glomeribacter gigasporarum ; Ca Gg) within its 86 cytoplasm, which have been shown to impact fungal biology 40,41,42. The presence of CaGg in G. 87 margarita spores (+CaGg) and the availability of a cured line ( -CaGg), along with its large and 88 repeated genome and phylogenetic placement , collectively position G. margarita as an ideal 89 species to elucidate diversity and conservation of AMF chromosome biology and the malleability 90 of their A/B compartments and TADs. 91

Results

92 Chromosome-level assembly, annotation, and phylogenomics of G. margarita 93 We performed PacBio and Hi-C sequencing of genomic DNA extracted from G. margarita +CaGg 94 spores. Using Hifiasm with Hi-C integration mode 43, we assembled the HiFi long reads into draft 95 contigs, which were then scaffolded using Hi -C data 18,43. For G. margarita, t his approach 96 generated 43 chromosome-level scaffolds, with a genome coverage of 82X, a size of 792.14 Mb 97 and an N50 of 18.89 Mb (Table 1; Fig. 1a, b). Of these, 20 chromosomes have telomeres at both 98 ends, 20 have telomeres at only one end, and three lack telomeres entirely. Only 23 contigs (0.718 99 Mb) could not be assigned to any chromosome. This assembly represents a significant 100 improvement over the previously available datasets39 in assembly size (792.14 Mb vs. 773.10 Mb), 101 fragmentation (43 vs. 6490 scaffolds), contiguity (N50 of 18.89 Mb vs. 326.79 kb), and gene count 102 (30211 vs. 26603). 103 Table 1. Summary statistics for the genome assembly of G. margarita 104 Feature G. margarita Genome size (Mbp) 792.14 No. of scaffolds 43 No. of genes 30,211 No. of rDNA clusters 21 Repeat content (%) (after curation) 78.28 Busco completeness (%) 95.4 GC (%) 27.77 Genome annotation identified 30,211 protein-coding genes, resulting in a BUSCO completeness 105 of 95.4% (Table 1 ). Of these, 2,438 (8.1%) were annotated as putative secreted proteins and 106 effectors, numbers higher than those reported in R. irregularis (Table S2). We confirm that G. 107 margarita lacks the hallmark “Missing Glomeromycota Core Genes (MGCGs)” 44 (Table S3) and 108 has a reduced set of carbohydrate-active enzymes (CAZymes), but shows a significant expansion 109 of CAZyme families involved in chitin metabolism (GH18, GT1, GT2, AA7, and CE4)39,45 (Table 110 S4). The annotation also uncovered cobalamin-dependent enzymes in G. margarita (Table S5), 111 supporting the hypothesis that the fungus uses cobalamin supplied by CaGg 46. The chromosome-112 level assembly confirms that all AMF with sequenced genomes carry all known meiosis-specific 113 genes, and supports that, as opposed to most AMF with sequenced genomes 21, including 114 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 4 representatives of early branching lineages 47,48, the genes composing the putative AMF mating-115 type17,18, namely a choline transporter, two homeodomain proteins (HD1- 2), and a phosphate 116 glycerate mutase, are not adjacent within members of the Gigasporaceae 16. 117 118 Figure 1. Gigaspora margarita chromosomes and genome content. (A) Karyoplots of 43 chromosomes, illustrating rDNA genes in lime green colour and A/B compartments in violet -red and turquoise colours, respectively, within the ideograms. (B) Genome-wide Hi-C contact map of G. margarita . The black squares represent chromosomes sorted by size. The colour intensity corresponds to interaction frequencies between loci, with darker red indicating high contact probability and white signifying low or no interactions. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 5 Alongside the nuclear genome, we recovered the complete mitochondrial genome as a single 119 contig. The genome size is 96,986 bp, and it encodes core mitochondrial genes, including 19 120 protein-coding genes, 2 trans-spliced rRNA genes, and 24 tRNA genes, supporting previous 121 findings of the G. margarita mitochondrion53. 122 We took advantage of this new chromosome-level dataset to determine the evolutionary placement 123 of G. margarita relative to other AMF species via phylogenomic analyses ( Fig. S1). These 124 analyses support the sister relationship between Glomeromycotina and Mortierellomycotina 125 within the Mucoromycota phylum 1,49. It also backs the close evolutionary relationship between 126 Glomerales and Diversisporales, the paraphyletic placement of Entrophospora species 45,48,50,51, as 127 well as the early branching of Paraglomerales and Archeosporales within the AMF phylogeny. 128 Notably, in support of phylogenetic analyses based on ribosomal genes 52 and available genome 129 datasets 47, our findings highlight Paraglomerales as representing the earliest AMF phylogenetic 130 node. However, we were unable to fully reject an alternative placement of P. occultum with 131 members of the Archaeosporales. 132 The divergent rDNA operons are physically linked in G. margarita 133 The chromosome-level genome annotation also confirms that, unlike most eukaryotes, AMF carry 134 few copies of highly divergent ribosomal DNA (rDNA) operons within their genomes. The G. 135 margarita genome has 21 rDNA copies, approximately twice as many R. irregularis strains24, and 136 these vary in copy number and sequence both within and across chromosome s 3, 12, 17, 25, 34 137 and 40 (Fig. 1a, b; Table S6). The high rDNA sequence paralogy within members of Glomerales 138 was reported to cause significant taxonomic challenges, as rDNA paralogs can sometimes cluster 139 across species boundaries 54,55. We find that this issue is exacerbated in Gigaspora species, further 140 building concerns about the utility of these genes alone for AMF taxonomy 54–56. For instance, the 141 rDNA paralogs from the G. margarita genom e scatter across three clades and are shared with 142 sequences from G. decipiens, G. albida, and G . gigantea. Almost all Gigaspora species show 143 similar paraphyletic clustering based on rDNA paralogy rather than speciation (Fig. S2). 144 145 146 147 148 149 150 151 152 153 154 155 Figure 2. Model of the rDNA 3D organization within the AMF nucleolus. Based on current Hi -C data and knowledge of rRNA transcription in model eukaryotes. Transcriptionally active rDNA repeats from different scaffolds hypothetically interact physically in the nucleolus. The methylated rDNA copy sits outside the nucleolus. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 6 In addition to enabling chromosome-level assemblies, the Hi-C data also revealed strong physical 156 interactions among rDNA copy regions on chromosomes 3, 12, 17, 25, 34, and 40, as evidenced 157 by visually bright signal s off the diagonal line in Hi -C maps (Fig. 1b, Table S 6). In model 158 eukaryotes, active rDNA copies physically cluster in the nucleolus to allow rDNA biogenesis, 159 while inactive copies are found at its edges or outside 57–59. A very similar pattern is seen in G. 160 margarita: most rDNA units are hypomethylated (i.e., actively transcribed), with only one unit, 161 located on chromosome 12, highly methylated, and thus possibly representing a pseudogene . 162 Notably, investigations of Hi -C data from R. irregularis strains uncovered similar r DNA 163 interactions (Fig. S3) between regions on chromosomes 9, 18, 23, and 28. Taken together , our 164

Results

indicate that AMF divergent rDNAs follow a cellular mechanism found in model 165 eukaryotes, in which active rDNA units are physically close to each other within the nucleus , 166 presumably residing within the nucleolus, while inactive units are positioned at the periphery of 167 the nucleolus or outside it (Fig. 2). 168 Endobacteria-driven regulation of transposable elements in G. margarita 169 The genome of G. margarita is known to be highly repetitive, but the true extent and diversity of 170 these repeats was likely obscured by the fragmented nature of available genome datasets 60 and a 171 lack of curated analysis of such repeats 25. Curating our chromosome-level assembly revealed that 172 78.34% of the G. margarita genome consists of repeats, of which 56.1% belong to known TE 173 families. Among these, 19.2% correspond to DNA/TIRS elements (i.e., DNA transposons), while 174 17.1% and 13.1% are LTR and LINE elements, respectively. Overall, 22.7% of the repeats remain 175 unknown and cannot be classified after manual curation ( Fig. 3), which compares to 41% in 176 previous work 39. These unknown repeats may represent expanded gene families commonly found 177 in AMF25,61, as well as TE families that have not yet been formally characterized in model species, 178 including fungi . We assessed the expansion and degeneration of TEs in G. margarita by 179 constructing their repeat landscapes based on Kimura substitution calculations ( Fig. 3) , where 180 lower Kimura substitution rates indicate recent TE insertions, while higher rates suggest older 181 insertions 61. Our analyses build on previous findings showing that TEs in Diversisporales are 182 enriched in recent and active expansions 25, with DNA/TIR elements, LINEs, and LTRs being the 183 primary contributors to recent TE bursts in G. margarita. 184 In AMF, including G. margarita, TEs are preferentially located close to promoters and genes 185 involved in molecular communication with the host (e.g., effectors) 25, which indicates their 186 potential role in regulating gene expression and maintaining successful mycorrhizal relationships. 187 Reports of significant TE upregulation in the model AMF R. irregularis following root 188 colonization supported this view 27. A more comprehensive annotation of TEs allowed a detailed 189 analysis of the differential regulation of fungal TEs in Lotus japonicus roots colonized by G. 190 margarita, with and without endobacteria. Remarkably, t he presence of the bacterial 191 endosymbiont alters the regulation of specific TE families during root colonization ( Fig. 3b, c). 192 Among these, some members of the Helitron , Maverick, and LINE families were specifically 193 upregulated only in the presence of CaCg, while members of the PLE family were upregulated 194 exclusively in the absence of CaCg. Furthermore, DIRS families were downregulated in the 195 absence of CaCg (Fig. 3c). 196 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 7 197 Figure 3. Composition and activity of transposable elements in G. margarita genome. (A) Transposable element distribution and repeat landscape. The pie chart depicts the relative abundance of each type of sequence in the G. margarita genome. The histogram represents the repeat landscape grouped by Kimura divergence levels (x -axis) in relation to the genome (x- axis). (B) Heatmap for differentially expressed TE families during Lotus japonicus root colonization, comparing samples lacking the CaCg endobacterium (-CaCg) with CaCg-containing samples (+ CaCg) in relation to spores (control). Rows represent TE families, annotated by TE order and clustered by expression profile. (C) Number of differentially expressed TE families by order and direction of regulation (up- or down-regulated) in colonized roots, shown separately for samples without and with CaCg. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 8 Identification of A/B compartments and Topologically Associating Domains in G. margarita 198 Hi-C data also revealed that the G. margarita genome exhibits a checkered pattern delineating two 199 A/B compartments (Figure 4a, b; Fig. S4), as reported in all R. irregularis strains with available 200 Hi-C data 18,24. Some chromosomes carry large blocks of A or B compartments (Fig 4a), while 201 others harbour a finely interleaved, plaid-like pattern of A/B compartmentalization (Fig. 4b). 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 Figure 4. Genome compartmentalization in G. margarita. Examples of Hi-C contact maps showing A/B compartments in G. margarita chromosomes (A) 3 and (B) 36. The bottom track shows the first eigenvector values, identifying the compartment interaction at 50-kb resolution. Regions that interact more frequently are visualized as brighter squares on the contact map. Gene/repeat densities, methylation frequency and gene expression in compartments A and B of G. margarita. Boxplots showing (C) Genes (Genes per 50 kb), (D) repeat densities (Repeats per 50 kb), (E) CpG methylation frequency, and (F) gene expression levels (logTPM + 1) in three conditions: Spores, ERM, and IRM ( L. japonicus) in A and B compartments. Boxes show the first quartile (25%), the middle black line (50%), and the third quartile (75%). The whiskers extend to 1.5× the box length, representing the outliers as dots. Asterisks above the boxplots indicate significant differ ences between the A/B compartments ( p < 0.05, Wilcoxon rank-sum test). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 9 The G. margarita A compartment has significantly higher gene density and average gene 225 expression compared to the B compartment and contains most core genes and all r DNA operons 226 (Fig. 4a), while the B compartment is significantly enriched for secreted proteins and candidate 227 effector genes (Table S7). Although both compartments have similar TE densities (Fig. 4d), the 228 TEs in the A compartment are significantly more methylated (Fig. 4e) . Like R. irregularis 229 homokaryons24,61, G. margarita shows a strong bimodal distribution of methylation levels, with a 230 larger percentage of CpG sites either highly methylated (>8 = 58.8%) or weakly methylated (<2 = 231 38.6%). Notably, A/B compartments remain remarkably stable between the +CaCg and - CaCg 232 conditions – i.e., no significant change in checkered patterns was observed both within and among 233 chromosomes. In G. margarita, the A compartments are preferentially located within chromosome 234 cores (Fig 1 ), while B-compartments are generally found at the chromosome ends . This clear 235 distinction in A/B localization within chromosomes is not present in R. irregularis isolates 24. 236 Hi-C data analysis also revealed 1,407 TAD-like structures in G. margarita (Fig. 5a). The TADs 237 cover 92.16% of the genom e, with a median size of 420 kb (Fig. 5b), in line with reports from 238 other eukaryotes, including non-AMF fungi 30,32,62. 239 Figure 5. Topologically associating domains (TADs) -like structures in the G. margarita genome. (A) A representative section of chromosome 2 showing examples of TADs as black triangles. The bottom green line corresponds to the insulation score (TAD score), and the vertical black dashed lines highlight the predicted TAD. (B) Density plot showing TAD size distribution. The vertical dashed line represents the mean value. (C -E) Line plots of (C) genes, (D) repeats, and (E) DNA methylation (CpG) at domain boundaries and ±50 kb from boundaries. (F) Gene expression (log TPMs + 1) in boundary and non- boundary regions across three conditions: Spores, ERM, and IRM ( L. japonicus), respectively. Boxes show the first quartile (25%), the middle black line (50%), and the third quartile (75%). The whiskers extend to 1.5× the box length, representing the outliers as dots. Asterisks above the boxplots indicate significant differences between the boundary and non-boundary regions (p < 0.05, Wilcoxon rank-sum test). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 10 We identified TAD boundaries based on low insulation scores and, surprisingly, found that they 240 are gene-rich and depleted of repeats (Fig. 5c, d). The gene enrichment at boundary regions is 241 linked to low methylation levels (Fig. 5e ) and, accordingly, boundary -associated genes have 242 significantly higher expression levels across multiple life stages, including germinating spores, 243 extraradical, and intraradical mycelium (L. japonicus), compared to non- TAD boundaries ( Fig. 244 5f). Moreover, genes within the same TAD are co-expressed across life stages (Fig. S5), indicating 245 that TAD boundaries function as transcriptional hotspots in AMF. 246 The tripartite CaGg endosymbiotic genome supports genomic plasticity and reveals novel 247 pathways for enhanced stress defence and cofactor biosynthesis 248 Alongside the AMF genome, we obtained the complete genome of the CaGg endosymbiont, 249 comprising a single circular chromosome of 1,998,997 bp and two circular plasmids, referred to 250 herein as pCaGg01 (99,883 bp) and pCaGg02 (22,198 bp), with a read depth double that of the 251 chromosome (Fig. 6; Table 2). The endobacterial circular chromosome and plasmid assemblies 252 contain the expected ORI motifs and were all corroborated by uniform read coverage, as well as 253 Hi-C contact mapping (Fig. S6) and independent assemblers 63,64. 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 Figure 6. Circular genome map of CaGg. From outside to inside: + strand clusters of orthologous genes (COGs); coding sequences (CDS) on the + strand; rRNA and tRNA on the + strand; CDSs, rRNA, and tRNA on the – strand; COGs on the - strand; GC content; GC skew. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 11 Table 2. Genome statistics and mobile genetic elements (MGEs) of the CaGg chromosome and 271 associated plasmids. 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 This new assembly recovers substantially more coding capacity and non-coding features than the 291 previously fragmented version did. Specifically, our genome assembly, totalling 2,121,078 bp, is 292 20% larger than the previous fragmented assemblies obtained with pyrosequencing and fosmid 293 libraries, which reported 3 ORIs in 4 genomes at 15x coverage across 125 contigs 46. It is also 294 complete, fully resolving previously undetected plasmid sequences. The CaGg chromosome 295 contains 2,292 protein-coding genes, 44 tRNAs, 21 non-coding RNAs, one transfer-messenger 296 RNA, and a single complete rRNA operon. The plasmid pCaGg01 has 130 coding genes and one 297 ncRNA, while pCaGg02 has 30 coding genes. A cluster of Orthologous Genes (COGs) analysis 298 reveals that the CaGg chromosome contains a high proportion of genes associated with mobilome 299 (X), while pCaGg01 is dominated by genes classified as defence (V) , and pCa Gg02 mainly 300 contains genes involved in signal transduction (T) (Fig. 6). 301 The metabolic reconstruction of the CaGg chromosome confirms previously reported core 302 metabolic modules, including the absence of phosphofructokinase ( pfk), fatty acid degradation 303 genes, reduced amino acid biosynthesis enzymes, and a complete cobalamin (B12) biosynthesis 304 Feature CaGg Chromosome pCaGg01 pCaGg02 Size (bp) 1998997 99883 22198 GC (%) 54.80 53.06 47.04 CDSs 2292 130 30 Coverage 27 55 56 MGEs Integration/Excision (IE) 102 2 0 Replication / Recombination / Repair (RRR) 21 0 0 Phages (P) 22 0 0 Stability/Transfer/Defence (STD) 47 5 0 Transfer (T) 15 8 0 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 12 pathway. The endosymbiont also encodes multiple membrane transport systems, including ABC 305 transporters and members of the major facilitator superfamily (MFS) , as well as Type II, III, and 306 IV secretion systems and complete Sec and Tat pathways 46. 307 Our improved assembly also uncovered novel and biologically relevant metabolic capabilities of 308 CaGg. These include complete pathways for coenzyme A, NAD, lipoic acid, PreQ1, and heme 309 metabolism, revealing a much broader capacity for cofactor production than previously assumed. 310 We also found a CRISPR arra y lacking a cas -motif. This orphan array may be linked to the 311 identification of phage elements in our assembly, including two intact prophages and several 312 remnants on the CaGg chromosome (Table S8), thereby explaining the remarkable enrichment of 313 Mobile Genetic Elements (MGEs) (Table 2). Remarkably, some of the MGEs are classified as 314 plasmid-origin gene families, and upon further examination, we identified several plasmid-related 315 genes in the CaGg genome, including some involved in plasmid replication, partitioning, 316 stabilization, and conjugation, altogether suggesting possible distinct plasmid integrations into the 317 CaGg genome (Table S9). None of these putative insertions result in coverage drops or changes 318 in Hi-C signals, indicating that these are not artefactual. 319 Overall, the MGEs cover 6.97% of the CaGg genome and are classified into five functional 320 categories: Integration/Excision (IE, 102), Replication/Recombination/Repair (RRR, 21), Phage 321 (P, 22), Stability/Transfer/Defense (STD, 47), and Transfer (T, 15). As observed in other 322 Glomeromycotina endosymbionts 65, IE elements are enriched in the genome, suggesting that they 323 are major contributors to genome plasticity in Glomeromycotina endosymbionts. The high 324 mobilome content prompted us to analyze type II toxin- antitoxin (TA s) modules, which are 325 thought to stabilize MGEs 66 and mediate stress -induced persistence in CaGg 67. The present 326 assembly uncovered 41 new TAs (39 chromosomal and 8 on pCa Gg01), up from 9 in earlier 327 predictions 67 (Table S10). 328

Discussion

329 In this work, we reported the first chromosome-level assembly and 3D analysis of an AMF genome 330 outside the Glomerales. This provided an unprecedented view into the biology of one of the largest 331 and most repeat-rich genomes known within the fungal kingdom and uncovered novel insights into 332 the biology and evolution of its bacterial endosymbiont. 333 A chromosomal view of a large and highly repetitive non-model AMF 334 Combining long reads with chromatin-capture methods has successfully helped assemble the large, 335 highly repetitive G. margarita genome and its endosymbiont into a complete chromosome-level 336 assembly. This allowed us to demonstrate that AMF have more chromosomes than most other 337 fungal relatives and that, with 43 chromosome-level scaffolds, G. margarita likely has the highest 338 reported chromosome count to date in the fungal kingdom. Our chromosome annotation reveals 339 that G. margarita encodes approximately 30,211 genes and retains all the hallmarks of obligate 340 biotrophy typical of AMF 48. These include a reduced set of cell wall-degrading enzymes, the lack 341 of fatty acid biosynthesis genes, the loss of thiamine synthesis genes, and the uptake of soluble 342 sugars, such as glucose, which are essential compounds they obtain directly from their hosts 12,16. 343 We also report a larger-than-expected set of proteins potentially involved in the dialogue with the 344 host 24,26,68,69, suggesting an improved ability to establish symbioses with multiple plants, and we 345 found primary evidence that G. margarita utilizes cobalamin produced by its endobacterial host, 346 CaGg. 347 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 13 Beyond gene content, the chromosome-level assembly provides a curated and comprehensive view 348 of TE diversity in a genome long recognized for its repetitiveness and fragmentation 39. While 349 previous work linked the large genome size of G. margarita primarily to LINE expansions 39, our 350 refined TE annotation uncovers a substantial contribution of Class II elements, including 351 DNA/TIR transposons and Mavericks in G. margarita genome biology. These DNA transposons 352 have been implicated in gene duplication, genome restructuring, and regulatory innovation in fungi 353 and other eukaryotes 60, suggesting that multiple TE classes , not only retrotransposons, continue 354 to contribute to genome complexity in G. margarita. Our analyses also continue to support the 355 view that, in addition to dictating genome evolution, the AMF TEs are major players in regulating 356 gene expression during colonization, presumably due to the ir localization within regulatory 357 regions and in proximity to candidate secreted proteins and effectors. It is noteworthy that different 358 TEs families are regulated in G. margarita and in R. irregularis strains during root colonization. 359 This could mirror lineage-specific regulatory adaptations in AMF and/or host plant -driven TE 360 regulations27 . It will be important to see how these elements are regulated across additional hosts 361 and conditions in future studies, particularly in the presence or absence of the endobacterium. 362 The genome analyses of a large AMF genome also reinforce the growing evidence that rDNA 363 sequence diversity is significant within individual AMF genomes. This continues to highlight the 364 challenges of using this locus alone for taxonomic resolution, as divergent rDNA copies cluster by 365 paralogy rather than by speciation, and supports the need for population genetics-based approaches 366 in AMF taxonomy 54,70. Our e vidence that some r DNA copies are likely pseudogenes further 367 exacerbates these challenges. 368 Remarkable conservation in genome biology among AMF lineages 369 In addition to sharing losses in key genes associated with obligate biotrophy and high rDNA gene 370 paralogy, the genome biology of G. margarita follows patterns conserved among distinct AMF 371 lineages. All species investigated to date with Hi -C have genomes partitioned into an A 372 compartment with high gene density, expression, and repeat methylation, and a B compartment 373 with low repeat methylation and an enrich ment in secreted and effector proteins. This spatial 374 organization of AMF chromatin likely maintains genome stability by silencing repeats in the A 375 compartment, which is rich in housekeeping genes 24 while allowing the B compartment to serve 376 as a hotspot for diversifying secreted proteins and effector encoding genes. Altogether, these 377 features continue to underpin striking analogies in the genome biology and evolution of AMF and 378 filamentous plant pathogens 71,72. 379 Despite the shared similarities, AMF genomes exhibit some notable epigenetic distinctions. For 380 example, while the R. irregularis heterokaryons show a tripartite genome-wide CpG methylation 381 distribution 18, the homokaryons and G. margarita display a bimodal methylation pattern 24,73. The 382 reasons for these differences remain unknown, although it is tempting to speculate that the nuclear 383 type (homokaryon vs. heterokaryon) influences AMF methylation distribution patterns. Our work 384 also identified notable distinctions between TADs in AMF and those of fungal relatives. For 385 example, i n the fungal species Epichloë festucae, AT-rich, repeated blocks contribute to the 386 formation of TADs 35, yet the repeat -depletion at the boundaries we observed suggests an 387 alternative mechanism for TAD establishment in AMF that is more consistent with reports from 388 other eukaryotic lineages33,34,62,74 , whereby boundary genes are co -expressed across different 389 conditions, indicating that TAD organization facilitates transcriptional coordination in these 390 prominent symbionts. 391 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 14 A model that explains the co-transcription of divergent rDNA operons in AMF 392 In most eukaryotes, rDNA genes exist as hundreds to thousands of identical copies scattered across 393 different chromosomes, yet these copies are all transcribed in a spatially coordinated manner 394 within the nucleolus 57,58,75. The localization of rDNA genes within the nucleolus has been 395 primarily studied using microscopy techniques, such as fluorescence in situ hybridization (FISH) 396 and immunofluorescence 57,76. Recently, 3C-based methods, such as Hi-C, have been employed to 397 identify these interactions at a genome -wide scale, providing an additional independent line of 398 evidence for rDNA co-localization 77. In our work, the presence of fewer, divergent rDNA copies 399 in G. margarita has likely facilitated their capture and separation by Hi-C, thereby enabling the 400 detection of their strong physical interactions, indicating that these genes co-localize, presumably 401 within the nucleolus, to ensure their co -transcription and enhance ribosome biogenesis 57,58. In 402 addition, the identification of identical signals in R. irregularis 24 shows that the co-localization of 403 rDNA is a conserved mechanism of rDNA organization and co-regulation in AMF. 404 An improved view of endosymbiotic contribution to G. margarita 405 Our complete genome assembly and comparative analysis revealed that the Ca Gg genome 406 comprises a large circular chromosome and two smaller complete circular plasmids, each with 407 distinct ORI motifs, supported by HiFi and Hi -C data and independent assemblers. As such, this 408 work clarifies previous assumptions, which suggested that the CaGg genome is organized into 2 409 to 4 genomic units ranging between 1.4 to 2.5 in size 46,78. Like other AMF endosymbionts 40,79,80, 410 the CaGg genome is smaller than that of free -living Burkholderia relatives81, resulting in limited 411 metabolic capabilities, including the inability to utilize glycolysis as an energy source and a limited 412 capacity to biosynthesize essential amino acids, underscoring the AMF host's strong dependence. 413 The close-knit relationship between CaGg and G. margarita is further emphasized by our finding 414 that some fungal TEs are upregulated only in the presence of the endobacterium, and the 415 identification of novel pathways through which CaGg may influence the host's fungal physiology. 416 In particular, the presence of a complete heme biosynthesis pathway suggests an unsuspected role 417 in supporting AMF host bioenergetics, consistent with observations in other host-endosymbiont 418 systems 82. Indeed, heme is an essential cofactor for many proteins, including cytochromes of the 419 mitochondrial electron transport chain, which are highly abundant in the G. margarita genomes . 420 As such, it is intriguing to speculate that CaGg-derived heme enhances the activity of respiratory 421 complexes in the AMF host, providing a possible explanation for the higher ATP production and 422 respiratory activity reported in endobacteria-containing G. margarita compared to the cured line 423 83. In other filamentous fungi 84, heme also plays roles in growth, development, and stress 424 adaptation, suggesting that CaGg-encoded heme production could affect multiple aspects of AMF 425 physiology beyond respiration. Future analyses will hopefully reveal how these novel 426 endobacterial genes are regulated in response to environmental changes. 427 Another surprising discovery was that CaGg carries several MGEs, including prophages, 428 expanding the viral pool within the G. margarita symbiotic system 41. While most bacteria rely on 429 CRISPR-Cas systems to defend against foreign DNA85,86, CaGg, like many endosymbionts, lacks 430 a functional CRISPR-Cas system 87,88. The presence of this orphan CRISPR array likely explains 431 the high MGE content, including prophages, in the CaGg genome, and it indicates that CRISPR -432 based defence was active in the BRE endobacterial ancestors prior to some lineages becoming 433 obligate endosymbionts. Conversely, the enrichment of toxin- antitoxin systems in CaGg might 434 represent an alternative defence strategy in the absence of CRISPR-Cas 86,89. 435 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 15

Conclusions

436 This study presents the first chromosome-scale and 3D genome assembly of a non-model AMF, 437 revealing conserved principles of chromatin architecture among AMF despite dramatic differences 438 in genome size and repeat content. Our findings confirm that A/B compartmentalization and TAD-439 like structures are fundamental features of AMF genome organization, supporting coordinated 440 gene expression and genome stability. We also uncover a novel mechanism f or rDNA co -441 localization within the nucleolus, which likely facilitates ribosome biogenesis despite extensive 442 rDNA sequence divergence—a hallmark of AMF genomes. 443 Beyond fungal chromatin biology, our work sheds light on new functional contributions of the 444 obligate endobacterium CaGg by revealing a multipartite architecture enriched in mobile genetic 445 elements, underscoring its genomic plasticity that cannot be offset by an incomplete CRISPR-Cas 446 system. Additionally, we identify novel pathways for essential cofactors , including heme and 447 cobalamin, suggesting a direct role in enhancing host bioenergetics and stress resilience. The 448 observation that CaGg presence modulates transposable element activity in G. margarita further 449 highlights a complex interplay between host and symbiont at the epigenetic and genome stability 450 levels. 451 Together, these findings point to a model in which chromatin architecture, rDNA organization, 452 and endosymbiont-derived metabolic capabilities collectively shape the biology and ecological 453 success of AMF. Future work should explore how these interactions respond to environmental 454 cues and influence plant fitness, including in early- branching lineages, to provide complete 455 insights into the molecular foundations of one of the most impactful plant symbioses. 456

Methods

457 Sample preparation for HiFi sequencing 458 Spores of Gigaspora margarita Becker and Hall (BEG 34, deposited at the European Bank of 459 Glomeromycota) containing (+CaGg) or lacking (-CaGg) the obligate endobacterium Candidatus 460 Glomeribacter gigasporarum were used in this study. The -CaGg spores were obtained from 461 +CaGg spores as described in 90. The + CaGg and -CaGg spores were isolated from clover trap 462 plants by wet sieving 91 and collected individually under a dissecting microscope. Only the 463 healthiest spores were selected and pooled in to groups of 100, then surface -sterilized with a 464 solution of Chloramine T (3% w/v) and streptomycin sulphate (0.03% w/v) in sterile distilled 465 water. A batch of +Ca Gg spores was immediately frozen in liquid nitrogen, while the remaining 466 batches (+CaGg and -CaGg) were incubated to allow germination. 467 DNA extraction and purification 468 DNA was extracted from pools of 100 sterilized and frozen +CaGg spores using the DNeasy Plant 469 Pro Kit (Qiagen). The s pores were ground in the extraction buffer using a TissueLyser 470 homogenizer (2 min at 24 Hz, repeated twice), and DNA extraction was performed according to 471 the manufacturer’s instructions, except that the PS solution was omitted and the samples were not 472 heated. DNA from two independent pools of 100 spores each was combined, and a cleanup step 473 was performed with the DNeasy PowerClean Pro Kit (Qiagen) according to the manufacturer’s 474 instructions. 475 476 477 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 16 Sample preparation and Hi-C sequencing 478 For Hi-C analysis, 2,900 +CaGg and 2,900 -CaGg sterilized spores were placed in multiwell plates 479 containing 1 mL of sterile distilled water and incubated in the dark at 30 °C for 7 days to allow 480 germination. Germinated spores and pre -symbiotic mycelium were fixed at room temperature 481 under gentle shaking for 20 min in 1% (v/v) formaldehyde in 0.01 M PBS (pH 7.2), followed by 482 an additional 15 min incubation in 0.125 M glycine in 1% formaldehyde in 0.01 M PBS (pH 7.2). 483 Spores and mycelium were then frozen in liquid nitrogen and ground with sterile micropestles to 484 a fine powder. Samples were stored at −80 °C until library preparation was performed. 485 Hi-C data was processed using the Proximo Hi -C Kit (Fungal) from Phase Genomics (Seattle, 486 WA, USA), employing four restriction enzymes: DpnII, MseI, DdeI, and HinFI. The Hi-C libraries 487 were sequenced on the Illumina NovaSeq X PE150 platform, generating 194 and 202 million reads 488 for the -CaGg and +CaGg samples, respectively (Table S1). 489 Genome assembly and Hi-C scaffolding 490 A total of 9.88 million PacBio HIFI reads were obtained on the Revio Platform at Mount Sinai 491 Hospital (Toronto) and assembled into contigs using Hifiasm v0.16.1 with the parameters -l0 (no 492 purging) and --h1 and --h2 for Hi-C data integration 93. The raw assembly was queried against the 493 NCBI nr database using Diamond blastx (v0.9.14.115; perc_identity 75, -evalue 1e-5) to detect 494 and remove contaminants, and identify mitochondria and CaGg contigs, which were annotated 495 using MitoHifi v3.2.3 94 and Bakta v1.8.2 95, respectively. 496 The remaining contigs were scaffolded using the Hi-C reads, which were first processed using the 497 Arima Genomics pipeline 96. The resulting BAM file alignments were used as input to the YaHS 498 scaffolding tool 97. The assembled scaffolds were manually curated using PretextView 98 to 499 generate chromosome-scale scaffolds and identify the unplaced contigs. A Python script was used 500 to identify telomeres ( https://github.com/Jana Sperschneider/FindTelomeres ). The final 501 chromosome-scale scaffolds were visualized using the genome- wide Hi-C heatmap generated by 502 Juicebox Assembly Tools v1.11.08 99 and through a chromosome ideogram layout using 503 RIdeogram100. 504 Repeat masking and transposable elements analysis 505 The assembled genome was submitted to RepeatModeler2 101 to generate a consensus library of 506 repetitive sequences. To better classify repeats and address the high number of unclassified 507 sequences (“Unknown”), we curated the consensus library using TEtrimmer102 and manual 508 curation, as has been previously done for other AMF species 25,27. Using this method, sequences 509 known to be expanded genes were removed. The genome masking and TEs annotation w ere 510 performed on the final library, which contains a curated consensus of the assembled genome using 511 RepeatMasker103. 512 RNA-seq data from intraradical mycelium with and without the endobacterium, from colonizing 513 Lotus japonicus roots, and from extraradical mycelium of G. margarita (PRJNA267628)104 were 514 used to assess the differential expression of TEs . The reads were first aligned to the reference 515 genome assembled in this study using HISAT2. TE expression was then evaluated using 516 TEtranscripts105, applied to the aligned reads, guided by the gene and TE annotations carried out 517 in this study. 518 519 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 17 Genome annotation 520 Gene prediction was performed on a soft-masked genome assembly using the Funannotate pipeline 521 v1.8.15 106. First, the command "funannotate train" was executed to train ab initio gene predictors 522 with previously published RNA -seq data (PRJNA751155)104. Next, the command "funannotate 523 predict" was run with the parameters --optimize_augustus and --ploidy 1. Multiple sources of 524 evidence were utilized as input during the prediction of protein- coding sequences: (1) transcript 525 assemblies ( --transcript_evidence) and alignments ( --rna_bam); (2) gene models generated by 526 PASA (--pasa_gff); and (3) protein sequences from UniProtKB and the G. margarita proteome. 527 The quality of the annotated genome was assessed using Benchmarking Universal Single -Copy 528 Orthologs v5.2.2 (BUSCO) 107. SignalP v6.0 108 was employed to predict secreted proteins with 529 parameters --organism eukarya and --mode slow. EffectorP v3.0 109 was utilized to predict effector 530 genes, and carbohydrate -active enzymes (CAZymes) were annotated using dbCAN 110,111. To 531 identify B12- dependent enzymes in G. margarita, homologs previously identified in R. 532 irregularis112 were used as BLASTP queries. 533 Phylogenetic analyses 534 To infer the phylogenetic relationships among SSU-ITS-LSU rDNA paralogs found in G. 535 margarita and rDNA copies from other Gigaspora species, a dataset was created including 536 sequences of Dentiscutata savannicola and Fuscutata heterogama as outgroups . Members of 537 Paradentiscutata and Intraornatospora, the closest genera to Gigaspora, were not chosen as 538 outgroups because they only have partial LSU sequences. Among the 21 rDNA operons of G. 539 margarita, 19 were used as overlapping the partial SSU-ITS-LSU region used for phylogenetic 540 inference. Overall, the dataset comprised 96 sequences representing eight Gigaspora species and 541 10 outgroup sequences. The dataset was aligned with the online version of MAFFT v.7 113, using 542 the E-INS-i iterative refinement method, and edited in MEGA v.5.2.2. by manual trimming of 543 overarching and misaligned ends. Maximum likelihood phylogenetic inference was carried out in 544 RAxML-NG via CIPRES Science Gateway 3.1114 and in IQ-TREE v.2.2.5 115, with 1000 bootstrap, 545 1000 ultrafast bootstrap replicates and 1000 SH-aLRT tests, respectively. Additionally, a Bayesian 546 analysis was performed in MrBayes v3.2.6 with 40 million generations and a stop rule at split 547 frequency standard deviation = 0.01. All analyses were conducted with partitions and nucleotide 548 substitution models as previously described 116. Notably, we found that most ingroup branches 549 were either unsupported or poorly supported, particularly with respect to bootstrap and posterior 550 probability values. In the Bayesian analysis, after 40 million generations, the a verage standard 551 deviation of split frequencies did not approach 0.01 but instead fluctuated around 0.055, suggesting 552 a potential lack of convergence in some bipartitions. 553 To determine the placement of G. margarita in the AMF phylogeny, phylogenomic analysis was 554 performed using the “fungi_odb10” dataset from the BUSCO v4.0 107. Profile-Hidden-Markov-555 Models corresponding to these markers were used to identify homologous sequences in G. 556 margarita and 50 additional fungal genomes, using HMMER3 v3.1b2 as implemented in the 557 PHYling pipeline (https://zenodo.org/records/ 10129968). A total of 702 conserved, single -copy 558 orthologous proteins were identified, aligned, and concatenated for downstream phylogenetic 559 inference. The phylogenetic tree was reconstructed using a maximum likelihood approach, with 560 the best-fit substitution model selected for each gene partition in IQ -TREE v.1.6.12. The final 561 analysis was based on 604,857 aligned sites, with 588,555 distinct patterns contributing to the tree 562 topology. 563 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 18 Methylation and RNA-seq analyses 564 PacBio HiFi reads with kinetic data were processed using Jasmine v3.0.0 117. The output, which 565 identified 5mC sites, was saved as ML and MM tags in the unaligned BAM file. The reads were 566 then aligned to the G. margarita genome using pbmm2 1.10.0 118 with the -preset HiFi parameter. 567 PacBio CpG tools 119 was then used to calculate CpG methylation frequency, using the options 568 pileup_calling_model. v1, tflite model, and –min-mapq 0. The resulting CpG locations were then 569 intersected with genome-wide compartments, genes, and repeat regions using bedtools v2.30.0 120 570 to determine median methylation frequencies within 50-kb windows. 571 For gene expression analysis, RNA-seq datasets from G. margarita germinating spores, ERM, and 572 intra-radical mycelium colonizing L. japonicus roots were mapped to the transcriptome using 573 Salmon v.1.3.0 121. The transcriptome was first indexed using the salmon index module with the -574 keepDuplicates option, and reads were quantified with salmon quant, specifying the -575 validateMappings parameter. The Transcripts Per Million values from the salmon output were log-576 transformed and compared between A/B compartments across the three RNA-seq datasets. 577 Compartment and topologically associating domains (TADs) analysis 578 HiC-Pro v2.11.1 122 and HicExplorer were used to process, analyze, and visualize Hi-C data. First, 579 HiC-Pro was used to map and filter +CaGg and -CaGg Hi-C reads, retaining only alignments with 580 a MAPQ score greater than 10. Contact matrices were then generated at multiple resolutions, 581 ranging from 20 kb to 100 kb in 10 kb increments. The resulting contact matrices were converted 582 to HDF5 (.h5) format using the hicConvertFormat module in HicExplorer. The eigenvector 583 decomposition implemented in the hicPCA command of Hicexplorer was used to call A/B 584 compartments from the contact maps. The first eigenvector (PC1) corresponded to the A and B 585 compartments, and the direction of the PC1 values (positive or negative) was used to determine 586 the compartment identity. Specifically, the contact matrices were manually examined per 587 chromosomal scaffold, and regions with identical positive values were labelled “1”, while those 588 with negative values were labelled “2”. Finally, to assign A/B compartments, the label associated 589 with higher average RNA-seq gene expression and gene density was assigned to compartment A, 590 while the opposite label was assigned to compartment B. 591 To assess whether the presence of an endobacteria primes compartment switches, eigenvalues from 592 contact matrices of G. margarita spores with and without endobacteria were compared. Regions 593 that showed a change in PC1 values from positive to negative , or vice versa, were considered 594 compartment switches. Additionally, the correlation of eigenvalues between the two conditions 595 was calculated to determine the degree of shifts in compartmentalization. 596 We predicted TAD bodies and boundaries using hicFindTADs from HiCExplorer at 30 kb 597 resolution, with the following parameters: --correctForMultipleTesting bonferroni and --598 thresholdComparisons 0.01. We then assessed the distribution of repetitive elements, protein-599 coding genes, and methylation levels around TAD boundaries and within ±50 kb from these 600 boundaries using BEDTools. The resulting metaplots were generated using deepTools 123. Next, 601 we compared gene expression (TPMs) patterns of boundary- associated genes and non- boundary 602 genes across three conditions: germinating spores, ERM, and in planta (lotus). Additionally, to 603 explore whether genes within the same TAD show coordinated expression, we calculated the 604 coefficient of variation for TAD and non-TAD regions. 605 CaGg genomic analyses 606 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 19 To confirm the number and circularity of the CaGg contigs, we reassembled HiFi reads using Flye 607 v2.9.663 and Unicycler v0.5.164 genome assemblers. This process resulted in three CaGg contigs, 608 all of which were verified as circular. One contig, containing core bacterial genes, was identified 609 as the chromosome, while two smaller contigs containing plasmid-associated genes were classified 610 as plasmids. The CaGg circular genome map was generated using GenoVi v0.2.16124. To identify 611 Type IV Secretion System (T4SS) and Mobilization enzymes (MOB), SecReT4.0 (https://bioinfo-612 mml.sjtu.edu.cn/SecReT4/) and MOBscan ( https://castillo.dicom.unican.es/mobscan/) were 613 employed. Prophage regions were located using Phaster (https://phaster.ca/) with default settings. 614 Phaster categorizes predicted prophage sequences as “intact” (score ≥ 90), “Questionable” (score 615 70-90), and “Incomplete” (score < 70), based on size, the presence of phage -related genes, and 616 similarity to known phages. Predicted protein sequences were assigned KEGG Orthologs (KOs) 617 and mapped to KEGG pathways with KofamKOALA ( www.genome.jp/tools/kofamkoala/). The 618 type II TA system was predicted using TAfinder 2.0 ( https://bioinfo-619 mml.sjtu.edu.cn/TADB3/TAfinder.php)125. CRISPRCasFinder v4.2.20, implemented in Prokesee, 620 was used to identify CRISPR arrays and cas genes. 621

Acknowledgements

622 Our research is funded by the Discovery Program of the Natural Sciences and Engineering 623 Research Council (RGPIN -2020-05643) and by a Discovery Accelerator Supplements Program 624 (RGPAS-2020-00033). N.C. is a University of Ottawa Research Chair in Microbial Genomics. 625 J.O. and G.K. were supported by MITACS projects (IT30302). At the University of Torino, 626 research was funded under the National Recovery and Resilience Plan (NRRP), Mission 4, 627 Component 2, Investment 1.4 - Call for tender No. 3138 of 16 December 2021, rectified by Decree 628 n.3175 of 18 December 2021 of the Italian Ministry of University and Research, funded by the 629 European Union – NextGenerationEU; Project code CN_00000033, Concession Decree No. 1034 630 of 17 June 2022, adopted by the Italian Ministry of University and Research, CUP 631 D13C22001350001, Project title “National Biodiversity Future Center - NBFC”. 632 Data Availability 633 The genome data used in our study are available in GenBank under the BioProject 634 PRJNA1364746. Chromosome annotations are available at Zenodo: 635 https://doi.org/10.5281/zenodo.18236849. All the scripts used can be accessed here: 636 https://github.com/kenmurithi/Mugambi-2026-AMF-endosymbiont-genomics 637

References

638 1. Spatafora, J. W. et al. A phylum-level phylogenetic classification of zygomycete fungi 639 based on genome-scale data. Mycologia 108, 1028–1046 (2016). 640 2. Smith, S. E. & Read, D. J. Mycorrhizal Symbiosis. (Academic, 2008). 641 3. Luginbuehl, L. H. et al. Fatty acids in arbuscular mycorrhizal fungi are synthesized by the 642 host plant. Science 356, 1175–1178 (2017). 643 4. Keymer, A. et al. Lipid transfer from plants to arbuscular mycorrhiza fungi. Elife 6, (2017). 644 5. Bonfante, P. The future has roots in the past: the ideas and scientists that shaped 645 mycorrhizal research. New Phytol. 220, 982–995 (2018). 646 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 20 6. Terry, V. et al. Mycorrhizal response of Solanum tuberosum to homokaryotic versus 647 dikaryotic arbuscular mycorrhizal fungi. Mycorrhiza (2023) doi:10.1007/s00572-023-648 01123-7. 649 7. MacColl, K. A. & Maherali, H. The effect of ecological restoration on mutualistic services 650 provided by arbuscular mycorrhizal fungi depends on site location and host identity. Plant 651 Soil 512, 347–360 (2025). 652 8. Ferguson, R. et al. Arbuscular mycorrhizal fungal genotype and nuclear organization as 653 driving factors in host plant nutrient acquisition and stable carbon storage. Plants People 654 Planet (2025) doi:10.1002/ppp3.10645. 655 9. Pozo, M. J. & Azcón-Aguilar, C. Unraveling mycorrhiza-induced resistance. Curr. Opin. 656 Plant Biol. 10, 393–398 (2007). 657 10. Kokkoris, V., Stefani, F., Dalpé, Y., Dettman, J. & Corradi, N. Nuclear dynamics in the 658 arbuscular mycorrhizal fungi. Trends Plant Sci. 25, 765–778 (2020). 659 11. Halary, S. et al. Conserved meiotic machinery in Glomus spp., a putatively ancient asexual 660 fungal lineage. Genome Biol. Evol. 3, 950–958 (2011). 661 12. Tisserant, E. et al. Genome of an arbuscular mycorrhizal fungus provides insight into the 662 oldest plant symbiosis. PNAS 110, 20117–20122 (2013). 663 13. Riley, R. & Corradi, N. Searching for clues of sexual reproduction in the genomes of 664 arbuscular mycorrhizal fungi. Fungal Ecol. 6, 44–49 (2013). 665 14. Halary, S. et al. Mating Type Gene Homologues and Putative Sex Pheromone-Sensing 666 Pathway in Arbuscular Mycorrhizal Fungi, a Presumably Asexual Plant Root Symbiont. 667 PLoS One 8, e80729 (2013). 668 15. Riley, R. et al. Extreme diversification of the mating type-high-mobility group (MATA-669 HMG) gene family in a plant-associated arbuscular mycorrhizal fungus. New Phytol. 201, 670 254–268 (2014). 671 16. Morin, E. et al. Comparative genomics of Rhizophagus irregularis, R. cerebriforme, R. 672 diaphanus and Gigaspora rosea highlights specific genetic features in Glomeromycotina. 673 New Phytol. 222, 1584–1598 (2019). 674 17. Ropars, J. et al. Evidence for the sexual origin of heterokaryosis in arbuscular mycorrhizal 675 fungi. Nat. Microbiol. 1, 16033 (2016). 676 18. Sperschneider, J. et al. Arbuscular mycorrhizal fungi heterokaryons have two nuclear 677 populations with distinct roles in host-plant interactions. Nat. Microbiol. 8, 2142–2153 678 (2023). 679 19. Wallen, R. M. & Perlin, M. H. An overview of the function and maintenance of sexual 680 reproduction in dikaryotic fungi. Front. Microbiol. 9, 503 (2018). 681 20. Ropars, J. et al. Sex in cheese: evidence for sexuality in the fungus Penicillium roqueforti. 682 PLoS One 7, e49665 (2012). 683 21. Oliveira, J., Yildirir, G. & Corradi, N. From chaos comes order: Genetics and genome 684 biology of arbuscular mycorrhizal fungi. Annu. Rev. Microbiol. 78, 147–168 (2024). 685 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 21 22. Sahraei, S. E. et al. Whole genome analyses based on single, field collected spores of the 686 arbuscular mycorrhizal fungus Funneliformis geosporum. Mycorrhiza 32, 361–371 (2022). 687 23. Chen, E. C. H. et al. High intraspecific genome diversity in the model arbuscular 688 mycorrhizal symbiont Rhizophagus irregularis. New Phytol. 220, 1161–1171 (2018). 689 24. Yildirir, G. et al. Long reads and Hi-C sequencing illuminate the two-compartment genome 690 of the model arbuscular mycorrhizal symbiont Rhizophagus irregularis. New Phytol. 233, 691 1097–1107 (2022). 692 25. Oliveira, J. I. N. et al. Analyses of transposable elements in arbuscular mycorrhizal fungi 693 support evolutionary parallels with filamentous plant pathogens. Genome Biol. Evol. 17, 694 evaf038 (2025). 695 26. Kloppholz, S., Kuhn, H. & Requena, N. A secreted fungal effector of Glomus intraradices 696 promotes symbiotic biotrophy. Curr. Biol. 21, 1204–1209 (2011). 697 27. Oliveira, J. I. N. & Corradi, N. Strain-specific evolution and host-specific regulation of 698 transposable elements in the model plant symbiont Rhizophagus irregularis. G3 (Bethesda) 699 14, jkae055 (2024). 700 28. Teulet, A. et al. A pathogen effector FOLD diversified in symbiotic fungi. New Phytol. 239, 701 1127–1139 (2023). 702 29. Lanfranco, L. & Bonfante, P. Lessons from arbuscular mycorrhizal fungal genomes. Curr. 703 Opin. Microbiol. 75, 102357 (2023). 704 30. Torres, D. E., Reckard, A. T., Klocko, A. D. & Seidl, M. F. Nuclear genome organization in 705 fungi: from gene folding to Rabl chromosomes. FEMS Microbiol. Rev. 47, fuad021 (2023). 706 31. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of 707 chromatin interactions. Nature 485, 376–380 (2012). 708 32. Dekker, J. & Heard, E. Structural and functional diversity of Topologically Associating 709 Domains. FEBS Lett. 589, 2877–2884 (2015). 710 33. Glavincheska, I. & Lorrain, C. Three-dimensional genome architecture connects chromatin 711 structure and function in a major wheat pathogen. bioRxiv 2025.05.13.653796 (2025) 712 doi:10.1101/2025.05.13.653796. 713 34. Kurbidaeva, A. et al. Topologically associating domains and the evolution of three-714 dimensional genome architecture in rice. Plant J. 122, e70139 (2025). 715 35. Winter, D. J. et al. Repeat elements organise 3D genome structure and mediate transcription 716 in the filamentous fungus Epichloë festucae. PLoS Genet. 14, e1007467 (2018). 717 36. Zhang, G., Li, Y. & Wei, G. Multi-omic analysis reveals dynamic changes of three-718 dimensional chromatin architecture during T cell differentiation. Commun. Biol. 6, 773 719 (2023). 720 37. Hansen, A. S., Cattoglio, C., Darzacq, X. & Tjian, R. Recent evidence that TADs and 721 chromatin loops are dynamic structures. Nucleus 9, 20–32 (2018). 722 38. Li, H., Playter, C., Das, P. & McCord, R. P. Chromosome compartmentalization: causes, 723 changes, consequences, and conundrums. Trends Cell Biol. 34, 707–727 (2024). 724 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 22 39. Venice, F. et al. At the nexus of three kingdoms: the genome of the mycorrhizal fungus 725 Gigaspora margarita provides insights into plant, endobacterial and fungal interactions. 726 Environ. Microbiol. 22, 122–141 (2020). 727 40. Bonfante, P. & Desirò, A. Who lives in a fungus? The diversity, origins and functions of 728 fungal endobacteria living in Mucoromycota. ISME J. 11, 1727–1735 (2017). 729 41. Turina, M. et al. The virome of the arbuscular mycorrhizal fungus Gigaspora margarita 730 reveals the first report of DNA fragments corresponding to replicating non-retroviral RNA 731 viruses in fungi. Environ. Microbiol. 20, 2012–2025 (2018). 732 42. Salvioli, A. et al. Symbiosis with an endobacterium increases the fitness of a mycorrhizal 733 fungus, raising its bioenergetic potential. ISME J. 10, 130–144 (2016). 734 43. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo 735 assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021). 736 44. Tang, N. et al. A survey of the gene repertoire of Gigaspora rosea unravels conserved 737 features among Glomeromycota for obligate biotrophy. Front. Microbiol. 7, 233 (2016). 738 45. Rosling, A. et al. Evolutionary history of arbuscular mycorrhizal fungi and genomic 739 signatures of obligate symbiosis. BMC Genomics 25, 529 (2024). 740 46. Ghignone, S. et al. The genome of the obligate endobacterium of an AM fungus reveals an 741 interphylum network of nutritional interactions. ISME J. 6, 136–145 (2012). 742 47. Malar C, M. et al. Early branching arbuscular mycorrhizal fungus Paraglomus occultum 743 carries a small and repeat-poor genome compared to relatives in the Glomeromycotina. 744 Microb. Genom. 8, 000810 (2022). 745 48. Malar C, M. et al. The genome of Geosiphon pyriformis reveals ancestral traits linked to the 746 emergence of the arbuscular mycorrhizal symbiosis. Curr. Biol. 31, 1578–1580 (2021). 747 49. Pelin, A. et al. The mitochondrial genome of the arbuscular mycorrhizal fungus Gigaspora 748 margarita reveals two unsuspected trans-splicing events of group I introns. New Phytol. 749 194, 836–845 (2012). 750 50. Wijayawardene, N. N. et al. Outline of fungi and fungus-like taxa – 2021. Mycosphere 13, 751 53–453 (2022). 752 51. Beaudet, D. et al. Ultra-low input transcriptomics reveal the spore functional content and 753 phylogenetic affiliations of poorly studied arbuscular mycorrhizal fungi. DNA Res. 0, 1–11 754 (2017). 755 52. Błaszkowski, J. et al. A new order, Entrophosporales, and three new Entrophospora species 756 in Glomeromycota. Front. Microbiol. 13, 962856 (2022). 757 53. Redecker, D. et al. An evidence-based consensus for the classification of arbuscular 758 mycorrhizal fungi (Glomeromycota). Mycorrhiza (2013) doi:10.1007/s00572-013-0486-y. 759 54. Corradi, N., Antunes, P. M. & Magurno, F. A call for reform: implementing genome-based 760 approaches for species classification in Glomeromycotina. New Phytol. (2025) 761 doi:10.1111/nph.70148. 762 55. Maeda, T. et al. Evidence of non-tandemly repeated rDNAs and their intragenomic 763 heterogeneity in Rhizophagus irregularis. Commun. Biol. 1, 87 (2018). 764 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 23 56. Stefani, F. et al. The pitfalls of rDNA-based AMF identification: a comparative analysis of 765 rDNA and protein-coding genes. New Phytol. 248, 1501–1515 (2025). 766 57. Boisvert, F.-M., van Koningsbruggen, S., Navascués, J. & Lamond, A. I. The 767 multifunctional nucleolus. Nat. Rev. Mol. Cell Biol. 8, 574–585 (2007). 768 58. Pederson, T. The nucleolus. Cold Spring Harb. Perspect. Biol. 3, a000638–a000638 (2011). 769 59. Schöfer, C. & Weipoltshammer, K. Nucleolus and chromatin. Histochem. Cell Biol. 150, 770 209–225 (2018). 771 60. Muszewska, A., Steczkiewicz, K., Stepniewska-Dziubinska, M. & Ginalski, K. 772 Transposable elements contribute to fungal genes and impact fungal lifestyle. Sci. Rep. 773 (2019) doi:10.1038/s41598-019-40965-0. 774 61. Dallaire, A. et al. Transcriptional activity and epigenetic regulation of transposable 775 elements in the symbiotic fungus Rhizophagus irregularis. Genome Res. 31, 2290–2302 776 (2021). 777 62. Li, D. et al. Comparative 3D genome architecture in vertebrates. BMC Biol. 20, 99 (2022). 778 63. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads 779 using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019). 780 64. Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial 781 genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, 782 e1005595 (2017). 783 65. Sorwar, E., Oliveira, J. I. N., Malar C, M., Krüger, M. & Corradi, N. Assembly and 784 comparative analyses of the Geosiphon pyriformis metagenome. Environ. Microbiol. 26, 785 e16681 (2024). 786 66. Chan, W. T., Garcillán-Barcia, M. P., Yeo, C. C. & Espinosa, M. Type II bacterial toxin-787 antitoxins: hypotheses, facts, and the newfound plethora of the PezAT system. FEMS 788 Microbiol. Rev. 47, fuad052 (2023). 789 67. Salvioli di Fossalunga, A., Lipuma, J., Venice, F., Dupont, L. & Bonfante, P. The 790 endobacterium of an arbuscular mycorrhizal fungus modulates the expression of its toxin-791 antitoxin systems during the life cycle of its host. ISME J. 11, 2394–2398 (2017). 792 68. Teulet, A. et al. A pathogen effector FOLD diversified in symbiotic fungi. bioRxiv (2022) 793 doi:10.1101/2022.12.16.520752. 794 69. Voß, S., Betz, R., Heidt, S., Corradi, N. & Requena, N. RiCRN1, a crinkler effector from 795 the arbuscular mycorrhizal fungus Rhizophagus irregularis, functions in arbuscule 796 development. Front. Microbiol. 9, 2068 (2018). 797 70. Bruns, T. D., Corradi, N., Redecker, D., Taylor, J. W. & Öpik, M. Glomeromycotina: What 798 is a species and why should we care? New Phytol. (2017) doi:10.1111/nph.14913. 799 71. Reinhardt, D., Roux, C., Corradi, N. & Di Pietro, A. Lineage-Specific Genes and Cryptic 800 Sex: Parallels and Differences between Arbuscular Mycorrhizal Fungi and Fungal 801 Pathogens. Trends Plant Sci. (2020) doi:10.1016/j.tplants.2020.09.006. 802 72. Dong, S., Raffaele, S. & Kamoun, S. The two-speed genomes of filamentous pathogens: 803 waltz with plants. Curr. Opin. Genet. Dev. 35, 57–65 (2015). 804 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 24 73. Manley, B. F. et al. A highly contiguous genome assembly reveals sources of genomic 805 novelty in the symbiotic fungus Rhizophagus irregularis. G3 (Bethesda) 13, jkad077 806 (2023). 807 74. Dixon, J. R., Gorkin, D. U. & Ren, B. Chromatin domains: The unit of chromosome 808 organization. Mol. Cell 62, 668–680 (2016). 809 75. Grummt, I. Life on a planet of its own: regulation of RNA polymerase I transcription in the 810 nucleolus. Genes Dev. 17, 1691–1702 (2003). 811 76. Mancio-Silva, L., Zhang, Q., Scheidig-Benatar, C. & Scherf, A. Clustering of dispersed 812 ribosomal DNA and its role in gene regulation and chromosome-end associations in malaria 813 parasites. Proc. Natl. Acad. Sci. U. S. A. 107, 15117–15122 (2010). 814 77. Rabuffo, C. et al. Inter-chromosomal transcription hubs shape the 3D genome architecture 815 of African trypanosomes. Nat. Commun. 15, 10716 (2024). 816 78. Jargeat, P. et al. Isolation, free-living capacities, and genome structure of “Candidatus 817 Glomeribacter gigasporarum,” the endocellular bacterium of the mycorrhizal fungus 818 Gigaspora margarita. J. Bacteriol. 186, 6876–6884 (2004). 819 79. Pawlowska, T. E. et al. Biology of fungi and their bacterial endosymbionts. Annu. Rev. 820 Phytopathol. 56, 289–309 (2018). 821 80. Uehling, J. K. et al. Bacterial endosymbionts of Mucoromycota fungi: Diversity and 822 function of their interactions. in The Mycota 177–205 (Springer International Publishing, 823 Cham, 2023). 824 81. Winsor, G. L. et al. The Burkholderia Genome Database: facilitating flexible queries and 825 comparative analyses. Bioinformatics 24, 2803–2804 (2008). 826 82. Strübing, U., Lucius, R., Hoerauf, A. & Pfarr, K. M. Mitochondrial genes for heme-827 dependent respiratory chain complexes are up-regulated after depletion of Wolbachia from 828 filarial nematodes. Int. J. Parasitol. 40, 1193–1202 (2010). 829 83. Venice, F. et al. Gigaspora margarita with and without its endobacterium shows adaptive 830 responses to oxidative stress. Mycorrhiza 27, 747–759 (2017). 831 84. Wang, J. et al. Crucial involvement of heme biosynthesis in vegetative growth, 832 development, stress response, and fungicide sensitivity of Fusarium graminearum. Int. J. 833 Mol. Sci. 25, 5268 (2024). 834 85. Horvath, P. & Barrangou, R. CRISPR/Cas, the immune system of bacteria and archaea. 835 Science 327, 167–170 (2010). 836 86. Rostøl, J. T. & Marraffini, L. (ph)ighting phages: How bacteria resist their parasites. Cell 837 Host Microbe 25, 184–194 (2019). 838 87. Burstein, D. et al. Major bacterial lineages are essentially devoid of CRISPR-Cas viral 839 defence systems. Nat. Commun. 7, 10613 (2016). 840 88. Siozios, S. et al. Genome dynamics across the evolutionary transition to endosymbiosis. 841 Curr. Biol. 34, 5659-5670.e7 (2024). 842 89. Song, S. & Wood, T. K. A primary physiological role of toxin/antitoxin systems is phage 843 inhibition. Front. Microbiol. 11, 1895 (2020). 844 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 25 90. Lumini, E. et al. Presymbiotic growth and sporal morphology are affected in the arbuscular 845 mycorrhizal fungus Gigaspora margarita cured of its endobacteria. Cell. Microbiol. 9, 846 1716–1729 (2007). 847 91. Spores of mycorrhizal Endogone species extracted from soil by wet sieving and decanting. 848 Transactions of the British Mycological Society 46, 235–244 (1963). 849 92. Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C 850 and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, 851 W177–W184 (2020). 852 93. Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. 853 Nat. Biotechnol. 40, 1332–1335 (2022). 854 94. Uliano-Silva, M. et al. MitoHiFi: a python pipeline for mitochondrial genome assembly 855 from PacBio high fidelity reads. BMC Bioinformatics 24, 288 (2023). 856 95. Schwengers, O. et al. Bakta: rapid and standardized annotation of bacterial genomes via 857 alignment-free sequence identification. Microb. Genom. 7, (2021). 858 96. Arima Genomics. Mapping_pipeline. (2019). 859 97. Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. 860 Bioinformatics 39, (2023). 861 98. Harry, E. PretextView (Paired REad TEXTure Viewer): A Desktop Application for Viewing 862 Pretext Contact Maps. (Github, 2020). 863 99. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C 864 experiments. Cell Syst. 3, 95–98 (2016). 865 100. Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data 866 on the idiograms. PeerJ Comput. Sci. 6, e251 (2020). 867 101. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable 868 element families. Proc. Natl. Acad. Sci. U. S. A. 117, 9451–9457 (2020). 869 102. Qian, J. et al. TEtrimmer: a tool to automate the manual curation of transposable elements. 870 Nat. Commun. 16, 8429 (2025). 871 103. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker. Published on the web at http://www. 872 repeatmasker. org (1996). 873 104. Venice, F. et al. Symbiotic responses of Lotus japonicus to two isogenic lines of a 874 mycorrhizal fungus differing in the presence/absence of an endobacterium. Plant J. 108, 875 1547–1564 (2021). 876 105. Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including 877 transposable elements in differential expression analysis of RNA-seq datasets. 878 Bioinformatics 31, 3593–3599 (2015). 879 106. Palmer, J. M. & Stajich, J. Funannotate v1.8.1: Eukaryotic Genome Annotation. (Zenodo, 880 2020). doi:10.5281/ZENODO.4054262. 881 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 26 107. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. 882 BUSCO: assessing genome assembly and annotation completeness with single-copy 883 orthologs. Bioinformatics 31, 3210–3212 (2015). 884 108. Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language 885 models. Nat. Biotechnol. 40, 1023–1025 (2022). 886 109. Sperschneider, J. & Dodds, P. N. EffectorP 3.0: Prediction of apoplastic and cytoplasmic 887 effectors in fungi and Oomycetes. Mol. Plant. Microbe. Interact. 35, 146–156 (2022). 888 110. Huang, L. et al. dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) 889 sequence and annotation. Nucleic Acids Res. 46, D516–D521 (2018). 890 111. Zheng, J. et al. dbCAN3: automated carbohydrate-active enzyme and substrate annotation. 891 Nucleic Acids Res. 51, W115–W121 (2023). 892 112. Orłowska, M., Steczkiewicz, K. & Muszewska, A. Utilization of cobalamin is ubiquitous in 893 early-branching fungal phyla. Genome Biol. Evol. 13, evab043 (2021). 894 113. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: 895 improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). 896 114. Miller, M. A., Pfeiffer, W. & Schwartz, T. The CIPRES science gateway: a community 897 resource for phylogenetic analyses. in Proceedings of the 2011 TeraGrid Conference: 898 Extreme Digital Discovery (ACM, New York, NY, USA, 2011). 899 doi:10.1145/2016741.2016785. 900 115. Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic 901 inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020). 902 116. Magurno, F. et al. Glomus mongioiense, a new species of arbuscular mycorrhizal fungi 903 from Italian Alps and the phylogeny-spoiling issue of ribosomal variants in the Glomus 904 genus. Agronomy (Basel) 14, 1350 (2024). 905 117. jasmine. Jasmine: Call Select Base Modifications in PacBio HiFi Reads. (Github, 2023). 906 118. pbmm2. Pbmm2: A Minimap2 Frontend for PacBio Native Data Formats. (Github, 2017). 907 119. pb-CpG-tools. Pb-CpG-Tools: Collection of Tools for the Analysis of CpG Data. (Github, 908 2022). 909 120. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic 910 features. Bioinformatics 26, 841–842 (2010). 911 121. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and 912 bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). 913 122. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. 914 Genome Biol. 16, 259 (2015). 915 123. Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible 916 platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187–W191 (2014). 917 124. Cumsille, A. et al. GenoVi, an open-source automated circular genome visualizer for 918 bacteria and archaea. PLoS Comput. Biol. 19, e1010998 (2023). 919 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint 27 125. Guan, J. et al. TADB 3.0: an updated database of bacterial toxin-antitoxin loci and 920 associated mobile genetic elements. Nucleic Acids Res. 52, D784–D790 (2024). 921 922 923 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 15, 2026. ; https://doi.org/10.64898/2026.01.14.699541doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-ND-4.0