Introduction
40
Mutations in DNA sequences accumulate over time and produce the variation that allows 41
populations to adapt to novel or changing environments. In this sense, mutation is the ultimate 42
source of evolutionary innovation. At the same time, mutations are often deleterious (Eyre-43
Walker and Keightley 2007), and somatic mutations can cause disease, setting up an interesting 44
dynamic where selection may favor alleles that lower mutation rates, even though mutational 45
input is required for adaptation and evolution (Zhang 2023). 46
The textbook view of mutation and adaptation is that mutations occur randomly with 47
respect to their environment-specific fitness consequences. This principle was established in 48
early investigations by Max Delbrück and Salvador Luria, who found that mutations in bacteria 49
that confer phage resistance were equally likely to occur regardless of whether bacteria were 50
grown in the presence of phage (Luria and Delbrück 1943). In other words, a phage-containing 51
environment creates selection for genetic variants responsible for resistance but does not 52
induce mutations to specifically occur at those loci. After subsequent decades of study, 53
mutations are still widely considered to be random in this respect even though both the type 54
and location of mutations are now known to have non-uniform distributions across genomes. 55
For example, transition substitutions are far more common than transversions in most 56
organisms across the tree of life. This bias in the mutation spectrum arises through the simple 57
properties of DNA bases and chemical damage, but it has important consequences for the 58
relationship between fitness effects and the probability of mutations. Due to the structure of 59
the genetic code, transversions are more likely than transitions to be nonsynonymous (i.e. result 60
in amino acid changes) and, therefore, have harmful fitness effects. As such, the average fitness 61
effect of mutations is lower than it would be if all types of nucleotide substitutions occurred 62
with equal probability (Eyre-Walker and Keightley 2007). 63
Mutation rates can also vary depending on genomic location. For example, mutational 64
gradients arise in mammalian mitochondrial genomes because regions near replication origins 65
are single-stranded (and more vulnerable to mutation causing damage) for longer periods 66
during DNA replication (Sanchez-Contreras et al. 2021). Variation in intragenomic mutation rates 67
can also occur at smaller scales, such is in Arabidopsis thaliana where mutations are enriched in 68
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
intergenic sequences compared to genes (Ossowski et al. 2010; Belfield et al. 2018; Weng et al. 69
2019) and in introns compared to exons (Monroe et al. 2022, 2023a; Quiroz et al. 2023; 70
Staunton et al. 2023). Because mutations in coding sequences are more likely to have functional 71
consequences, this biased distribution of mutations should again result in lower average fitness 72
effects than if mutations were uniformly distributed across the genome. 73
The probability of a mutation, therefore, cannot be considered independent of the 74
fitness consequences of that mutation. However, to challenge the textbook view that mutations 75
occur randomly with respect to environment-specific fitness effects, gene-specific mutational 76
biases would have to systematically vary with changes in the environment. One potential 77
mechanism that could create such a relationship between environment and mutation bias is the 78
coupling of DNA repair surveillance with transcription machinery, which results in lower 79
mutation rates for highly expressed genes (Supek and Lehner 2017; Oztas et al. 2018; Huang et 80
al. 2018; Huang and Li 2018; Gonzalez-Perez et al. 2019; Monroe et al. 2022). Therefore, 81
environmental changes that increase a gene’s expression level should lower its mutation rate. In 82
addition, highly expressed genes are known to experience stronger selection (Zhang and Yang 83
2015), so genes may be most protected from mutation in environments where they are most 84
functionally important. Alternatively, transcription may be mutagenic, as increased DNA damage 85
associated with exposure of single-stranded DNA to mutagens can potentially overpower the 86
increased protection of actively transcribed genes (Kim et al. 2007; Jinks-Robertson and 87
Bhagwat 2014; Seplyarskiy et al. 2023). 88
A challenge associated with addressing how local mutation rates vary with environment 89
is the difficulty of measuring mutations in experimental settings. Historical estimates of 90
mutation relied on comparisons of synonymous substitutions between populations or species. 91
Because these substitutions do not result in a change in amino acid, they are expected to 92
experience minimal selection and thus approximate mutational input, though in reality 93
synonymous sites do experience selection due to codon usage bias (Grantham et al. 1980; 94
Hershberg and Petrov 2008) and other mechanisms (Bailey et al. 2021). It is inherently difficult 95
to measure mutation rates more directly in large multicellular organisms because their long 96
generations require many individuals and/or large amounts of time for sufficient mutations to 97
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
occur, making methods such as mutation accumulation lines and parent-offspring trio 98
sequencing (Lynch et al. 2016; Tatsumoto et al. 2017) expensive and time-consuming. 99
An alternative and potentially complementary approach to mutation accumulation and 100
trio sequencing studies is to detect the mutations that accumulate in an organism’s somatic 101
tissues (Gundry and Vijg 2012; Moore et al. 2021; Monroe et al. 2022; Quiroz et al. 2023; 102
Schmitt et al. 2023; Staunton et al. 2023; Satake et al. 2023; Goel et al. 2024). This approach 103
benefits from the fact that many more cell lineages can be tracked than just the germline. 104
Inclusion of somatic (vegetative) mutations in recent Arabidopsis studies led to the 105
identification of thousands of mutations, which increased power to test for relationships 106
between local mutation rates and various sequence features, such as GC content, DNA 107
methylation, histone modifications and expression level (Monroe et al. 2022). However, this 108
approach appears to have been inaccurate because low frequency somatic variants can be 109
difficult to distinguish from sequencing errors, and reanalysis of the somatic mutation calls 110
showed that many of the putative mutations arose from technical artefacts (Liu and Zhang 111
2022; Monroe et al. 2023a; Wang et al. 2023; Monroe et al. 2023b). Therefore, the actual 112
frequency of somatic mutations in vegetative plant tissue remains an open question. 113
Measurements of low frequency somatic mutations can be obtained using a high-fidelity 114
sequencing technology to distinguish mutational signal from noise (Sloan et al. 2018). For 115
example, Duplex Sequencing is an Illumina-based method in which unique molecular identifiers 116
(UMIs) are included in adaptors and attached to both ends of DNA fragments before library 117
amplification (Schmitt et al. 2012; Kennedy et al. 2014). After sequencing, the UMIs are used to 118
cluster families of reads that originated from each strand of a given DNA fragment so that a 119
double-stranded consensus sequence can be created that is virtually error free (< 5-8 errors 120
per base pair; Kennedy et al. 2014). 121
Our goal in this study was to test if the pattern of local mutation rate variation across a 122
genome depends on environmental effects on gene expression levels. We also wanted to 123
determine whether low-frequency somatic mutations in plant tissues could provide a robust 124
signal for addressing this type of question. Therefore, we perturbed gene expression by growing 125
Arabidopsis under different temperatures. We identified differentially expressed (DE) genes with 126
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
RNA-seq, which we then targeted for low-frequency somatic mutation detection using Duplex 127
Sequencing coupled with hybrid capture. We included mutant lines msh2 and ung, which 128
respectively lack mismatch repair (MMR) and base excision repair (BER) capabilities, in order to 129
understand how repair mechanisms may drive biased mutation accumulation (Cordoba-Canero 130
et al. 2010; Belfield et al. 2018). We also included hsp70-16 mutant lines, which are deficient for 131
a key heat shock protein, as a means to endogenously manipulate gene expression and 132
potentially interact with our temperature treatment (Ran et al. 2020). As expected, we found 133
significant increases in variant frequencies in the MMR deficient lines. In wild type (WT) lines 134
and other mutant lines, measured mutation frequencies were too low to quantify relationships 135
between mutation rates and environment-specific gene expression levels. Therefore, our results 136
support the conclusion that earlier estimates of somatic variant frequencies were inflated 137
(Monroe et al. 2023a; Wang et al. 2023) and indicate that much larger datasets will be needed 138
to test for environment-specific changes in mutation biases. 139
140
Results
141
To test if environment specific changes in gene expression impact mutation, we performed 142
mutation detection on a targeted set of Arabidopsis genes that were DE in plants grown at 20°C 143
vs. 30°C. We first generated and analyzed RNA-seq data to identify genes in six categories: 1) 144
increased expression at 30°C compared to 20°C in WT plants, 2) increased expression at 20°C 145
compared to 30°C in WT plants, 3) constitutively high expression in WT plants at both 20°C and 146
30°C, 4) constitutively low expression in WT plants at both 20°C and 30°C, 5) genes that had 147
increased expression at 30°C vs. 20°C in WT plants (like category 1) and also had an interaction 148
between WT and hsp70-16, and 6) genes that had increased expression at 30°C vs. 20°C in WT 149
plants (like category 2) and also had an interaction between WT and hsp70-16 (Table S1). The 150
sequences of the DE genes were used to create a custom probe-set for hybrid capture of Duplex 151
Sequencing libraries. 152
Duplex Sequencing coverage of the genes and 250 bp of flanking sequence in the probe-153
set ranged from 74.7 to 109.4 (Figure S1), and the average probe-set coverage across all 154
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
libraries was 193.1-fold higher than the genome background. In total, we obtained 1.89 Gb of 155
Duplex Sequencing coverage of our region of interest across the 24 libraries (Table S2) 156
We then looked for the presence of single nucleotide variants (SNVs) and short indels 157
within the 339 genes covered in the probe-set. Mutant alleles already present in the parents of 158
the assayed sets of full-sib plants have the potential to bias estimates of de novo mutation 159
frequencies but should be readily identifiable. For a homozygous parent, they would be present 160
in all Duplex Sequencing reads of all the replicates of a given genotype. For a heterozygous 161
parent, they would segregate in a 1:2:1 Mendelian ratio and account for roughly 50% of the 162
reads for all replicates of a given genotype (as each replicate represents a pool of five sibling 163
plants). We identified just three apparent fixed SNVs (Table S3), which were removed for 164
downstream analyses. In contrast, we identified 41 fixed indels, over half of which were in the 165
msh2 background (Table S4). One gene (AT5G39190) had five sites that appeared to be 166
segregating SNVs in all 24 replicates. We suspected this might be caused by a cryptic gene 167
duplication which was not captured in the TAIR 10.2 reference genome (Jaegle et al. 2023). 168
Indeed, when we realigned the reads to the improved Col-CC genome (Reiser et al. 2023), the 169
mutation calls in AT5G39190 were absent. As such, reads mapping to AT5G39190 were 170
disregarded in downstream analyses. The rest of the SNVs we identified were unique to each 171
replicate and all were present at a frequency of no more than 17.64% (the average variant 172
frequency across all mutations was 2.27%), suggesting that these are low frequency somatic 173
variants that arose during the experiment and were present in a subset of the sampled 174
vegetative tissue. 175
Among the six WT biological replicates, we detected a single indel and just six SNVs, one 176
in each replicate (Figure 1). As such, there was very limited statistical power to test for the 177
effects of temperature or expression level on mutation frequency in WT plants. Similarly, we 178
detected few or no SNVs and indels in the hsp70-16 and the ung mutant lines (Figure 1; File S1, 179
S2). In contrast, variant frequencies were significantly elevated in the msh2 mutant lines 180
(compared to WT plants), where we detected 271 indels and 180 SNVs (Figure 1; two-way 181
ANOVA with Tukey’s test, p < 0.0001). The mutations in the msh2 lines were distributed 182
relatively evenly across the temperature treatments, as we found that temperature did not 183
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
influence either SNV or indel frequency (Figure 1; two-way ANOVA, p = 0.99). In the msh2 lines, 184
deletions were 8.5-fold more common than insertions (Table S5; two-way ANOVA, p < 0.0001). 185
We observed significant differences among SNV classes in msh2 SNV spectrum (Figure 2; two-186
way ANOVA, p <0.0001), which was dominated by CG→TA transitions. The next most common 187
types of substitutions were AT→GC transitions and CG→AT transversions. We compared the 188
msh2 mutation frequencies in the constitutively lowly expressed (group 3 in Table S1) vs 189
constitutively highly expressed (group 4 in Table S1) genes and found no significant differences 190
(paired t-test; Table S6), though we did observe a trend towards higher indel frequencies in 191
constitutively highly expressed genes at 30°C. We did not analyze the SNV spectra or indel bias 192
in WT , ung, or hsp70-16 lines because the small number of sampled mutations precluded a 193
statistically meaningful comparison. 194
195
Materials and methods
268
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
All plants were grown in environmentally controlled growth chambers (75% humidity) under a 269
long-day photoperiod (16 hrs light, 8 hrs dark) with irradiance of 185 µmol m−2 sec−1 at constant 270
temperatures (either 20°C or 30°C, as specified below). Prior to planting, seeds were stratified 271
for 5 days in sterile ddH20. Arabidopsis thaliana ecotype Col-0 was used as the WT line. Existing 272
mutant lines were obtained from the Arabidopsis Biological Resource Center (Table S7) and 273
seedlings were screened with allele-specific PCR markers to identify plants that were 274
homozygous for the mutant alleles used in this study (msh2, ung, hsp70-16; Table S8). 275
Sibling plants (roughly 35 for each genotype and each temperature treatment) were 276
planted in 2.5-inch pots. Both temperature treatments were initiated in chambers (Convarion 277
models PGR15 (20°C) and PGCFLEX (30°C)) at 20°C because elevated ambient temperatures 278
(30°C) can inhibit seed germination (Silva-Correia et al. 2014). After 5 days, the temperature was 279
turned up for the 30°C treatment and kept at 20°C for the other treatment. When the plants 280
had reached stage 6.5 of development (where ~50 % of flowers have opened) (Boyes et al. 281
2001), we performed DNA and RNA extractions on unopened floral buds from laterally 282
branching florets. The 30°C plants reached developmental stage 6.5 at 31 days while the 20°C 283
plants reached developmental stage 6.5 at 41 days, consistent with faster plant development at 284
elevated ambient temperatures (Silva-Correia et al. 2014). 285
For the RNA extractions, plant material was collected from the unopened floral buds of 3 286
laterally branching florets from 3 WT and 3 hsp70-16 plants in each temperature treatment. The 287
harvested tissues were immediately placed into liquid nitrogen and homogenized for 10 288
seconds at 30 beats/sec with the Qiagen TissueLyser, before being processed with the Qiagen 289
RNeasy Plant Mini Kit, according to manufacturer’s instructions. The RNA samples were then 290
sent to Novogene and RNA-Seq libraries were made using the NEBNext Ultra II Directional RNA 291
Library Prep Kit with the NEBNext Poly(A) mRNA Magnetic Isolation Module. The RNA-Seq 292
libraries were sequenced on a NovaSeq 6000 using the PE150 strategy to generate 29 to 54 293
million read pairs per library (see Table S9). 294
Tissue was harvested for DNA sequencing and mutation detection at the same time as 295
the tissue for RNA extraction, from siblings of the plants used for RNA extraction. For each 296
replicate in the DNA extractions, plant material was pooled from 5 siblings from the unopened 297
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
floral buds of 3 laterally branching florets from 5 plants per each replicate, with 3 replicates per 298
genotype (WT, hsp70-16, msh2, ung) per temperature treatment. The floret tissue was 299
homogenized for 10 seconds at 30 beats/sec with the Qiagen TissueLyser, before being 300
processed with the DNeasy Plant Mini Kit from Qiagen. 301
The RNA-seq reads were analyzed to detect DE genes at 20°C vs. 30°C. First, the adaptors 302
were removed with Cutadapt version 4.0 with Python 3.9.16 (Martin 2011). Then the reads 303
were mapped to the TAIR10.2 reference genome with HISAT2 (version 2.2.1; (Kim et al. 2019). 304
Read counts were generated with HTSeq-count version 2.0.2 (Anders et al. 2014), and DESeq2 305
models (Love et al. 2014) were implemented to identify genes that were differentially expressed 306
or constitutively highly or lowly expressed. 307
We created a custom probe-set to enrich the sequences of DE genes via hybrid capture 308
so that we could perform mutation detection with Duplex Sequencing. We sent the sequences 309
of 400 DE genes (plus 250 nt of flanking sequence on the end of each gene) to the probe design 310
team at Arbor Bioscience, which flagged 61 of the genes as unsuitable for hybrid capture 311
because they were > 25 % soft-masked for repeats in a BLAST search against the Arbor 312
Biosciences eudicot database. The remaining 339 genes (listed in supplementary file 2) and 313
flanking sequences spanned a total length of 855,123 nt. Sets of 80-nt probes were 2 tiled 314
across the target sequence at approximately every 40 nt. The probes were biotinylated so that 315
probe-bound library molecules can be captured with streptavidin-coated magnetic beads. 316
We created Duplex Sequencing libraries from the 24 DNA samples (3 replicates 4 317
genotypes 2 temperature treatments), following our previously described library preparation 318
protocols (Wu et al. 2020; Waneka et al. 2021), except that in this case the amount of input 319
DNA was increased to 500 ng because the target sequence comprises a small fraction (< 1%) of 320
the total-cellular DNA sample. Once DNA samples had been fragmented via ultrasonication, 321
end-repaired, A-tailed, adaptor-ligated, and treated with a cocktail of damage removal enzymes 322
(Wu et al. 2020), we amplified 0.73 ng of DNA (per reaction) for 13 PCR cycles with New England 323
Biolabs Q5 High-Fidelity Polymerase and dual-indexed primers. We then created 3 pools by 324
combining 350 ng of each amplified library as the Arbor Biosciences hybrid-capture reactions 325
have enough capacity for 8 libraries in each pool. We performed the overnight hybrid-capture 326
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
reaction at 65°C, according to the manufacturer’s instructions (Arbor Biosciences MyBaits Kit 327
Manual v. 5.02). We assessed enrichment efficiency and library concentrations through qPCR (as 328
previously described; (Waneka et al. 2021)) before amplifying the enriched pools for an 329
additional 9 cycles to obtain sufficient library amounts for sequencing. 330
Duplex Sequencing libraries were sequenced with PE150 reads on an Illumina NovaSeq 331
6000 S4 Lane (Novogene) to generate 87 to 123 million read pairs per library (Table S10). 332
Processing of the Duplex Sequencing reads to was performed with our previously described 333
pipeline (Wu et al. 2020), which trimmed adaptor sequences, created duplex consensus 334
sequences based on the presence of shared barcodes, mapped the consensus sequences to the 335
entire TAIR10.2 reference genome. Each duplex consensus sequences is composed of at least 6 336
Illumina reads (at least 3 originating from each strand of a DNA fragment). Alignment files were 337
then parsed to identify duplex consensus sequences that contain SNVs and short indels. Since 338
Duplex Sequencing is highly accurate (< 5-8 errors per base pair; Kennedy et al. 2014) we 339
require just a single duplex consensus to support a putative mutation. Comparisons of coverage 340
in the probe-set vs. outside the probe-set were performed with Samtools version 1.6 (Li et al. 341
2009). For variant frequency calculations, we excluded the first or last 10 bps of a read because 342
we have previously identified elevated mutation frequencies at read ends (Wu et al. 2020). 343
344
345
DATA AVAILABILITY 346
The raw reads are available via the NCBI Sequence Read Archive under accessions 347
SRR27564102-SRR27564113 (RNA-seq libraries) and SRR27693810-SRR27693833 (Duplex 348
Sequencing libraries). Duplex Sequencing datasets were processed with a previously published 349
pipeline (https://github.com/dbsloan/duplexseq) (Wu et al. 2020). 350
351
References
409
Anders, S., P . T. Pyl, and W. Huber, 2014 HTSeq—a Python framework to work with high-410
throughput sequencing data. Bioinformatics 31: 166–169. 411
Bailey, S. F., L. A. Alonso Morales, and R. Kassen, 2021 Effects of Synonymous Mutations beyond 412
Codon Bias: The Evidence for Adaptive Synonymous Substitutions from Microbial 413
Evolution Experiments. Genome Biol. Evol. 13:. 414
Belfield, E. J., C. Brown, Z. J. Ding, L. Chapman, M. Luo et al., 2021 Thermal stress accelerates 415
Arabidopsis thaliana mutation rate. Genome Res. 31: 40–50. 416
Belfield, E. J., Z. J. Ding, F. J. C. Jamieson, A. M. Visscher, S. J. Zheng et al., 2018 DNA mismatch 417
repair preferentially protects genes from mutation. Genome Res. 28: 66–74. 418
Boyes, D. C., A. M. Zayed, R. Ascenzi, A. J. McCaskill, N. E. Hoffman et al., 2001 Growth stage-419
based phenotypic analysis of Arabidopsis: a model for high throughput functional 420
genomics in plants. Plant Cell 13: 1499–1510. 421
Cordoba-Canero, D., E. Dubois, R. R. Ariza, M.-P . Doutriaux, and T. Roldán-Arjona, 2010 422
Arabidopsis uracil DNA glycosylase (UNG) is required for base excision repair of uracil 423
and increases plant sensitivity to 5-fluorouracil. J. Biol. Chem. 285: 7475–7483. 424
Eyre-Walker, A., and P . D. Keightley, 2007 The distribution of fitness effects of new mutations. 425
Nat. Rev. Genet. 8: 610–618. 426
Goel, M., J. A. Campoy, K. Krause, L. C. Baus, A. Sahu et al., 2024 The majority of somatic 427
mutations in fruit trees are layer-specific. bioRxiv 2024.01.04.573414. 428
Gonzalez-Perez, A., R. Sabarinathan, and N. Lopez-Bigas, 2019 Local Determinants of the 429
Mutational Landscape of the Human Genome. Cell 177: 101–114. 430
Grantham, R., C. Gautier, M. Gouy, R. Mercier, and A. Pavé, 1980 Codon catalog usage and the 431
genome hypothesis. Nucleic Acids Res. 8: r49–r62. 432
Gundry, M., and J. Vijg, 2012 Direct mutation analysis by high-throughput sequencing: from 433
germline to low-abundant, somatic variants. Mutat. Res. 729: 1–15. 434
Hershberg, R., and D. A. Petrov, 2008 Selection on Codon Bias. 435
Huang, Y ., L. Gu, and G.-M. Li, 2018 H3K36me3-mediated mismatch repair preferentially 436
protects actively transcribed genes from mutation. J. Biol. Chem. 293: 7811–7823. 437
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
Huang, Y ., and G.-M. Li, 2018 DNA mismatch repair preferentially safeguards actively transcribed 438
genes. DNA Repair 71: 82–86. 439
Jaegle, B., R. Pisupati, L. M. Soto-Jiménez, R. Burns, F. A. Rabanal et al., 2023 Extensive sequence 440
duplication in Arabidopsis revealed by pseudo-heterozygosity. Genome Biol. 24: 44. 441
Jiang, C., A. Mithani, E. J. Belfield, R. Mott, L. D. Hurst et al., 2014 Environmentally responsive 442
genome-wide accumulation of de novo Arabidopsis thaliana mutations and 443
epimutations. Genome Res. 24: 1821–1829. 444
Jinks-Robertson, S., and A. S. Bhagwat, 2014 Transcription-associated mutagenesis. Annu. Rev. 445
Genet. 48: 341–359. 446
Kennedy, S. R., M. W. Schmitt, E. J. Fox, B. F. Kohrn, J. J. Salk et al., 2014 Detecting ultralow-447
frequency mutations by Duplex Sequencing. Nat. Protoc. 9: 2586–2606. 448
Kim, N., A. L. Abdulovic, R. Gealy, M. J. Lippert, and S. Jinks-Robertson, 2007 Transcription-449
associated mutagenesis in yeast is directly proportional to the level of gene expression 450
and influenced by the direction of DNA replication. DNA Repair 6: 1285–1296. 451
Kim, D., J. M. Paggi, C. Park, C. Bennett, and S. L. Salzberg, 2019 Graph-based genome alignment 452
and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37: 907–915. 453
Klepikova, A. V., A. S. Kasianov, E. S. Gerasimov, M. D. Logacheva, and A. A. Penin, 2016 A high 454
resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-455
seq profiling. Plant J. 88: 1058–1070. 456
Lenth, R., H. Singmann, J. Love, P . Buerkner, and M. Herve, 2021 Emmeans: Estimated marginal 457
means, aka least-squares means. R Package Version 1 (2018). Preprint at. 458
Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., 2009 The Sequence Alignment/Map 459
format and SAMtools. Bioinformatics 25: 2078–2079. 460
Liu, H., and J. Zhang, 2022 Is the Mutation Rate Lower in Genomic Regions of Stronger Selective 461
Constraints? Mol. Biol. Evol. 39:. 462
Love, M. I., W. Huber, and S. Anders, 2014 Moderated estimation of fold change and dispersion 463
for RNA-seq data with DESeq2. Genome Biol. 15: 550. 464
Lu, Z., J. Cui, L. Wang, N. Teng, S. Zhang et al., 2021 Genome-wide DNA mutations in Arabidopsis 465
plants after multigenerational exposure to high temperatures. Genome Biol. 22: 160. 466
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
Luria, S. E., and M. Delbrück, 1943 Mutations of Bacteria from Virus Sensitivity to Virus 467
Resistance. Genetics 28: 491–511. 468
Lynch, M., M. S. Ackerman, J.-F. Gout, H. Long, W. Sung et al., 2016 Genetic drift, selection and 469
the evolution of the mutation rate. Nat. Rev. Genet. 17: 704–714. 470
Martin, M., 2011 Cutadapt removes adapter sequences from high-throughput sequencing 471
reads. EMBnet.journal 17: 10–12. 472
Monroe, J. G., K. D. Murray, W. Xian, T. Srikant, P . Carbonell-Bejerano et al., 2023a Reply to: Re-473
evaluating evidence for adaptive mutation rate variation. Nature 619: E57–E60. 474
Monroe, J. G., T. Srikant, P . Carbonell-Bejerano, C. Becker, M. Lensink et al., 2023b Author 475
Correction: Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 620: 476
E13. 477
Monroe, J. G., T. Srikant, P . Carbonell-Bejerano, C. Becker, M. Lensink et al., 2022 Mutation bias 478
reflects natural selection in Arabidopsis thaliana. Nature 602: 101–105. 479
Moore, L., A. Cagan, T. H. H. Coorens, M. D. C. Neville, R. Sanghvi et al., 2021 The mutational 480
landscape of human somatic and germline cells. Nature 597: 381–386. 481
Ossowski, S., K. Schneeberger, J. I. Lucas-Lledó, N. Warthmann, R. M. Clark et al., 2010 The rate 482
and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327: 483
92–94. 484
Oztas, O., C. P . Selby, A. Sancar, and O. Adebali, 2018 Genome-wide excision repair in 485
Arabidopsis is coupled to transcription and reflects circadian gene expression patterns. 486
Nat. Commun. 9: 1503. 487
Quiroz, D., M. Lensink, D. J. Kliebenstein, and J. G. Monroe, 2023 Causes of Mutation Rate 488
Variability in Plant Genomes. Annu. Rev. Plant Biol. 74: 751–775. 489
Ran, X., X. Chen, L. Shi, M. Ashraf, F. Yan et al., 2020 Transcriptomic insights into the roles of 490
HSP70-16 in sepal’s responses to developmental and mild heat stress signals. Environ. 491
Exp. Bot. 179: 104225. 492
Reiser, L., E. Bakker, S. Subramaniam, X. Chen, and S. Sawant, 2023 The Arabidopsis Information 493
Resource in 2024. bioRxiv. 494
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
Richards, C. L., U. Rosas, J. Banta, N. Bhambhra, and M. D. Purugganan, 2012 Genome-wide 495
patterns of Arabidopsis gene expression in nature. PLoS Genet. 8: e1002662. 496
Sanchez-Contreras, M., M. T. Sweetwyne, B. F. Kohrn, K. A. Tsantilas, M. J. Hipp et al., 2021 A 497
replication-linked mutational gradient drives somatic mutation accumulation and 498
influences germline polymorphisms and genome composition in mitochondrial DNA. 499
Nucleic Acids Res. 49: 11103–11118. 500
Satake, A., R. Imai, T. Fujino, S. Tomimoto, K. Ohta et al., 2023 Somatic mutation rates scale with 501
time not growth rate in long-lived tropical trees. eLife. 502
Schmitt, S., P . Heuret, V . Troispoux, M. Beraud, J. Cazal et al., 2023 Plant mutations: slaying 503
beautiful hypotheses by surprising evidence. bioRxiv 2023.06.05.543657. 504
Schmitt, M. W., S. R. Kennedy, J. J. Salk, E. J. Fox, J. B. Hiatt et al., 2012 Detection of ultra-rare 505
mutations by next-generation sequencing. Proc. Natl. Acad. Sci. U. S. A. 109: 14508–506
14513. 507
Seplyarskiy, V ., E. M. Koch, D. J. Lee, J. S. Lichtman, H. H. Luan et al., 2023 A mutation rate model 508
at the basepair resolution identifies the mutagenic effect of polymerase III transcription. 509
Nat. Genet. 55: 2235–2242. 510
Silva-Correia, J., S. Freitas, R. M. Tavares, T. Lino-Neto, and H. Azevedo, 2014 Phenotypic analysis 511
of the Arabidopsis heat stress response during germination and early seedling 512
development. Plant Methods 10: 7. 513
Sloan, D. B., A. K. Broz, J. Sharbrough, and Z. Wu, 2018 Detecting Rare Mutations and DNA 514
Damage with Sequencing-Based Methods. Trends Biotechnol. 36: 729–740. 515
Staunton, P . M., A. J. Peters, and C. Seoighe, 2023 Somatic mutations inferred from RNA-seq 516
data highlight the contribution of replication timing to mutation rate variation in a model 517
plant. Genetics 225:. 518
Supek, F., and B. Lehner, 2017 Clustered Mutation Signatures Reveal that Error-Prone DNA 519
Repair Targets Mutations to Active Genes. Cell 170: 534-547.e23. 520
Tatsumoto, S., Y . Go, K. Fukuta, H. Noguchi, T. Hayakawa et al., 2017 Direct estimation of de 521
novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome 522
sequencing. Sci. Rep. 7: 13561. 523
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint
Waneka, G., J. M. Svendsen, J. C. Havird, and D. B. Sloan, 2021 Mitochondrial mutations in 524
Caenorhabditis elegans show signatures of oxidative damage and an AT-bias. Genetics 525
219:. 526
Wang, L., A. T. Ho, L. D. Hurst, and S. Yang, 2023 Re-evaluating evidence for adaptive mutation 527
rate variation. Nature 619: E52–E56. 528
Weng, M.-L., C. Becker, J. Hildebrandt, M. Neumann, M. T. Rutter et al., 2019 Fine-Grained 529
Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana. 530
Genetics 211: 703–714. 531
Wu, Z., G. Waneka, A. K. Broz, C. R. King, and D. B. Sloan, 2020 MSH1 is required for 532
maintenance of the low mutation rates in plant mitochondrial and plastid genomes. 533
Proc. Natl. Acad. Sci. U. S. A. 117: 16448–16455. 534
Zhang, G., 2023 The mutation rate as an evolving trait. Nat. Rev. Genet. 24: 3. 535
Zhang, J., and J.-R. Yang, 2015 Determinants of the rate of protein sequence evolution. Nat. Rev. 536
Genet. 16: 409–420. 537
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint