Investigating low frequency somatic mutations inArabidopsiswith Duplex Sequencing

doi:10.1101/2024.01.31.578196

Investigating low frequency somatic mutations inArabidopsiswith Duplex Sequencing

2024 · doi:10.1101/2024.01.31.578196

preprint OA: closed CC-BY-NC-ND-4.0

📄 Open PDF Full text JSON View at publisher

Full text 60,050 characters · extracted from oa-pdf · 8 sections · click to expand

Abstract

7 Mutations are the source of novel genetic diversity but can also lead to disease and 8 maladaptation. The conventional view is that mutations occur randomly with respect to their 9 environment-specific fitness consequences. However, intragenomic mutation rates can vary 10 dramatically due to transcription coupled repair and based on local epigenomic modifications, 11 which are non-uniformly distributed across genomes. One sequence feature associated with 12 decreased mutation is higher expression level, which can vary depending on environmental 13 cues. To understand whether the association between expression level and mutation rate 14 creates a systematic relationship with environment-specific fitness effects, we perturbed 15 expression through a heat treatment in Arabidopsis thaliana. We quantified gene expression to 16 identify differentially expressed genes, which we then targeted for mutation detection using 17 Duplex Sequencing. This approach provided a highly accurate measurement of the frequency of 18 rare somatic mutations in vegetative plant tissues, which has been a recent source of 19 uncertainty in plant mutation research. We included mutant lines lacking mismatch repair 20 (MMR) and base excision repair (BER) capabilities to understand how repair mechanisms may 21 drive biased mutation accumulation. We found wild type (WT) and BER mutant mutation 22 frequencies to be very low (mean variant frequency 1.810-8 and 2.610-8, respectively), while 23 MMR mutant frequencies were significantly elevated (1.1310-6) These results show that 24 somatic variant frequencies are extremely low in WT plants, indicating that larger datasets will 25 be needed to address the fundamental evolutionary question as to whether environmental 26 change leads to gene-specific changes in mutation rate. 27 28 SIGNIFICANCE 29 Accurately measuring mutations in plants grown under different environments is important for 30 understanding the determinants of mutation rate variation across a genome. Given the low rate 31 of de novo mutation in plant germlines, such measurements can take years to obtain, hindering 32 tests of mutation accumulation under varying environmental conditions. We implemented 33 highly accurate Duplex Sequencing to study somatic mutations in plants grown in two different 34 temperatures. In contrast to plants with deficiencies in DNA mismatch repair machinery, we 35 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint found extremely low mutation frequencies in wild type plants. These findings help resolve 36 recent uncertainties about the somatic mutation rate in plant tissues and indicate that larger 37 datasets will be necessary to understand the interaction between mutation and environment in 38 plant genomes. 39 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint

Introduction

40 Mutations in DNA sequences accumulate over time and produce the variation that allows 41 populations to adapt to novel or changing environments. In this sense, mutation is the ultimate 42 source of evolutionary innovation. At the same time, mutations are often deleterious (Eyre-43 Walker and Keightley 2007), and somatic mutations can cause disease, setting up an interesting 44 dynamic where selection may favor alleles that lower mutation rates, even though mutational 45 input is required for adaptation and evolution (Zhang 2023). 46 The textbook view of mutation and adaptation is that mutations occur randomly with 47 respect to their environment-specific fitness consequences. This principle was established in 48 early investigations by Max Delbrück and Salvador Luria, who found that mutations in bacteria 49 that confer phage resistance were equally likely to occur regardless of whether bacteria were 50 grown in the presence of phage (Luria and Delbrück 1943). In other words, a phage-containing 51 environment creates selection for genetic variants responsible for resistance but does not 52 induce mutations to specifically occur at those loci. After subsequent decades of study, 53 mutations are still widely considered to be random in this respect even though both the type 54 and location of mutations are now known to have non-uniform distributions across genomes. 55 For example, transition substitutions are far more common than transversions in most 56 organisms across the tree of life. This bias in the mutation spectrum arises through the simple 57 properties of DNA bases and chemical damage, but it has important consequences for the 58 relationship between fitness effects and the probability of mutations. Due to the structure of 59 the genetic code, transversions are more likely than transitions to be nonsynonymous (i.e. result 60 in amino acid changes) and, therefore, have harmful fitness effects. As such, the average fitness 61 effect of mutations is lower than it would be if all types of nucleotide substitutions occurred 62 with equal probability (Eyre-Walker and Keightley 2007). 63 Mutation rates can also vary depending on genomic location. For example, mutational 64 gradients arise in mammalian mitochondrial genomes because regions near replication origins 65 are single-stranded (and more vulnerable to mutation causing damage) for longer periods 66 during DNA replication (Sanchez-Contreras et al. 2021). Variation in intragenomic mutation rates 67 can also occur at smaller scales, such is in Arabidopsis thaliana where mutations are enriched in 68 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint intergenic sequences compared to genes (Ossowski et al. 2010; Belfield et al. 2018; Weng et al. 69 2019) and in introns compared to exons (Monroe et al. 2022, 2023a; Quiroz et al. 2023; 70 Staunton et al. 2023). Because mutations in coding sequences are more likely to have functional 71 consequences, this biased distribution of mutations should again result in lower average fitness 72 effects than if mutations were uniformly distributed across the genome. 73 The probability of a mutation, therefore, cannot be considered independent of the 74 fitness consequences of that mutation. However, to challenge the textbook view that mutations 75 occur randomly with respect to environment-specific fitness effects, gene-specific mutational 76 biases would have to systematically vary with changes in the environment. One potential 77 mechanism that could create such a relationship between environment and mutation bias is the 78 coupling of DNA repair surveillance with transcription machinery, which results in lower 79 mutation rates for highly expressed genes (Supek and Lehner 2017; Oztas et al. 2018; Huang et 80 al. 2018; Huang and Li 2018; Gonzalez-Perez et al. 2019; Monroe et al. 2022). Therefore, 81 environmental changes that increase a gene’s expression level should lower its mutation rate. In 82 addition, highly expressed genes are known to experience stronger selection (Zhang and Yang 83 2015), so genes may be most protected from mutation in environments where they are most 84 functionally important. Alternatively, transcription may be mutagenic, as increased DNA damage 85 associated with exposure of single-stranded DNA to mutagens can potentially overpower the 86 increased protection of actively transcribed genes (Kim et al. 2007; Jinks-Robertson and 87 Bhagwat 2014; Seplyarskiy et al. 2023). 88 A challenge associated with addressing how local mutation rates vary with environment 89 is the difficulty of measuring mutations in experimental settings. Historical estimates of 90 mutation relied on comparisons of synonymous substitutions between populations or species. 91 Because these substitutions do not result in a change in amino acid, they are expected to 92 experience minimal selection and thus approximate mutational input, though in reality 93 synonymous sites do experience selection due to codon usage bias (Grantham et al. 1980; 94 Hershberg and Petrov 2008) and other mechanisms (Bailey et al. 2021). It is inherently difficult 95 to measure mutation rates more directly in large multicellular organisms because their long 96 generations require many individuals and/or large amounts of time for sufficient mutations to 97 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint occur, making methods such as mutation accumulation lines and parent-offspring trio 98 sequencing (Lynch et al. 2016; Tatsumoto et al. 2017) expensive and time-consuming. 99 An alternative and potentially complementary approach to mutation accumulation and 100 trio sequencing studies is to detect the mutations that accumulate in an organism’s somatic 101 tissues (Gundry and Vijg 2012; Moore et al. 2021; Monroe et al. 2022; Quiroz et al. 2023; 102 Schmitt et al. 2023; Staunton et al. 2023; Satake et al. 2023; Goel et al. 2024). This approach 103 benefits from the fact that many more cell lineages can be tracked than just the germline. 104 Inclusion of somatic (vegetative) mutations in recent Arabidopsis studies led to the 105 identification of thousands of mutations, which increased power to test for relationships 106 between local mutation rates and various sequence features, such as GC content, DNA 107 methylation, histone modifications and expression level (Monroe et al. 2022). However, this 108 approach appears to have been inaccurate because low frequency somatic variants can be 109 difficult to distinguish from sequencing errors, and reanalysis of the somatic mutation calls 110 showed that many of the putative mutations arose from technical artefacts (Liu and Zhang 111 2022; Monroe et al. 2023a; Wang et al. 2023; Monroe et al. 2023b). Therefore, the actual 112 frequency of somatic mutations in vegetative plant tissue remains an open question. 113 Measurements of low frequency somatic mutations can be obtained using a high-fidelity 114 sequencing technology to distinguish mutational signal from noise (Sloan et al. 2018). For 115 example, Duplex Sequencing is an Illumina-based method in which unique molecular identifiers 116 (UMIs) are included in adaptors and attached to both ends of DNA fragments before library 117 amplification (Schmitt et al. 2012; Kennedy et al. 2014). After sequencing, the UMIs are used to 118 cluster families of reads that originated from each strand of a given DNA fragment so that a 119 double-stranded consensus sequence can be created that is virtually error free (< 5-8 errors 120 per base pair; Kennedy et al. 2014). 121 Our goal in this study was to test if the pattern of local mutation rate variation across a 122 genome depends on environmental effects on gene expression levels. We also wanted to 123 determine whether low-frequency somatic mutations in plant tissues could provide a robust 124 signal for addressing this type of question. Therefore, we perturbed gene expression by growing 125 Arabidopsis under different temperatures. We identified differentially expressed (DE) genes with 126 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint RNA-seq, which we then targeted for low-frequency somatic mutation detection using Duplex 127 Sequencing coupled with hybrid capture. We included mutant lines msh2 and ung, which 128 respectively lack mismatch repair (MMR) and base excision repair (BER) capabilities, in order to 129 understand how repair mechanisms may drive biased mutation accumulation (Cordoba-Canero 130 et al. 2010; Belfield et al. 2018). We also included hsp70-16 mutant lines, which are deficient for 131 a key heat shock protein, as a means to endogenously manipulate gene expression and 132 potentially interact with our temperature treatment (Ran et al. 2020). As expected, we found 133 significant increases in variant frequencies in the MMR deficient lines. In wild type (WT) lines 134 and other mutant lines, measured mutation frequencies were too low to quantify relationships 135 between mutation rates and environment-specific gene expression levels. Therefore, our results 136 support the conclusion that earlier estimates of somatic variant frequencies were inflated 137 (Monroe et al. 2023a; Wang et al. 2023) and indicate that much larger datasets will be needed 138 to test for environment-specific changes in mutation biases. 139 140

Results

141 To test if environment specific changes in gene expression impact mutation, we performed 142 mutation detection on a targeted set of Arabidopsis genes that were DE in plants grown at 20°C 143 vs. 30°C. We first generated and analyzed RNA-seq data to identify genes in six categories: 1) 144 increased expression at 30°C compared to 20°C in WT plants, 2) increased expression at 20°C 145 compared to 30°C in WT plants, 3) constitutively high expression in WT plants at both 20°C and 146 30°C, 4) constitutively low expression in WT plants at both 20°C and 30°C, 5) genes that had 147 increased expression at 30°C vs. 20°C in WT plants (like category 1) and also had an interaction 148 between WT and hsp70-16, and 6) genes that had increased expression at 30°C vs. 20°C in WT 149 plants (like category 2) and also had an interaction between WT and hsp70-16 (Table S1). The 150 sequences of the DE genes were used to create a custom probe-set for hybrid capture of Duplex 151 Sequencing libraries. 152 Duplex Sequencing coverage of the genes and 250 bp of flanking sequence in the probe-153 set ranged from 74.7 to 109.4 (Figure S1), and the average probe-set coverage across all 154 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint libraries was 193.1-fold higher than the genome background. In total, we obtained 1.89 Gb of 155 Duplex Sequencing coverage of our region of interest across the 24 libraries (Table S2) 156 We then looked for the presence of single nucleotide variants (SNVs) and short indels 157 within the 339 genes covered in the probe-set. Mutant alleles already present in the parents of 158 the assayed sets of full-sib plants have the potential to bias estimates of de novo mutation 159 frequencies but should be readily identifiable. For a homozygous parent, they would be present 160 in all Duplex Sequencing reads of all the replicates of a given genotype. For a heterozygous 161 parent, they would segregate in a 1:2:1 Mendelian ratio and account for roughly 50% of the 162 reads for all replicates of a given genotype (as each replicate represents a pool of five sibling 163 plants). We identified just three apparent fixed SNVs (Table S3), which were removed for 164 downstream analyses. In contrast, we identified 41 fixed indels, over half of which were in the 165 msh2 background (Table S4). One gene (AT5G39190) had five sites that appeared to be 166 segregating SNVs in all 24 replicates. We suspected this might be caused by a cryptic gene 167 duplication which was not captured in the TAIR 10.2 reference genome (Jaegle et al. 2023). 168 Indeed, when we realigned the reads to the improved Col-CC genome (Reiser et al. 2023), the 169 mutation calls in AT5G39190 were absent. As such, reads mapping to AT5G39190 were 170 disregarded in downstream analyses. The rest of the SNVs we identified were unique to each 171 replicate and all were present at a frequency of no more than 17.64% (the average variant 172 frequency across all mutations was 2.27%), suggesting that these are low frequency somatic 173 variants that arose during the experiment and were present in a subset of the sampled 174 vegetative tissue. 175 Among the six WT biological replicates, we detected a single indel and just six SNVs, one 176 in each replicate (Figure 1). As such, there was very limited statistical power to test for the 177 effects of temperature or expression level on mutation frequency in WT plants. Similarly, we 178 detected few or no SNVs and indels in the hsp70-16 and the ung mutant lines (Figure 1; File S1, 179 S2). In contrast, variant frequencies were significantly elevated in the msh2 mutant lines 180 (compared to WT plants), where we detected 271 indels and 180 SNVs (Figure 1; two-way 181 ANOVA with Tukey’s test, p < 0.0001). The mutations in the msh2 lines were distributed 182 relatively evenly across the temperature treatments, as we found that temperature did not 183 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint influence either SNV or indel frequency (Figure 1; two-way ANOVA, p = 0.99). In the msh2 lines, 184 deletions were 8.5-fold more common than insertions (Table S5; two-way ANOVA, p < 0.0001). 185 We observed significant differences among SNV classes in msh2 SNV spectrum (Figure 2; two-186 way ANOVA, p <0.0001), which was dominated by CG→TA transitions. The next most common 187 types of substitutions were AT→GC transitions and CG→AT transversions. We compared the 188 msh2 mutation frequencies in the constitutively lowly expressed (group 3 in Table S1) vs 189 constitutively highly expressed (group 4 in Table S1) genes and found no significant differences 190 (paired t-test; Table S6), though we did observe a trend towards higher indel frequencies in 191 constitutively highly expressed genes at 30°C. We did not analyze the SNV spectra or indel bias 192 in WT , ung, or hsp70-16 lines because the small number of sampled mutations precluded a 193 statistically meaningful comparison. 194 195

Discussion

196 In this study we took a novel approach to studying plant mutation by utilizing high 197 fidelity Duplex Sequencing to measure low-frequency somatic variants in a targeted region of 198 the A. thaliana nuclear genome. Variants in unopened floral bud tissue of WT plants were 199 present at very low frequencies (Figure 1), which were near the detection threshold of Duplex 200 Sequencing (Kennedy et al. 2014; Wu et al. 2020). Although we did not have enough power to 201 address our prediction that increases in gene expression would correlate with decreases in 202 mutation rates in WT plants, the results are nonetheless of interest given recent debates about 203 the frequency of somatic mutations in plant tissues (Monroe et al. 2022; Liu and Zhang 2022; 204 Monroe et al. 2023a; Wang et al. 2023; Monroe et al. 2023b). Our results support the 205

Conclusion

that the high error rate of Illumina short-read sequencing makes it difficult to reliably 206 discern sequencing errors from extremely rare WT somatic mutations. That said, we are 207 skeptical of directly comparing the variant frequencies we measured in unopened floral buds 208 with those obtained in differentiated leaves (Monroe et al. 2022, 2023a) given recent evidence 209 showing substantial variation in somatic mutation rates depending on plant tissue (Goel et al. 210 2024). 211 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint We also surveyed variant frequencies in ung mutant plants and did not observe a 212 difference between WT and ung lines. Given that ung plants have previously been shown to 213 accumulate more uracil in DNA (presumably to the loss of base-excision repair activity on 214 deaminated cytosines) than WT plants (Cordoba-Canero et al. 2010), we interpret the lack of a 215 difference between WT and ung lines as evidence that actual WT mutation frequencies may be 216 below the detection threshold of Duplex Sequencing. However, it is also possible that the 217 similarly low mutation rates in WT and ung reflect the lack of a true biological difference, which 218 may be possible if redundant pathways exist that prevent uracils in DNA from becoming CG→TA 219 transitions. 220 In contrast, we found significantly elevated variant frequencies in msh2 mutants 221 compared to WT lines (Figure 1). MSH2 is known to function in mismatch repair (MMR) and 222 mutation accumulation experiments with msh2 mutant lines have established that the germline 223 SNV rate is 132 to 204-fold greater than the WT SNV rate (Ossowski et al. 2010; Jiang et al. 224 2014; Belfield et al. 2018). Here, we found that the average msh2 SNV frequency was 27-fold 225 greater than the average WT SNV frequency (Figure 1). Though somatic variant frequencies 226 measured with Duplex Sequencing are not directly comparable to germline mutation rates 227 assayed with mutation accumulation experiments, the smaller magnitude of the difference 228 between msh2 vs. WT in our dataset may be interpreted as further evidence that the actual WT 229 variant frequency is beneath the detection threshold of Duplex Sequencing. Alternatively, the 230 smaller difference between WT and msh2 reported here could be evidence that MMR is 231 particularly important for buffering against mutation in germline plant tissues, which is 232 supported by elevated expression of MSH2 and other mismatch repair genes in meristematic 233 tissues (Klepikova et al. 2016). 234 Variant frequencies in the msh2 mutant lines showed no significant difference in plants 235 grown at 20°C vs. 30°C. This finding contrasts with a recent mutation accumulation study that 236 found elevated germline mutation rates in WT plants grown at 29°C compared to those grown 237 at 23°C (Belfield et al. 2021) and another study that documented increases at 28°C and 32°C 238 compared to 23°C (Lu et al. 2021). One potential explanation of this result is that heat stress 239 may be mutagenic in WT plants because it impairs MMR since in the absence of MMR there is 240 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint no apparent heat effect. However, this interpretation would be at odds with the fact that the 241 genome-wide distribution of mutations in the heat-stressed plants mirrors the distribution of 242 WT plants grown at standard temperature, not of mismatch repair mutants (see Figure 3 of 243 (Belfield et al. 2021). The Duplex Sequencing variant frequencies in the msh2 mutant lines also 244 did not vary significantly between lowly expressed vs. highly expressed genes at either 20°C or 245 30°C (Figure 1). This result is consistent with the model that MMR provides special protection to 246 actively transcribed genes (Belfield et al. 2018; Huang et al. 2018; Huang and Li 2018). However, 247 we present this interpretation cautiously in the absence of WT data to test for an impact of 248 expression when MMR is functional. 249 In summary, we took a novel approach to studying plant mutations by using Duplex 250 Sequencing and hybrid capture to obtain a highly accurate snapshot of somatic variants in 251 targeted regions of the A. thaliana genome. We designed our experiment to test if 252 environmental conditions alter mutation rates in a gene-specific fashion. However, 253 the low rate of mutations in WT plants prevented testing for how expression levels impact 254 mutation rates. Nonetheless, the link between increased expression and decreased mutation in 255 plants is well documented (Oztas et al. 2018; Monroe et al. 2022; Quiroz et al. 2023), as is the 256 fact that gene expression is environmentally determined (Richards et al. 2012), so by logical 257 extension environmental conditions must drive mutation rates and related fitness 258 consequences. However, whether the magnitude of such an effect is biologically meaningful in 259 shaping mutation and evolution remains an important, unanswered question. Though mutation 260 accumulation and parent-offspring sequencing are time- and resource-intensive experiments, 261 they are both increasingly feasible due to continued declines in the cost of DNA sequencing 262 (Ossowski et al. 2010; Weng et al. 2019; Monroe et al. 2022). Conducting such experiments 263 under contrasting environments (Jiang et al. 2014; Belfield et al. 2021; Lu et al. 2021) to 264 measure the correlation between expression and mutation seems to be the key to 265 understanding how environments impact the types of mutations that organisms accumulate. 266 267

Materials and methods

268 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint All plants were grown in environmentally controlled growth chambers (75% humidity) under a 269 long-day photoperiod (16 hrs light, 8 hrs dark) with irradiance of 185 µmol m−2 sec−1 at constant 270 temperatures (either 20°C or 30°C, as specified below). Prior to planting, seeds were stratified 271 for 5 days in sterile ddH20. Arabidopsis thaliana ecotype Col-0 was used as the WT line. Existing 272 mutant lines were obtained from the Arabidopsis Biological Resource Center (Table S7) and 273 seedlings were screened with allele-specific PCR markers to identify plants that were 274 homozygous for the mutant alleles used in this study (msh2, ung, hsp70-16; Table S8). 275 Sibling plants (roughly 35 for each genotype and each temperature treatment) were 276 planted in 2.5-inch pots. Both temperature treatments were initiated in chambers (Convarion 277 models PGR15 (20°C) and PGCFLEX (30°C)) at 20°C because elevated ambient temperatures 278 (30°C) can inhibit seed germination (Silva-Correia et al. 2014). After 5 days, the temperature was 279 turned up for the 30°C treatment and kept at 20°C for the other treatment. When the plants 280 had reached stage 6.5 of development (where ~50 % of flowers have opened) (Boyes et al. 281 2001), we performed DNA and RNA extractions on unopened floral buds from laterally 282 branching florets. The 30°C plants reached developmental stage 6.5 at 31 days while the 20°C 283 plants reached developmental stage 6.5 at 41 days, consistent with faster plant development at 284 elevated ambient temperatures (Silva-Correia et al. 2014). 285 For the RNA extractions, plant material was collected from the unopened floral buds of 3 286 laterally branching florets from 3 WT and 3 hsp70-16 plants in each temperature treatment. The 287 harvested tissues were immediately placed into liquid nitrogen and homogenized for 10 288 seconds at 30 beats/sec with the Qiagen TissueLyser, before being processed with the Qiagen 289 RNeasy Plant Mini Kit, according to manufacturer’s instructions. The RNA samples were then 290 sent to Novogene and RNA-Seq libraries were made using the NEBNext Ultra II Directional RNA 291 Library Prep Kit with the NEBNext Poly(A) mRNA Magnetic Isolation Module. The RNA-Seq 292 libraries were sequenced on a NovaSeq 6000 using the PE150 strategy to generate 29 to 54 293 million read pairs per library (see Table S9). 294 Tissue was harvested for DNA sequencing and mutation detection at the same time as 295 the tissue for RNA extraction, from siblings of the plants used for RNA extraction. For each 296 replicate in the DNA extractions, plant material was pooled from 5 siblings from the unopened 297 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint floral buds of 3 laterally branching florets from 5 plants per each replicate, with 3 replicates per 298 genotype (WT, hsp70-16, msh2, ung) per temperature treatment. The floret tissue was 299 homogenized for 10 seconds at 30 beats/sec with the Qiagen TissueLyser, before being 300 processed with the DNeasy Plant Mini Kit from Qiagen. 301 The RNA-seq reads were analyzed to detect DE genes at 20°C vs. 30°C. First, the adaptors 302 were removed with Cutadapt version 4.0 with Python 3.9.16 (Martin 2011). Then the reads 303 were mapped to the TAIR10.2 reference genome with HISAT2 (version 2.2.1; (Kim et al. 2019). 304 Read counts were generated with HTSeq-count version 2.0.2 (Anders et al. 2014), and DESeq2 305 models (Love et al. 2014) were implemented to identify genes that were differentially expressed 306 or constitutively highly or lowly expressed. 307 We created a custom probe-set to enrich the sequences of DE genes via hybrid capture 308 so that we could perform mutation detection with Duplex Sequencing. We sent the sequences 309 of 400 DE genes (plus 250 nt of flanking sequence on the end of each gene) to the probe design 310 team at Arbor Bioscience, which flagged 61 of the genes as unsuitable for hybrid capture 311 because they were > 25 % soft-masked for repeats in a BLAST search against the Arbor 312 Biosciences eudicot database. The remaining 339 genes (listed in supplementary file 2) and 313 flanking sequences spanned a total length of 855,123 nt. Sets of 80-nt probes were 2 tiled 314 across the target sequence at approximately every 40 nt. The probes were biotinylated so that 315 probe-bound library molecules can be captured with streptavidin-coated magnetic beads. 316 We created Duplex Sequencing libraries from the 24 DNA samples (3 replicates  4 317 genotypes  2 temperature treatments), following our previously described library preparation 318 protocols (Wu et al. 2020; Waneka et al. 2021), except that in this case the amount of input 319 DNA was increased to 500 ng because the target sequence comprises a small fraction (< 1%) of 320 the total-cellular DNA sample. Once DNA samples had been fragmented via ultrasonication, 321 end-repaired, A-tailed, adaptor-ligated, and treated with a cocktail of damage removal enzymes 322 (Wu et al. 2020), we amplified 0.73 ng of DNA (per reaction) for 13 PCR cycles with New England 323 Biolabs Q5 High-Fidelity Polymerase and dual-indexed primers. We then created 3 pools by 324 combining 350 ng of each amplified library as the Arbor Biosciences hybrid-capture reactions 325 have enough capacity for 8 libraries in each pool. We performed the overnight hybrid-capture 326 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint reaction at 65°C, according to the manufacturer’s instructions (Arbor Biosciences MyBaits Kit 327 Manual v. 5.02). We assessed enrichment efficiency and library concentrations through qPCR (as 328 previously described; (Waneka et al. 2021)) before amplifying the enriched pools for an 329 additional 9 cycles to obtain sufficient library amounts for sequencing. 330 Duplex Sequencing libraries were sequenced with PE150 reads on an Illumina NovaSeq 331 6000 S4 Lane (Novogene) to generate 87 to 123 million read pairs per library (Table S10). 332 Processing of the Duplex Sequencing reads to was performed with our previously described 333 pipeline (Wu et al. 2020), which trimmed adaptor sequences, created duplex consensus 334 sequences based on the presence of shared barcodes, mapped the consensus sequences to the 335 entire TAIR10.2 reference genome. Each duplex consensus sequences is composed of at least 6 336 Illumina reads (at least 3 originating from each strand of a DNA fragment). Alignment files were 337 then parsed to identify duplex consensus sequences that contain SNVs and short indels. Since 338 Duplex Sequencing is highly accurate (< 5-8 errors per base pair; Kennedy et al. 2014) we 339 require just a single duplex consensus to support a putative mutation. Comparisons of coverage 340 in the probe-set vs. outside the probe-set were performed with Samtools version 1.6 (Li et al. 341 2009). For variant frequency calculations, we excluded the first or last 10 bps of a read because 342 we have previously identified elevated mutation frequencies at read ends (Wu et al. 2020). 343 344 345 DATA AVAILABILITY 346 The raw reads are available via the NCBI Sequence Read Archive under accessions 347 SRR27564102-SRR27564113 (RNA-seq libraries) and SRR27693810-SRR27693833 (Duplex 348 Sequencing libraries). Duplex Sequencing datasets were processed with a previously published 349 pipeline (https://github.com/dbsloan/duplexseq) (Wu et al. 2020). 350 351

Acknowledgements

352 This work was supported by a grant from the National Institutes of Health (R35 GM148134). 353 354 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint FIGURES 355 356 Figure 1. Mutation frequencies in WT vs mutant lines at 20°C and 30°C. Log10 mutation 357 frequencies for single nucleotide variants (SNVs) and insertions/deletions (INDELs) calculated as 358 the number of events (SNVs or INDELs) divided by the duplex sequencing coverage of the probe-359 set. A floor of 2.5-8 was applied to the y-axis for data visualization. P-values are from a 360 Tukey’s test on a two-way ANOVA performed in R with the emmeans package (version 1; (Lenth 361 et al. 2021). 362 363 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint 364 Figure 2. Mutation spectrum for WT and mutant plants at 20 °C and 30 °C. Log10 mutation 365 frequencies for different types of single nucleotide variants were calculated as the number of 366 events divided by the nucleotide-specific duplex sequencing coverage of the probe-set. A floor 367 of 2.510-8 was applied to the y-axis for data visualization. 368 369 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint SUPPLEMENTARY FIGURES 370 371 372 Figure S1. Duplex Sequencing coverage of the probe-set (panel 1), the 250 bps flanking the 373 probe-set (panel 2) and the rest of the genome, outside of the probe-set (panel 3). 374 375 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint SUPPLEMENTARY TABLES 376 377 Table S1. Differentially expressed genes from the RNA-seq analysis identified with DESeq2 378 Category Genotype Comparison p-value Log fold change Average normalized cover of each treatment Number of genes Included in probe- set Genes retained after arbor repeat filtering 1 WT Increased exp. at 30°C 0.05 > 2 Minimum coverage > 5 683 100 with greatest LFC 84 2 WT Increased exp. at 20°C 0.05 5 350 100 with lowest LFC 80 3 WT Constitutive low exp. 0.05 50 genes with LFC closest to 0 50 genes with lowest coverage (ranges from 129 to 400 50 50 44 4 WT Constitutive high exp. 0.05 50 genes with LFC closest to 0 50 genes with highest coverage (ranges from 8384 to 68053 50 50 45 5 WT vs. HSP70-16 Interaction between genotype and temp 0.05 >2 Minimum coverage > 5 106 (39 of which are also in group 1) 92 with highest LFC 81 6 WT vs HSP70-60 Interaction between genotype and temp 0.05 5 8 (5 of which are also in group 2) All 8 5 total 400 339 379 380 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S2. Duplex Sequencing coverage for each replicate 381 Sample Mean Depth of Coverage Total Duplex Seq. Data (bp) WT 20°C A 86.86 74273348 WT 20°C B 92.16 78809954 WT 20°C C 82.40 70459706 WT 30°C A 81.46 69660673 WT 30°C B 95.39 81571700 WT 30°C C 93.77 80187868 HSP70-16 20°C A 82.31 70384149 HSP70-16 20°C B 74.75 63917524 HSP70-16 20°C C 93.94 80328860 HSP70-16 30°C A 93.65 80085644 HSP70-16 30°C B 81.50 69690981 HSP70-16 30°C C 98.70 84396810 MSH2 20°C A 105.53 90244630 MSH2 20°C B 95.50 81667422 MSH2 20°C C 107.69 92087225 MSH2 30°C A 95.50 81666433 MSH2 30°C B 87.40 74739952 MSH2 30°C C 93.40 79871709 UNG 20°C A 98.30 84059203 UNG 20°C B 93.33 79804898 UNG 20°C C 75.23 64327096 UNG 30°C A 109.44 93588299 UNG 30°C B 93.79 80203757 UNG 30°C C 106.23 90842455 382 383 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S3. Putative fixed SNVs removed before downstream analysis of Duplex Sequencing data 384 Genotype Chromosome Position Substitution type Shared among all replicates ung 2 2016156 AT→GC yes wild-type 2 14827204 CG→AT yes msh2 4 14827204 CG→AT yes 385 386 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S4. Putative fixed indels removed before downstream analysis of Duplex Sequencing data 387 Chrom Pos Indel Type Genotype Number of Reps (of 6) Indel Length Indel Seq Chrom1 2243387 I MSH2 6 1 G Chrom1 2243387 I WT 6 1 G Chrom1 2243387 I UNG 6 1 G Chrom1 2243387 I HSP70 6 1 G Chrom1 2269740 D MSH2 6 1 A Chrom1 2270545 D MSH2 5 1 T Chrom1 2437835 D MSH2 5 1 T Chrom1 5291180 D MSH2 6 1 T Chrom1 6591532 I MSH2 6 1 A Chrom1 6591532 I WT 6 1 A Chrom1 6591532 I UNG 6 1 A Chrom1 6591532 I HSP70 6 1 A Chrom1 8551177 I MSH2 6 1 G Chrom1 8551177 I WT 6 1 G Chrom1 8551177 I UNG 6 1 G Chrom1 8551177 I HSP70 6 1 G Chrom1 11646952 D MSH2 6 1 T Chrom1 13533273 I MSH2 6 3 AGA Chrom1 13533273 I WT 6 3 AGA Chrom1 13533273 I UNG 6 3 AGA Chrom1 13533273 I HSP70 6 3 AGA Chrom1 17886514 D MSH2 6 1 A Chrom1 23734915 D MSH2 6 1 A Chrom1 26640491 D MSH2 6 1 A Chrom2 11236090 D MSH2 4 1 A Chrom2 11567248 I MSH2 4 1 T Chrom2 11567248 I WT 6 1 T Chrom2 11567248 I UNG 6 1 T Chrom2 11567248 I HSP70 6 1 T Chrom2 17464171 D MSH2 6 1 T Chrom3 4833763 D MSH2 6 1 A Chrom3 8412456 D MSH2 4 1 T Chrom3 18338647 D MSH2 6 1 T Chrom4 13742764 D MSH2 6 1 T .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Chrom4 16470637 I MSH2 6 1 T Chrom4 16470637 I WT 6 1 T Chrom4 16470637 I UNG 6 1 T Chrom4 16470637 I HSP70 6 1 T Chrom5 2974730 D MSH2 4 1 T Chrom5 7718829 D MSH2 6 1 T Chrom5 25010019 D MSH2 6 1 A 388 389 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S5. Indel mutations in msh2 mutant lines 390 Sample Deletions Insertions MSH2 20°C A 44 7 MSH2 20°C B 33 2 MSH2 20°C C 33 5 MSH2 30°C A 47 4 MSH2 30°C B 43 5 MSH2 30°C C 47 6 total 247 29 391 392 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S6. Paired t-test results of group 3 vs group 4 mutation rates in msh2- lines (two-tailed) 393 Temp Mutation class Group 3 ave. variant frequency Group 4 ave. variant frequency P value 20 °C SNV 1.0210-07 1.0410-07 0.9771 30 °C SNV 7.2510-08 9.4710-08 0.6815 20 °C INDEL 1.1910-07 1.3810-07 0.1615 30 °C INDEL 1.1710-07 1.7210-07 0.0695 394 395 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S7. Mutant lines used, all sourced from ABRC 396 Gene AGI Mutant Allele Ref HSP70-16 AT1G11660 SALK_028829 (Ran et al. 2020) MSH2 AT3G18524 SALK_002708 (Belfield et al. 2018) UNG AT3G18630 CS308297 (Cordoba-Canero et al. 2010) 397 398 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S8. PCR primers used to identify mutant alleles in the three mutant lines 399 Gene/line Fwd Primer Rev Primer HSP70-16 WT TACGCACTCACTTGCATTCAC TGTGTTATCGCAGTTGCAAAG HSP70-16 Mut ATTTTGCCGATTTCGGAAC TGTGTTATCGCAGTTGCAAAG MSH2 WT TCACCACGATGATGTCAAGAG AGGAGCTGTCAAAAGGAGCTC MSH2 Mut ATTTTGCCGATTTCGGAAC AGGAGCTGTCAAAAGGAGCTC UNG WT ACTTGGAGAAGGTAAAGCAATTCA CCATACAAAATATAATACACCACCACTC UNG Mut ACTTGGAGAAGGTAAAGCAATTCA ATATTGACCATCATACTCATTGC 400 401 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S9. Read counts for the 12 RNA-seq libraries 402 Sample Count of read pairs HSP70-16 20°C A 29689895 HSP70-16 20°C B 32052311 HSP70-16 20°C C 33450418 HSP70-16 30°C A 32567642 HSP70-16 30°C B 31456737 HSP70-16 30°C C 29678098 WT 20°C A 30417658 WT 20°C B 54410188 WT 20°C C 42449872 WT 30°C A 34353207 WT 30°C B 36605678 WT 30°C C 37953073 403 404 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Table S10. Read counts for the 24 Duplex Sequencing libraries 405 Sample Count of read-pairs HSP70-16 20°C A 102214316 HSP70-16 20°C B 88105828 HSP70-16 20°C C 106355604 HSP70-16 30°C A 88061502 HSP70-16 30°C B 99506728 HSP70-16 30°C C 112263590 MSH2 20°C A 106838516 MSH2 20°C B 90724220 MSH2 20°C C 111544972 MSH2 30°C A 115206890 MSH2 30°C B 93741162 MSH2 30°C C 111444292 UNG 20°C A 113380236 UNG 20°C B 110455064 UNG 20°C C 108883106 UNG 30°C A 91537708 UNG 30°C B 87766824 UNG 30°C C 123532620 WT 20°C A 100905496 WT 20°C B 102443086 WT 20°C C 116973524 WT 30°C A 97650342 WT 30°C B 105779540 WT 30°C C 110474398 406 407 408 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint

References

409 Anders, S., P . T. Pyl, and W. Huber, 2014 HTSeq—a Python framework to work with high-410 throughput sequencing data. Bioinformatics 31: 166–169. 411 Bailey, S. F., L. A. Alonso Morales, and R. Kassen, 2021 Effects of Synonymous Mutations beyond 412 Codon Bias: The Evidence for Adaptive Synonymous Substitutions from Microbial 413 Evolution Experiments. Genome Biol. Evol. 13:. 414 Belfield, E. J., C. Brown, Z. J. Ding, L. Chapman, M. Luo et al., 2021 Thermal stress accelerates 415 Arabidopsis thaliana mutation rate. Genome Res. 31: 40–50. 416 Belfield, E. J., Z. J. Ding, F. J. C. Jamieson, A. M. Visscher, S. J. Zheng et al., 2018 DNA mismatch 417 repair preferentially protects genes from mutation. Genome Res. 28: 66–74. 418 Boyes, D. C., A. M. Zayed, R. Ascenzi, A. J. McCaskill, N. E. Hoffman et al., 2001 Growth stage-419 based phenotypic analysis of Arabidopsis: a model for high throughput functional 420 genomics in plants. Plant Cell 13: 1499–1510. 421 Cordoba-Canero, D., E. Dubois, R. R. Ariza, M.-P . Doutriaux, and T. Roldán-Arjona, 2010 422 Arabidopsis uracil DNA glycosylase (UNG) is required for base excision repair of uracil 423 and increases plant sensitivity to 5-fluorouracil. J. Biol. Chem. 285: 7475–7483. 424 Eyre-Walker, A., and P . D. Keightley, 2007 The distribution of fitness effects of new mutations. 425 Nat. Rev. Genet. 8: 610–618. 426 Goel, M., J. A. Campoy, K. Krause, L. C. Baus, A. Sahu et al., 2024 The majority of somatic 427 mutations in fruit trees are layer-specific. bioRxiv 2024.01.04.573414. 428 Gonzalez-Perez, A., R. Sabarinathan, and N. Lopez-Bigas, 2019 Local Determinants of the 429 Mutational Landscape of the Human Genome. Cell 177: 101–114. 430 Grantham, R., C. Gautier, M. Gouy, R. Mercier, and A. Pavé, 1980 Codon catalog usage and the 431 genome hypothesis. Nucleic Acids Res. 8: r49–r62. 432 Gundry, M., and J. Vijg, 2012 Direct mutation analysis by high-throughput sequencing: from 433 germline to low-abundant, somatic variants. Mutat. Res. 729: 1–15. 434 Hershberg, R., and D. A. Petrov, 2008 Selection on Codon Bias. 435 Huang, Y ., L. Gu, and G.-M. Li, 2018 H3K36me3-mediated mismatch repair preferentially 436 protects actively transcribed genes from mutation. J. Biol. Chem. 293: 7811–7823. 437 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Huang, Y ., and G.-M. Li, 2018 DNA mismatch repair preferentially safeguards actively transcribed 438 genes. DNA Repair 71: 82–86. 439 Jaegle, B., R. Pisupati, L. M. Soto-Jiménez, R. Burns, F. A. Rabanal et al., 2023 Extensive sequence 440 duplication in Arabidopsis revealed by pseudo-heterozygosity. Genome Biol. 24: 44. 441 Jiang, C., A. Mithani, E. J. Belfield, R. Mott, L. D. Hurst et al., 2014 Environmentally responsive 442 genome-wide accumulation of de novo Arabidopsis thaliana mutations and 443 epimutations. Genome Res. 24: 1821–1829. 444 Jinks-Robertson, S., and A. S. Bhagwat, 2014 Transcription-associated mutagenesis. Annu. Rev. 445 Genet. 48: 341–359. 446 Kennedy, S. R., M. W. Schmitt, E. J. Fox, B. F. Kohrn, J. J. Salk et al., 2014 Detecting ultralow-447 frequency mutations by Duplex Sequencing. Nat. Protoc. 9: 2586–2606. 448 Kim, N., A. L. Abdulovic, R. Gealy, M. J. Lippert, and S. Jinks-Robertson, 2007 Transcription-449 associated mutagenesis in yeast is directly proportional to the level of gene expression 450 and influenced by the direction of DNA replication. DNA Repair 6: 1285–1296. 451 Kim, D., J. M. Paggi, C. Park, C. Bennett, and S. L. Salzberg, 2019 Graph-based genome alignment 452 and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37: 907–915. 453 Klepikova, A. V., A. S. Kasianov, E. S. Gerasimov, M. D. Logacheva, and A. A. Penin, 2016 A high 454 resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-455 seq profiling. Plant J. 88: 1058–1070. 456 Lenth, R., H. Singmann, J. Love, P . Buerkner, and M. Herve, 2021 Emmeans: Estimated marginal 457 means, aka least-squares means. R Package Version 1 (2018). Preprint at. 458 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., 2009 The Sequence Alignment/Map 459 format and SAMtools. Bioinformatics 25: 2078–2079. 460 Liu, H., and J. Zhang, 2022 Is the Mutation Rate Lower in Genomic Regions of Stronger Selective 461 Constraints? Mol. Biol. Evol. 39:. 462 Love, M. I., W. Huber, and S. Anders, 2014 Moderated estimation of fold change and dispersion 463 for RNA-seq data with DESeq2. Genome Biol. 15: 550. 464 Lu, Z., J. Cui, L. Wang, N. Teng, S. Zhang et al., 2021 Genome-wide DNA mutations in Arabidopsis 465 plants after multigenerational exposure to high temperatures. Genome Biol. 22: 160. 466 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Luria, S. E., and M. Delbrück, 1943 Mutations of Bacteria from Virus Sensitivity to Virus 467 Resistance. Genetics 28: 491–511. 468 Lynch, M., M. S. Ackerman, J.-F. Gout, H. Long, W. Sung et al., 2016 Genetic drift, selection and 469 the evolution of the mutation rate. Nat. Rev. Genet. 17: 704–714. 470 Martin, M., 2011 Cutadapt removes adapter sequences from high-throughput sequencing 471 reads. EMBnet.journal 17: 10–12. 472 Monroe, J. G., K. D. Murray, W. Xian, T. Srikant, P . Carbonell-Bejerano et al., 2023a Reply to: Re-473 evaluating evidence for adaptive mutation rate variation. Nature 619: E57–E60. 474 Monroe, J. G., T. Srikant, P . Carbonell-Bejerano, C. Becker, M. Lensink et al., 2023b Author 475 Correction: Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 620: 476 E13. 477 Monroe, J. G., T. Srikant, P . Carbonell-Bejerano, C. Becker, M. Lensink et al., 2022 Mutation bias 478 reflects natural selection in Arabidopsis thaliana. Nature 602: 101–105. 479 Moore, L., A. Cagan, T. H. H. Coorens, M. D. C. Neville, R. Sanghvi et al., 2021 The mutational 480 landscape of human somatic and germline cells. Nature 597: 381–386. 481 Ossowski, S., K. Schneeberger, J. I. Lucas-Lledó, N. Warthmann, R. M. Clark et al., 2010 The rate 482 and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327: 483 92–94. 484 Oztas, O., C. P . Selby, A. Sancar, and O. Adebali, 2018 Genome-wide excision repair in 485 Arabidopsis is coupled to transcription and reflects circadian gene expression patterns. 486 Nat. Commun. 9: 1503. 487 Quiroz, D., M. Lensink, D. J. Kliebenstein, and J. G. Monroe, 2023 Causes of Mutation Rate 488 Variability in Plant Genomes. Annu. Rev. Plant Biol. 74: 751–775. 489 Ran, X., X. Chen, L. Shi, M. Ashraf, F. Yan et al., 2020 Transcriptomic insights into the roles of 490 HSP70-16 in sepal’s responses to developmental and mild heat stress signals. Environ. 491 Exp. Bot. 179: 104225. 492 Reiser, L., E. Bakker, S. Subramaniam, X. Chen, and S. Sawant, 2023 The Arabidopsis Information 493 Resource in 2024. bioRxiv. 494 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Richards, C. L., U. Rosas, J. Banta, N. Bhambhra, and M. D. Purugganan, 2012 Genome-wide 495 patterns of Arabidopsis gene expression in nature. PLoS Genet. 8: e1002662. 496 Sanchez-Contreras, M., M. T. Sweetwyne, B. F. Kohrn, K. A. Tsantilas, M. J. Hipp et al., 2021 A 497 replication-linked mutational gradient drives somatic mutation accumulation and 498 influences germline polymorphisms and genome composition in mitochondrial DNA. 499 Nucleic Acids Res. 49: 11103–11118. 500 Satake, A., R. Imai, T. Fujino, S. Tomimoto, K. Ohta et al., 2023 Somatic mutation rates scale with 501 time not growth rate in long-lived tropical trees. eLife. 502 Schmitt, S., P . Heuret, V . Troispoux, M. Beraud, J. Cazal et al., 2023 Plant mutations: slaying 503 beautiful hypotheses by surprising evidence. bioRxiv 2023.06.05.543657. 504 Schmitt, M. W., S. R. Kennedy, J. J. Salk, E. J. Fox, J. B. Hiatt et al., 2012 Detection of ultra-rare 505 mutations by next-generation sequencing. Proc. Natl. Acad. Sci. U. S. A. 109: 14508–506 14513. 507 Seplyarskiy, V ., E. M. Koch, D. J. Lee, J. S. Lichtman, H. H. Luan et al., 2023 A mutation rate model 508 at the basepair resolution identifies the mutagenic effect of polymerase III transcription. 509 Nat. Genet. 55: 2235–2242. 510 Silva-Correia, J., S. Freitas, R. M. Tavares, T. Lino-Neto, and H. Azevedo, 2014 Phenotypic analysis 511 of the Arabidopsis heat stress response during germination and early seedling 512 development. Plant Methods 10: 7. 513 Sloan, D. B., A. K. Broz, J. Sharbrough, and Z. Wu, 2018 Detecting Rare Mutations and DNA 514 Damage with Sequencing-Based Methods. Trends Biotechnol. 36: 729–740. 515 Staunton, P . M., A. J. Peters, and C. Seoighe, 2023 Somatic mutations inferred from RNA-seq 516 data highlight the contribution of replication timing to mutation rate variation in a model 517 plant. Genetics 225:. 518 Supek, F., and B. Lehner, 2017 Clustered Mutation Signatures Reveal that Error-Prone DNA 519 Repair Targets Mutations to Active Genes. Cell 170: 534-547.e23. 520 Tatsumoto, S., Y . Go, K. Fukuta, H. Noguchi, T. Hayakawa et al., 2017 Direct estimation of de 521 novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome 522 sequencing. Sci. Rep. 7: 13561. 523 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint Waneka, G., J. M. Svendsen, J. C. Havird, and D. B. Sloan, 2021 Mitochondrial mutations in 524 Caenorhabditis elegans show signatures of oxidative damage and an AT-bias. Genetics 525 219:. 526 Wang, L., A. T. Ho, L. D. Hurst, and S. Yang, 2023 Re-evaluating evidence for adaptive mutation 527 rate variation. Nature 619: E52–E56. 528 Weng, M.-L., C. Becker, J. Hildebrandt, M. Neumann, M. T. Rutter et al., 2019 Fine-Grained 529 Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana. 530 Genetics 211: 703–714. 531 Wu, Z., G. Waneka, A. K. Broz, C. R. King, and D. B. Sloan, 2020 MSH1 is required for 532 maintenance of the low mutation rates in plant mitochondrial and plastid genomes. 533 Proc. Natl. Acad. Sci. U. S. A. 117: 16448–16455. 534 Zhang, G., 2023 The mutation rate as an evolving trait. Nat. Rev. Genet. 24: 3. 535 Zhang, J., and J.-R. Yang, 2015 Determinants of the rate of protein sequence evolution. Nat. Rev. 536 Genet. 16: 409–420. 537 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 1, 2024. ; https://doi.org/10.1101/2024.01.31.578196doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-20T11:00:21.680559+00:00

License: CC-BY-NC-ND-4.0