Genomic studies in Linum shed light on the evolution of the distyly supergene and the molecular basis of convergent floral evolution

preprint OA: gold CC-BY-NC-ND-4.0
📄 Open PDF Full text JSON View at publisher
Full text 91,274 characters · extracted from oa-pdf · 6 sections · click to expand

Introduction

Distyly is a floral polymorphism that promotes outcrossing and has evolved repeatedly in flowering plants, making it a prominent example of convergent floral evolution (Barrett, 2019). In distylous species, there are two floral morphs that differ reciprocally in the positions of anthers and stigmas. Long-styled (L-morph, or pin) plants have stigmas in a high position in the flower, and anthers in a low position, whereas short-styled (S-morph, or thrum) individuals have the opposite arrangement. Differences in flower structure are usually accompanied by differences in pollen and stigma traits and by heteromorphic self- incompatibility (SI), which limits self- and intra-morph pollination. Distyly has evolved independently multiple times in flowering plants (Lloyd & Webb, 1992; Naiki, 2012) suggesting that it provides a solution to a common set of selective pressures (Shore et al., 2019; Simón-Porcar et al., 2024). Specifically, reciprocal morph differences increase the precision of pollen transfer by pollinators (Darwin, 1877; Lloyd & Webb, 1992; Barrett, 2019; Simón-Porcar et al., 2024), whereas heteromorphic SI confers inbreeding avoidance (Charlesworth & Charlesworth, 1979). In distylous species where genetic studies have been done, distyly is governed by a single Mendelian locus, the S-locus, with one dominant and one recessive allele (Bateson & Gregory, 1905; Laibach, 1923; reviewed by Ganders, 1979) which controls both floral morphology and heteromorphic SI. In most systems, the L-morph is homozygous for the recessive s-allele (s/s), whereas the S-morph is genetically heterozygous (S/s) (reviewed by Ganders, 1979). To explain how a single Mendelian locus could control this multi-trait balanced polymorphism, Ernst (1936) proposed that in distylous Primula, the S-locus harbored at least three separate and polymorphic genes, present in close linkage and controlling different aspects of distyly. Under Ernst’s model, the S-locus constitutes a supergene, defined as “a system of closely linked loci controlling a polymorphic phenotype, such that a non-recombining genome region is structured into two or more distinct haplotypes, each carrying a set of alleles that control multiple aspects of one of the phenotypes” (Charlesworth, 2016). Detailed genomic characterization of multiple independently evolved distyly S-loci has shown that they harbor multiple closely linked genes important for trait polymorphism, and can be considered supergenes (e.g., Fagopyrum (Yasui et al., 2012; Fawcett et al., 2023); Primula (Huu et al., 2016; Li et al., 2016); Turnera (Shore et al., 2019); Linum (Gutiérrez- Valencia et al., 2022); Nymphoides (Yang et al., 2023); Gelsemium (Zhao et al., 2023); Oleaceae (Castric et al., 2024; Raimondeau et al., 2024). However, unlike other types of .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 4 supergenes, which often harbor inversions, all distyly supergenes studied in detail so far instead harbor large indels (reviewed in Gutiérrez-Valencia et al., 2021b). Because the dominant allele is typically longer and the recessive allele shorter, the supergene is usually hemizygous in S-morph individuals, with S-morph-specific expression of dominant S-linked genes that control floral morph (reviewed in (Gutiérrez-Valencia et al., 2021b)). Hemizygosity ensures both dominant expression and absence of recombination between the recessive and dominant alleles at the distyly supergene. Whether repeated evolution of similar genomic architectures at the distyly supergene is accompanied by functional similarities in the mechanisms regulating distyly remains less clear, as independently evolved S-loci do not share orthologous genes. Genomic studies have enabled experimental and functional studies of candidate S-locus genes and mechanisms governing distyly, demonstrating that S-linked genes that inactivate brassinosteroids control style length and female SI in at least two distylous systems (in Primula (Huu et al., 2016, 2022) and in Turnera (Shore et al., 2019; Matzke et al., 2020, 2021)). Studies in additional distylous systems are required to determine whether distyly is generally accompanied by parallel evolution at the biochemical pathway level. Linum (wild flaxseed species) is a classic system for the study of the function, evolution and genetic basis of distyly (Darwin, 1863, 1877; Dulberger, 1992; Armbruster et al., 2006). This system is of particular interest due to its varied stylar polymorphisms, including distyly (Fig. 1b-c) and stigma height dimorphism (Fig. 1d) (Armbruster et al., 2006; McDill et al., 2009; Ruiz-Martin et al., 2018; Maguilla et al., 2021). The presence of varied stylar polymorphisms, as well as recurrent loss of distyly, make Linum a particularly suitable system for dissecting the genetic basis of distyly. Building on a high-quality genome assembly of the distylous and SI Linum tenue (Fig. 1b), we recently characterized the distyly S-locus in Linum, and showed that in L. tenue it constitutes a supergene which harbors an ~260 kb indel, rendering the S-morph predominantly hemizygous (Gutiérrez-Valencia et al., 2022). The L. tenue S-locus harbors nine protein-coding genes, including candidate genes for style length (LtTSS1) and anther height/pollen SI (LtWDR-44). In the closely related selfing species L. trigynum which recently lost distyly and is homostylous, i.e., monomorphic with anthers and stigmas at the same height, LtWDR-44 is present but expressed a lower level than in SI L. tenue thrums (Gutiérrez-Valencia et al., 2024). Altered expression of LtWDR-44 is associated with a switch in pollen SI function from thrum-to-pin-type and self-compatibility (SC), suggesting a role for this gene in pollen SI (Gutiérrez-Valencia et al., 2024). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 5 Based on our previous work in L. tenue and L. trigynum (Gutiérrez-Valencia et al. 2022; 2024), two genes thus stand out as strong candidate genes for governing distyly in Linum: LtTSS1 (hereafter called TSS1) and LtWDR-44 (hereafter called WDR-44). However, it is not currently clear whether these two genes are generally important for distyly in Linum, as ancestral state reconstruction suggested that divergent Linum species may have independently evolved distyly (Armbruster et al., 2006; McDill et al., 2009; Ruiz-Martin et al., 2018). To shed further light on the origin(s) of the distyly S-locus in Linum, and to identify conserved candidate genes and mechanisms regulating distyly in Linum, there is a need for further comparative studies of the S-locus, building on high-quality genome assemblies of widely diverged distylous Linum species. Here, we describe new high-quality haplotype-resolved genome assemblies of two Linum species: the distylous and SI L. perenne and the style length dimorphic and SI L. grandiflorum, both of which diverged from L. tenue about 33 Mya (Maguilla et al., 2021) (Fig. 1a). Like L. tenue, both L. perenne and L. grandiflorum have heteromorphic SI (Murray, 1986), but L. grandiflorum lacks anther height dimorphism. To identify shared S-locus genes that might be involved in the control of floral morph and SI in Linum, we 1) identify and characterize the S-loci of L. grandiflorum and L. perenne and compare their gene content to that of L. tenue. We use the results to 2) infer the origin of the distyly S-locus in Linum, and 3) experimentally test whether downregulation of brassinosteroid-responsive genes by the distyly supergene is a conserved mechanism that controls style length polymorphism in Linum. Our results shed new light on the origin and evolution of the distyly supergene in Linum and the genetic basis of an iconic case of convergent floral evolution.

Materials and methods

Biological material for genome assembly and annotation For de novo genome assembly of L. perenne L. and L. grandiflorum Desf., we snap-froze leaves from one S-morph individual of L. perenne accession LIN 2003 (here named L96A) and one S-morph individual of L. grandiflorum accession LIN 10 (here named L62.06), both from the IPK Gatersleben Gene Bank (Table S1). For annotation of genome-assemblies, we snap-froze at least two replicates each of leaves, stems, early and late flower buds (collected at two stages for L. perenne and at three stages for L. grandiflorum) and open flowers of L. perenne L96A and L. grandiflorum L62.06 for RNA extraction and sequencing. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 6 Plant growth conditions Seeds were sterilized with 10% bleach or 3% hydrogen peroxide and detergent, followed by washes with 70% ethanol and distilled water, sown on sterile plates with half strength Murashige-Skoog medium (Sigma-Aldrich, St. Louis, MO, USA) stratified and moved to standard long-day conditions at 16 h light at 20°C: 8 h dark at 18°C, 60% maximum humidity, 120 µE light intensity, until seedlings emerged. Seedlings were transplanted to pots containing a mixture of soil (Hasselfors Garden, Sweden) and gravel (1.5:1), with an addition of perlite and vermiculite. L. perenne plants were vernalized for 9 weeks under short-day conditions: 8 h light at 6°C, 16 h dark at 2°C, 65% maximum humidity, 110 µE light intensity, with the following transition conditions were in place two weeks before and after vernalization: 11 h light at 15°C, 13 h dark at 10°C, 120 µE light intensity, 65% maximum humidity. High Molecular Weight (HMW) DNA isolation HMW DNA was extracted and purified using a two-step protocol, from a total of 2 g fresh- frozen leaves from individuals L. perenne L96A and L. grandiflorum L62.06. For HMW DNA extraction we followed a modified protocol from (Fulton et al., 1995) with purification using Genomic-Tip/500 (Qiagen, Hilden, Germany). HMW DNA quality was checked spectrophotometrically and through pulsed-field gel electrophoresis using SeaKem Gold agarose (Lonza, Rockland, ME, USA), 0.5X KBB buffer (Sage Science), and a Pippin Pulse Electrophoresis power supply system (Sage Science, Beverly, MA, USA), with post-staining using GelRed (Biotium, Fremont, CA, USA). PacBio High-Fidelity (HiFi) sequencing HMW DNA from L. perenne L96A (32 µg) and L. grandiflorum L62.06 (33 µg) was used to generate SMRTBell libraries for HiFi long-read sequencing. Each library was sequenced on two SMRT cells in HiFi mode on a Sequel II (Pacific Biosciences), which resulted in 31 and 50 Gbases of HiFi data for L. perenne and L. grandiflorum, respectively, with an insert size of 15 kbp. Hi-C data generation To generate high-quality proximity ligation libraries (Hi-C) for scaffolding of genome assemblies, a total of 300 mg of fresh-frozen leaf tissue was first ground to a fine powder in .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 7 liquid nitrogen. Hi-C libraries were generated using the Dovetail OmniC kit. Sequencing on an Illumina NovaSeq6000 generated a total of 1.0*109 paired-end 150 bp reads for L. perenne, and 2.3*109 paired-end 150 bp reads for L. grandiflorum. RNA extraction and sequencing For genome annotation, we obtained RNA sequencing data from leaves, stems, flower buds and open flowers of L. perenne L96A and L. grandiflorum L62.06. Total RNA was extracted using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany). Sequencing libraries were prepared using the TruSeq stranded mRNA library preparation kit (Illumina, San Diego, CA, USA) including polyA selection and unique dual indexes (Illumina, San Diego, CA, USA), and were sequenced using paired-end 150bp reads on a NovaSeq6000 system. De novo genome assembly We generated primary and haplotype-resolved genome assemblies based on HiFi and Hi-C data of our outbred S-morph L. perenne L96A and L. grandiflorum L62.06 individuals using integrated Hi-C assembly settings in Hifiasm (Cheng et al., 2021). For each species, we generated two high-quality phased haplotype assemblies, designated as hap1 and hap2, as well as a primary assembly. Assembly completeness was checked using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Waterhouse et al., 2018) against the eudicots_odb10 gene data set. Prior to annotation, assemblies were screened for contamination (Note S1) and presence of chloroplast or mitochondrial sequences as described in (Gutiérrez-Valencia et al., 2024). Genome annotation Annotation of genes and repeats was performed using open-source pipelines in use at the National Bioinformatics Infrastructure Sweden (NBIS) Annotation and Assembly unit (See Data Availability). We used a combination of evidence-based and ab initio annotation, followed by functional annotation. In addition, repeats were modeled and annotated, after vetting them against annotated genes. We fully annotated the primary and haplotype-resolved assemblies of each species. For evidence-based gene annotation methods we used both proteins and transcriptomes. As protein evidence, we used proteins from sequenced Linum species (L. tenue, Linum usitatissimum), more distantly related species from the Malpighiales (Manihot esculenta, Populus trichocarpa, Ricinus communis, Salix purpurea), the Vitales (Vitis .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 8 vinifera) as well as Uniprot data for rosids. We further used transcriptome data from leaves, stems, buds and flowers of L. perenne L96A and L. grandiflorum L62.06. After adapter- trimming with fastp v0.23.2 (Chen et al., 2018), RNAseq reads were aligned to the reference genome using Hisat2 v2.1.0 (Kim et al., 2015). Genome-guided assembly of transcripts was done using StringTie v2.2.1 (Pertea et al., 2015), using MultiQC (Ewels et al., 2016) for quality-checking. Evidence-based annotation was performed using MAKER v3.01.02 (Holt & Yandell, 2011) including aligned transcript sequences and reference proteins as evidence, whereas ab-initio training was conducted using GeneMark v4.3 (Besemer et al., 2001), Augustus v3.3.3 (Stanke et al., 2006), and Snap 2013_11_29 (Korf, 2004). Finally, results from ab-initio and evidence-based annotation were combined to produce final gene builds, which were functionally annotated using Blast (v. 2.9.0) (Altschul et al., 1990) matches against Uniprot/Swissprot and results from InterproScan 5.59-91.0 (Hunter et al., 2012). Species-specific repeat libraries were generated using RepeatModeler (Smit & Hubley, 2008), and candidate repeats were vetted against protein evidence (excluding transposons) to exclude low-complexity coding sequences. Finally, repeat identification was performed using RepeatMasker (Smit et al., 2013) and RepeatRunner (Smith et al., 2007). Manual curation of annotation in the S-locus region To describe and compare the gene content of the S-locus region across species, we manually curated gene annotation in genome regions of L. perenne and L. grandiflorum containing their respective S-morph hemizygous S-locus (L. perenne h1tg000002l:1,080,000-4,890,000; L. grandiflorum h1tg000023l: 11,240,000-12,420,000 – see section “Identification of the S- locus in L. perenne and L. grandiflorum” below for details) by inspecting transcriptome evidence for the original annotation as well as for TransDecoder/StringTie v2.2.1 based gene predictions. Manual curation resulted in removal of eight gene models and addition of two gene models in the L. perenne hemizygous S-locus region, and removal of 14 gene models that were not supported by transcriptome data, and addition of five new gene models based on TransDecoder (https://github.com/TransDecoder/TransDecoder) output in the L. grandiflorum hemizygous S-locus region. We performed additional transposable element (TE) and repeat annotation to improve TE classification completeness prior to tests for repeat enrichment at the S-locus. Specifically, we used HiTE v3.2 (Hu et al., 2024) in conjunction with LTR_retriever v2.9.9 (Ou & Jiang, 2018) to build a repeat element library and annotate the genome with full-length TEs, classified .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 9 using RepeatMasker v4.1.5 (Smit et al., 2013) in sensitive mode. Statistical comparison of S- locus and genome-wide repeat content was done using binomial tests in R v4.3.2. Whole-genome short-read sequencing DNA for short-read sequencing was extracted from 157 samples of L. perenne from three natural populations and two accessions, and for L. grandiflorum we extracted DNA from 22 individuals from three accessions (Table S1), using the Quick-DNA Miniprep Plus Kit (Zymo Research, Irvine, CA, USA). We also acquired short-read sequencing data for five additional distylous species of Linum (Fig. 1; Table S1) following the same procedure but using magnetic beads and the Quick-DNA MagBead Plus kit (Zymo Research, Irvine, CA, USA) for DNA extraction. Sequencing libraries were prepared from 1 µg DNA using the TruSeq PCRfree DNA sample preparation kit (Illumina, San Diego, CA, USA) with unique dual indexes, targeting an insert size of 350 bp. Libraries were sequenced on a NovaSeq 6000 system, yielding paired-end 150 bp reads. Short-read processing, mapping, variant calling and filtering Illumina whole-genome resequencing reads were adapter- and quality-trimmed using BBDuk from BBMap v38.61b (Bushnell, 2014), and mapped using BWA-MEM v0.7.18 (Li, 2013). We excluded mapped reads with a mapping quality lower than 20 and duplicates using Picard tools v3.1.1 (http://broadinstitute.github.io/picard). Variants were called using BCFtools mpileup v1.17 (Danecek & McCarthy, 2017) independently for each species. We kept only bi-allelic variants and invariant sites, and applied additional filters for depth, missingness and mapping quality (BCFtools min_depth = 5; max_depth = 200; missingness = 0.9; min_quality = 20). Due to the high repeat content of our assemblies additional masking of repeats was necessary. Hence, we masked repeats using ‘bedtools intersect’ and by filtering on coverage as in (Gutiérrez-Valencia et al., 2021a). Finally, to reduce false heterozygous calls, we applied an allele balance filter with thresholds 0.2 and 0.8, setting heterozygous calls that failed this criterion to missing. Identification of the S-locus in L. perenne and L. grandiflorum To identify the S-locus we tested for an association between single nucleotide polymorphism (SNP) genotype and floral morph using genome-wide association mapping (GWAS). Prior to GWAS we removed sites with missing data, rare variants (minor allele frequency < 0.05), .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 10 and pruned variants with high linkage disequilibrium (LD) (r2 > 0.2) in 50 kb windows. We performed association analysis in PLINK v1.90b4.9 (Purcell et al., 2007) using Fisher’s exact test on genotypes, assuming a dominant effect for the minor allele and applying a False Discovery Rate (FDR) P-value adjustment. In L. grandiflorum, this association analysis used 15,927 LD pruned SNPs in 11 S-morph and 7 L-morph individuals from three accessions (Table S1). In L. perenne, we first analyzed 13,992 LD-pruned SNPs genome-wide from 7 S- morph and 12 L-morph L. perenne individuals from one family. Because family-based analyses can have limited resolution, we validated our findings by GWAS analyses on 53 individuals from one natural population (ger3, Table S1; Note S2). We performed depth of coverage analyses to identify genomic regions with presence- absence variation between morphs in the genomes of L. perenne and L. grandiflorum and narrow down the position of the S-locus. Depth of coverage of reads mapped to the hap1 haplotype-resolved assembly of each species was calculated in 300kb kb windows using BEDTools v2.31.1 (Quinlan & Hall, 2010) and normalized by total sample read count. We identified windows that differed in normalized median coverage between individuals with different floral morph (L- vs S-morph) using a two-sample Fisher-Pitman permutation test in R (v.4.3.2, package “coin” v1.4.3), with 1,000,000 permutations, using a significance threshold of (P£0.01) after Bonferroni multiple testing correction. Stepwise assembly of the S-locus gene set We estimated dS between S-locus genes and their closest paralogs in L. grandiflorum and L. perenne, to test for an impact of stepwise gene movement on S-locus gene content. To do so, we estimated dS between each S-locus gene and its closest paralog, identified by OrthoFinder v2.5.5 (Emms & Kelly, 2019). We estimated dS in MEGA X (Tamura et al., 2021) under the Nei-Gojobori model. Widely different dS estimates for different genes imply stepwise gene duplication at different times, and very low estimates suggest very recent gene duplication. Selection on S-locus candidate genes and the age of the S-locus in Linum To assess purifying selection on the S-locus candidate genes TSS1 and WDR-44, we estimated the ratio of nonsynonymous to synonymous sequence divergence (dN/dS) for both genes across L. perenne, L. grandiflorum, L. tenue and five additional distylous species of Linum (Fig. 1; Table S1). We obtained TSS1 and WDR-44 sequences of the additional distylous Linum using the HybPiper pipeline (Johnson et al., 2016), based on a target file including .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 11 available gene sequences of TSS1 and WDR-44 from L. tenue and L. trigynum (Gutiérrez- Valencia et al., 2022, 2024), L. grandiflorum and L. perenne. We constructed multiple sequence alignments for coding sequences of each gene using T-Coffee (Notredame et al., 2000), inferred gene trees using RAxML with the GTR-Gamma model of nucleotide substitution (Stamatakis, 2014) and estimated dN/dS ratios using codeml in PAML (Yang, 2007), comparing a model allowing different values of dN/dS for each branch to a constrained model with only one dN/dS ratio for the whole tree using a likelihood ratio test (LRT). Next, we took advantage of the presence of WDR-44 paralogs to compare selective pressures on the S-locus sequence compared to its paralogs and to infer the timing of duplication. To compare selective pressures on the S-locus vs paralogs of WDR-44 we used the codeml branch model and three phylogenetic tree annotations: (i) only one dN/dS ratio for the whole tree, (ii) one dN/dS ratio for S-locus sequences and one for paralogs, and (iii) one dN/dS ratio per species, with model selection based on LRTs. To estimate the timing of duplication of WDR-44, we ran BEAST2 v2.7.7 (Bouckaert et al., 2019) with a lognormal optimized relaxed molecular clock and a calibrated Yule model. For calibration we used the timing of diversification of Linum species, i.e., ~33 Mya (parameters: lognormal, M = 3.5, S = 0.05) (Maguilla et al., 2021). We set the chain length to 100,000,000, sampling every 10,000th step and obtained the final estimate after excluding the first 10% of trees as burn-in. We also applied a simple molecular clock analysis to estimate the divergence time of L. tenue and L. perenne (t = dS/(2µ)), based on synonymous sequence divergence at TSS1 and WDR- 44 estimated under the Nei-Gojobori model in MEGA X and assuming a mutation rate (µ) of 7*10-9 (Ossowski et al. 2010). Finally, we included WDR-44 sequences for three outgroups in our phylogenetic reconstruction. For Tirpitzia sinensis, the WDR-44 sequence was retrieved using the same procedure as for the Linum species. For Manihot esculenta we used Manes.15G085300 for M. esculenta and for Populus trichocarpa Potri.011G122500 as in (Gutiérrez-Valencia et al., 2022). Brassinosteroid supplementation experiment To test whether supplementation with active brassinosteroid hormone affects style length specifically in the S-morph of both L. perenne and L. tenue, we performed a controlled experiment with two treatments: eBL treatment (10 µM 24-epibrassinolide (eBL) dissolved in 0.1% dimethylsulfoxide, DMSO) and control treatment (0.1% DMSO). eBL concentration was chosen based on initial tests with 1 µM, 10 µM, and 20 µM eBL. Young flower buds .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 12 were injected until saturation on two consecutive days with with either eBL or control solution. Fully open flowers were dissected, photographed under a stereo microscope (Leica S APO) and style and stamen lengths were determined using ImageJ 1.53k (Schneider et al., 2012). One to two flowers of eight L-morph and 11 L. perenne S-morph individuals were subjected to each treatment type (eBL or control) for a total of 90 measurements of style and stamen length. For L. tenue, two flowers of each of 19 L-morph and S-morph individuals were subjected to each treatment type, for a total of 152 measurements of style and stamen length. The experiment was not performed on L. grandiflorum due to growth chamber space and time constraints. Style and stamen lengths were analyzed separately using analysis of variance (ANOVA) in R v4.1.1 using the lm() function, with organ length as the response and floral morph, hormonal treatment, and the interaction between floral morph and treatment as predictors. For models with significant effects, we conducted a post-hoc test using the Tukey “Honest Significant Difference” (HSD) method and obtained 95% CIs for the difference in mean organ length. To test whether eBL treatment affects style cell length, we quantified cell length in styles subjected to eBL or control treatment in L. perenne. To obtain an image of the epidermal cells in control and treated styles, a thin layer of UV-cured transparent nail polish (Semilac, Poznan, Poland) was applied to a microscope slide and excised styles were carefully placed on the surface of the nail polish (this method was not feasible for fixed L. tenue material and therefore it was only performed in L. perenne). After hardening under the UV-light the imprint was photographed under a light microscope (Olympus BX60) and cell sizes were measured using ImageJ 1.53k (Schneider et al., 2012). Cell length measurements were obtained separately for three different sections at the bottom, middle and top of the style (10 cells measured per section), following (Ushijima et al., 2015) and (Foroozani et al., 2023). Measurements were performed on two to four flowers of each of eight L. perenne L- morph individuals and 11 S-morph individuals, resulting in a total of 805 cell length measurements. The 10 cell length measurements for each style section were averaged prior to linear model analysis using lm() in R v4.1.1. We tested for an effect of eBL treatment on mean style cell length using a linear model with mean cell length as the response, and floral morph, style section (bottom, middle, or top) and hormonal treatment as predictor variables. Cell lengths were log-transformed to improve normality of residuals. Post-hoc tests were performed and 95% CI intervals obtained for significant effects as described above. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 13

Results

High-quality phased genome assemblies of Linum perenne and Linum grandiflorum As both L. grandiflorum and L. perenne are SI (Murray 1986) and outbred, we assembled both a primary assembly and a pair of haplotype-resolved assemblies for each species based on PacBio HiFi and Hi-C data. We sequenced S-morph individuals which are expected to harbor both the dominant and recessive alleles at their S-loci. Hi-C integrated assembly in Hifiasm (Cheng et al., 2021) resulted in highly complete primary and haplotype-resolved assemblies with BUSCO scores ranging from 94.0% to 95.1% (Table S2) that were highly contiguous, with N50 scores ranging from 10.2 Mb to 69.9 Mb (Table S2). Assembly lengths were similar to genome sizes estimated by flow cytometry (Table S2). We annotated the primary and haplotype-resolved assemblies using a combination of ab-initio and evidence-based methods. We identified a total of ~42,000 protein-coding genes in our L. grandiflorum assemblies, whereas our L. perenne assemblies had ~45,000 protein- coding genes (Data S1A). Compared to L. tenue, where 49.4% of the genome consisted of repeats (Gutiérrez-Valencia et al., 2022), the genomes of L. grandiflorum and L. perenne were richer in repeats, with 78.2% and 69.5% of the respective genome assemblies annotated as repetitive (Data S1A). The relatively high gene numbers of L. perenne and L. grandiflorum likely result from an ancient whole genome duplication in the ancestor of these species (Sveinsson et al., 2014). Hemizygosity in the S-morph is a common feature of Linum S-locus supergenes To identify S-loci in L. perenne and L. grandiflorum we searched for single nucleotide polymorphisms (SNPs) whose genotypes were associated with floral morph. Because many distyly S-locus supergenes harbor presence-absence variation, we also tested for presence- absence variation between floral morphs using short-read depth of coverage analyses. In L. grandiflorum, GWAS identified two SNPs on contig h1tg000023l of haplotype- resolved assembly hap1 as significantly associated with floral morph (Fisher exact test, assuming dominant effect of the S-morph-specific allele, FDR<0.05) (Figs 2a, S1). The associated SNPs define an ~1.2 Mb region on contig h1tg000023l ranging from ~11.2 Mb to ~12.4 Mb. Within this region, coverage analyses showed presence-absence variation between floral morphs, with significantly lower normalized median coverage in L-morph than S- morph individuals (median normalized coverage 0 for L-morph and 14.8 for S-morph, .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 14 permutation test with 1,000,000 resamples, P<0.01 after Bonferroni correction, Fig. 2a). These results suggest that h1tg000023l on haplotype-resolved assembly hap1 harbors the longer, dominant allele at the L. grandiflorum S-locus. Comparison between the two haplotype-resolved assemblies of L. grandiflorum confirmed the presence of a ~1.2 Mb hemizygous region in S-morph individuals and identified h2tg000012l on the hap2 assembly as harboring the shorter, recessive S-allele. Although the recessive allele was shorter than the dominant allele, it included a unique 70 kb region missing from the dominant allele. Finally, inspection of the L. grandiflorum genome annotation showed that the S- morph specific region on the dominant S-allele (i.e., on h1tg000023l) harbored the L. grandiflorum gene Thrum Style Specific 1 (TSS1), a style length candidate gene previously identified as S-linked (Ushijima et al., 2015) (Figs 2a, S1). Taken together, these results indicate that the S-locus of L. grandiflorum consists of a ~1.2 Mb genomic region which harbors presence-absence variation and is hemizygous in S-morph individuals. In L. perenne, family-based GWAS analysis resulted in significant associations between floral morph and SNP genotype on four contigs. We identified a total of 124 significantly associated SNPs on contig ht1g000002l, 33 on contig ht1g000009l, 2 on ht1g000026l, and 20 on ht1g000047l of our hap1 haplotype-resolved genome assembly (Fisher exact test, assuming dominant effect of the S-morph-specific allele, FDR<0.01) (Figs 2b, S1). These results were validated using a population-based GWAS which showed that the same four contigs accounted for 97.7% of GWAS hits (Note S2). Both contigs with the highest number of morph-associated SNPs (ht1g000002l and ht1g000009l on hap1) map to the same contig in the alternate haplotype-resolved genome assembly (ht2g000035l on hap2), implying that at least 87.7% of the SNPs that show an association with floral morph map to the same chromosome. In total, the contigs showing an association between floral morph and SNP genotype span more than 30 Mb in L. perenne. The large size of the region associated with floral morph is likely due to elevated linkage disequilibrium in this specific genomic region (Note S3; Fig. S2). Coverage analyses further indicate that hap1 contig ht1g000002l corresponds to the dominant S-allele, as it harbored an ~3.8 Mb region specific to the S-morph (median normalized coverage 14.7 in S-morph, 0 in L-morph) (Fig. 2b), and additionally identified an ~800 kb region specific to the recessive allele (hemizygous in S-morph, diploid in L-morph). Inspection of the annotation of the dominant S-haplotype showed the presence of an ortholog of TSS1 (Figs 2b, S1; Table 1; Data S1B). These results indicate that the L. perenne S-locus .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 15 includes an approximately 3.8 Mb region that is specific to and hemizygous in the S-morph (Figs 2b, S1). Taken together, association mapping, coverage analyses, and comparisons of haplotype-resolved assemblies indicate that, like in L. tenue, the distyly S-loci of L. perenne and L. grandiflorum harbor indel variation, but their S-morph-specific hemizygous regions differ in size (L. perenne ~3.8 Mb, L. grandiflorum ~1.2 Mb). Both S-loci are considerably larger than the previously identified S-locus of L. tenue: 260 kb (Gutiérrez-Valencia et al. 2022). Divergent S-loci share distyly candidate genes despite pervasive differences in gene content Next, we compared the gene content of the S-loci of L. perenne, L. grandiflorum and L. tenue. Apart from the style length candidate gene TSS1 ((Ushijima et al., 2015); also termed LtTSS1 (Gutiérrez-Valencia et al., 2022)), the S-loci of both L. perenne and L. grandiflorum harbored orthologs of the anther height/pollen SI candidate gene WDR-44 (also termed LtWDR-44 (Gutiérrez-Valencia et al., 2022, 2024)) identified as S-linked in L. tenue (Gutiérrez-Valencia et al., 2022, 2024). One additional S-linked gene, MPT1 (mitochondrial phosphate transporter; GO:0005315) was shared between L. perenne and L. grandiflorum, but not with the more distantly related L. tenue (Fig. 3a-b). No other gene homology was detected when comparing the gene content of the hemizygous region of L. grandiflorum to the four morph-associated contigs of L. perenne (Fig. 3a-b, S3; Table 1; Note S4). The number of annotated genes in the S-linked hemizygous region differed greatly between L. grandiflorum, L. perenne and L. tenue, with the longer, dominant haplotype having 24 vs 32 annotated protein-coding genes in L. grandiflorum and L. perenne, compared to only nine in L. tenue (Gutiérrez-Valencia et al. 2022) (Table 1, Data S1B). In Primula (Huu et al., 2020) and L. tenue (Gutiérrez-Valencia et al. 2022), the gene set of the S-locus has been assembled stepwise via gene duplication. If this process occurred continuously after the origin of the S-locus in Linum, it might help explain the large differences in gene content between Linum S-loci that we observe (Note S4). To test this hypothesis, we estimated dS between S-locus genes and their closest paralogs in L. grandiflorum and L. perenne, and found evidence for wide differences in the timing of duplication, as well as very recent duplication of S-locus genes (Fig. S3). The closest paralogs of S-locus genes were found on multiple contigs in both L. grandiflorum and L. perenne, as expected under stepwise gene duplication (Table S 3). These results suggest that stepwise gene duplication, occurring .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 16 independently in different Linum lineages, has contributed to the differences in S-locus gene content we observe. Divergent S-loci are enriched for different classes of repeats The S-locus is expected to accumulate repeats due to the combined effects of lack of recombination and reduced effective population size (reviewed by Gutiérrez-Valencia et al. 2021). In line with this expectation, we found that the S-locus was enriched in repetitive elements relative to the genome-wide average in both L. perenne, L. grandiflorum, and L. tenue (Fig. 3c). Enrichment was driven primarily by retroelements, specifically Ty3-like Long Terminal Repeat (LTR) retroelements (Fig. 3d). However, the content of certain classes of TEs differed between species, with the L. grandiflorum S-locus enriched for rolling circle TEs, in contrast to the S-loci of L. perenne and L. tenue (Fig. 3e). Functional constraints on distyly candidate genes at the S-locus over 30 Mya The presence of TSS1 and WDR-44 on the dominant haplotypes of the S-loci of L. grandiflorum, L. perenne and L. tenue (Gutiérrez-Valencia et al., 2022) despite major differences in S-locus gene content (Note S4) suggests that TSS1 and WDR-44 may be ancestrally shared and conserved at the S-locus due to their function in the determination of floral morph and/or SI. A simple molecular clock analysis of synonymous divergence at TSS1 and WDR-44 between L. perenne and L. tenue supported this conclusion, as it placed the split between these species at approximately 31-37 Mya (TSS1: t = 36.6 Mya (± S.E. 7.3 Mya), dS=0.513±0.10; WDR-44: t=31.3 Mya (± S.E. 2.1 Mya), dS=0.438±0.03), consistent with these genes having been retained since the diversification of Linum ~33 Mya. While TSS1 is a single-copy gene with homologs in outgroups of Linum, WDR-44 is part of a gene family and harbors non-S-linked paralogs (Gutiérrez-Valencia et al., 2022). To determine when TSS1 and WDR-44 first came together in the same genomic region, and to quantify sequence-level constraint on these distyly candidate genes, we assembled sequences of TSS1, WDR-44 and a set of paralogs of WDR-44 from five additional distylous Linum species (Figs 1a, 4a-b; Table S1). Phylogenetic analysis indicated that the S-linked copy of WDR-44 originated by gene duplication approximately 37 Mya (95% highest posterior density interval (HPD): 30.43-48.22 Mya), suggesting that duplication and translocation of WDR-44 into a genomic region already harboring TSS1 occurred at or before the diversification of Linum (estimated to have occurred 33 Mya (95% HPD: 27.2-38.3 Mya) .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 17 (Maguilla et al., 2021) (Figs 4b, S4). Consistent with this hypothesis, the closely related outgroup T. sinensis only harbored a sequence clustering with the non-S-linked paralogs of WDR-44, while WDR-44 sequences of more distant outgroups fell outside of the Linaceae. Taken together, these results support our inference that WDR-44 duplication occurred at or around the time of diversification of Linum (Fig. S4). If TSS1 and WDR-44 were retained at the distyly S-locus in Linum for >30 Mya, we would expect these genes to be under functional constraint. To test this hypothesis, we analyzed ratios of non-synonymous to synonymous divergence (dN/dS) across our eight Linum species. We found that for both TSS1 and the S-linked copy of WDR-44, a simple model with a single dN/dS across our Linum species was supported (TSS1: LRT, log LRT test statistic: 1.93, 8 df, NS; WDR-44: LRT, log LRT test statistic: 1.18, 8 df, NS), and both dN/dS estimates were well below 1 (dN/dS of 0.29±0.05 for TSS1, and 0.37±0.03 for WDR-44), consistent with both genes being under purifying selection (see also Note S5). However, elevated dN/dS of the S-locus copy of WDR-44 compared to its paralogs (0.37±0.03 vs 0.27±0.02; LRT, log LRT test statistic=9.08, 2 df, P=0.0107), suggests the possibility of relaxed purifying selection or alternatively more frequent positive selection on the S-locus copy of WDR-44. Elevated dN/dS at the S-locus copy might be expected under a model where duplication and neofunctionalization contributed to the formation of the distyly S-locus. These results suggest that the distyly S-locus of Linum formed at or before the diversification of Linum, and that the two S-locus candidate genes TSS1 and WDR-44, which are shared among widely diverged distylous Linum species, are under purifying selection, possibly related to their role in determining floral morph differences and/or SI. Regulation of style length by brassinosteroids in widely divergent distylous Linum The style length candidate gene TSS1 was present and conserved at the S-loci of L. tenue, L. perenne and L. grandiflorum, all of which exhibit style length polymorphism. We previously hypothesized that TSS1, which is primarily expressed in styles of S-morph individuals (Ushijima et al., 2015; Gutiérrez-Valencia et al., 2022), might result in shorter style cells and thereby shorter styles by downregulating brassinosteroid-responsive genes in a manner similar to its Arabidopsis homolog VUP1 (Grienenberger & Douglas, 2014). If so, treating floral buds with brassinosteroids should result in longer styles and style cells specifically in S-morph but not in L-morph Linum individuals. If the mechanism of action of TSS1 has remained conserved, we expect the effect of brassinosteroid treatment to be present in widely .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 18 divergent distylous Linum species, as long as their S-locus harbors functional TSS1. To test this hypothesis, we conducted a brassinosteroid supplementation experiment where L- and S- morph flower buds of L. tenue and L. perenne were treated with brassinosteroid solution (eBL: 10 µM 24-epibrassinolide, dissolved in 0.1% of the solvent dimethylsulfoxide, DMSO) or control treatment (control: 0.1% DMSO only), followed by measurement of style and stamen length. In line with our expectation, eBL treatment resulted in significantly longer styles in both L. perenne (Table 2; Fig. 5a) and L. tenue (Table 2; Fig. 5b) due to the specific effect of eBL treatment on style length in S-morph individuals (L. perenne: Two-way ANOVA, interaction of treatment and morph, Table 2; Fig. 5c; L. tenue: Two-way ANOVA, interaction of treatment and morph, Table 2; Fig. 5d). On average, eBL treatment resulted in 0.82 mm (95% CI: 0.34-1.30 mm; Fig. 5f) longer styles in L. perenne S-morph individuals and 0.94 mm (95% CI: 0.50-1.38 mm) longer styles in L. tenue S-morph individuals. While styles were significantly longer in eBL-treated S-morph individuals, they were still shorter than those of L-morph individuals (Fig. 5a), possibly due to the timing of application and/or concentration of eBL treatment used. The eBL treatment had no effect on style length in L- morph individuals of L. perenne or L. tenue (Fig. 5a, b), and there was no significant interaction effect on stamen length in L. perenne or L. tenue (Table S4). To test whether the effect of brassinosteroid treatment on style length was mediated by style cell length, we measured epidermal style cell length in L. perenne after eBL and control treatment. There was a significant effect of eBL treatment on mean style cell length (F1,75, P<0.0001; Figs 5e, S5a; Table S5), as well as a significant interaction between eBL treatment and morph (F1,75=5.7, P=0.02; Fig. 5e; Table S5). The effect of eBL on mean style cell length in the S-morph was 12.6 µm (95% CI: 4.9-20.3 µm). The brassinosteroid treatment had no significant effect on mean style cell length in L-morph individuals (Fig. 5e; Fig. S5). Taken together, the impact of brassinosteroid treatment on style and style cell length specifically in S-morph individuals suggests that a mechanism relying on the brassinosteroid pathway, likely regulated by TSS1, contributes to style length differences between floral morphs in widely diverged distylous Linum.

Discussion

.CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 19 One of the most prominent examples of convergent floral evolution in plants is distyly, yet until recently little was known about the molecular nature of distyly S-loci and the mechanisms underlying this multi-trait balanced polymorphism. Here, we expand our understanding of the nature and evolution of distyly S-loci by leveraging haplotype-resolved genome assemblies of widely diverged Linum species. We show that the S-locus supergenes of three Linum species that diverged as far back as 33 Mya share the same genetic architecture, characterized by an S-morph specific hemizygous region, although the sizes of the hemizygous regions vary greatly from ~260 kb to ~3.8 Mb. All three Linum species further harbor two only shared genes at the S-morph specific region of their S-loci; the style length candidate gene TSS1 (Ushijima et al., 2015; Gutiérrez-Valencia et al., 2022) and anther position/pollen SI candidate gene WDR-44 (Gutiérrez-Valencia et al., 2022, 2024). The presence of WDR-44 at the S-locus of L. grandiflorum, which lacks stamen length variation, is especially intriguing, given that WDR- 44 has been hypothesized to be an anther height/pollen SI gene (Gutiérrez-Valencia et al., 2022). We cannot currently rule out conservation of WDR-44 in L. grandiflorum due to an effect of this gene on pollen SI, as L. grandiflorum shares heteromorphic SI with both L. tenue and L. perenne (Murray 1986). Further detailed characterization and functional work will be required to determine the effects of WDR-44 in Linum. We have previously shown that TSS1 is present in outgroups of Linum (Gutiérrez- Valencia et al., 2022), and here we use paralog dating to show that WDR-44 originated through gene duplication ~37 Mya (95% CI 30.4 Mya - 48.2 Mya), suggesting that these two distyly candidate genes became colocated in one genomic region at or before the diversification of Linum s.l. ~33 Mya (Maguilla et al., 2021). The distyly S-locus therefore probably evolved early during diversification of Linum, through a process involving duplication and likely also neofunctionalization of WDR-44. Duplication and neofunctionalization of the anther height gene GLO2 (GLOT) have previously been documented at the distyly supergene in Primula (Li et al., 2016; Huu et al., 2020). Our results suggest that TSS1 was present and could have evolved presence-absence polymorphism regulating style length before WDR-44 was duplicated and became co-located with TSS1, broadly in line with predictions of the “pollen transfer” model of the evolution of distyly (Lloyd & Webb, 1992). However, we cannot currently rule out other scenarios, including one where both stamen and style length polymorphism was established at the same time through a large indel generating presence-absence variation for both TSS1 and WDR-44. Therefore, while our results do not allow us to distinguish between major models for the .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 20 evolution of distyly (Charlesworth & Charlesworth, 1979; Lloyd & Webb, 1992), they suggest the distyly S-locus evolved early during diversification of Linum. Molecular evolutionary analyses of TSS1 and WDR-44 show that both of these S-locus genes are under purifying selection in Linum. To test whether sequence conservation is related to effects on distyly, we focused on TSS1, which has been hypothesized to regulate style length via its impact on the brassinosteroid pathway (Gutiérrez-Valencia et al., 2022a). While functional studies of TSS1 are required to validate our findings, morph-specific effects of exogenous brassinosteroid application on style and style cell length in L. perenne suggest that TSS1 governs style length through its effect on brassinosteroid-regulated genes. Our finding that brassinosteroid supplementation also had a morph-specific effect on style length in L. tenue implies that the mechanism underlying style length polymorphism is conserved across these widely diverged distylous Linum species. Future studies should investigate whether brassinosteroid treatment also affects female SI reaction in Linum. While we cannot fully rule out the involvement of additional hormonal pathways in the regulation of style length in Linum, our results suggest that genes impacting the brassinosteroid pathway have repeatedly been recruited during convergent evolution of style length polymorphism in distylous species, including in Primula (Huu et al., 2022) and Turnera (Matzke et al., 2020, 2021). An early cessation of recombination at the supergene followed by independent evolution for an extended period could possibly explain the extensive variation in size, gene and repeat content that we observe among Linum distyly supergenes. Indeed, structural variants and repeats are expected to accumulate in non-recombining regions, especially over long evolutionary timescales (reviewed in (Gutiérrez-Valencia et al., 2021b)). Our observations of differences in the accumulation of repeat classes at the distyly S-locus among the studied species are consistent with independent TE accumulation in different lineages. Recent genomic studies have documented gene movement into the S-locus (termed “stepwise assembly”; (Huu et al., 2020; Gutiérrez-Valencia et al., 2022)). In L. tenue, we previously found evidence for very recent gene movement into the S-locus (Gutiérrez-Valencia et al., 2022), suggesting that this process could still be ongoing. Our analyses here suggest that stepwise gene duplication, occurring independently and continuously in different Linum lineages, could help explain the marked differences in gene content we observe at Linum S- loci. Our study revealed strong differences in the gene content at the S-locus across divergent Linum species, despite the presence of conserved candidate genes and a conserved .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 21 hormone-dependent mechanism regulating style length. Similar findings have been reported in other systems that share features with distyly supergenes. For instance, distyly supergenes and sex determining regions share the features of recombination suppression between alleles and morph-specific inheritance of one allele (reviewed in Gutiérrez-Valencia et al. 2021b). Therefore, it is perhaps not surprising that many of the patterns we document at the Linum distyly supergene are reminiscent of those at the sex determining region of Actinidia species (Akagi et al., 2023). Specifically, while we observe only two shared candidate genes for distyly located in an S-morph-specific genomic region across divergent Linum species, Akagi et al., (2023) similarly observed only three shared candidate sex determination genes located in a male-specific genomic region in divergent Actinidia species. Like Akagi et al., (2023), we also find that molecular mechanisms underlying morphs appear conserved, despite marked sequence-level evolution of the morph-specific region. These similarities suggest that further investigation of the parallels between the evolution of plant mating system supergenes and sex determining regions is a fruitful avenue for future theoretical and empirical work. Taken together, our results shed light on the genetic architecture, origin and evolution of the Linum distyly supergene, revealing the presence of conserved candidate genes and pathways regulating distyly, despite marked differences in supergene size, gene and repeat content. Our results and the genome assemblies produced here provide a foundation for further work on the role of parallel genetic changes for convergent evolution of floral form and function in distylous species.

Acknowledgements

We thank Benjamin Laenen and Aurélie Desamoré for assistance with plant sampling, Alireza Foorozani, P. William Hughes, and Juanita Gutiérrez-Valencia for assistance with plant cultivation and flow cytometry, Jerker Eriksson for technical assistance with plant growth chambers, Tomas Larsson for bioinformatics advice and Matias Wanntorp for bioinformatic assistance. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 and Horizon Europe research and innovation programmes (grant agreement No 757451 and 101132305), from the Swedish Research Council (grant agreements 2019-04452 and 2023-04532), from the Erik Philip- Sörensen foundation to T.S., and from the Nilsson-Ehle foundation to P.I.Z. Z.P. was funded by a Carl Tryggers foundation grant (CTS21:1471) to T.S. The authors acknowledge support from the National Genomics Infrastructure (NGI) in Sweden, funded by Science for Life .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 22 Laboratory, the Knut and Alice Wallenberg Foundation and the Swedish Research Council. Long-read sequencing was performed at the NGI Uppsala Genome Center, whereas short- read sequencing was performed by the NGI SNP&SEQ Technology Platform in Uppsala. Hi- C sequencing was performed by NGI in Stockholm. The computations were enabled by resources in projects SNIC2022/22-683, SNIC 2022/22-695, NAISS 2023/22-129, NAISS 2024/5-158, and NAISS 2023/4-5 provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at UPPMAX, funded by the Swedish Research Council through grant agreement no. 2022-06725. Support by the National Bioinformatics Infrastructure Sweden is gratefully acknowledged. Competing Interests The authors declare no competing interests. Author Contributions T.S. conceived of and designed the study, acquired funding, supervised the work and wrote the original draft. A.L. performed experiments. P.I.Z., Z.P., M.F., L.S., E.P.W., I.B., A.C. and T.S. performed analyses. P.I.Z., Z.P. and A.L. revised and edited the manuscript, with additional comments and input from M.F., L.S., E.P.W., I.B., and A.C. P.I.Z. and Z.P. contributed equally. Data Availability All sequencing data generated in this study has been uploaded to the European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena/) and will be released upon acceptance. Original code has been uploaded to Zenodo (10.5281/zenodo.14500764). NBIS open-source pipelines for genome annotation are available at: https://github.com/NBISweden/GAAS; https://github.com/NBISweden/AGAT; https://github.com/NBISweden/pipelines-nextflow.

References

Akagi T, Varkonyi-Gasic E, Shirasawa K, Catanach A, Henry IM, Mertten D, Datson P, Masuda K, Fujita N, Kuwada E, et al. 2023. Recurrent neo-sex chromosome evolution in kiwifruit. Nature Plants 9: 393–402. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 23 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic Local Alignment Search Tool. J. Mol. Biol 215: 403–410. Armbruster WS, Pérez-Barrales R, Arroyo J, Edwards ME, Vargas P. 2006. Three- dimensional reciprocity of floral morphs in wild flax (Linum suffruticosum): A new twist on heterostyly. New Phytologist 171: 581–590. Barrett SCH. 2019. ‘A most complex marriage arrangement’: recent advances on heterostyly and unresolved questions. New Phytologist 224: 1051–1067. Bateson W, Gregory R. 1905. On the Inheritance of Heterostylism in Primula. Proc. R. Soc. London B. 76: 581–586. Besemer J, Lomsadze A, Borodovsky M. 2001. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 29: 2607–2618. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, et al. 2019. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology 15: e1006650. Bushnell, B. 2014. BBMap: A Fast, Accurate, Splice-Aware Aligner. Lawrence Berkeley National Laboratory. LBNL-7065E. Castric V, Batista RA, Carré A, Mousavi S, Mazoyer C, Godé C, Gallina S, Ponitzki C, Theron A, Bellec A, et al. 2024. The homomorphic self-incompatibility system in Oleaceae is controlled by a hemizygous genomic region expressing a gibberellin pathway gene. Current Biology 34: 1–10. Charlesworth D. 2016. The status of supergenes in the 21st century: Recombination suppression in Batesian mimicry and sex chromosomes and other complex adaptations. Evolutionary Applications 9: 74–90. Charlesworth D, Charlesworth B. 1979. A Model for the Evolution of Distyly. The American Naturalist 114: 467–498. Chen S, Zhou Y, Chen Y, Gu J. 2018. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34: i884–i890. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18: 170–175. Danecek P, McCarthy SA. 2017. BCFtools/csq: Haplotype-aware variant consequences. Bioinformatics 33: 2037–2039. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 24 Darwin C. 1863. On the existence of two forms and on their reciprocal sexual relation in several species of the genus Linum. Bot J Linn Soc 26: 69–83. Darwin C. 1877. The Different Forms of Flowers on Plants of the Same Species. London: Cambridge University Press. Dulberger R. 1992. Floral Polymorphism and Their Functional Significance in the Heterostylous Syndrome. In: Springer Verlag, ed. Barrett, S.C.H., Ed .,Evolution and Function of Heterostyly. Berlin, 41–84. Emms DM, Kelly S. 2019. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biology 20: 238. Ernst A. 1936. Heterostylie-Forschung. Zeitschrift Für Induktive Abstammungs- Und Vererbungslehre 71:156– 230. Ewels P, Magnusson M, Lundin S, Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32: 3047–3048. Fawcett JA, Takeshima R, Kikuchi S, Yazaki E, Katsube-Tanaka T, Dong Y, Li M, Hunt HV, Jones MK, Lister DL, et al. 2023. Genome sequencing reveals the genetic architecture of heterostyly and domestication history of common buckwheat. Nature Plants 9: 1236–1251. Foroozani A, Desmond EL, Gough CA, Pérez-Barrales R, Brennan AC. 2023. Sources of Variation in Reciprocal Herkogamy in the Distyly Floral Syndrome of Linum tenue (Linaceae). International Journal of Plant Science 184: 142–155. Fulton T, Chunwongse J, Tanksley S. 1995. Microprep protocol for extraction of DNA from tomato and other herbaceous plants. Plant Molecular Biology Reporter 13: 1995. Ganders FR. 1979. The biology of heterostyly. New Zealand Journal of Botany 17: 607– 635. Grienenberger E, Douglas CJ. 2014. Arabidopsis vascular-related unknown protein1 regulates xylem development and growth by a conserved mechanism that modulates hormone signaling. Plant Physiology 164: 1991–2010. Gutiérrez-Valencia J, Fracassetti M, Berdan EL, Bunikis I, Soler L, Dainat J, Kutschera VE, Losvik A, Désamoré A, Hughes PW, et al. 2022. Genomic analyses of the Linum distyly supergene reveal convergent evolution at the molecular level. Current Biology 32: 4360-4371.e6. Gutiérrez-Valencia J, Fracassetti M, Horvath R, Laenen B, Désamore A, Drouzas AD, Friberg M, Kolář F, Slotte T. 2021a. Genomic Signatures of Sexual Selection on Pollen- Expressed Genes in Arabis alpina. Molecular Biology and Evolution 39: msab349. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 25 Gutiérrez-Valencia J, Hughes PW, Berdan EL, Slotte T. 2021b. The Genomic Architecture and Evolutionary Fates of Supergenes. Genome Biology and Evolution 13: 1–19. Gutiérrez-Valencia J, Zervakis PI, Postel Z, Fracassetti M, Losvik A, Mehrabi S, Bunikis I, Soler L, Hughes PW, Désamoré A, et al. 2024. Genetic Causes and Genomic Consequences of Breakdown of Distyly in Linum trigynum. Molecular Biology and Evolution 41. Holt C, Yandell M. 2011. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12: 491. Hu K, Ni P, Xu M, Zou Y, Chang J, Gao X, Li Y, Ruan J, Hu B, Wang J. 2024. HiTE: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation. Nature Communications 15: 5573. Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et al. 2012. InterPro in 2011: New developments in the family and domain prediction database. Nucleic Acids Research 40: D306–D312. Huu CN, Kappel C, Keller B, Sicard A, Takebayashi Y, Breuninger H, Nowak MD, Bäurle I, Himmelbach A, Burkart M, et al. 2016. Presence versus absence of CYP734A50 underlies the style-length dimorphism in primroses. eLife e17956. Huu CN, Keller B, Conti E, Kappel C, Lenhard M. 2020. Supergene evolution via stepwise duplications and neofunctionalization of a floral-identity gene. Proceedings of the National Academy of Sciences of the USA 117: 23148–23157. Huu CN, Plaschil S, Himmelbach A, Kappel C, Lenhard M. 2022. Female self- incompatibility type in heterostylous Primula is determined by the brassinosteroid- inactivating cytochrome P450 CYP734A50. Current Biology 32: 671–676. Johnson MG, Gardner EM, Liu Y, Medina R, Goffinet B, Shaw JA, Zerega NJC, Wickett NJ. 2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Applications in Plant Sciences 4: 1600016. Kim D, Langmead B, Salzberg SL. 2015. HISAT: A fast spliced aligner with low memory requirements. Nature Methods 12: 357–360. Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5: 1–9. Laibach F. 1923. Die abweichungen vom ‘“mechanischen”’ zahlenverhältnis der long- under kurz-griffel bei heterostylen pflanzen. Biol Zentralbl.: 148–157. Li H. 2013. Aligning sequence reads clone sequences and assembly contigs with BWA- MEM. arXiv - preprint: not peer reviewed. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 26 Li J, Cocker JM, Wright J, Webster MA, McMullan M, Dyer S, Swarbreck D, Caccamo M, Van Oosterhout C, Gilmartin PM. 2016. Genetic architecture and evolution of the S locus supergene in Primula vulgaris. Nature Plants 2: 1–7. Lloyd DG, Webb CL. 1992. The evolution of heterostyly. In: Springer, ed. Evolution And Function Of Heterostyly. Monographs on Theoretical and Applied Genetics. S.C.H. Barrett. 151–178. Maguilla E, Escudero M, Ruíz-Martín J, Arroyo J. 2021. Origin and diversification of flax and their relationship with heterostyly across the range. Journal of Biogeography 48: 1994–2007. Matzke CM, Hamam HJ, Henning PM, Dougherty K, Shore JS, Neff MM, McCubbin AG. 2021. Pistil mating type and morphology are mediated by the brassinosteroid inactivating activity of the S-locus gene BAHD in heterostylous Turnera species. International Journal of Molecular Sciences 22: 10603. Matzke CM, Shore JS, Neff MM, McCubbin AG. 2020. The Turnera style S-locus gene TsBAHD possesses brassinosteroid-inactivating activity when expressed in Arabidopsis thaliana. Plants 9: 1–13. McDill J, Repplinger M, Simpson BB, Kadereit JW. 2009. The phylogeny of Linum and Linaceae subfamily Linoideae, with implications for their systematics, biogeography, and evolution of heterostyly. Systematic Botany 34: 386–405. Murray BG. 1986. Floral biology and self-incompatibility in Linum. Botanical Gazette 147: 327–333. Naiki A. 2012. Heterostyly and the possibility of its breakdown by polyploidization. Plant Species Biology 27: 3–29. Notredame C, Higgins DG, Heringa J. 2000. T-coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302: 205–217. Ou S, Jiang N. 2018. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176: 1410–1422. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33: 290–295. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, De Bakker PIW, Daly MJ, et al. 2007. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81: 559–575. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 27 Quinlan AR, Hall IM. 2010. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842. Raimondeau P, Ksouda S, Marande W, Fuchs AL, Gryta H, Theron A, Puyoou A, Dupin J, Cheptou PO, Vautrin S, et al. 2024. A hemizygous supergene controls homomorphic and heteromorphic self-incompatibility systems in Oleaceae. Current Biology 34: 1–10. Ruiz-Martin J, Santos-Gally R, Escudero M, Midgley JJ, Pérez-Barrales R, Arroyo J. 2018. Style polymorphism in Linum (Linaceae): a case of Mediterranean parallel evolution? Plant Biology 20: 100–111. Schneider CA, Rasband WS, Eliceiri KW. 2012. NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9: 671–675. Shore JS, Hamam HJ, Chafe PDJ, Labonne JDJ, Henning PM, McCubbin AG. 2019. The long and short of the S-locus in Turnera (Passifloraceae). New Phytologist 224: 1316– 1329. Simón-Porcar V, Escudero M, Santos-Gally R, Sauquet H, Schönenberger J, Johnson SD, Arroyo J. 2024. Convergent evolutionary patterns of heterostyly across angiosperms support the pollination-precision hypothesis. Nature Communications 15: 1237. Smit A, Hubley R. 2008. RepeatModeler Open-1.0. Retrieved from: http://www.repeatmasker.org. Smit A, Hubley R, Green P. 2013. RepeatMasker Open-4.0. Retrieved from: http://www.repeatmasker.org. Smith CD, Edgar RC, Yandell MD, Smith DR, Celniker SE, Myers EW, Karpen GH. 2007. Improved repeat identification and masking in Dipterans. Gene 389: 1–9. Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. 2006. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Research 34: W435–W439. Sveinsson S, McDill J, Wong GKS, Li J, Li X, Deyholos MK, Cronk QCB. 2014. Phylogenetic pinpointing of a paleopolyploidy event within the flax genus (Linum) using transcriptomics. Annals of Botany 113: 753–761. Ushijima K, Ikeda K, Nakano R, Matsubara M, Tsuda Y, Kubo Y. 2015. Genetic control of floral morph and petal pigmentation in Linum grandiflorum desf., a heterostylous flax. Horticulture Journal 84: 261–268. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 28 Waterhouse RM, Seppey M, Simao FA, Manni M, Ioannidis P, Klioutchnikov G, Kriventseva E V., Zdobnov EM. 2018. BUSCO applications from quality assessments to gene prediction and phylogenomics. Molecular Biology and Evolution 35: 543–548. Yang Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24: 1586–1591. Yang J, Xue H, Li Z, Zhang Y, Shi T, He X, Barrett SCH, Wang Q, Chen J. 2023. Haplotype-resolved genome assembly provides insights into the evolution of S-locus supergene in distylous Nymphoides indica. New Phytologist: 1–14. Yasui Y, Mori M, Aii J, Abe T, Matsumoto D, Sato S, Hayashi Y, Ohnishi O, Ota T. 2012. S-LOCUS EARLY FLOWERING 3 is exclusively present in the genomes of short-styled buckwheat plants that exhibit heteromorphic self-incompatibility. PLoS ONE 7: e31264. Zhao Z, Zhang Y, Shi M, Liu Z, Xu Y, Luo Z, Yuan S, Tu T, Sun Z, Zhang D, et al. 2023. Genomic evidence supports the genetic convergence of a supergene controlling the distylous floral syndrome. New Phytologist 237: 601–614. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 29 Tables Table 1. Total number of protein-coding genes annotated at the dominant and recessive S-hemizygous region of Linum grandiflorum, Linum perenne, and Linum tenue, the number and identity of genes shared between species, and the number of other genes of potential interest for floral morphology/distyly. Species L. grandiflorum L. perenne L. tenue S- haplotype Dominant1 Recessive1 Dominant1 Recessive1 Dominant1 Recessive1 Total gene count2 24 4 (2)3 32 11 (3)3 9 2 (2)3 Shared gene count 3 (2)4 0 3 (2)4 0 24 0 Shared genes TSS14, WDR-444, MPT15 - TSS14, WDR-444, MPT15 - TSS14, WDR-444 - Other genes of interest NOV NOV AGL8, AGL80 AGL80 1Separate gene counts are given for the dominant and recessive S-haplotypes of each species 2Gene counts exclude genes with TE-related functional annotation. For detailed annotation information on listed genes, see DataS1B. 3Counts of annotated genes on the recessive haplotype that were also present on the dominant haplotype given in parentheses. 4Annotated genes shared between all three species. 5Annotated genes shared between L. grandiflorum and L. perenne only. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 30 Table 2. Brassinosteroid treatment has a morph-specific effect on style length (mm) in both the distylous species Linum perenne and Linum tenue. Species Source of variation1 Df2 SS3 MS4 F5 P-value L. perenne Morph 1 60.36 60.36 244.24 <0.0001 Treatment 1 3.71 3.71 15.03 <0.0001 Morph*Treatment 1 1.51 1.51 6.11 0.02 Residuals 42 10.38 0.25 L. tenue Morph 1 142.7 142.7 537.1 <0.0001 Treatment 1 6.48 6.48 24.4 <0.0001 Morph*Treatment 1 2.44 2.44 9.2 0.003 Residuals 72 19.13 0.27 1Analysis of variation (ANOVA) sources of variation 2Degrees of freedom 3Sums of squares 4Mean squares 5F-statistic .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 31 Figure Legends Fig. 1. Schematic phylogeny and floral morphs of the study species. a. Schematic phylogeny and divergence times of the species used in the study. The three main species of interest (L. grandiflorum, L. perenne and L. tenue) are highlighted with color. b. Floral morph of S-morph (left) and L-morph (right) L. tenue. c. Floral morph of S-morph (left) and L- morph (right) L. perenne. d. Floral morph of S-morph (left) and L-morph (right) L. grandiflorum. In panels b-d, Positions of anthers (a) and stigmas (s) are marked in the Fig. and magnification is indicated by a scale bar (5 mm). Part of the corolla and sepals were removed for improved visibility of sexual organ location. Fig. 2. S-loci of Linum grandiflorum and Linum perenne harbor large S-morph hemizygous regions containing distyly candidate genes. Both L. grandiflorum contig h1tg000023l (a) and L. perenne contig ht1g000002l (b) harbor S-morph hemizygous S-linked regions (coverage values, left y-axis) which contain key candidate distyly genes TSS1 and WDR-44. The size of the hemizygous region and the pattern of SNP association (points showing GWAS significance vs position, right y-axis, significance level α=0.01 indicated by a dashed line) differ between species. In each plot, darker and lighter lines correspond to S- morph and L-morph normalized coverage, respectively, surrounded by shaded regions indicating 95% confidence intervals. The grey areas correspond to regions hemizygous in S- morph individuals, based on coverage analysis and alignment of haplotype-resolved assemblies. The positions of key candidate genes TSS1 and WDR-44 are indicated by dotted lines and arrows and the x-axis shows position on each contig (in base pairs). Fig. 3. Gene and repetitive element content at the hemizygous S-locus regions. a-b. Schematic depiction of the haplotype structure and gene content on the dominant and recessive alleles at the hemizygous region of the L. grandiflorum (a) and L. perenne (b) S- locus. Genes are indicated by blue boxes and arrows indicating orientation. Names of candidate genes (S-linked genes shared between Linum species) and genes present on both the recessive and dominant haplotypes are shown. c-e. The log2-fold enrichment of repetitive elements (c), long terminal repeat (LTR) elements (d) and DNA transposons (e) at the S- morph hemizygous S-locus region of L. grandiflorum, L. perenne and L. tenue. Colours indicate the -log10 p-value from a binomial test of repeat enrichment. Circle sizes denote the (log10-transformed) proportion of the S-locus region made up of a certain type of repeat. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 32 Fig. 4. Phylogenetic trees of TSS1 and WDR-44 for eight style polymorphic species of Linum. a. Phylogenetic tree of the conserved S-locus candidate genes TSS1 for style polymorphic Linum (all distylous except L. grandiflorum which exhibits stigma height dimorphism) reconstructed using RAxML under a GTR-GAMMA substitution model. Support values based on 100 bootstraps are indicated by each node. b. Phylogenetic tree of the conserved S-locus candidate genes WDR44 for style length polymorphic Linum reconstructed using RAxML under a GTR-GAMMA substitution model. Support values based on 100 bootstraps are indicated by each node. Estimates of the inferred timing of duplication and age of each clade based on BEAST2 analysis are shown for WDR-44, with 95% confidence intervals indicated in square brackets. Fig. 5. Brassinosteroid supplementation results in longer styles and style cells in S- morph but not L-morph morph individuals of widely divergent Linum species. a, b. Boxplots showing significantly longer S-morph but not L-morph styles in L. perenne (a) and L. tenue (b) after epibrassinolide (10 uM eBL in 0.1% DMSO) treatment of flower buds compared to control treatment (0.1% DMSO only). c, d. Interaction plots demonstrating a significant interaction between floral morph and eBL treatment, in both L. perenne (c) and L. tenue (d). e. eBL treatment results in significantly longer epidermal style cells in S-morph but not L-morph individuals of L. perenne. f. Photograph of control and eBL-treated L. perenne S-morph sexual organs, showing the effect of eBL treatment on style length. Stigma positions are indicated by arrows. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 5 mm a as s ba d c a s a s a s a s 5 mm 5 mm S-morph (S/s) S-morph (S/s) S-morph (S/s) L-morph (s/s) L-morph (s/s) L-morph (s/s) Tirpitzia sinensis Linum grandiflorum Linum narbonense Linum perenne Linum tenue Linum capitatum Linum suffruticosum Linum flavum Linum tauricum SEC. LINUM SEC. LINOPSIS B SEC. SYLLINUM SEC. LINOPSIS C NEOGENEPALEOGENE 30 20 10 0 Mya MioceneEocene Oligocene Pliocene Pleistocene .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 0 20 40 10200000 11200000 12200000 13200000 13200000 Coverage 0.0 0.5 1 1.5 2 −log10 p-value (FDR) −log10 p-value (FDR) 0 20 40Coverage WDR-44 1000000 2000000 3000000 4000000 5000000 6000000 TSS1 TSS1WDR-44 11244948 124178181.2 Mbp 1084734 3.8 Mbp 4887446 Position (bp)Position (bp) 0 1 2 3 4 a b .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint TE Enrichment [log2(fold of S-locus to genome)] Species −4 −2 0 2 TE Enrichment [log2(fold of S-locus to genome)] L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum −4 −2 0 2 L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum L. tenue L. perenne L. grandiflorum SpeciesSpecies c d e LTR elements DNA transposons Repetitive elements −log10 p value 0100200 −1 −2 −3 −4 Proportion of S-locus TEs (log10) Retro- transposons DNA transposons Rolling circleUnclassifiedSmall RNA Simple repeats Low complexity Ty1/CopiaTy3-like, DIRS1Retroviralhobo, Activator Tc1, IS630, Pogo Tourist, Harbinger Linum grandiflorum a 11.2M 11.5M 11.8M 12.1M 12.4M WDR44 NOV TSS1 MPT1 4CLL6 4.9M5M5.1M 4CLL6 NOV 100 kb Dominant Recessive Linum perenne b 23.2M23.6M24M24.4M AGL80SDH2-2SERGT1 SERGT1SDH2-2 WDR44 MPT1 TSS1 AGL8AGL80 0.8M 1.6M 2.4M 3.2M 4M 4.8M 100 kb Dominant Recessive .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint L. suffruticosum L. flavum L. tauricum L. capitatum L. tenue L. narbonense L. grandiflorum L. perenne 0.06 L. tenue L. flavum - 2 L. narbonense L. perenne L. tenue L. tauricum L. capitatum - 1 L. grandiflorum L. capitatum L. flavum L. suffruticosum L. tauricum - 1 L. suffruticosum L. grandiflorum L. capitatum - 2 L. flavum - 1 L. tauricum - 2 L. narbonense L. perenne 100 WDR-44 100 100 100 100 100 100 100 100 100 100 100 100 100 41 99 70 54 100 100 76 92 65 0.05 37.5 Mya 33.2 Mya 23.0 Mya [30.4-48.2] [30.0-36.5] [9.0-38.5] E V I T A T U PS S E C N E U Q E S S U C O L - S E C N E U Q E S G O L A R A P L. grandiflorum bTSS1a .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint 4 5 6 7 S-morph L-morphMorph Style length (mm) 3 4 5 6 7 S-morph L-morphMorph Style length (mm) 4 5 6 7 Morph Style length (mm) S-morph L-morph eBL Control 3 4 5 6 7 Morph Style length (mm) S-morph L-morph eBL Control 3.6 3.8 4.0 4.2 4.4 4.6 4.8 Morph Mean cell length log(um) S-morph L-morph eBL control Control eBL Control eBL *** *** NS NSNS a b c d e f S-morph - Control S-morph - eBL .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted April 17, 2025. ; https://doi.org/10.1101/2025.04.11.648331doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-21T05:10:58.409756+00:00
License: CC-BY-NC-ND-4.0