Genome Evolution and Diversity of Wild and Cultivated Rice Species | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Biological Sciences - Article Genome Evolution and Diversity of Wild and Cultivated Rice Species Weixiong Long, Qiang He, Yitao Wang, Yu Wang, Jie Wang, Zhengqing Yuan, and 11 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4350570/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 18 Nov, 2024 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract Rice ( Oryza sativa L.) is a vital staple food globally, but its genetic diversity has decreased due to extensive breeding. However, research on genome evolution and diversity of wild rice species, particularly those with BB, CC, BBCC, CCDD, EE, FF, and GG genome types, is limited, impeding their potential in rice breeding 1,2 . This study presents chromosome-scale genomes of thirteen representatives wild rice species from the Oryza genus. By integrating these genomes with four previously published ones, a total of 101,723 gene families were identified across the genus, including 9,834 (9.67%) core gene families. Additionally, 63,881 new gene families absent in cultivated rice species were discovered. Comparative genomic analysis among Oryza genomes reveals potential mechanisms underlying genome size variation, centromere evolution, and gene number and expression influenced by transposable elements. Extensive structural rearrangements, large scale sub-genomes exchanges, and widespread allelic variations and regulatory sequence variations were discovered in wild rice. We noticed an inversion that are pervasive occurred in Oryza rufipogon and Oryza sativa japonica, which is tightly linked to a locus that might contributed to the expansion of geographical range. Interestingly, a notable expansion but less diversity in disease resistance genes in cultivated genomes was observed, likely due to the random loss of some R genes and extensive amplification of others for specific diseases during domestication and artificial selection. This comprehensive study not only provide previously hidden legacy accessible to genetic studies and breeding but also deepens our understanding of rice evolution and biology. Biological sciences/Genetics/Genome/Genetic variation Biological sciences/Plant sciences/Plant evolution Figures Figure 1 Figure 2 Introduction Rice ( Oryza sativa L.) is a crucial crop globally and serves as a key model species for monocot and crop plant research 3 . Rice production will need to double by 2050 in order to feed the demand of the increasing world population 4 . It is expected to break the current production bottleneck that stagnates rice yields by exploiting the presence and absence genes and interspecies allelic diversity 5,6 . The Oryza genus comprises two cultivated rice species: Asian and African rice, along with 20 extant wild rice species. These wild rice comprising six diploids (AA, BB, CC, EE, FF and GG) and five allotetraploids (BBCC, CCDD, HHJJ, HHKK and KKLL), exhibiting significant genetic and phenotypic diversity, adapting to various ecological environments across Asia, Africa, America, and Australia, exhibits great genetic and phenotypic diversity 7,8 . Recent advancements in obtaining high-quality genome assemblies of cultivated rice have allowed for a thorough characterization of structural variation through comparative genomic analysis 9–11 . Various studies have contributed to the construction of pangenomes that integrate cultivars and the AA genome of wild rice 12,13 . Currently, there is a lack of high-quality reference genomes for 11 rice wild relatives (BB, BBCC, CC, EE, FF, and GG) and a comprehensive super pangenome of the rice genus are lacking. Here, we de novo assembled high-quality genomes from 13 wild rice accessions (Supplementary Table 1), along with two cultivated rice species (cv. Nipponbare, referred to as NIP of japonica sub-species and R498 of indica sub-species), Oryza. glaberrima (cv. CG14 of African cultivated rice species) and Oryza rufipogon (W1943) 9,14–16 . The Oryza genus super pangenome usher a new era for rice researcher and breeders with the amount new resource to improve cultivated rice and meet future food demands. Results High-quality assemblies of thirteen representative wild rice species We selected thirteen representative wild rice species, including six allotetraploids and seven diploids, for de novo genome assembly. These accessions exhibit significant variation in geographical distribution and phenotype, as illustrated in Fig. 1 a. A total of 331.3 Gb of High-fidelity (HiFi) reads were generated for the 13 wild rice accessions, representing approximately 40-fold coverage relative to the NIP genome size of around 400 Mb (Supplementary Table 1). Through the integration of high-throughput chromosome conformation capture 17,18 , all 13 wild rice accessions were assembled at the chromosome level, resulting in genome size ranging from 393.77 Mb to 903.31 Mb with an average N50 contig size of 36.80 Mb (Supplementary Table 2, Supplementary Fig. 1, Extended data Figs. 1 , 2 ). The mapping of Illumina short reads and HiFi reads to the 13 wild rice genome assemblies revealed high percentages of alignment, with the results of more than 99.88% and 99.95%, respectively (Supplementary Fig. 2). While an average of 99.44% of the 1614 single-copy orthologs (BUSCO) were identified in these assemblies 19 . Furthermore, an average value of 23.64 for the LTR assembly index (LAI) was calculated for all assemblies 20 , as along with high consensus quality values, indicative of their superior quality, continuity and completeness (Supplementary Table 2, Supplementary Fig. 1). The gene structure annotation was performed using a combination of de novo, homologous, and transcript prediction methods for gene structure annotation based on the repeat-masked genome. This resulted in a higher number of protein-coding gene numbers (ranging from 37,711 to 85,846) compared to previous report on O. brachyantha and NIP 21 (41,096) (Supplementary Table 2). The transcript data supported a high percentage (ranging from 71.1–87.7%) of the predicted protein-coding genes, indicating the quality of the gene annotations (Supplementary Table 1). A phylogenetic tree based on 3555 single-copy orthologous genes grouped the 17 rice species into 5 clusters, which contradicted previous findings 22 (Fig. 1 b). Phylogenetic inference using 1000 randomly selected single-copy orthologs showed that 66.42% were supported, suggesting potential incomplete lineage sorting or gene flow (Fig. 1 b). Additionally, an ML tree using chloroplast data was constructed to trace the maternal progenitors of allopolyploid wild rice (Supplementary Fig. 3). Transposable elements (TEs) play a significant role in shaping large plant genomes and driving genome evolution through periodic bursts of amplification (Supplementary Fig. 4). The TE content in the 13 wild rice genomes ranged from 35.11–76.35%, with long terminal repeat retrotransposons (LTR-RTs) being the most abundant (Fig. 1 ). Construction of the super pangenome of the Oryza genus helps to review untapped genes and hidden genomic variations The rice pangenome was expanded to include 16 species (17 subspecies) in the Oryza genus, incorporating genomes from three AA-genome Oryza species. Through OrthoFinder analysis 23 , a super pangenome was constructed, clustering 808,478 predicted gene models from 13 wild rice species, along with three previously published AA genotype rice genomes and a reference genome of Oryza sativa (Nip), resulting in a pangenome cluster of 101,723 (Extended data Fig. 3 a, Extended data Fig. 4a). A total of 9.66% of the gene families (9,834) were found to be shared among all 17 rice accessions, classified as core gene families. Dispensable gene families, present in 2–15 individuals, made up 56.84% of the Oryza genus pan genome, while 33.48% were identified as species-specific gene families (Extended data Fig. 3 a, b). Compared with cultivated rice, our super pan-genome of this study can provide an additional 63,881 new gene families. The 17 rice accessions, comprising 11 diploids and 6 allotetraploids, were categorized into diploid genome types labeled A to G (Extended data Fig. 4b). Additionally, a syntenic pangenome was constructed to analyze the differentiation among the 7 diploid genomes, revealing 16,849 core gene families shared across these genomes, Dispensable families, accounted for 29.73% of the total gene sets (Extended data Fig. 4, Supplementary Table 3), with the largest proportion represented by genome type private gene sets, unique to individual genome types, making up 52.90% of the total gene sets (Supplementary Table 3). Within the Oryza genus, approximately 81.14% of the core genes could be assigned to protein domains in our Pfam and InterPro databases, which is nearly twice as high as the percentage of dispensable genes (41.82%) and more than seven times higher than accession-specific genes (10.83%) (Extended data Fig. 3 d). Core genes exhibited 6- to 20-fold higher expression levels compared to shell and private genes (Extended data Fig. 3 e). This tendency was also reflected in gene length, with core genes showing significantly lower (0.15-fold on average) pairwise nonsynonymous substitution/synonymous substitution ratios (Ka/Ks) compared to the dispensable genes (Extended data Figs. 3 f, g), indicating conservation of function among core genes in the Oryza genus, while variable genes evolved more rapidly to adapt to diverse environments. The average LTR insertion ratio of core genes was notably lower than that of shell- and species-specific genes (Extended data Fig. 3 i), likely due to the lower number of exons per gene in accession specific genes compared to dispensable and core genes (Supplementary Fig. 5), suggesting that exon shuffling or loss contributes to their specificity of genes to each species. Intriguingly, the core genes in the Oryza genus are primarily involved in fundamental functions such as transposition, iron ion binding, transport, and electron transport, indicating their role in maintaining essential activities of Oryza genus (Extended data Fig. 3 h, Supplementary Table 3). A further GO analysis of the different genome types of rice private genes showed distinct functional difference except for disease resistance (Supplementary Fig. 6). These Oryza genus super pangenome open the door for non-AA genome wild rice resources utilizing in rice biology and breeding. Transposon signature contribute to various genome and centromere size in rice The selective removal and retention of TEs has significantly influenced the genome size, adaptation and evolution of the Oryza genus 24 . Scientists have been intrigued by the question of which specific subfamily of TEs influences the genome size 25 . To address this, a comprehensive classification of TEs within the Oryza genus was conducted, along with an in-depth analysis of the expansion profiles of long terminal repeat (LTR) subfamilies (Top 6 selected from the largest genome size in Oryza ) were performed. Each LTR-RT sub-family displayed unique patterns of amplification across the species, impacting gene numbers and expressions (Extended data Fig. 5b, Supplementary Fig. 7a, Supplementary Table 4). In addition to the amplification of Gypsy superfamily (Ogre, Retand, and Tekay), the Angle LTR belongs to the Copia superfamily emerged as a significant contributor to the large genome size of O. australiensis (E genome type rice), distinguishing it from other rice species (Extended data Fig. 5b, c, Supplementary Fig. 7a). The genome sizes of B, C, and D type rice genome were primarily influenced by the top three Gypsy superfamilies in descending order: Ogre, Retand and Tekay (Extended data Fig. 5c, Supplementary Table 4). The genome size of G genome type ( O. meyeriana) was predominantly influenced by Retand LTR amplification (Extended data Fig. 5c), and the abundance of Tekay in O. glumaepatula surpassed that in the O. sativa resulted in its slightly puffy genome. Notably, O. brachyantha (F genome type) exhibited significant elimination of all LTR subfamilies compared to O. sativa (Extended data Fig. 5c), aligning with the observed LTR density patterns across Oryza genomes (Supplementary Fig. 8). Genome size showed a stronger correlation with the Retand LTR superfamily compared to the other subfamilies within the Oryza (Extended data Fig. 5d). While SIRE and the CRM subfamily made up a small portion of the entire genome, they influenced a certain proportion of genes expression (Extended data Fig. 5c, Supplementary Fig. 7b, Supplementary Table 4). The distribution of whole-genome intact long terminal repeats (LTRs) indicated that the majority of LTR bursts occurred in proximity to centromeric regions (Supplementary Fig. 8). Rice centromeres consist of organized satellite repeats (SRs), interrupted by centromere-specific retrotransposons (CRRs) 26 . It remains unclear if other lines or wild rice plants possess unique centromere satellites and whether centromere repositioning events occurred during centromere evolution. In cultivated rice (MH63 and ZS97), 155 bp and 165 bp CentO satellite repeats were categorized into seven distinct subsets across the 12 chromosomes 27 . Interestingly, only a few copies of satellite repeats were identified in O. meyeriana and O. branchyantha wild rice. Compared to the cultivated rice (NIP) genome, the C genome contains centromere-specific 126 bp and 366 bp Cent satellite repeats (Supplementary Table 4). Phylogenetic analysis results showed that the satellite repeats in Oryza can be classified into four groups (Extended data Fig. 5e), with chromosomes of the same type tending to cluster together, supporting models of repeated amplification events involving the central domain and local homogenization. Examination of the genetic characteristics of centromeres in the Oryza genus revealed a decrease in gene density as centromeres are approached, along with an increase in transposon density and the frequency of k-mers (Extended data Fig. 5f). While the wild rice centromere sizes, except Oryza brathyantha , were notably larger than those in cultivated rice, the opposite trend was observed for the number of genes (Extended data Fig. 5g, Supplementary Fig. 7c). A comparative sequence map of centromere synteny between cultivated rice and wild rice highlighted extensive structural rearrangements in centromeric and pericentric regions across the Oryza genus (Extended data Fig. 5h, Supplementary Fig. 9, Supplementary Table 4). Additionally, several centromere repositioning events were noted in the synteny analysis (Supplementary Fig. 9). Large-scale chromosomal rearrangement and inferring the genome evolutionary history of Oryza lineages To compare the karyotype stability between cultivated rice and wild diploid genomes, we created a synteny map and conducted whole-genome pairwise alignments to identify large segment variations, such as translocations and inversions 7 . Notably large inversions (with more than 5 consecutive genes) shared by at least two consecutive species in the Oryza genus were prevalent in the genome alignments 11 (Extended data Fig. 6). While reports on segregating inversions in wild rice are scarce and have not yet included natural polyploid wild rice, most of these events were observed in the low-recombining pericentromeric regions of the Oryza chromosomes, with a few inversions being species-specific (Extended data Fig. 6). The AA genomic type genome displayed a significant level of chromosomal conservation. Furthermore, the gene synteny findings provide support for the phylogenetic position of the rice genus (Supplementary Fig. 10). Through multiple species/genome comparisons, many large-scale genomic rearrangements were validated, including an inversion of an approximately 2.53 Mb segment comprising 166 genes on Chr6 in the modern cultivated rice NIP (Fig. 2 a). The inversion occurred only among the common wild rice and Oryza sativa japonica with high latitude distribution, indicating that it has risen to expansion of geographical range. OsMFT1 , which contribute to its later flowering in Oryza sativa was located in the inversion region with an 89 kb distance next to the breakpoint in Oryza glumaepatula 28 (Fig. 2 b). Chromosomal rearrangements involving homoeologous groups 1, 3, and 6 of allotetraploid wild rice were initially identified in Oryza species (Extended data Fig. 7a). The comparison of syntenic blocks revealed that chromosomes 6D t in O. latifolia displayed complete collinearity with the corresponding 3D t chromosomes in O. alta and O.grandigumis , However, a reciprocal translocation was observed between 3D t and 1C t in O. alta and O.grandigumis (Extended data Fig. 7a), due to a fragmental collinearity between 3C t in O. latifolia and 1C t in O. alta and O.grandigumis . Additionally, a translocation between 1C t and 6C t as identified, with the 1Ct segment translocated to the end of 6C t . Furthermore, a translocation was detected in homoeologous groups 4 and 7 in Oryza (Extended data Fig. 7c-i). By aligning allotetraploid wild rice resequencing data to the corresponding diploid BB and CC or CC and EE genomes, homeologous exchanges on each chromosome were identified based on the coverage depth calculated from unique reads. Several translocations of large segments between the subgenomes post-tetraploidization were discovered (Extended data Fig. 7b, c). Chromosomes B t 1 and C t 1 exhibited high synteny, but the coverage depth of the reads to the BB and CC genomes indicated a significant homoeologous exchange between them (Extended data Fig. 7b). The potential history of homeologous exchange is depicted in Extended data Fig. 7d. Discovery of untapped SVs in the Oryza genus sequence and their influence on agronomic traits Despite extensive efforts to analyze genetic variants in cultivated rice and its ancestor species O. rufipogon 29 , the genetic diversity in distantly related wild rice species such as O. punctata , O. rhizomatis , and O. meyeriana remains poorly understood. We identified 2781–10656 insertions, 2680–10419 deletions, 4–52 translocations, and 7–22 inversions in the 16 rice accessions, with sizes ranging from 162.49-278.65 Mb, 182.13-705.17 Mb, 8.64-887.29 kb, and 41.51–11.33 Mb, respectively (Extended data Fig. 8a, Supplementary Table 5). Interestingly, the cultivated rice and the AA genome of wild rice showed a higher number of structural variations compared to the non-AA genome of wild rice, although the size of variation was smaller, likely due to more regions aligning with the reference genome (Extended data Fig. 8a). Wild rice species-specific SVs accounted for a significant portion of the total variation, indicating untapped genetic diversity in wild rice compared to cultivated varieties (Extended data Fig. 8a). The majority of insertions, deletions, and inversions in the cultivar were shorter than 5 kb. As the length of SVs increased, there was a significant decrease in the number of SVs in cultivar and wild rice, whereas the wild rice variety had a higher number of SVs larger than 250 kb, leading to the presence of numerous private genes in the wild rice genome (Extended data Fig. 8b, Extended data Fig. 9a). Intergenic regions within the Oryza genus were most common locations for SVs, followed by regions ± 10kb around genes (Extended data Fig. 9b, Supplementary Table 5), in line with previous research findings 9 . Surprisingly, insertion and deletion variations were less frequent at the chromosomes ends (Extended data Fig. 8c). The private genes in each rice genome type and their corresponding presence-absence variations (PAVs) relative to the NIP were also recorded (Extended data Fig. 8c, d). when using NIP genome sequence as a reference, the cultivated rice displayed a higher number of SVs compared to wild rice, consistent with earlier observations (Extended data Fig. 8a). However, wild rice, particularly O. australiensis and O. meyeriana , exhibited substantial variation in SV sizes, indicating an enrichment of SVs in repetitive DNA regions (Extended data Fig. 9c). Further examination of transposable elements in PAV sequences revealed that other and DNA transposable elements were the primary components of both deletion and inversion variation (Extended data Fig. 9d). By analyzing a large number of SVs across different rice genomes within a phylogenetic framework, we were able to uncover evolutionary events that would have otherwise gone undetected with a limited number of genomes. Recent findings suggested that gene loss could be linked to insertion/deletion event. For example, a 500 kb insertion corresponding to the NIP genome was identified on chromosome 12 at 14.50 Mb (Extended data Fig. 8e). In addition, the insertion occurred only in the O. eichingeri and C t subgenome of O. punctata . Further detailed investigation revealed that the PAV region contained a gene ( OPUW363G084108 / OEIW71G043491 ) specific to the C genome wild rice 30 (Extended data Fig. 8e). This gene might be de novo birth to contribute to the ability of wild rice to adapt to poor and problem soil in Sri Lanka. The phylogenetic results revealed that the gene originated from O. eichingeri and then transferred to O. punctata , providing evidence that O. eichingeri was the progenitor of tetraploid O. punctata , consistent with our chloroplast evolution results. This result demonstrated that insertion variation occurred during C genome wild rice speciation and cultivated rice, which exhibited SVs possibly via introgression from hybridization with O. eichingeri . A multitude QTLs in O. officinalis have been identified for brown planthopper resistance, but the lack of unknown sequences in wild rice has hindered the cloning of these genes 31 . In this study, we identified the Bph4 gene through a combination of comparative genome analysis and gene annotation within the QTL region (Supplementary Fig. 11). Haplotype analysis indicated that Bph4 is highly conserved in cultivated rice but displayed diversity in wild rice. Among the 13 wild rice accessions studies, only three species retained the functional S28 locus 32 , while others lacked either the ribosomal protein S27 gene or the nearby UDPGT gene (Extended data Fig. 8f). The phylogenetic analysis of the Oryza genus suggested that the HS locus likely originated from O. australiensis and diverged from the C genome of wild rice (Extended data Fig. 8f). Allelic and regulatory elements variations The natural allelic variation of genes is essential for phenotypic diversity, environmental adaptation, and the process of domestication 33–35 . Our analysis focused on variations in whole-genome alleles and their regulatory sequences (gene ± 10kb) in the rice genome, as there are very few highly collinear blocks between non-AA genomic wild rice and cultivated rice (Extended data Fig. 10, Supplementary Table 6). As the divergence from cultivated rice increased, the number of colinear genes between wild rice diploids and cultivar rice decreased, ranging from 18,463 to 23,812, with an average of 20,288 (Supplementary Table 6). Including 19 published, high-quality, chromosome-level cultivated rice genomes (Supplementary Table 1) in our study allowed us to identify comprehensive SVs resources for both wild and cultivated rice 9 . By mapping collinear genes with 10 kb nearby regions onto the corresponding region of Nipponbare, we identified SNPs and InDels of 50 bp or greater as PAV targets. The total number of SVs increased with the accession number, with cultivated rice showing a higher percentage of nonredundant SVs compared to wild rice (Extended data Fig. 11a). The wild rice genomes exhibited a greater number of alleles and gene haplotypes than cultivated rice (Extended data Fig. 11b, c), indicating a rich source of novel genetic variations. To delve deeper into the functional impact of SVs on genes or proteins, combining variant alleles detected in each species into haplotypes and annotating each accession independently is essential. Wild rice (sub)genome displayed a higher number of alleles in collinear genes within the core genome compared to cultivated rice (Extended data Fig. 11c). The number of gene haplotypes (gHap) and gene-coding sequence (CDS)-haplotypes (gcHap) in wild rice was significantly greater than in cultivated rice (Extended data Fig. 11c). Analyses of protein diversity in collinear genes between wild and cultivated rice have provided insight into their functional differentiation. A genome-wide protein cluster was created based on their domain similarity, revealing that wild rice had approximately 7 clusters, corresponding to the number of wild rice genome types, whereas cultivated rice predominantly clustered into one group, corresponding to the AA genome type (Extended data Fig. 11d). Furthermore, analysis of gene presence-absence variations (PAVs) distinguished major species and highlighted significant differences between wild and cultivated rice (Extended data Fig. 10b-d). The majority of group-unbalanced genes, accounting for 87.33%, were more prevalent in wild rice but less common in cultivated rice, underscoring the substantial legacy of mutations in wild rice (Extended data Fig. 11e). Notably, the selection for grain coat color during rice domestication is evident, with wild rice species predominantly displaying black and red grain, while most cultivars exhibit white seed coat color. Structural variation analyses revealed distinct haplotypes of the Rc protein among cultivated and wild rice accessions 36 . Compared to the Rc haplotype in cultivated rice, wild rice exhibited 7 haplotypes corresponding to different genome types, suggesting that genetic divergence in Rc played a role in grain pericarp development during domestication (Extended data Fig. 11f). Gene CNVs and NLR repertoire in rice Recent studies have highlighted the significant role of genomic copy number variations (gCNVs) in the evolution and domestication of crops 37,38 . However, the accurate identification of gCNVs in highly repetitive genome sequences within the rice genus pose notable challenges. Leveraging our high-quality assemblies, we systematically investigated gCNVs by aligning collinear blocks of the rice accessions against the Nipponbare reference genome, assessing their potential impact on important agronomic traits. Through whole-genome comparisons, we identified 207 genes with tandem repeats across the 14 wild rice assemblies, potentially influencing yield, resistance, grain quality, heading date, biotic and abiotic resistance (Supplementary Table 7). To gain further insights into the functional roles of gCNVs in rice, we analyzed 4400 genes with known functions from a previous study 39 . Among these genes, 36 exhibited tandem repeats in the rice genus, impacting various agronomic traits related to yield, disease and pest resistance (e.g., blast, bacterial blight, rice brown planthoppers), biotic stress tolerance, element transport, and other important adaptation traits like heading date and hybrid sterility (Fig. 3 a). Additionally, we assessed the expression levels of selected gCNVs to investigate potential alterations in their expression profile. Notably, several variations linked to the Pi9 cluster, with gCNVs in the 10.38 Mb region of Nip genome chromosome 6, were also identified. Pi9 is a well-known gene in rice that offers strong and long-lasting resistance to the fungus M. oryzae 40 . Interestingly, Pi9 is a typically NLR genes with copy number variation, which contributed to rice species environmental adaptation (Supplementary Fig. 12). Genes that encoding nucleotide-binding domain and leucine-rich repeat (NLR) proteins play a crucial role in plant immune systems 41 . Therefore, it is essential to have a comprehensive and accurate NLR dataset for rice genera. Plant NLRs often occur in clusters, making their identification challenging. To address this issue, we utilized RGAugury 42 and DupGen_finder 43 tools, resulting in a total of 7,048 NLR genes across rice genus (Supplementary Table 7). The number of NLR genes varied from 419 in O. glabberima to 511 in O. sativa indica (R498) in cultivated rice and from 159 in O.australiensis to 669 in O. punctata in wild rice (Fig. 3 b, Supplementary Table 7), This suggests that the immune system in wild rice has a more diverse evolutionary history compared to cultivated rice. Our study focused on identifying and categorizing NLRs in different rice species to establish a comprehensive understanding of NLR diversity within rice genus. Interestingly, the diploid rice genome exhibited a lower number of NLR in wild rice compared to cultivated rice, despite the larger genome size in wild rice (Fig. 3 f). For instance, the genomes of O. australiensis and O. meyeriana , although twice the size of NIP, contain only half the number of NLRs of that in cultivated rice (Fig. 3 f). Analysis of NLR distribution showed that while R gene singletons were similar between wild and cultivated rice, cultivated rice tended to have a higher number of R genes in pairs or clusters compared to wild rice (Fig. 3 b-d). Redundancy analysis revealed that 55.64% of NLR signatures were shared across all genomes, with 15 unique signatures in the cultivated group and 162 unique signatures in the wild group (Fig. 3 g). The study found that as the number of cultivated rice accessions increased, the number of core NLR signatures also tended to increase. Redundancy analysis of the NLR gene in wild and cultivated rice revealed that 78.8% of the NLR genes in cultivated rice were dispensable, slightly lower than in wild rice (Fig. 3 h). More than 90% of NLR genes in the core NLR genome were expressed in both wild and cultivated rice, while around 20% of NLR genes in the dispensable genome exhibited low or no expression under normal conditions, suggesting specific expression upon encountering disease (Fig. 3 i). We classified NLRs of the rice genus into 369 clusters, and 167 clusters of which were sharply increased in cultivated rice (Supplementary Table 7), including well-studied rice R gene families that provide resistance to rice blast disease caused by Xanthomonas oryzae. Pv. Oryzae (Xoo), such as WRKY61 (Fig. 3 j). Additionally, an NLR expansion event was observed in the wild rice pangenome (Fig. 3 j), enabling these plants to adapt to various environments compared to cultivars. By leveraging lost NLR gene rice during domestication and artificial selection, we can enhance the resistance resources of cultivated rice and enrich the diversity of modern commercial rice. The total number of NLRs in cultivated rice species has increased compared to diploid wild rice species, despite some NLR gene losses, indicating that NLR expansion into cluster forms may be driven by breeding for specific pathogen resistance. Discussion We integrated 13 wild rice species, three cultivated species and one common wild rice high-quality assembly to construct a comprehensive super pangenome of the rice genus. Compared with cultivated rice, our super pan-genome of this study can provide an additional 63,881 new gene families. Notably, we reconstructed the phylogenetic tree of Oryzae at the genome level and corrected the evolutionary positions of BB, CC, FF and GG rice species. Our analysis delved into pervasive structural variations, examining the size and distribution of Oryzae . In addition to the Oryzae pangenome resources we present, our study also exemplifies how these new resources can enhance our understanding of the role of SVs, gCNV, and allelic variation in the processes of environmental adaptation, domestication, differentiation, and artificial selection in rice. Moreover, our examination of why genome sizes vary significantly during evolution in Oryzae and which component of repeat sequences contributes predominantly to rice genome size serves as a model for similar analyses in other plant species. Additionally, we observed that the number of NLRs in cultivated rice exceeded that in wild rice diploid genome but exhibited lower disease resistance than in wild rice (Fig. 3 e-j); The cluster NLR number in cultivated rice was notably higher than in wild rice, suggesting that some additional copies of NLRs may be redundant in ensuring resistance in cultivated rice. This aligns with the notion that multiple NLRs are necessary for the broad-spectrum resistance of Tetep to blast 44 . The next step of Oryza genus pan-genomic will focus on the effect of increasing production, resistance to various diseases and adaptation to changing environment for the private genes and alleles through gene editing. Declarations Acknowledgments We thank all the member of the Longan Yan group at Jiangxi Academy of Agricultural science for collecting and preserving the wild rice resource. We thank Dr. Zhilan Fan for providing the GG genome type wild rice O. meyeriana at Guangdong Academy of Agricultural science, we thank Dr. Shengyi Liu at Oil crops research institute, Chinese Academy of Agricultural science for providing constructive suggestions. This work was supported by China Agriculture Research System (CARS-01-08), National Key Research and Development Program of China (2017YFD0100302, 2023YFD1201203), National Natural Science Foundation of China (31960400), Major Discipline Academic and Technical Leaders Training Program of Jiangxi Province (20213BCJL22044), Jiangxi Technology Innovation Guidance Program (20223AEI91010). Author contribution L. Y., and Y. C. supervising the work. L. L., L. L., W. X., and Y. L. collected sample for resequencing, HiC, and HiFi sequencing. M. W. performed the genome assembly. Q. H. performed the genome annotation. Y. W., and Y. W. conducted the super rice pangenome construction. Q. H., and Y. W. Conducted SV and CNV identification. J. W., Z. Y., and W. C. collected sample for RNA-seq sequencing and conducted expression validation. W. L., H. D., and H. X wrote the manuscript and design the experiment. Declaration of interests The authors declare no competing interests. References Yu, H. et al. A route to de novo domestication of wild allotetraploid rice. Cell 184 , 1156-1170 e1114, doi:10.1016/j.cell.2021.01.013 (2021). Huang, C., Chen, Z. & Liang, C. Oryza pan-genomics: A new foundation for future rice research and improvement. The Crop Journal 9 , 622-632, doi:10.1016/j.cj.2021.04.003 (2021). Wing, R. A., Purugganan, M. D. & Zhang, Q. The rice genome revolution: from an ancient grain to Green Super Rice. Nat Rev Genet 19 , 505-517, doi:10.1038/s41576-018-0024-z (2018). Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588 , 277-283, doi:10.1038/s41586-020-2961-x (2020). Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557 , 43-49, doi:10.1038/s41586-018-0063-9 (2018). Khan, A. W. et al. Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement. Trends Plant Sci 25 , 148-158, doi:10.1016/j.tplants.2019.10.012 (2020). Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet 50 , 285-296, doi:10.1038/s41588-018-0040-0 (2018). Ge, S., Sang, T., Lu, B. R. & Hong, D. Y. Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci U S A 96 , 14400-14405, doi:10.1073/pnas.96.25.14400 (1999). Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184 , 3542-3558 e3516, doi:10.1016/j.cell.2021.04.046 (2021). Li, N. et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat Genet , doi:10.1038/s41588-023-01340-y (2023). Zhou, Y. et al. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat Commun 14 , 1567, doi:10.1038/s41467-023-37004-y (2023). Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50 , 278-284, doi:10.1038/s41588-018-0041-z (2018). Shang, L. G. et al. A super pan-genomic landscape of rice. Cell Research 32 , 878-896, doi:10.1038/s41422-022-00685-z (2022). Xie, X. et al. A chromosome-level genome assembly of the wild rice Oryza rufipogon facilitates tracing the origins of Asian cultivated rice. Sci China Life Sci 64 , 282-293, doi:10.1007/s11427-020-1738-x (2021). Du, H. et al. Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun 8 , 15324, doi:10.1038/ncomms15324 (2017). Wang, M. et al. The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. Nat Genet 46 , 982-988, doi:10.1038/ng.3044 (2014). Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31 , 1119-1125, doi:10.1038/nbt.2727 (2013). Kaplan, N. & Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat Biotechnol 31 , 1143-1147, doi:10.1038/nbt.2768 (2013). Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31 , 3210-3212, doi:10.1093/bioinformatics/btv351 (2015). Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res 46 , e126, doi:10.1093/nar/gky730 (2018). Chen, J. et al. Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. Nat Commun 4 , 1595, doi:10.1038/ncomms2596 (2013). Zou, X. H. et al. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol 9 , R49, doi:10.1186/gb-2008-9-3-r49 (2008). Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20 , 238, doi:10.1186/s13059-019-1832-y (2019). Pulido, M. & Casacuberta, J. M. Transposable element evolution in plant genome ecosystems. Curr Opin Plant Biol 75 , 102418, doi:10.1016/j.pbi.2023.102418 (2023). Kidwell, M. G. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115 , 49-63, doi:10.1023/a:1016072014259 (2002). Comai, L., Maheshwari, S. & Marimuthu, M. P. A. Plant centromeres. Curr Opin Plant Biol 36 , 158-167, doi:10.1016/j.pbi.2017.03.003 (2017). Song, J. M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol Plant 14 , 1757-1767, doi:10.1016/j.molp.2021.06.018 (2021). Song, S. et al. OsMFT1 increases spikelets per panicle and delays heading date in rice by suppressing Ehd1, FZP and SEPALLATA-like genes. J Exp Bot 69 , 4283-4293, doi:10.1093/jxb/ery232 (2018). Kou, Y. et al. Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication. Mol Biol Evol 37 , 3507-3524, doi:10.1093/molbev/msaa185 (2020). Gamuyao, R. et al. The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency. Nature 488 , 535-539, doi:10.1038/nature11346 (2012). Hu, J. et al. Fine mapping and pyramiding of brown planthopper resistance genes QBph3 and QBph4 in an introgression line from wild rice O. officinalis. Molecular Breeding 35 , doi:10.1007/s11032-015-0228-2 (2015). Yamagata, Y. et al. Mitochondrial gene in the nuclear genome induces reproductive barrier in rice. Proc Natl Acad Sci U S A 107 , 1494-1499, doi:10.1073/pnas.0908283107 (2010). Bai, F. et al. Natural allelic variation in GRAIN SIZE AND WEIGHT 3 of wild rice regulates the grain size and weight. Plant Physiol 193 , 502-518, doi:10.1093/plphys/kiad320 (2023). Sun, X. et al. Natural variation of DROT1 confers drought adaptation in upland rice. Nat Commun 13 , 4265, doi:10.1038/s41467-022-31844-w (2022). Huang, X. et al. Natural variation at the DEP1 locus enhances grain yield in rice. Nat Genet 41 , 494-497, doi:10.1038/ng.352 (2009). Furukawa, T. et al. The Rc and Rd genes are involved in proanthocyanidin synthesis in rice pericarp. Plant J 49 , 91-102, doi:10.1111/j.1365-313X.2006.02958.x (2007). Wang, Y. et al. Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet 47 , 944-948, doi:10.1038/ng.3346 (2015). Deng, Y. et al. Epigenetic regulation of antagonistic receptors confers rice blast resistance with yield balance. Science 355 , 962-965, doi:10.1126/science.aai8898 (2017). Huang, F. et al. New Data and New Features of the FunRiceGenes (Functionally Characterized Rice Genes) Database: 2021 Update. Rice (N Y) 15 , 23, doi:10.1186/s12284-022-00569-1 (2022). Qu, S. et al. The broad-spectrum blast resistance gene Pi9 encodes a nucleotide-binding site-leucine-rich repeat protein and is a member of a multigene family in rice. Genetics 172 , 1901-1914, doi:10.1534/genetics.105.044891 (2006). Feehan, J. M., Castel, B., Bentham, A. R. & Jones, J. D. Plant NLRs get by with a little help from their friends. Curr Opin Plant Biol 56 , 99-108, doi:10.1016/j.pbi.2020.04.006 (2020). Li, P. et al. RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics 17 , 852, doi:10.1186/s12864-016-3197-x (2016). Qiao, X. et al. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol 20 , 38, doi:10.1186/s13059-019-1650-2 (2019). Wang, L. et al. Large-scale identification and functional analysis of NLR genes in blast resistance in the Tetep rice genome sequence. Proc Natl Acad Sci U S A 116 , 18479-18487, doi:10.1073/pnas.1910229116 (2019). Additional Declarations There is NO Competing Interest. Supplementary Files Fig.S1.tif Supplementary Fig. 1 Fig.S2.tif Supplementary Fig.2 Fig.S3.tif Supplementary Fig.3 Fig.S4.tif Supplementary Fig.4 Fig.S6.tif Supplementary Fig.6 Fig.S7.tif Supplementary Fig.7 Fig.S9.tif Supplementary Fig.9 TableS1.xlsx Supplementary Table 1 TableS3.xlsx Supplementary Table 3 TableS4.xlsx Supplementary Table 4 TableS5.xlsx Supplementary Table 5 TableS6.xlsx Supplementary Table 6 TableS7.xlsx Supplementary Table 7 ExtendeddataFig.1.jpg Extended Data Fig.1 ExtendedDataFig.2.jpg Extended Data Fig.2 ExtendeddataFig.3.tif Extended Data Fig.3 extendeddataFig.5.tif Extended Data Fig.5 ExtendeddataFig.7.tif Extended Data Fig.7 ExtendeddataFig.8.jpg Extended Data Fig.8 ExtendeddataFig.10.tif Extended Data Fig.10 Cite Share Download PDF Status: Published Journal Publication published 18 Nov, 2024 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4350570","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Biological Sciences - Article","associatedPublications":[],"authors":[{"id":298433553,"identity":"3e19143e-b2b2-4845-a7c6-b9aec98bfeab","order_by":0,"name":"Weixiong Long","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/ElEQVRIiWNgGAWjYBACPgglwcAGohIqJOT4mZkPP8CnhQ1Fy4MzFsaS7WxpBkRogQDGh20ViRvO8yhI4NXC3nv4NU+FRWIfkPEigU0icfNhHgYDhhqbaJxaeM6lWfOckTAGMSwSeCSMtx3mPfCA4VhabgMuLRI5Zsa5bRJyIIZBgoSE7LbDfAkGjA2HCWj5J8ED0WIgwbi5mQdI4tdi/Di3AWyL8YOEBAnFDcyEtPCcMWP+cwzklzNmDAkHJIwlDgMDOQGPX/jZe4w/zqipS5zfDmT8/Fcnx99/+PCDDzU2OLWA3YbOACYD3MpBgPkDOmMUjIJRMApGAQoAAML3UZqHfxxlAAAAAElFTkSuQmCC","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":true,"prefix":"","firstName":"Weixiong","middleName":"","lastName":"Long","suffix":""},{"id":298433554,"identity":"eb11b658-9cc9-41b3-9c88-3ba8c1025e4a","order_by":1,"name":"Qiang He","email":"","orcid":"","institution":"Institute of Life Sciences and Green Development, Hebei University","correspondingAuthor":false,"prefix":"","firstName":"Qiang","middleName":"","lastName":"He","suffix":""},{"id":298433555,"identity":"b4dcfee0-1ff6-40f4-a1da-afe8d43d6f4e","order_by":2,"name":"Yitao Wang","email":"","orcid":"","institution":"Institute of Life Sciences and Green Development, Hebei University","correspondingAuthor":false,"prefix":"","firstName":"Yitao","middleName":"","lastName":"Wang","suffix":""},{"id":298433556,"identity":"fcc8762a-f855-4c05-929f-1cff71652212","order_by":3,"name":"Yu Wang","email":"","orcid":"","institution":"Institute of Life Sciences and Green Development, Hebei University","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Wang","suffix":""},{"id":298433557,"identity":"f0d17465-fdc1-4d85-9fc4-59f17317b8bf","order_by":4,"name":"Jie Wang","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Jie","middleName":"","lastName":"Wang","suffix":""},{"id":298433559,"identity":"170b83ea-b5bd-4049-a3f3-c6d7b066fd3d","order_by":5,"name":"Zhengqing Yuan","email":"","orcid":"","institution":"College of Life Science, Wuhan University","correspondingAuthor":false,"prefix":"","firstName":"Zhengqing","middleName":"","lastName":"Yuan","suffix":""},{"id":298433561,"identity":"f62ac42d-4cc2-489a-a24c-4f441480563f","order_by":6,"name":"Meijia Wang","email":"","orcid":"","institution":"School of Life Sciences, Institute of Life Sciences and Green Development, Hebei University","correspondingAuthor":false,"prefix":"","firstName":"Meijia","middleName":"","lastName":"Wang","suffix":""},{"id":298433563,"identity":"43f8809b-ae06-42fa-857f-4cb699b17d92","order_by":7,"name":"Wei Chen","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Chen","suffix":""},{"id":298433565,"identity":"744c1c5f-a37e-4384-911c-0d7f3899acf4","order_by":8,"name":"Lihua hua","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Lihua","middleName":"","lastName":"hua","suffix":""},{"id":298433567,"identity":"01d922ef-d40d-48d5-9a9f-9f989d0fc4d6","order_by":9,"name":"Laiyang Luo","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Laiyang","middleName":"","lastName":"Luo","suffix":""},{"id":298433569,"identity":"ac37efc8-f497-490a-9765-a182dbb7badc","order_by":10,"name":"Weibiao Xu","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Weibiao","middleName":"","lastName":"Xu","suffix":""},{"id":298433571,"identity":"2d1e43a0-f1ce-4340-9ada-bbd39932c754","order_by":11,"name":"Yonghui Li","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yonghui","middleName":"","lastName":"Li","suffix":""},{"id":298433573,"identity":"8ac470ce-0e61-47a7-b277-4465e4c9f8cf","order_by":12,"name":"Wei Li","email":"","orcid":"","institution":"Institute of Life Sciences and Green Development, Hebei University","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Li","suffix":""},{"id":298433575,"identity":"1482cc01-cdd7-45eb-a4c8-ea66309d1cad","order_by":13,"name":"Longan Yan","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Longan","middleName":"","lastName":"Yan","suffix":""},{"id":298433577,"identity":"b3d2b222-12c3-4dd8-af36-2cdace1e71eb","order_by":14,"name":"Yaohui Cai","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yaohui","middleName":"","lastName":"Cai","suffix":""},{"id":298433578,"identity":"2aecd3b0-9626-42bf-8cef-83bfe8248de3","order_by":15,"name":"Huilong Du","email":"","orcid":"","institution":"Hebei University","correspondingAuthor":false,"prefix":"","firstName":"Huilong","middleName":"","lastName":"Du","suffix":""},{"id":298433579,"identity":"9fa39604-b1b0-40ae-ae07-5708914affeb","order_by":16,"name":"Hongwei Xie","email":"","orcid":"","institution":"Jiangxi Super -rice Research and Development center, Jiangxi Academy of Agricultural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Hongwei","middleName":"","lastName":"Xie","suffix":""}],"badges":[],"createdAt":"2024-04-30 16:45:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4350570/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4350570/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-024-54427-3","type":"published","date":"2024-11-18T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":56158885,"identity":"730af529-cd2d-4c44-b359-05e312e53402","added_by":"auto","created_at":"2024-05-09 08:54:40","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1278101,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGeographical distribution and phylogeny of wild and cultivated rice.\u003c/strong\u003e \u003cstrong\u003ea\u003c/strong\u003e, Geographic distribution of the 13 wild rice varieties and their diverse agronomic characteristics, such as plant height and structure, panicle. These 13 accessions covered 13 species in the rice genus. B, Species phylogeny of 18 species (23 subgenomes/genomes) from \u003cem\u003eOryza\u003c/em\u003e and \u003cem\u003eZea\u003c/em\u003e. TE content in each subgenome/genome of the wild and cultivated rice. The order of species is corresponding to the phylogeny tree. Proportions of contrasting gene tree topologies for the 100 genes with regard to two major conflicting relationships.\u003c/p\u003e","description":"","filename":"Fig.1.png","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/47bd1983267a8d2b23bcfd9f.png"},{"id":56158891,"identity":"ff9d1ab1-73b2-44c6-bddc-9b30ce5960ea","added_by":"auto","created_at":"2024-05-09 08:54:42","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":145728,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnalysis of a pervasive inversion on chromosome 6 between O. sativa japonica and O. glumaepatula. a\u003c/strong\u003e, whole genome alignment shows the large chromosomal inversion on Chr6 within the \u003cem\u003eOryza\u003c/em\u003e genus. \u003cstrong\u003eb\u003c/strong\u003e, Schematic of the inversion region. The OsMFT1 gene is closest to the breakpoint that is distal in \u003cem\u003eO. glumaepatula\u003c/em\u003e (89 kb) and proximal in \u003cem\u003eO. sativa \u003c/em\u003ejaponica (1.29Mb).\u003c/p\u003e","description":"","filename":"Fig.2.png","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/6730426de85c92b1128dd024.png"},{"id":69326188,"identity":"6c5590cb-9fd3-4e82-abe9-3631aca7c831","added_by":"auto","created_at":"2024-11-19 08:07:33","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1985308,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/b69c7077-be2b-45f5-8ed7-9d384f91103c.pdf"},{"id":56160967,"identity":"9fc0eb31-1aaa-404a-896f-3c32c09b91b8","added_by":"auto","created_at":"2024-05-09 09:26:44","extension":"tif","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":5343244,"visible":true,"origin":"","legend":"Supplementary Fig. 1","description":"","filename":"Fig.S1.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/a1fcd967a7f5f83f1377e1a9.tif"},{"id":56159425,"identity":"44b4e004-6a54-4cc0-8a87-1ac10e95e08b","added_by":"auto","created_at":"2024-05-09 09:02:40","extension":"tif","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":264816,"visible":true,"origin":"","legend":"Supplementary Fig.2","description":"","filename":"Fig.S2.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/fde5017f066512712e74b3ab.tif"},{"id":56160973,"identity":"db8693e2-010d-478c-a867-872b38ed2d46","added_by":"auto","created_at":"2024-05-09 09:26:46","extension":"tif","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":426932,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Fig.3\u003c/p\u003e","description":"","filename":"Fig.S3.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/d170a1df8c464fced92b7683.tif"},{"id":56160042,"identity":"d0b2971d-e222-4884-8cfc-073802e01ee5","added_by":"auto","created_at":"2024-05-09 09:10:43","extension":"tif","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1295804,"visible":true,"origin":"","legend":"Supplementary Fig.4","description":"","filename":"Fig.S4.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/8097736443a202f84ecc3e9b.tif"},{"id":56160037,"identity":"46e28b2f-0237-48f6-ae0f-e7769cd7852c","added_by":"auto","created_at":"2024-05-09 09:10:42","extension":"tif","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":845940,"visible":true,"origin":"","legend":"Supplementary Fig.6","description":"","filename":"Fig.S6.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/1142740a3b6d147e8d631729.tif"},{"id":56160043,"identity":"bdd0a42d-8615-4a1d-93f9-ecb3041d9279","added_by":"auto","created_at":"2024-05-09 09:10:44","extension":"tif","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":3277740,"visible":true,"origin":"","legend":"Supplementary Fig.7","description":"","filename":"Fig.S7.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/eafa232c18122406e9a20536.tif"},{"id":56159443,"identity":"8853a68b-3e5f-4c4b-a9b8-44cbb9072500","added_by":"auto","created_at":"2024-05-09 09:02:43","extension":"tif","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":1174760,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Fig.9\u003c/p\u003e","description":"","filename":"Fig.S9.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/17a0d43224897719b6d6b796.tif"},{"id":56160058,"identity":"f594e1e3-c6cf-460b-b33c-be6f3626c1e3","added_by":"auto","created_at":"2024-05-09 09:10:46","extension":"xlsx","order_by":13,"title":"","display":"","copyAsset":false,"role":"supplement","size":33825,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Table 1\u003c/p\u003e","description":"","filename":"TableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/52d6692a5fe527c4ba424d21.xlsx"},{"id":56158912,"identity":"0dd36c96-cac8-49e7-81da-8f768a9601fa","added_by":"auto","created_at":"2024-05-09 08:54:45","extension":"xlsx","order_by":15,"title":"","display":"","copyAsset":false,"role":"supplement","size":413015,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Table 3\u003c/p\u003e","description":"","filename":"TableS3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/aa285ab43b5d96c1323a813a.xlsx"},{"id":56160463,"identity":"7286db64-8eef-440c-a6a2-6b59fc21875b","added_by":"auto","created_at":"2024-05-09 09:18:42","extension":"xlsx","order_by":16,"title":"","display":"","copyAsset":false,"role":"supplement","size":39838,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Table 4\u003c/p\u003e","description":"","filename":"TableS4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/0d08e8266a3a1338042d1753.xlsx"},{"id":56159436,"identity":"6f3e5f91-4afc-4ab0-b7d5-cbe73f32f0e9","added_by":"auto","created_at":"2024-05-09 09:02:42","extension":"xlsx","order_by":17,"title":"","display":"","copyAsset":false,"role":"supplement","size":20418,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Table 5\u003c/p\u003e","description":"","filename":"TableS5.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/94a00ad060fbeef5ff08baf0.xlsx"},{"id":56160467,"identity":"f5da753e-c7b8-4ce6-a21f-160f9d6527ad","added_by":"auto","created_at":"2024-05-09 09:18:44","extension":"xlsx","order_by":18,"title":"","display":"","copyAsset":false,"role":"supplement","size":19402,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Table 6\u003c/p\u003e","description":"","filename":"TableS6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/013a89e4cb93d13f047261a2.xlsx"},{"id":56160053,"identity":"690f5b19-df99-48c8-87e5-1d63e4a8884d","added_by":"auto","created_at":"2024-05-09 09:10:45","extension":"xlsx","order_by":19,"title":"","display":"","copyAsset":false,"role":"supplement","size":46151,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Table 7\u003c/p\u003e","description":"","filename":"TableS7.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/569d57b5052b149eb4c52a85.xlsx"},{"id":56160480,"identity":"32a5a283-d3f4-49eb-9b0f-39c56f99b759","added_by":"auto","created_at":"2024-05-09 09:18:46","extension":"jpg","order_by":20,"title":"","display":"","copyAsset":false,"role":"supplement","size":684957,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.1\u003c/p\u003e","description":"","filename":"ExtendeddataFig.1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/8737dfd13abce116a59b5884.jpg"},{"id":56160964,"identity":"6bc674e9-b409-4f1d-9341-332c264bc655","added_by":"auto","created_at":"2024-05-09 09:26:43","extension":"jpg","order_by":21,"title":"","display":"","copyAsset":false,"role":"supplement","size":21805779,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.2\u003c/p\u003e","description":"","filename":"ExtendedDataFig.2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/3323d7baa211c8205ca0db89.jpg"},{"id":56159451,"identity":"a805089d-ab09-49e3-a983-1525f841c2b7","added_by":"auto","created_at":"2024-05-09 09:02:44","extension":"tif","order_by":22,"title":"","display":"","copyAsset":false,"role":"supplement","size":2475256,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.3\u003c/p\u003e","description":"","filename":"ExtendeddataFig.3.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/67a7669a84b1acbd87a0a518.tif"},{"id":56158897,"identity":"09f83f43-d2bd-4393-9525-b54024a41b1a","added_by":"auto","created_at":"2024-05-09 08:54:42","extension":"tif","order_by":24,"title":"","display":"","copyAsset":false,"role":"supplement","size":3996728,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.5\u003c/p\u003e","description":"","filename":"extendeddataFig.5.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/2b339354383cf6e7c87bae94.tif"},{"id":56158905,"identity":"b8001d00-2663-41c5-82d1-1258c4fe693c","added_by":"auto","created_at":"2024-05-09 08:54:44","extension":"tif","order_by":26,"title":"","display":"","copyAsset":false,"role":"supplement","size":4388088,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.7\u003c/p\u003e","description":"","filename":"ExtendeddataFig.7.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/115dc95de302b545108575bf.tif"},{"id":56159458,"identity":"a2f188e3-9b21-4836-85e9-273f27e0fddf","added_by":"auto","created_at":"2024-05-09 09:02:45","extension":"jpg","order_by":27,"title":"","display":"","copyAsset":false,"role":"supplement","size":2210388,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.8\u003c/p\u003e","description":"","filename":"ExtendeddataFig.8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/bd3689584693002391db5e84.jpg"},{"id":56160475,"identity":"f9848f7d-c3d3-4e6c-af4b-378a5df73eb6","added_by":"auto","created_at":"2024-05-09 09:18:45","extension":"tif","order_by":29,"title":"","display":"","copyAsset":false,"role":"supplement","size":2216688,"visible":true,"origin":"","legend":"\u003cp\u003eExtended Data Fig.10\u003c/p\u003e","description":"","filename":"ExtendeddataFig.10.tif","url":"https://assets-eu.researchsquare.com/files/rs-4350570/v1/27faf45843d1fe39d0e8b1e4.tif"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Genome Evolution and Diversity of Wild and Cultivated Rice Species","fulltext":[{"header":"Introduction","content":"\u003cp\u003eRice (\u003cem\u003eOryza sativa\u003c/em\u003e L.) is a crucial crop globally and serves as a key model species for monocot and crop plant research \u003csup\u003e3\u003c/sup\u003e. Rice production will need to double by 2050 in order to feed the demand of the increasing world population\u003csup\u003e4\u003c/sup\u003e. It is expected to break the current production bottleneck that stagnates rice yields by exploiting the presence and absence genes and interspecies allelic diversity\u003csup\u003e5,6\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe \u003cem\u003eOryza\u003c/em\u003e genus comprises two cultivated rice species: Asian and African rice, along with 20 extant wild rice species. These wild rice comprising six diploids (AA, BB, CC, EE, FF and GG) and five allotetraploids (BBCC, CCDD, HHJJ, HHKK and KKLL), exhibiting significant genetic and phenotypic diversity, adapting to various ecological environments across Asia, Africa, America, and Australia, exhibits great genetic and phenotypic diversity \u003csup\u003e7,8\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eRecent advancements in obtaining high-quality genome assemblies of cultivated rice have allowed for a thorough characterization of structural variation through comparative genomic analysis \u003csup\u003e9\u0026ndash;11\u003c/sup\u003e. Various studies have contributed to the construction of pangenomes that integrate cultivars and the AA genome of wild rice \u003csup\u003e12,13\u003c/sup\u003e. Currently, there is a lack of high-quality reference genomes for 11 rice wild relatives (BB, BBCC, CC, EE, FF, and GG) and a comprehensive super pangenome of the rice genus are lacking. Here, we de novo assembled high-quality genomes from 13 wild rice accessions (Supplementary Table\u0026nbsp;1), along with two cultivated rice species (cv. Nipponbare, referred to as NIP of japonica sub-species and R498 of indica sub-species), \u003cem\u003eOryza. glaberrima\u003c/em\u003e (cv. CG14 of African cultivated rice species) and \u003cem\u003eOryza rufipogon\u003c/em\u003e (W1943)\u003csup\u003e9,14\u0026ndash;16\u003c/sup\u003e. The Oryza genus super pangenome usher a new era for rice researcher and breeders with the amount new resource to improve cultivated rice and meet future food demands.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n\u003ch2\u003eHigh-quality assemblies of thirteen representative wild rice species\u003c/h2\u003e\n\u003cp\u003eWe selected thirteen representative wild rice species, including six allotetraploids and seven diploids, for de novo genome assembly. These accessions exhibit significant variation in geographical distribution and phenotype, as illustrated in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003ea. A total of 331.3 Gb of High-fidelity (HiFi) reads were generated for the 13 wild rice accessions, representing approximately 40-fold coverage relative to the NIP genome size of around 400 Mb (Supplementary Table\u0026nbsp;1). Through the integration of high-throughput chromosome conformation capture\u003csup\u003e17,18\u003c/sup\u003e, all 13 wild rice accessions were assembled at the chromosome level, resulting in genome size ranging from 393.77 Mb to 903.31 Mb with an average N50 contig size of 36.80 Mb (Supplementary Table\u0026nbsp;2, Supplementary Fig.\u0026nbsp;1, Extended data Figs.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e). The mapping of Illumina short reads and HiFi reads to the 13 wild rice genome assemblies revealed high percentages of alignment, with the results of more than 99.88% and 99.95%, respectively (Supplementary Fig.\u0026nbsp;2). While an average of 99.44% of the 1614 single-copy orthologs (BUSCO) were identified in these assemblies\u003csup\u003e19\u003c/sup\u003e. Furthermore, an average value of 23.64 for the LTR assembly index (LAI) was calculated for all assemblies\u003csup\u003e20\u003c/sup\u003e, as along with high consensus quality values, indicative of their superior quality, continuity and completeness (Supplementary Table\u0026nbsp;2, Supplementary Fig.\u0026nbsp;1).\u003c/p\u003e\n\u003cp\u003eThe gene structure annotation was performed using a combination of de novo, homologous, and transcript prediction methods for gene structure annotation based on the repeat-masked genome. This resulted in a higher number of protein-coding gene numbers (ranging from 37,711 to 85,846) compared to previous report on \u003cem\u003eO. brachyantha\u003c/em\u003e and NIP\u003csup\u003e21\u003c/sup\u003e (41,096) (Supplementary Table\u0026nbsp;2). The transcript data supported a high percentage (ranging from 71.1\u0026ndash;87.7%) of the predicted protein-coding genes, indicating the quality of the gene annotations (Supplementary Table\u0026nbsp;1). A phylogenetic tree based on 3555 single-copy orthologous genes grouped the 17 rice species into 5 clusters, which contradicted previous findings\u003csup\u003e22\u003c/sup\u003e (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eb). Phylogenetic inference using 1000 randomly selected single-copy orthologs showed that 66.42% were supported, suggesting potential incomplete lineage sorting or gene flow (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003eb). Additionally, an ML tree using chloroplast data was constructed to trace the maternal progenitors of allopolyploid wild rice (Supplementary Fig.\u0026nbsp;3). Transposable elements (TEs) play a significant role in shaping large plant genomes and driving genome evolution through periodic bursts of amplification (Supplementary Fig.\u0026nbsp;4). The TE content in the 13 wild rice genomes ranged from 35.11\u0026ndash;76.35%, with long terminal repeat retrotransposons (LTR-RTs) being the most abundant (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConstruction of the super pangenome of the\u003c/strong\u003e \u003cstrong\u003eOryza\u003c/strong\u003e \u003cstrong\u003egenus helps to review untapped genes and hidden genomic variations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe rice pangenome was expanded to include 16 species (17 subspecies) in the \u003cem\u003eOryza\u003c/em\u003e genus, incorporating genomes from three AA-genome Oryza species. Through OrthoFinder analysis\u003csup\u003e23\u003c/sup\u003e, a super pangenome was constructed, clustering 808,478 predicted gene models from 13 wild rice species, along with three previously published AA genotype rice genomes and a reference genome of \u003cem\u003eOryza sativa\u003c/em\u003e (Nip), resulting in a pangenome cluster of 101,723 (Extended data Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ea, Extended data Fig.\u0026nbsp;4a). A total of 9.66% of the gene families (9,834) were found to be shared among all 17 rice accessions, classified as core gene families. Dispensable gene families, present in 2\u0026ndash;15 individuals, made up 56.84% of the \u003cem\u003eOryza\u003c/em\u003e genus pan genome, while 33.48% were identified as species-specific gene families (Extended data Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ea, b). Compared with cultivated rice, our super pan-genome of this study can provide an additional 63,881 new gene families.\u003c/p\u003e\n\u003cp\u003eThe 17 rice accessions, comprising 11 diploids and 6 allotetraploids, were categorized into diploid genome types labeled A to G (Extended data Fig.\u0026nbsp;4b). Additionally, a syntenic pangenome was constructed to analyze the differentiation among the 7 diploid genomes, revealing 16,849 core gene families shared across these genomes, Dispensable families, accounted for 29.73% of the total gene sets (Extended data Fig.\u0026nbsp;4, Supplementary Table\u0026nbsp;3), with the largest proportion represented by genome type private gene sets, unique to individual genome types, making up 52.90% of the total gene sets (Supplementary Table\u0026nbsp;3).\u003c/p\u003e\n\u003cp\u003eWithin the \u003cem\u003eOryza\u003c/em\u003e genus, approximately 81.14% of the core genes could be assigned to protein domains in our Pfam and InterPro databases, which is nearly twice as high as the percentage of dispensable genes (41.82%) and more than seven times higher than accession-specific genes (10.83%) (Extended data Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ed). Core genes exhibited 6- to 20-fold higher expression levels compared to shell and private genes (Extended data Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ee). This tendency was also reflected in gene length, with core genes showing significantly lower (0.15-fold on average) pairwise nonsynonymous substitution/synonymous substitution ratios (Ka/Ks) compared to the dispensable genes (Extended data Figs.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ef, g), indicating conservation of function among core genes in the \u003cem\u003eOryza\u003c/em\u003e genus, while variable genes evolved more rapidly to adapt to diverse environments. The average LTR insertion ratio of core genes was notably lower than that of shell- and species-specific genes (Extended data Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ei), likely due to the lower number of exons per gene in accession specific genes compared to dispensable and core genes (Supplementary Fig.\u0026nbsp;5), suggesting that exon shuffling or loss contributes to their specificity of genes to each species.\u003c/p\u003e\n\u003cp\u003eIntriguingly, the core genes in the \u003cem\u003eOryza\u003c/em\u003e genus are primarily involved in fundamental functions such as transposition, iron ion binding, transport, and electron transport, indicating their role in maintaining essential activities of \u003cem\u003eOryza\u003c/em\u003e genus (Extended data Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eh, Supplementary Table\u0026nbsp;3). A further GO analysis of the different genome types of rice private genes showed distinct functional difference except for disease resistance (Supplementary Fig.\u0026nbsp;6). These \u003cem\u003eOryza\u003c/em\u003e genus super pangenome open the door for non-AA genome wild rice resources utilizing in rice biology and breeding.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n\u003ch2\u003eTransposon signature contribute to various genome and centromere size in rice\u003c/h2\u003e\n\u003cp\u003eThe selective removal and retention of TEs has significantly influenced the genome size, adaptation and evolution of the \u003cem\u003eOryza\u003c/em\u003e genus \u003csup\u003e24\u003c/sup\u003e. Scientists have been intrigued by the question of which specific subfamily of TEs influences the genome size\u003csup\u003e25\u003c/sup\u003e. To address this, a comprehensive classification of TEs within the \u003cem\u003eOryza\u003c/em\u003e genus was conducted, along with an in-depth analysis of the expansion profiles of long terminal repeat (LTR) subfamilies (Top 6 selected from the largest genome size in \u003cem\u003eOryza\u003c/em\u003e) were performed. Each LTR-RT sub-family displayed unique patterns of amplification across the species, impacting gene numbers and expressions (Extended data Fig.\u0026nbsp;5b, Supplementary Fig.\u0026nbsp;7a, Supplementary Table\u0026nbsp;4). In addition to the amplification of Gypsy superfamily (Ogre, Retand, and Tekay), the Angle LTR belongs to the Copia superfamily emerged as a significant contributor to the large genome size of \u003cem\u003eO. australiensis\u003c/em\u003e (E genome type rice), distinguishing it from other rice species (Extended data Fig.\u0026nbsp;5b, c, Supplementary Fig.\u0026nbsp;7a). The genome sizes of B, C, and D type rice genome were primarily influenced by the top three Gypsy superfamilies in descending order: Ogre, Retand and Tekay (Extended data Fig.\u0026nbsp;5c, Supplementary Table\u0026nbsp;4). The genome size of G genome type (\u003cem\u003eO. meyeriana)\u003c/em\u003e was predominantly influenced by Retand LTR amplification (Extended data Fig.\u0026nbsp;5c), and the abundance of Tekay in \u003cem\u003eO. glumaepatula\u003c/em\u003e surpassed that in the \u003cem\u003eO. sativa\u003c/em\u003e resulted in its slightly puffy genome. Notably, \u003cem\u003eO. brachyantha\u003c/em\u003e (F genome type) exhibited significant elimination of all LTR subfamilies compared to \u003cem\u003eO. sativa\u003c/em\u003e (Extended data Fig.\u0026nbsp;5c), aligning with the observed LTR density patterns across \u003cem\u003eOryza\u003c/em\u003e genomes (Supplementary Fig.\u0026nbsp;8). Genome size showed a stronger correlation with the Retand LTR superfamily compared to the other subfamilies within the \u003cem\u003eOryza\u003c/em\u003e (Extended data Fig.\u0026nbsp;5d). While SIRE and the CRM subfamily made up a small portion of the entire genome, they influenced a certain proportion of genes expression (Extended data Fig.\u0026nbsp;5c, Supplementary Fig.\u0026nbsp;7b, Supplementary Table\u0026nbsp;4).\u003c/p\u003e\n\u003cp\u003eThe distribution of whole-genome intact long terminal repeats (LTRs) indicated that the majority of LTR bursts occurred in proximity to centromeric regions (Supplementary Fig.\u0026nbsp;8). Rice centromeres consist of organized satellite repeats (SRs), interrupted by centromere-specific retrotransposons (CRRs) \u003csup\u003e26\u003c/sup\u003e. It remains unclear if other lines or wild rice plants possess unique centromere satellites and whether centromere repositioning events occurred during centromere evolution. In cultivated rice (MH63 and ZS97), 155 bp and 165 bp CentO satellite repeats were categorized into seven distinct subsets across the 12 chromosomes \u003csup\u003e27\u003c/sup\u003e. Interestingly, only a few copies of satellite repeats were identified in \u003cem\u003eO. meyeriana\u003c/em\u003e and \u003cem\u003eO. branchyantha\u003c/em\u003e wild rice. Compared to the cultivated rice (NIP) genome, the C genome contains centromere-specific 126 bp and 366 bp Cent satellite repeats (Supplementary Table\u0026nbsp;4). Phylogenetic analysis results showed that the satellite repeats in \u003cem\u003eOryza\u003c/em\u003e can be classified into four groups (Extended data Fig.\u0026nbsp;5e), with chromosomes of the same type tending to cluster together, supporting models of repeated amplification events involving the central domain and local homogenization. Examination of the genetic characteristics of centromeres in the \u003cem\u003eOryza\u003c/em\u003e genus revealed a decrease in gene density as centromeres are approached, along with an increase in transposon density and the frequency of k-mers (Extended data Fig.\u0026nbsp;5f). While the wild rice centromere sizes, except \u003cem\u003eOryza brathyantha\u003c/em\u003e, were notably larger than those in cultivated rice, the opposite trend was observed for the number of genes (Extended data Fig.\u0026nbsp;5g, Supplementary Fig.\u0026nbsp;7c). A comparative sequence map of centromere synteny between cultivated rice and wild rice highlighted extensive structural rearrangements in centromeric and pericentric regions across the \u003cem\u003eOryza\u003c/em\u003e genus (Extended data Fig.\u0026nbsp;5h, Supplementary Fig.\u0026nbsp;9, Supplementary Table\u0026nbsp;4). Additionally, several centromere repositioning events were noted in the synteny analysis (Supplementary Fig.\u0026nbsp;9).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLarge-scale chromosomal rearrangement and inferring the genome evolutionary history of\u003c/strong\u003e \u003cstrong\u003eOryza\u003c/strong\u003e \u003cstrong\u003elineages\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo compare the karyotype stability between cultivated rice and wild diploid genomes, we created a synteny map and conducted whole-genome pairwise alignments to identify large segment variations, such as translocations and inversions\u003csup\u003e7\u003c/sup\u003e. Notably large inversions (with more than 5 consecutive genes) shared by at least two consecutive species in the \u003cem\u003eOryza\u003c/em\u003e genus were prevalent in the genome alignments\u003csup\u003e11\u003c/sup\u003e (Extended data Fig.\u0026nbsp;6). While reports on segregating inversions in wild rice are scarce and have not yet included natural polyploid wild rice, most of these events were observed in the low-recombining pericentromeric regions of the \u003cem\u003eOryza\u003c/em\u003e chromosomes, with a few inversions being species-specific (Extended data Fig.\u0026nbsp;6). The AA genomic type genome displayed a significant level of chromosomal conservation. Furthermore, the gene synteny findings provide support for the phylogenetic position of the rice genus (Supplementary Fig.\u0026nbsp;10). Through multiple species/genome comparisons, many large-scale genomic rearrangements were validated, including an inversion of an approximately 2.53 Mb segment comprising 166 genes on Chr6 in the modern cultivated rice NIP (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003ea). The inversion occurred only among the common wild rice and \u003cem\u003eOryza sativa\u003c/em\u003e japonica with high latitude distribution, indicating that it has risen to expansion of geographical range. \u003cem\u003eOsMFT1\u003c/em\u003e, which contribute to its later flowering in \u003cem\u003eOryza sativa\u003c/em\u003e was located in the inversion region with an 89 kb distance next to the breakpoint in \u003cem\u003eOryza glumaepatula\u003c/em\u003e \u003csup\u003e\u003cem\u003e28\u003c/em\u003e\u003c/sup\u003e(Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003eb).\u003c/p\u003e\n\u003cp\u003eChromosomal rearrangements involving homoeologous groups 1, 3, and 6 of allotetraploid wild rice were initially identified in \u003cem\u003eOryza\u003c/em\u003e species (Extended data Fig.\u0026nbsp;7a). The comparison of syntenic blocks revealed that chromosomes 6D\u003csub\u003et\u003c/sub\u003e in \u003cem\u003eO. latifolia\u003c/em\u003e displayed complete collinearity with the corresponding 3D\u003csub\u003et\u003c/sub\u003e chromosomes in \u003cem\u003eO. alta\u003c/em\u003e and \u003cem\u003eO.grandigumis\u003c/em\u003e, However, a reciprocal translocation was observed between 3D\u003csub\u003et\u003c/sub\u003e and 1C\u003csub\u003et\u003c/sub\u003e in \u003cem\u003eO. alta\u003c/em\u003e and \u003cem\u003eO.grandigumis\u003c/em\u003e (Extended data Fig.\u0026nbsp;7a), due to a fragmental collinearity between 3C\u003csub\u003et\u003c/sub\u003e in \u003cem\u003eO. latifolia\u003c/em\u003e and 1C\u003csub\u003et\u003c/sub\u003e in \u003cem\u003eO. alta\u003c/em\u003e and \u003cem\u003eO.grandigumis\u003c/em\u003e. Additionally, a translocation between 1C\u003csub\u003et\u003c/sub\u003e and 6C\u003csub\u003et\u003c/sub\u003e as identified, with the 1Ct segment translocated to the end of 6C\u003csub\u003et\u003c/sub\u003e. Furthermore, a translocation was detected in homoeologous groups 4 and 7 in \u003cem\u003eOryza\u003c/em\u003e (Extended data Fig.\u0026nbsp;7c-i).\u003c/p\u003e\n\u003cp\u003eBy aligning allotetraploid wild rice resequencing data to the corresponding diploid BB and CC or CC and EE genomes, homeologous exchanges on each chromosome were identified based on the coverage depth calculated from unique reads. Several translocations of large segments between the subgenomes post-tetraploidization were discovered (Extended data Fig.\u0026nbsp;7b, c). Chromosomes B\u003csub\u003et\u003c/sub\u003e1 and C\u003csub\u003et\u003c/sub\u003e1 exhibited high synteny, but the coverage depth of the reads to the BB and CC genomes indicated a significant homoeologous exchange between them (Extended data Fig.\u0026nbsp;7b). The potential history of homeologous exchange is depicted in Extended data Fig.\u0026nbsp;7d.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDiscovery of untapped SVs in the\u003c/strong\u003e \u003cstrong\u003eOryza\u003c/strong\u003e \u003cstrong\u003egenus sequence and their influence on agronomic traits\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDespite extensive efforts to analyze genetic variants in cultivated rice and its ancestor species \u003cem\u003eO. rufipogon\u003c/em\u003e \u003csup\u003e29\u003c/sup\u003e, the genetic diversity in distantly related wild rice species such as \u003cem\u003eO. punctata\u003c/em\u003e, \u003cem\u003eO. rhizomatis\u003c/em\u003e, and \u003cem\u003eO. meyeriana\u003c/em\u003e remains poorly understood. We identified 2781\u0026ndash;10656 insertions, 2680\u0026ndash;10419 deletions, 4\u0026ndash;52 translocations, and 7\u0026ndash;22 inversions in the 16 rice accessions, with sizes ranging from 162.49-278.65 Mb, 182.13-705.17 Mb, 8.64-887.29 kb, and 41.51\u0026ndash;11.33 Mb, respectively (Extended data Fig.\u0026nbsp;8a, Supplementary Table\u0026nbsp;5). Interestingly, the cultivated rice and the AA genome of wild rice showed a higher number of structural variations compared to the non-AA genome of wild rice, although the size of variation was smaller, likely due to more regions aligning with the reference genome (Extended data Fig.\u0026nbsp;8a). Wild rice species-specific SVs accounted for a significant portion of the total variation, indicating untapped genetic diversity in wild rice compared to cultivated varieties (Extended data Fig.\u0026nbsp;8a).\u003c/p\u003e\n\u003cp\u003eThe majority of insertions, deletions, and inversions in the cultivar were shorter than 5 kb. As the length of SVs increased, there was a significant decrease in the number of SVs in cultivar and wild rice, whereas the wild rice variety had a higher number of SVs larger than 250 kb, leading to the presence of numerous private genes in the wild rice genome (Extended data Fig.\u0026nbsp;8b, Extended data Fig.\u0026nbsp;9a). Intergenic regions within the \u003cem\u003eOryza\u003c/em\u003e genus were most common locations for SVs, followed by regions\u0026thinsp;\u0026plusmn;\u0026thinsp;10kb around genes (Extended data Fig.\u0026nbsp;9b, Supplementary Table\u0026nbsp;5), in line with previous research findings \u003csup\u003e9\u003c/sup\u003e. Surprisingly, insertion and deletion variations were less frequent at the chromosomes ends (Extended data Fig.\u0026nbsp;8c). The private genes in each rice genome type and their corresponding presence-absence variations (PAVs) relative to the NIP were also recorded (Extended data Fig.\u0026nbsp;8c, d). when using NIP genome sequence as a reference, the cultivated rice displayed a higher number of SVs compared to wild rice, consistent with earlier observations (Extended data Fig.\u0026nbsp;8a). However, wild rice, particularly \u003cem\u003eO. australiensis\u003c/em\u003e and \u003cem\u003eO. meyeriana\u003c/em\u003e, exhibited substantial variation in SV sizes, indicating an enrichment of SVs in repetitive DNA regions (Extended data Fig.\u0026nbsp;9c). Further examination of transposable elements in PAV sequences revealed that other and DNA transposable elements were the primary components of both deletion and inversion variation (Extended data Fig.\u0026nbsp;9d).\u003c/p\u003e\n\u003cp\u003eBy analyzing a large number of SVs across different rice genomes within a phylogenetic framework, we were able to uncover evolutionary events that would have otherwise gone undetected with a limited number of genomes. Recent findings suggested that gene loss could be linked to insertion/deletion event. For example, a 500 kb insertion corresponding to the NIP genome was identified on chromosome 12 at 14.50 Mb (Extended data Fig.\u0026nbsp;8e). In addition, the insertion occurred only in the \u003cem\u003eO. eichingeri\u003c/em\u003e and C\u003csub\u003et\u003c/sub\u003e subgenome of \u003cem\u003eO. punctata\u003c/em\u003e. Further detailed investigation revealed that the PAV region contained a gene (\u003cem\u003eOPUW363G084108\u003c/em\u003e/\u003cem\u003eOEIW71G043491\u003c/em\u003e) specific to the C genome wild rice\u003csup\u003e30\u003c/sup\u003e (Extended data Fig.\u0026nbsp;8e). This gene might be de novo birth to contribute to the ability of wild rice to adapt to poor and problem soil in Sri Lanka. The phylogenetic results revealed that the gene originated from \u003cem\u003eO. eichingeri\u003c/em\u003e and then transferred to \u003cem\u003eO. punctata\u003c/em\u003e, providing evidence that \u003cem\u003eO. eichingeri\u003c/em\u003e was the progenitor of tetraploid \u003cem\u003eO. punctata\u003c/em\u003e, consistent with our chloroplast evolution results. This result demonstrated that insertion variation occurred during C genome wild rice speciation and cultivated rice, which exhibited SVs possibly via introgression from hybridization with \u003cem\u003eO. eichingeri\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eA multitude QTLs in \u003cem\u003eO. officinalis\u003c/em\u003e have been identified for brown planthopper resistance, but the lack of unknown sequences in wild rice has hindered the cloning of these genes\u003csup\u003e31\u003c/sup\u003e. In this study, we identified the \u003cem\u003eBph4\u003c/em\u003e gene through a combination of comparative genome analysis and gene annotation within the QTL region (Supplementary Fig.\u0026nbsp;11). Haplotype analysis indicated that \u003cem\u003eBph4\u003c/em\u003e is highly conserved in cultivated rice but displayed diversity in wild rice.\u003c/p\u003e\n\u003cp\u003eAmong the 13 wild rice accessions studies, only three species retained the functional S28 locus\u003csup\u003e32\u003c/sup\u003e, while others lacked either the ribosomal protein S27 gene or the nearby \u003cem\u003eUDPGT\u003c/em\u003e gene (Extended data Fig.\u0026nbsp;8f). The phylogenetic analysis of the \u003cem\u003eOryza\u003c/em\u003e genus suggested that the HS locus likely originated from \u003cem\u003eO. australiensis\u003c/em\u003e and diverged from the C genome of wild rice (Extended data Fig.\u0026nbsp;8f).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n\u003ch2\u003eAllelic and regulatory elements variations\u003c/h2\u003e\n\u003cp\u003eThe natural allelic variation of genes is essential for phenotypic diversity, environmental adaptation, and the process of domestication \u003csup\u003e33\u0026ndash;35\u003c/sup\u003e. Our analysis focused on variations in whole-genome alleles and their regulatory sequences (gene\u0026thinsp;\u0026plusmn;\u0026thinsp;10kb) in the rice genome, as there are very few highly collinear blocks between non-AA genomic wild rice and cultivated rice (Extended data Fig.\u0026nbsp;10, Supplementary Table\u0026nbsp;6). As the divergence from cultivated rice increased, the number of colinear genes between wild rice diploids and cultivar rice decreased, ranging from 18,463 to 23,812, with an average of 20,288 (Supplementary Table\u0026nbsp;6). Including 19 published, high-quality, chromosome-level cultivated rice genomes (Supplementary Table\u0026nbsp;1) in our study allowed us to identify comprehensive SVs resources for both wild and cultivated rice\u003csup\u003e9\u003c/sup\u003e. By mapping collinear genes with 10 kb nearby regions onto the corresponding region of Nipponbare, we identified SNPs and InDels of 50 bp or greater as PAV targets. The total number of SVs increased with the accession number, with cultivated rice showing a higher percentage of nonredundant SVs compared to wild rice (Extended data Fig.\u0026nbsp;11a). The wild rice genomes exhibited a greater number of alleles and gene haplotypes than cultivated rice (Extended data Fig.\u0026nbsp;11b, c), indicating a rich source of novel genetic variations. To delve deeper into the functional impact of SVs on genes or proteins, combining variant alleles detected in each species into haplotypes and annotating each accession independently is essential. Wild rice (sub)genome displayed a higher number of alleles in collinear genes within the core genome compared to cultivated rice (Extended data Fig.\u0026nbsp;11c). The number of gene haplotypes (gHap) and gene-coding sequence (CDS)-haplotypes (gcHap) in wild rice was significantly greater than in cultivated rice (Extended data Fig.\u0026nbsp;11c). Analyses of protein diversity in collinear genes between wild and cultivated rice have provided insight into their functional differentiation. A genome-wide protein cluster was created based on their domain similarity, revealing that wild rice had approximately 7 clusters, corresponding to the number of wild rice genome types, whereas cultivated rice predominantly clustered into one group, corresponding to the AA genome type (Extended data Fig.\u0026nbsp;11d). Furthermore, analysis of gene presence-absence variations (PAVs) distinguished major species and highlighted significant differences between wild and cultivated rice (Extended data Fig.\u0026nbsp;10b-d). The majority of group-unbalanced genes, accounting for 87.33%, were more prevalent in wild rice but less common in cultivated rice, underscoring the substantial legacy of mutations in wild rice (Extended data Fig.\u0026nbsp;11e). Notably, the selection for grain coat color during rice domestication is evident, with wild rice species predominantly displaying black and red grain, while most cultivars exhibit white seed coat color. Structural variation analyses revealed distinct haplotypes of the Rc protein among cultivated and wild rice accessions\u003csup\u003e36\u003c/sup\u003e. Compared to the Rc haplotype in cultivated rice, wild rice exhibited 7 haplotypes corresponding to different genome types, suggesting that genetic divergence in \u003cem\u003eRc\u003c/em\u003e played a role in grain pericarp development during domestication (Extended data Fig.\u0026nbsp;11f).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n\u003ch2\u003eGene CNVs and NLR repertoire in rice\u003c/h2\u003e\n\u003cp\u003eRecent studies have highlighted the significant role of genomic copy number variations (gCNVs) in the evolution and domestication of crops \u003csup\u003e37,38\u003c/sup\u003e. However, the accurate identification of gCNVs in highly repetitive genome sequences within the rice genus pose notable challenges. Leveraging our high-quality assemblies, we systematically investigated gCNVs by aligning collinear blocks of the rice accessions against the Nipponbare reference genome, assessing their potential impact on important agronomic traits. Through whole-genome comparisons, we identified 207 genes with tandem repeats across the 14 wild rice assemblies, potentially influencing yield, resistance, grain quality, heading date, biotic and abiotic resistance (Supplementary Table\u0026nbsp;7). To gain further insights into the functional roles of gCNVs in rice, we analyzed 4400 genes with known functions from a previous study\u003csup\u003e39\u003c/sup\u003e. Among these genes, 36 exhibited tandem repeats in the rice genus, impacting various agronomic traits related to yield, disease and pest resistance (e.g., blast, bacterial blight, rice brown planthoppers), biotic stress tolerance, element transport, and other important adaptation traits like heading date and hybrid sterility (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ea). Additionally, we assessed the expression levels of selected gCNVs to investigate potential alterations in their expression profile. Notably, several variations linked to the \u003cem\u003ePi9\u003c/em\u003e cluster, with gCNVs in the 10.38 Mb region of Nip genome chromosome 6, were also identified. \u003cem\u003ePi9\u003c/em\u003e is a well-known gene in rice that offers strong and long-lasting resistance to the fungus \u003cem\u003eM. oryzae\u003c/em\u003e\u003csup\u003e\u003cem\u003e40\u003c/em\u003e\u003c/sup\u003e. Interestingly, \u003cem\u003ePi9\u003c/em\u003e is a typically NLR genes with copy number variation, which contributed to rice species environmental adaptation (Supplementary Fig.\u0026nbsp;12).\u003c/p\u003e\n\u003cp\u003eGenes that encoding nucleotide-binding domain and leucine-rich repeat (NLR) proteins play a crucial role in plant immune systems \u003csup\u003e41\u003c/sup\u003e. Therefore, it is essential to have a comprehensive and accurate NLR dataset for rice genera. Plant NLRs often occur in clusters, making their identification challenging. To address this issue, we utilized RGAugury \u003csup\u003e42\u003c/sup\u003e and DupGen_finder \u003csup\u003e43\u003c/sup\u003e tools, resulting in a total of 7,048 NLR genes across rice genus (Supplementary Table\u0026nbsp;7). The number of NLR genes varied from 419 in \u003cem\u003eO. glabberima\u003c/em\u003e to 511 in \u003cem\u003eO. sativa\u003c/em\u003e indica (R498) in cultivated rice and from 159 in \u003cem\u003eO.australiensis\u003c/em\u003e to 669 in \u003cem\u003eO. punctata\u003c/em\u003e in wild rice (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eb, Supplementary Table\u0026nbsp;7), This suggests that the immune system in wild rice has a more diverse evolutionary history compared to cultivated rice.\u003c/p\u003e\n\u003cp\u003eOur study focused on identifying and categorizing NLRs in different rice species to establish a comprehensive understanding of NLR diversity within rice genus. Interestingly, the diploid rice genome exhibited a lower number of NLR in wild rice compared to cultivated rice, despite the larger genome size in wild rice (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ef). For instance, the genomes of \u003cem\u003eO. australiensis\u003c/em\u003e and \u003cem\u003eO. meyeriana\u003c/em\u003e, although twice the size of NIP, contain only half the number of NLRs of that in cultivated rice (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ef). Analysis of NLR distribution showed that while R gene singletons were similar between wild and cultivated rice, cultivated rice tended to have a higher number of R genes in pairs or clusters compared to wild rice (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eb-d). Redundancy analysis revealed that 55.64% of NLR signatures were shared across all genomes, with 15 unique signatures in the cultivated group and 162 unique signatures in the wild group (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eg). The study found that as the number of cultivated rice accessions increased, the number of core NLR signatures also tended to increase. Redundancy analysis of the NLR gene in wild and cultivated rice revealed that 78.8% of the NLR genes in cultivated rice were dispensable, slightly lower than in wild rice (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003eh). More than 90% of NLR genes in the core NLR genome were expressed in both wild and cultivated rice, while around 20% of NLR genes in the dispensable genome exhibited low or no expression under normal conditions, suggesting specific expression upon encountering disease (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ei).\u003c/p\u003e\n\u003cp\u003eWe classified NLRs of the rice genus into 369 clusters, and 167 clusters of which were sharply increased in cultivated rice (Supplementary Table\u0026nbsp;7), including well-studied rice R gene families that provide resistance to rice blast disease caused by \u003cem\u003eXanthomonas oryzae. Pv. Oryzae\u003c/em\u003e (Xoo), such as WRKY61 (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ej). Additionally, an NLR expansion event was observed in the wild rice pangenome (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003ej), enabling these plants to adapt to various environments compared to cultivars. By leveraging lost NLR gene rice during domestication and artificial selection, we can enhance the resistance resources of cultivated rice and enrich the diversity of modern commercial rice. The total number of NLRs in cultivated rice species has increased compared to diploid wild rice species, despite some NLR gene losses, indicating that NLR expansion into cluster forms may be driven by breeding for specific pathogen resistance.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe integrated 13 wild rice species, three cultivated species and one common wild rice high-quality assembly to construct a comprehensive super pangenome of the rice genus. Compared with cultivated rice, our super pan-genome of this study can provide an additional 63,881 new gene families. Notably, we reconstructed the phylogenetic tree of \u003cem\u003eOryzae\u003c/em\u003e at the genome level and corrected the evolutionary positions of BB, CC, FF and GG rice species. Our analysis delved into pervasive structural variations, examining the size and distribution of \u003cem\u003eOryzae\u003c/em\u003e.\u003c/p\u003e \u003cp\u003eIn addition to the \u003cem\u003eOryzae\u003c/em\u003e pangenome resources we present, our study also exemplifies how these new resources can enhance our understanding of the role of SVs, gCNV, and allelic variation in the processes of environmental adaptation, domestication, differentiation, and artificial selection in rice. Moreover, our examination of why genome sizes vary significantly during evolution in \u003cem\u003eOryzae\u003c/em\u003e and which component of repeat sequences contributes predominantly to rice genome size serves as a model for similar analyses in other plant species. Additionally, we observed that the number of NLRs in cultivated rice exceeded that in wild rice diploid genome but exhibited lower disease resistance than in wild rice (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee-j); The cluster NLR number in cultivated rice was notably higher than in wild rice, suggesting that some additional copies of NLRs may be redundant in ensuring resistance in cultivated rice. This aligns with the notion that multiple NLRs are necessary for the broad-spectrum resistance of Tetep to blast \u003csup\u003e44\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe next step of \u003cem\u003eOryza\u003c/em\u003e genus pan-genomic will focus on the effect of increasing production, resistance to various diseases and adaptation to changing environment for the private genes and alleles through gene editing.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank all the member of the Longan Yan group at Jiangxi Academy of Agricultural science for collecting and preserving the wild rice resource. We thank Dr. Zhilan Fan for providing the GG genome type wild rice \u003cem\u003eO. meyeriana\u003c/em\u003e at Guangdong Academy of Agricultural science, we thank Dr. Shengyi Liu at Oil crops research institute, Chinese Academy of Agricultural science for providing constructive suggestions. This work was supported by China Agriculture Research System (CARS-01-08), National Key Research and Development Program of China (2017YFD0100302, 2023YFD1201203), National Natural Science Foundation of China (31960400), Major Discipline Academic and Technical Leaders Training Program of Jiangxi Province (20213BCJL22044), Jiangxi Technology Innovation Guidance Program (20223AEI91010).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eL. Y., and Y. C. supervising the work. L. L., L. L., W. X., and Y. L. collected sample for resequencing, HiC, and HiFi sequencing. M. W. performed the genome assembly. Q. H. performed the genome annotation. Y. W., and Y. W. conducted the super rice pangenome construction. Q. H., and Y. W. Conducted SV and CNV identification. J. W., Z. Y., and W. C. collected sample for RNA-seq sequencing and conducted expression validation. W. L., H. D., and H. X wrote the manuscript and design the experiment.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eYu, H.\u003cem\u003e et al.\u003c/em\u003e A route to de novo domestication of wild allotetraploid rice. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 1156-1170 e1114, doi:10.1016/j.cell.2021.01.013 (2021).\u003c/li\u003e\n\u003cli\u003eHuang, C., Chen, Z. \u0026amp; Liang, C. Oryza pan-genomics: A new foundation for future rice research and improvement. \u003cem\u003eThe Crop Journal\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 622-632, doi:10.1016/j.cj.2021.04.003 (2021).\u003c/li\u003e\n\u003cli\u003eWing, R. A., Purugganan, M. D. \u0026amp; Zhang, Q. The rice genome revolution: from an ancient grain to Green Super Rice. \u003cem\u003eNat Rev Genet\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 505-517, doi:10.1038/s41576-018-0024-z (2018).\u003c/li\u003e\n\u003cli\u003eWalkowiak, S.\u003cem\u003e et al.\u003c/em\u003e Multiple wheat genomes reveal global variation in modern breeding. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e588\u003c/strong\u003e, 277-283, doi:10.1038/s41586-020-2961-x (2020).\u003c/li\u003e\n\u003cli\u003eWang, W.\u003cem\u003e et al.\u003c/em\u003e Genomic variation in 3,010 diverse accessions of Asian cultivated rice. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e557\u003c/strong\u003e, 43-49, doi:10.1038/s41586-018-0063-9 (2018).\u003c/li\u003e\n\u003cli\u003eKhan, A. W.\u003cem\u003e et al.\u003c/em\u003e Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement. \u003cem\u003eTrends Plant Sci\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 148-158, doi:10.1016/j.tplants.2019.10.012 (2020).\u003c/li\u003e\n\u003cli\u003eStein, J. C.\u003cem\u003e et al.\u003c/em\u003e Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, 285-296, doi:10.1038/s41588-018-0040-0 (2018).\u003c/li\u003e\n\u003cli\u003eGe, S., Sang, T., Lu, B. R. \u0026amp; Hong, D. Y. Phylogeny of rice genomes with emphasis on origins of allotetraploid species. \u003cem\u003eProc Natl Acad Sci U S A\u003c/em\u003e \u003cstrong\u003e96\u003c/strong\u003e, 14400-14405, doi:10.1073/pnas.96.25.14400 (1999).\u003c/li\u003e\n\u003cli\u003eQin, P.\u003cem\u003e et al.\u003c/em\u003e Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 3542-3558 e3516, doi:10.1016/j.cell.2021.04.046 (2021).\u003c/li\u003e\n\u003cli\u003eLi, N.\u003cem\u003e et al.\u003c/em\u003e Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. \u003cem\u003eNat Genet\u003c/em\u003e, doi:10.1038/s41588-023-01340-y (2023).\u003c/li\u003e\n\u003cli\u003eZhou, Y.\u003cem\u003e et al.\u003c/em\u003e Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 1567, doi:10.1038/s41467-023-37004-y (2023).\u003c/li\u003e\n\u003cli\u003eZhao, Q.\u003cem\u003e et al.\u003c/em\u003e Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, 278-284, doi:10.1038/s41588-018-0041-z (2018).\u003c/li\u003e\n\u003cli\u003eShang, L. G.\u003cem\u003e et al.\u003c/em\u003e A super pan-genomic landscape of rice. \u003cem\u003eCell Research\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 878-896, doi:10.1038/s41422-022-00685-z (2022).\u003c/li\u003e\n\u003cli\u003eXie, X.\u003cem\u003e et al.\u003c/em\u003e A chromosome-level genome assembly of the wild rice Oryza rufipogon facilitates tracing the origins of Asian cultivated rice. \u003cem\u003eSci China Life Sci\u003c/em\u003e \u003cstrong\u003e64\u003c/strong\u003e, 282-293, doi:10.1007/s11427-020-1738-x (2021).\u003c/li\u003e\n\u003cli\u003eDu, H.\u003cem\u003e et al.\u003c/em\u003e Sequencing and de novo assembly of a near complete indica rice genome. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 15324, doi:10.1038/ncomms15324 (2017).\u003c/li\u003e\n\u003cli\u003eWang, M.\u003cem\u003e et al.\u003c/em\u003e The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e46\u003c/strong\u003e, 982-988, doi:10.1038/ng.3044 (2014).\u003c/li\u003e\n\u003cli\u003eBurton, J. N.\u003cem\u003e et al.\u003c/em\u003e Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. \u003cem\u003eNat Biotechnol\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 1119-1125, doi:10.1038/nbt.2727 (2013).\u003c/li\u003e\n\u003cli\u003eKaplan, N. \u0026amp; Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. \u003cem\u003eNat Biotechnol\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 1143-1147, doi:10.1038/nbt.2768 (2013).\u003c/li\u003e\n\u003cli\u003eSimao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. \u0026amp; Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 3210-3212, doi:10.1093/bioinformatics/btv351 (2015).\u003c/li\u003e\n\u003cli\u003eOu, S., Chen, J. \u0026amp; Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e46\u003c/strong\u003e, e126, doi:10.1093/nar/gky730 (2018).\u003c/li\u003e\n\u003cli\u003eChen, J.\u003cem\u003e et al.\u003c/em\u003e Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 1595, doi:10.1038/ncomms2596 (2013).\u003c/li\u003e\n\u003cli\u003eZou, X. H.\u003cem\u003e et al.\u003c/em\u003e Analysis of 142 genes resolves the rapid diversification of the rice genus. \u003cem\u003eGenome Biol\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, R49, doi:10.1186/gb-2008-9-3-r49 (2008).\u003c/li\u003e\n\u003cli\u003eEmms, D. M. \u0026amp; Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. \u003cem\u003eGenome Biol\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 238, doi:10.1186/s13059-019-1832-y (2019).\u003c/li\u003e\n\u003cli\u003ePulido, M. \u0026amp; Casacuberta, J. M. Transposable element evolution in plant genome ecosystems. \u003cem\u003eCurr Opin Plant Biol\u003c/em\u003e \u003cstrong\u003e75\u003c/strong\u003e, 102418, doi:10.1016/j.pbi.2023.102418 (2023).\u003c/li\u003e\n\u003cli\u003eKidwell, M. G. Transposable elements and the evolution of genome size in eukaryotes. \u003cem\u003eGenetica\u003c/em\u003e \u003cstrong\u003e115\u003c/strong\u003e, 49-63, doi:10.1023/a:1016072014259 (2002).\u003c/li\u003e\n\u003cli\u003eComai, L., Maheshwari, S. \u0026amp; Marimuthu, M. P. A. Plant centromeres. \u003cem\u003eCurr Opin Plant Biol\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 158-167, doi:10.1016/j.pbi.2017.03.003 (2017).\u003c/li\u003e\n\u003cli\u003eSong, J. M.\u003cem\u003e et al.\u003c/em\u003e Two gap-free reference genomes and a global view of the centromere architecture in rice. \u003cem\u003eMol Plant\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 1757-1767, doi:10.1016/j.molp.2021.06.018 (2021).\u003c/li\u003e\n\u003cli\u003eSong, S.\u003cem\u003e et al.\u003c/em\u003e OsMFT1 increases spikelets per panicle and delays heading date in rice by suppressing Ehd1, FZP and SEPALLATA-like genes. \u003cem\u003eJ Exp Bot\u003c/em\u003e \u003cstrong\u003e69\u003c/strong\u003e, 4283-4293, doi:10.1093/jxb/ery232 (2018).\u003c/li\u003e\n\u003cli\u003eKou, Y.\u003cem\u003e et al.\u003c/em\u003e Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication. \u003cem\u003eMol Biol Evol\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 3507-3524, doi:10.1093/molbev/msaa185 (2020).\u003c/li\u003e\n\u003cli\u003eGamuyao, R.\u003cem\u003e et al.\u003c/em\u003e The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e488\u003c/strong\u003e, 535-539, doi:10.1038/nature11346 (2012).\u003c/li\u003e\n\u003cli\u003eHu, J.\u003cem\u003e et al.\u003c/em\u003e Fine mapping and pyramiding of brown planthopper resistance genes QBph3 and QBph4 in an introgression line from wild rice O. officinalis. \u003cem\u003eMolecular Breeding\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, doi:10.1007/s11032-015-0228-2 (2015).\u003c/li\u003e\n\u003cli\u003eYamagata, Y.\u003cem\u003e et al.\u003c/em\u003e Mitochondrial gene in the nuclear genome induces reproductive barrier in rice. \u003cem\u003eProc Natl Acad Sci U S A\u003c/em\u003e \u003cstrong\u003e107\u003c/strong\u003e, 1494-1499, doi:10.1073/pnas.0908283107 (2010).\u003c/li\u003e\n\u003cli\u003eBai, F.\u003cem\u003e et al.\u003c/em\u003e Natural allelic variation in GRAIN SIZE AND WEIGHT 3 of wild rice regulates the grain size and weight. \u003cem\u003ePlant Physiol\u003c/em\u003e \u003cstrong\u003e193\u003c/strong\u003e, 502-518, doi:10.1093/plphys/kiad320 (2023).\u003c/li\u003e\n\u003cli\u003eSun, X.\u003cem\u003e et al.\u003c/em\u003e Natural variation of DROT1 confers drought adaptation in upland rice. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 4265, doi:10.1038/s41467-022-31844-w (2022).\u003c/li\u003e\n\u003cli\u003eHuang, X.\u003cem\u003e et al.\u003c/em\u003e Natural variation at the DEP1 locus enhances grain yield in rice. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, 494-497, doi:10.1038/ng.352 (2009).\u003c/li\u003e\n\u003cli\u003eFurukawa, T.\u003cem\u003e et al.\u003c/em\u003e The Rc and Rd genes are involved in proanthocyanidin synthesis in rice pericarp. \u003cem\u003ePlant J\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, 91-102, doi:10.1111/j.1365-313X.2006.02958.x (2007).\u003c/li\u003e\n\u003cli\u003eWang, Y.\u003cem\u003e et al.\u003c/em\u003e Copy number variation at the GL7 locus contributes to grain size diversity in rice. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e47\u003c/strong\u003e, 944-948, doi:10.1038/ng.3346 (2015).\u003c/li\u003e\n\u003cli\u003eDeng, Y.\u003cem\u003e et al.\u003c/em\u003e Epigenetic regulation of antagonistic receptors confers rice blast resistance with yield balance. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e355\u003c/strong\u003e, 962-965, doi:10.1126/science.aai8898 (2017).\u003c/li\u003e\n\u003cli\u003eHuang, F.\u003cem\u003e et al.\u003c/em\u003e New Data and New Features of the FunRiceGenes (Functionally Characterized Rice Genes) Database: 2021 Update. \u003cem\u003eRice (N Y)\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 23, doi:10.1186/s12284-022-00569-1 (2022).\u003c/li\u003e\n\u003cli\u003eQu, S.\u003cem\u003e et al.\u003c/em\u003e The broad-spectrum blast resistance gene Pi9 encodes a nucleotide-binding site-leucine-rich repeat protein and is a member of a multigene family in rice. \u003cem\u003eGenetics\u003c/em\u003e \u003cstrong\u003e172\u003c/strong\u003e, 1901-1914, doi:10.1534/genetics.105.044891 (2006).\u003c/li\u003e\n\u003cli\u003eFeehan, J. M., Castel, B., Bentham, A. R. \u0026amp; Jones, J. D. Plant NLRs get by with a little help from their friends. \u003cem\u003eCurr Opin Plant Biol\u003c/em\u003e \u003cstrong\u003e56\u003c/strong\u003e, 99-108, doi:10.1016/j.pbi.2020.04.006 (2020).\u003c/li\u003e\n\u003cli\u003eLi, P.\u003cem\u003e et al.\u003c/em\u003e RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. \u003cem\u003eBMC Genomics\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 852, doi:10.1186/s12864-016-3197-x (2016).\u003c/li\u003e\n\u003cli\u003eQiao, X.\u003cem\u003e et al.\u003c/em\u003e Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. \u003cem\u003eGenome Biol\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 38, doi:10.1186/s13059-019-1650-2 (2019).\u003c/li\u003e\n\u003cli\u003eWang, L.\u003cem\u003e et al.\u003c/em\u003e Large-scale identification and functional analysis of NLR genes in blast resistance in the Tetep rice genome sequence. \u003cem\u003eProc Natl Acad Sci U S A\u003c/em\u003e \u003cstrong\u003e116\u003c/strong\u003e, 18479-18487, doi:10.1073/pnas.1910229116 (2019).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4350570/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4350570/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eRice (\u003cem\u003eOryza sativa\u003c/em\u003e L.) is a vital staple food globally, but its genetic diversity has decreased due to extensive breeding. However, research on genome evolution and diversity of wild rice species, particularly those with BB, CC, BBCC, CCDD, EE, FF, and GG genome types, is limited, impeding their potential in rice breeding\u003csup\u003e1,2\u003c/sup\u003e. This study presents chromosome-scale genomes of thirteen representatives wild rice species from the \u003cem\u003eOryza\u003c/em\u003e genus. By integrating these genomes with four previously published ones, a total of 101,723 gene families were identified across the genus, including 9,834 (9.67%) core gene families. Additionally, 63,881 new gene families absent in cultivated rice species were discovered. Comparative genomic analysis among \u003cem\u003eOryza\u003c/em\u003e genomes reveals potential mechanisms underlying genome size variation, centromere evolution, and gene number and expression influenced by transposable elements. Extensive structural rearrangements, large scale sub-genomes exchanges, and widespread allelic variations and regulatory sequence variations were discovered in wild rice. We noticed an inversion that are pervasive occurred in \u003cem\u003eOryza rufipogon\u003c/em\u003e and \u003cem\u003eOryza sativa\u003c/em\u003e japonica, which is tightly linked to a locus that might contributed to the expansion of geographical range. Interestingly, a notable expansion but less diversity in disease resistance genes in cultivated genomes was observed, likely due to the random loss of some R genes and extensive amplification of others for specific diseases during domestication and artificial selection. This comprehensive study not only provide previously hidden legacy accessible to genetic studies and breeding but also deepens our understanding of rice evolution and biology.\u003c/p\u003e","manuscriptTitle":"Genome Evolution and Diversity of Wild and Cultivated Rice Species","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-05-09 08:54:20","doi":"10.21203/rs.3.rs-4350570/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"31e4688a-febe-4a4f-adc5-71f5fc9de62b","owner":[],"postedDate":"May 9th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":31486226,"name":"Biological sciences/Genetics/Genome/Genetic variation"},{"id":31486227,"name":"Biological sciences/Plant sciences/Plant evolution"}],"tags":[],"updatedAt":"2024-11-19T08:07:23+00:00","versionOfRecord":{"articleIdentity":"rs-4350570","link":"https://doi.org/10.1038/s41467-024-54427-3","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2024-11-18 05:00:00","publishedOnDateReadable":"November 18th, 2024"},"versionCreatedAt":"2024-05-09 08:54:20","video":"","vorDoi":"10.1038/s41467-024-54427-3","vorDoiUrl":"https://doi.org/10.1038/s41467-024-54427-3","workflowStages":[]},"version":"v1","identity":"rs-4350570","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4350570","identity":"rs-4350570","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.