Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes

doi:10.21203/rs.3.rs-4798240/v1

Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes

2024 · doi:10.21203/rs.3.rs-4798240/v1

preprint OA: closed

Full text JSON View at publisher

Full text 119,298 characters · extracted from preprint-html · click to expand

Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes Kai-Hua Jia, Ren-Gang Zhang, Hong-Yun Shang, Heng Shu, Yongpeng Ma This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4798240/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract We developed a scalable and robust approach, the Orthology Index ( OI ), to accurately identify orthologous synteny by calculating the proportion of pre-inferred orthologs within syntenic blocks. Our evaluation of a comprehensive dataset comprising nearly 100 known cases with diverse polyploidy events revealed that the approach is highly reliable and robust in the identification of orthologous synteny. This discovery highlights OI as a potentially universal criterion for the automated identification of orthologous synteny. Additionally, we demonstrate its broad applications in reconstructing plant genome evolutionary histories, including polyploidy and reticulation inference, and phylogenomics. The index is packaged in an all-in-one toolkit ( https://github.com/zhangrengang/OrthoIndex ). Biological sciences/Plant sciences/Plant evolution Biological sciences/Evolution/Evolutionary genetics Orthologous synteny Polyploidy Reticulation Phylogenomics Orthology Index Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Reconstruction of the evolutionary histories of organisms, including inference of the tree/network of life, identification of polyploidy events, and placement of the events on the tree/network, usually relies on the orthologous relationships at gene, block, chromosome, subgenome and/or whole genome scales. In general, the more orthologous genes or loci involved in the reconstruction of evolutionary history, the more confidence can be placed in it. For example, phylogenomic reconstructions using genome-scale data have become the gold standard for understanding the evolution of lineages in the tree of life [ 1 ]. Orthologous synteny has been established as a proxy to generate the maximum number of orthologs, from the chromosomal to the subgenomic/genomic scales. Orthologous synteny has been used successfully in reconstructing evolutionary histories in, for example, the poppies [ 2 ] and the angiosperms as a whole [ 3 ]. However, accurately identification of orthologous synteny is still challenging, especially in flowering plants, where recurrent whole-genome duplications (WGDs) or polyploidizations have produced substantial numbers of syntenic paralogs, which can confuse the inference of the orthology [ 1 , 4 ] and may mislead the reconstruction of evolutionary history. For example, the overlooked orthologous relationships in the synteny between two Papaver species resulted in an incorrect interpretation of the history of polyploidy in these species [ 2 , 5 , 6 ]. To date, two main strategies have been employed to identify orthologous synteny. One strategy is to filter the detected synteny with certain criteria, which are usually specific to each case and are therefore not scalable for large-scale datasets. One widely-used criterion is the synonymous substitution rate ( Ks ). Due to the whole genome duplications (WGDs; producing paralogs) and speciation (producing orthologs) events having occurred at different times in the past, Ks between syntenic gene pairs can be used to differentiate the ages of orthologous and paralogous synteny [e.g. 7, 8]. However, Ks -based methods are not always effective for distinguishing syntenic blocks from different events [ 9 ], and cannot be universally applied to the large-scale automated identification of orthologous syntenic blocks, because, as the events of different species occur at different times, the Ks values vary case by case, and because species have different substitution rates. A parameter, homo , has been proposed for use with WGDI, to extract the best homology (orthology) of syntenic blocks [ 10 ]. This parameter relies on another parameter, multiple , to define the top number of hits as best hits. Unfortunately, the parameter cannot be universally applied to different cases with different syntenic depth ratios. Another tool, QUOTA-ALIGN [ 9 ] was developed to screen orthologous syntenic blocks under given constraints on syntenic depths ( QUOTA ). However, users are required to set the prior QUOTA parameter for cases of different evolutionary histories. For example, it is necessary to set QUOTA = 1:1 for Arabidopsis thaliana and A. lyrata , but to set QUOTA = 4:2 for A. thaliana and poplar [ 9 ]. This is also similarly required when setting the multiple parameter in WGDI (-c option) when extracting orthologous syntenic blocks. The second alternative strategy for identifying orthologous synteny is the use of pre-inferred orthologs to call synteny, with tools such as MCScanX_h [ 11 ]. This strategy is scalable for large-scale datasets. However, hidden out-paralogs in the pre-inferred orthologs may result in out-paralogous syntenic blocks that need further removal with the first strategy. This problem can be frequently observed in previously published work, including certain studies investigating willow [ 12 ] and poplar [ 7 ] which used pre-inferred orthologs from OrthoFinder2 [ 13 ] and OrthoMCL [ 14 ], respectively, to call synteny. Moreover, the hidden orthologs (i.e. false negatives in ortholog inference) can reduce the efficiency of the subsequent detection of orthologous synteny in practice. To address these issues and to screen orthologous synteny robustly for large-scale datasets, we developed a scalable approach called Orthology Index ( OI ). We combined the information regarding pre-inferred orthology (provided by OrthoFinder2 or analogs) and synteny (by WGDI or analogs) to calculate the OI , which represents the proportion of orthologous gene pairs within a syntenic block. We evaluated the efficacy of OI using a comprehensive dataset comprising nearly 100 well-documented cases with diverse WGD and speciation events. Our analysis demonstrated the high robustness and accuracy of OI , as well as its usefulness in evolutionary inference including polyploidy, reticulation and phylogenomics. We finally integrated the index into a toolkit (freely available from https://github.com/zhangrengang/OrthoIndex ) to facilitate its use. Results and Discussion Identification of orthologous synteny using the Orthology Index We aim to distinguish orthologous synteny from out-paralogy robustly for genomes of two species. The genomes should have shared a whole genome duplication event (WGD, or polyploidization; producing out-paralogy) and a speciation event (producing orthology), and these events should have occurred within a short time period (denoted as Δ T , Fig. 1 A). To evaluate our approach, we selected ninety well-documented plant species pairs that shared at least one polyploidy event ( Table S1 , Figs. S1-90 ), along with the typical poplar–willow case (Fig. 1 ), from the literature. The typical example is shown in Fig. 1 with poplar ( Populus trichocarpa as the representative) and willow ( Salix dunnii as the representative), which shared two rounds of polyploidization. The first round was a paleohexaploidization event (γ event, whole genome triplication or WGT) common to the core eudicots at ~ 120 million years ago (Mya). The second was a lineage-specific tetraploidization event (WGD) shared by poplar and willow at ~ 60 Mya [ 15 – 18 ] (Fig. 1 A). Following the recurrent polyploidization events, the two genera speciated later at ~ 50 Mya [ 16 , 18 ] (Fig. 1 A). With the collinearity detector WGDI [ 10 ], syntenic blocks derived from the three evolutionary events (two polyploidization and one speciation) can be identified and distinguished visually using their distinct synonymous substitution rate ( Ks ) values and fragmentation levels (the more ancient the event, the more fragmented the blocks or greater the chromosomal rearrangements, and the higher the Ks value) (Fig. 1 B). We denoted the three categories of syntenic blocks as WGT-SBs ( Ks ≈ 1.5, most fragmented, out-paralogous), WGD-SBs ( Ks ≈ 0.27, moderately fragmented, out-paralogous), and S-SBs ( Ks ≈ 0.13, least fragmented, orthologous) (Fig. 1 B). The recent two Ks peaks derived from WGD-SBs (out-paralogy) and S-SBs (orthology) overlap (Fig. 1 B) and are difficult to split with a unified cutoff. The orthology inference method, OrhtoFinder2 [ 13 ], inferred a substantial number of orthologous genes (~ 30%) that exhibit synteny in the out-paralogous blocks ( Ks ≈ 0.27; Fig. 1 C), suggesting that they are hidden paralogs and would introduce out-paralogous synteny when imported to tools such as MCScanX_h [ 11 ]. This problem was observed in numerous instances where the Δ T was not substantial (e.g. Figs. S1, 5, 6, 8, 10 ), which could be attributed to biased gene loss or systematic errors [ 1 ]. Nevertheless, in the poplar–willow case, the peak ( Ks ≈ 0.27) of out-paralogs from the nearest WGD (WGD-SBs) was significantly reduced, while out-paralogs from the older WGT (WGT-SBs, Ks ≈ 0.13) were almost entirely eliminated (Fig. 1 C). This suggests that the level of accuracy of orthology inference by OrthoFinder2 is relatively high, although not entirely sufficient. Consequently, we combined the algorithmic advances of the two methods described above by introducing a straightforward index, referred to as the Orthology Index ( OI ), to determine the orthology of a syntenic block. Remarkably, the OI polarized orthology from out-paralogy with a very clear boundary ( OI = ~ 0.3–0.7) (Fig. 1 D), showing a much higher distinguishability than the Ks . The WGT-SBs exhibit an OI of nearly 0 (out-paralogous), and the WGD-SBs have an OI of approximately 0.1 (out-paralogous), whereas the S-SBs display an OI peak of approximately 0.9 (orthologous). We applied the index analysis to ninety study cases with diverse polyploidy and varying Δ T ( Table S1 , Figs. S1-90 ). The OI cutoff was then further adjusted to 0.6 to give a universal identification of orthology, according to the ranges of the polarized orthology – out-paralogy boundary calculated for these examined cases (Fig. 1 D, Figs. S1-90D ). Interestingly, we observed that when the OI cutoff value was set to 0.6, the index performed exceptionally and consistently well in identifying orthology accurately in nearly all instances (Fig. 1 E, Figs. S1-90E ). This finding highlights the efficiency and robustness of our index as a potentially universal criterion for identifying orthologous synteny. As for the poplar–willow case, a cutoff value of OI = 0.6 retained 12.2% of the blocks and 47.6% of the gene pairs as orthologs, effectively retrieving a 1:1 orthology relationship (Fig. 1 E) as expected from the evolutionary history (Fig. 1 A). False positives (hidden paralogy) and false negatives (excessive removal of true orthologous synteny) were rarely observed in the examined cases (Fig. 1 E, Figs. S1-90E ), demonstrating the high robustness. Therefore, the index can be applied in automated pipelines for large-scale datasets (ranging from dozens to hundreds of genomes at present) with a unified threshold of OI . We integrated the index into a toolkit ( https://github.com/zhangrengang/OrthoIndex ) to facilitate downstream application. Potential applications of OI in polyploidy inference: determining whether WGD events were shared among or specific to lineages Distinguishing synteny resulting from orthology and out-paralogy is vital for the inference of the evolutionary history of polyploidy. We illustrate a simple model to explain the process. When two genomes exhibit a 2:2 ratio of synteny depth, there are two main hypotheses to consider: the genomes share a tetraploidization event (WGD), or they both underwent a lineage-specific tetraploidization event (WGD) separately (Fig. 2 A). The OI offers a straightforward and visual method to test the two hypotheses with separated orthology and out-paralogy. If the genomes display a clear 1:1 orthology + 1:1 out-paralogy, similar to the poplar–willow case (Fig. 1 D; ignoring the very ancient out-paralogy from the γ event), the first hypothesis is supported (Fig. 2 A, left panel ). On the contrary, if they exhibit a 2:2 orthology, the second hypothesis is supported (Fig. 2 A, right panel ). Our experience of studying this in dozens of known cases ( Figs. S1-90 ) suggests that these inferences of polyploidy history from orthologous synteny patterns are reasonable and accurate. Consequently, we applied this strategy to the controversial question of whether the Apiaceae and Araliaceae families (both within Apiales) share a tetraploidization event. There are two major opposing views on this: (i) the two families share a tetraploidization event, as evidenced by synteny and corrected Ks [ 19 ]; (ii) oppositely, the two families do not share the tetraploidization event, as inferred by synteny and Ks -based methods [ 20 – 22 ]. Here, we reexamined the polyploidy of the available genomes from the two families using the OI . Using Centella asiatica (Apiaceae) and Aralia elata (Araliaceae) as representatives, a clear 2:1 orthologous synteny depth with the grape ( Vitis vinifera ) genome is observed ( Fig. S91 ), suggesting that both species have undergone a tetraploidization event following their divergence from grape. These inferences are in agreement with previous research [ 21 , 22 ]. However, upon comparison between the two genomes, a clear 1:1 orthology + 1:1 out-paralogy pattern was highlighted by the OI (Fig. 2 B), aligning with the first model (Fig. 2 A, left panel ) and thus suggesting a shared tetraploidization event (ω, WGD; Fig. 2 C) that is inconsistent with the first view above [ 19 ]. This inference was corroborated with macro-synteny phylogenies ( Fig. S92 ), demonstrating its reliability and confidence. Similar patterns were also observed in other combinations of genomes from the two families (e.g. Centella asiatica : Eleutherococcus senticosus = 1:2 orthology) ( Figs. S93-94 ), further supporting the inference. Additionally, we noted a considerable disparity in branch length (or substitution rate, genetic distance) between the two families from their most recent common ancestor (MRCA), with the mean branch length of Apiaceae ( L1 ) nearly three times than that of Araliaceae ( L2 ) (Fig. 2 C). This discrepancy can explain the unreasonable inferences drawn from uncorrected Ks -based methods [ 20 – 22 ], which assumed equal substitution rates for different lineages when estimating the timing of polyploidization and speciation events. Furthermore, we deduced additional lineage-specific polyploidy events within the two families using the OI -based approach. We observed that all studied genomes from species in the Apioideae subfamily (including Daucus carota , Angelica sinensis , Apium graveolens and Coriandrum sativum ) exhibit a 2:1 orthologous synteny depth when compared to the C. asiatica genome ( Fig. S95 ), suggesting that they each experienced a tetraploidization (WGD) event subsequent to their divergence from C. asiatica . Moreover, any two genomes within this subfamily display distinct 1:1 orthology + 1:1 out-paralogy patterns ( Figs. S56, 96 ), suggesting a shared tetraploidization event (α), which is in line with previous research [ 19 , 22 ]. Thus the α event was placed between the stem and crown of the Apioideae subfamily (Fig. 2 C). In addition, employing the OI , we also inferred one species-specific tetraploidization event in both the Panax ginseng and Eleutherococcus senticosus genomes (Fig. 2 C, Figs. S57-59, 97 ), consistent with some previous research [ 21 , 22 ] but in conflict with the results from Song et al [ 19 ]. These inferences demonstrate the ability of the OI to infer the diverse polyploidy events explicitly and position them on the tree of life. Potential applications of OI in the identification of reticulation Reticulation, driven by allopolyploidization or hybridization, constitutes a significant factor in eukaryotic evolution, yielding novel phenotypes that facilitate ecological diversification and the occupation of new niches [ 23 ]. Numerous genomes originating from recent reticulation events have been documented (summarized in Jia et al [ 24 ]), encompassing critical cereals [ 25 ], fruits [ 26 ], vegetables [ 27 ], trees [ 7 ] and fish [ 28 ]. We evaluated the application of the Orthology Index in several well-documented cases, ranging from simple to complex evolutionary scenarios (Fig. 3 ). We conclude that straightforward reticulate speciation can be inferred solely from the OI -colored dot plots. The cases of Arabidopsis [ 29 ] and Arachis [ 30 ] represent the simple hybridization or allopolyploidization scenarios. From the OI -colored dot plots, the two subgenomes of both the hybrid ( A. thaliana × A. renosa ) and tretaploid ( A. hypogaea ) show clear and separate orthologous relationships with their diploid progenitors (Fig. 3 A and 3 C, Fig. S98-99 ). Based on the orthologous relationships revealed by the OI , the hybridization events can be easily inferred (Fig. 3 B and 3 D). The orthologous relationships between the genomes in complicated polyploid species complexes ( Triticum and Papaver ), are also clear from the OI -colored dot plots (Fig. 3 E and 3 G, Fig. S100-101 ). The two subgenomes of the tretaploid T. turgidum are orthologous to the two subgenomes of the hexaploid T. aestivum (Fig. 3 E, Fig. S100 ), leading the inference that T. turgidum is the intermediate tretaploid progenitor of the hexaploid T. aestivum (Fig. 3 F), in line with previous results [ 25 ]. Similarly, a reticulate allopolyploidization origin in two Papaver genomes (Fig. 3 H) can be inferred from the OI -colored dot plots (Fig. 3 G, Fig. S101 ). This inference agrees with our previous work [ 2 ], however, the phylogenetic relationships between homoeologous subgenomes cannot be resolved directly from the OI -colored dot plots and require further evidence, such as chromosome or subgenome-scale phylogenies [ 2 ]. Potential applications of OI in phylogenomics Accurate inference of orthology plays a crucial role in the estimation of species trees, and the pseudo-orthologs (i.e. hidden paralogs) derived from WGD and gene loss can mislead species tree inference greatly under some circumstances [ 31 ]. As demonstrated in various cases above (Figs. 1 , S1-90), the Orthology Index exhibited a high level of accuracy in identifying syntenic orthology (or syntelogs) and can therefore minimize the detrimental influence. Consequently, the results can be directly applied to species tree reconstruction. We used the example of the core eudicots to showcase the application. It is accepted that all core eudicots share a paleohexaploidization event (γ event, WGT), meanwhile no two orders within core eudicots share an additional polyploidy event [ 32 ]. Therefore, it is appropriate to use the Orthology Index to remove the out-paralogy produced from the γ event, and to better resolve the phylogenetic relationships among the orders of the core eudicots, which remain poorly resolved for some orders within the core eudicots, such as the Celastrales–Oxalidales–Malpighiales (COM) clade [ 33 ]. Here we utilized the genome-scale syntenic orthologs inferred by the Orthology Index to reconstruct a backbone phylogeny of the core eudicots, aiming to minimize the detrimental influence of the γ event. We collected a high-quality genomic dataset that covers 28 (70%) of the 40 orders and 98 (33%) of the 298 families of core eudicots treated in APG IV [ 33 ]. We then applied the OI to identify syntenic orthologs, which resulted in the identification of 54,322 syntenic orthogroups (SOGs). After filtering, 12,277 multi-copy and 5,154 single-copy SOGs were retrieved, allowing for up to 40% taxa missing (Fig. 4 A). This imbalance between the numbers of multi-copy and single-copy SOGs is attributable to lineage-specific polyploidy events within the core eudicots. As a result of these polyploidy events, the occupancy of single-copy SOGs showed significant decreases in species with high relative ploidy (i.e. orthologous syntenic depth to the grape genome) (Fig. 4 B). Nevertheless, the order-level species tree topologies based on the two gene sets were identical and strongly supported with high posterior probabilities (Fig. 4 C ) , although the tree based on multi-copy SOGs was more robust, with equal or higher posterior probabilities at nearly all nodes (Fig. 4 C ) and there were slight differences in the positions of a few of the species ( Figs. S102-103 ). We observed large incongruences between our phylogeny and the APG IV [ 33 ], as nearly half of the orders covered in this study have inconsistent phylogenetic positions (Fig. 4 C ) . For example, the COM clade is not monophyletic and is placed with the malvids, in contrast to that from APG IV (Fig. 4 C ) . However, recent phylogenomics/phylotranscriptomic results are consistent with most of our findings [e.g. 34–40]. For example, the phylogenetic positions of the Fagales, Rosales, Celastrales–Oxalidales–Malpighiales, Myrtales, Cornales and Ericales (Fig. 4 C ) are consistent with the genome-scale phylogenomics based on coalescent-based analysis of 482 single-copy nuclear orthologous sequences [ 38 ]. In addition, the phylogenetic positions of the Santalales (sister to superrosids), Aquifoliales (sister to Garryales), and Crossosomatales (sister to fabids + remaining malvids) (Fig. 4 C ) are consistent with the phylogenetic inference based on coalescent tree analysis of 410 single-copy gene families extracted from transcriptome and genome data [ 35 ]. Our results further provide interesting phylogenomic insights for the core eudicots. We discovered that Vitales is sister to Saxifragales with high support (Fig. 4 C ) . Additionally, the Vitales + Saxifragales clade is found to be the sister group of the remaining rosids (Fig. 4 C ) , rejecting the hypothesis that Vitales are sister to Saxifragales and all other taxa in the superrosids clade [ 34 – 36 , 38 , 39 ]. We also found that the Zygophyllales were sister to the malvids clade with high support (Fig. 4 C ) , inconsistent with its placement as sister of the Malvales [ 35 , 39 ]. Considering that our analyses involved ten thousand syntenic orthologous gene families, and that we have minimized the detrimental effects of the shared γ event, we believe that our results are likely to be more robust than these mentioned above in the depiction of the real tree of life of the core eudicots. Nevertheless, our analyses were limited by taxon sampling (lack of high-quality genomes), which is expected to be resolved in the near future with the continuing developments and efforts in the field of genome sequencing [ 41 ]. Limitations The Orthology Index may not perform well in extremely complex cases. For instance, when Δ T is notably small (e.g. radiation following polyploidization in a few generations), OI may find it difficult to distinguish between out-paralogy and orthology. This method is also limited in some scenarios when ortholog inference and/or synteny detection is limited. For example, synteny is known not to be conserved in distantly-related lineages [ 42 ], suggesting that this method should not be applied in cases involving lineages that are distantly related (e.g. gymnosperms–angiosperms). Additionally, fragmented assemblies, as well as mis-assemblies, can disrupt synteny and subsequently reduce the efficiency of the index. However, this is likely to cease to be a concern in the near future with the fast development of sequencing and assembly techniques. Conclusions In summary, we present a human-interpretable and machine-actionable approach to distinguish orthology from out-paralogy for syntenic blocks. The approach can identify orthologous synteny robustly, as validated with nearly 100 representative cases. We have demonstrated the broad and valuable applications of this approach to the reconstruction of evolutionary history in plant genomes, including reconstruction of the tree/network of life, and identification of and placing of polyploidy events on the tree/network. This approach will extend our analytical capacity in evolutionary genomics and might reduce the misleading data generated using some traditional methods. Methods Data collection and pre-processing The genomic data were obtained from public databases or from the corresponding authors, as detailed in Table S2 . An all-versus-all BLAST search of protein sequences for each species was conducted pairwise using DIAMOND v0.9.24 [ 43 ]. Orthologous relationships were inferred using OrthoFinder v2.3.1 (Emms and Kelly 2019) (parameters: -M msa). Syntenic/collinear blocks were identified with the `-icl` option of WGDI [ 10 ] v0.6.2 (default parameters). The synonymous substitution rate ( Ks ) was calculated for homologous gene pairs using the `-ks` option of WGDI (default parameters). Distinguishing orthologous and out-paralogous syntenic blocks using the Orthology Index We propose an index, named the Orthology Index , that can distinguish the syntenic blocks of orthology from out-paralogy. The index ( OI ) is defined as: where m represents the total number of collinear gene pairs in a pre-inferred collinearity block, and n denotes the number of collinear gene pairs pre-inferred as orthologs. Thus, the index represents the proportion of orthologous syntenic gene pairs within a block. It ranges from 0 to 1 and can differentiate orthology from out-paralogy. A higher index value suggests a higher likelihood of orthology, whereas a lower index value suggests a higher likelihood of out-paralogy. From our experience with many different cases, the peak with the highest index can be generally considered to be orthologous (Figs. S1–90). Using this index as a foundation, we developed user-friendly all-in-one programs for visualization and downstream analyses. The subcommand ‘dotplot’ enables visualization and evaluation of synteny, with the dots colored by the index or K s values. The subcommand ‘filter’ retrieves orthologous blocks by discarding all blocks with less than a default index value of 0.6. The subcommand ‘cluster’ groups orthologous syntenic genes into syntenic orthogroups (SOGs), by constructing an orthologous syntenic graph and applying the Markov Cluster (MCL) algorithm [ 44 ] to perform graph clustering and break weak links. The subcommand ‘outgroup’ retrieves syntenic orthologs from outgroups that lack WGDs shared with ingroups. The subcommand ‘phylo’ reconstructs multi-copy or single-copy gene trees, by aligning protein sequences with MAFFT v7.481 [ 45 ], converting protein alignments to codon alignments with PAL2NAL v14 [ 46 ], trimming alignments with trimAl v1.2 [ 47 ] (parameter: -automated1) and reconstructing maximum-likelihood trees with IQ-TREE v2.2.0.3 [ 48 ]. These gene trees serve as input to infer a species tree with the coalescence-based method ASTRAL-Pro v1.10.1.3 [ 49 ]. The default threshold for missing taxa is set to 40%, according to a recent study [ 50 ]. This tool is implemented in Python3 and supports synteny outputs from state-of-the-art synteny detectors, including MCscan/JCVI [ 17 ], MCscanX [ 11 ] and WGDI [ 10 ], as well as orthology outputs from OrthoFinder2 [ 13 ], OrthoMCL [ 14 ] and other tools upon request. The tool can be easily installed using the conda environment or the Apptainer/Singularity container system [ 51 ], and can be seamlessly integrated into other pipelines. The source code is accessible on GitHub ( https://github.com/zhangrengang/OrthoIndex ). Declarations Code availability The codes and typical examples used in this study can be found on GitHub ( https://github.com/zhangrengang/OrthoIndex ). Data availability The codon alignments and gene trees of the core eudicots are available from Figshare ( https://doi.org/10.6084/m9.figshare.24174930 ). Competing interests The authors declare no competing interests. Author contributions RGZ and YPM conceived and designed the study; RGZ, HYS and HS programmed and tested the code; RGZ collected and analyzed the data; RGZ and HYS prepared figures; KHJ, RGZ and HYS drafted the manuscript; YPM revised the manuscript; all authors approved the final manuscript. Acknowledgments We thank Professors Ying-Xiong Qiu, Xiao-Ming Song, Quan-Jun Hu, Tao Zhou, and Zhao-Ying Liu for generously sharing their genomic data, and the other researchers who have already released their genomic data publicly. References Steenwyk JL, Li Y, Zhou X, Shen X, Rokas A (2023) Incongruence in the phylogenomics era. Nat Rev Genet Zhang R, Lu C, Li G, Lv J, Wang L, Wang Z, Chen Z, Liu D, Zhao Y, Shi T et al (2023) Subgenome-aware analyses suggest a reticulate allopolyploidization origin in three Papaver genomes. Nat Commun 14:2204 Ma J, Sun P, Wang D, Wang Z, Yang J, Li Y, Mu W, Xu R, Wu Y, Dong C et al (2021) The Chloranthus sessilifolius genome provides insight into early diversification of angiosperms. Nat Commun 12:6929 Guo C, Luo Y, Gao LM, Yi TS, Li HT, Yang JB, Li DZ (2023) Phylogenomics and the flowering plant tree of life. J Integr Plant Biol 65:299–323 Yang X, Gao S, Guo L, Wang B, Jia Y, Zhou J, Che Y, Jia P, Lin J, Xu T et al (2021) Three chromosome-scale Papaver genomes reveal punctuated patchwork evolution of the morphinan and noscapine biosynthesis pathway. Nat Commun 12:6030 Yang X, Gao S, Xu T, Wang B, Jia Y, Ye K (2023) Reply to Subgenome-aware analyses suggest a reticulate allopolyploidization origin in three Papaver genomes. Nat Commun 14:2203 An X, Gao K, Chen Z, Li J, Yang X, Yang X, Zhou J, Guo T, Zhao T, Huang S et al (2021) High quality haplotype-resolved genome assemblies of Populus tomentosa Carr., a stabilized interspecific hybrid species that is widespread in Asia. Mol Ecol Resour 22:786–802 Zhang T, Qiao Q, Du X, Zhang X, Hou Y, Wei X, Sun C, Zhang R, Yun Q, Crabbe MJC et al (2022) Cultivated hawthorn ( Crataegus pinnatifida var. major ) genome sheds light on the evolution of Maleae (apple tribe). J Integr Plant Biol 64:1487–1501 Tang H, Lyons E, Pedersen B, Schnable JC, Paterson AH, Freeling M (2011) Screening synteny blocks in pairwise genome comparisons through integer programming. BMC Bioinformatics 12:102 Sun P, Jiao B, Yang Y, Shan L, Li T, Li X, Xi Z, Wang X, Liu J (2022) WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant 15:1841–1851 Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H et al (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49 Chen J, Huang Y, Brachi B, Yun Q, Zhang W, Lu W, Li H, Li W, Sun X, Wang G et al (2019) Genome-wide analysis of Cushion willow provides insights into alpine plant divergence in a biodiversity hotspot. Nat Commun 10:5230 Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238 Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189 Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. ; 313:1596–604 Dai X, Hu Q, Cai Q, Feng K, Ye N, Tuskan GA, Milne R, Chen Y, Wan Z, Wang Z (2014) The willow genome and divergent evolution from poplar after the common genome duplication. Cell Res 24:1274 Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH (2008) Synteny and collinearity in plant genomes. Science 320:486–488 He L, Jia KH, Zhang RG, Wang Y, Shi TL, Li ZC, Zeng SW, Cai XJ, Wagner ND, Hörandl E et al (2021) Chromosome-scale assembly of the genome of Salix dunnii reveals a male‐heterogametic sex determination system on chromosome 7. Mol Ecol Resour 21:1966–1982 Song X, Sun P, Yuan J, Gong K, Li N, Meng F, Zhang Z, Li X, Hu J, Wang J et al (2021) The celery genome sequence reveals sequential paleo-polyploidizations, karyotype evolution and resistance gene reduction in apiales. Plant Biotechnol J 19:731–744 Jiang Z, Tu L, Yang W, Zhang Y, Hu T, Ma B, Lu Y, Cui X, Gao J, Wu X et al (2021) The chromosome-level reference genome assembly for Panax notoginseng and insights into ginsenoside biosynthesis. Plant Commun 2:100113 Wang Y, Zhang H, Ri HC, An Z, Wang X, Zhou J, Zheng D, Wu H, Wang P, Yang J et al (2022) Deletion and tandem duplications of biosynthetic genes drive the diversity of triterpenoids in Aralia elata . Nat Commun 13:2224 Yang Z, Chen S, Wang S, Hu Y, Zhang G, Dong Y, Yang S, Miao J, Chen W, Sheng J (2021) Chromosomal-scale genome assembly of Eleutherococcus senticosus provides insights into chromosome evolution in Araliaceae. Mol Ecol Resour 21:2204–2220 Otto SP, Whitton J (2000) Polyploid incidence and evolution. Annu Rev Genet 34:401–437 Jia K, Wang Z, Wang L, Li G, Zhang W, Wang X, Xu F, Jiao S, Zhou S, Liu H et al (2022) SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome-specific k -mers. New Phytol 235:801–809 Marcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, Jakobsen KS, Wulff BBH, Steuernagel B, Mayer KFX, Olsen OA et al (2014) Ancient hybridizations among the ancestral genomes of bread wheat. Science 345:1250092 Edger PP, Poorten TJ, VanBuren R, Hardigan MA, Colle M, McKain MR, Smith RD, Teresi SJ, Nelson ADL, Wai CM et al (2019) Origin and evolution of the octoploid strawberry genome. Nat Genet 51:541–547 Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B et al (2014) Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345:950–953 Li JT, Wang Q, Huang YM, Li QS, Cui MS, Dong ZJ, Wang HW, Yu JH, Zhao YJ, Yang CR et al (2021) Parallel subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish. Nat Genet 53:1493–1503 Jiang X, Song Q, Ye W, Chen ZJ (2021) Concerted genomic and epigenomic changes accompany stabilization of Arabidopsis allopolyploids. Nat Ecol Evol 5:1382–1393 Zhuang W, Chen H, Yang M, Wang J, Pandey MK, Zhang C, Chang W, Zhang L, Zhang X, Tang R et al (2019) The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication. Nat Genet 51:865–876 Xiong H, Wang D, Shao C, Yang X, Yang J, Ma T, Davis CC, Liu L, Xi Z (2022) Species tree estimation and the impact of gene loss following whole-genome duplication. Syst Biol 71:1348–1361 Van de Peer Y, Mizrachi E, Marchal K (2017) The evolutionary significance of polyploidy. Nat Rev Genet 18:411–424 The Angiosperm Phylogeny Group (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc 181:1–20 Zeng L, Zhang N, Zhang Q, Endress PK, Huang J, Ma H (2017) Resolution of deep eudicot phylogeny and their temporal diversification using nuclear genes from transcriptomic and genomic datasets. New Phytol 214:1338 Leebens-Mack JH, Barker MS, Carpenter EJ, Deyholos MK, Gitzendanner MA, Graham SW, Grosse I, Li Z, Melkonian M, Mirarab S et al (2019) One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574:679–685 Yang L, Su D, Chang X, Foster CSP, Sun L, Huang C, Zhou X, Zeng L, Ma H, Zhong B (2020) Phylogenomic insights into deep phylogeny of angiosperms based on broad nuclear gene sampling. Plant Commun 1:100027 Zhang C, Zhang T, Luebert F, Xiang Y, Huang C, Hu Y, Rees M, Frohlich MW, Qi J, Weigend M et al (2020) Asterid phylogenomics/phylotranscriptomics uncover morphological evolutionary histories and support phylogenetic placement for numerous whole genome duplications. Mol Biol Evol. :msaa160 Hu H, Sun P, Yang Y, Ma J, Liu J (2023) Genome-scale angiosperm phylogenies based on nuclear, plastome, and mitochondrial datasets. J Integr Plant Biol 65:1479–1489 Liu L, Chen M, Folk RA, Wang M, Zhao T, Shang F, Soltis DE, Li P (2023) Phylogenomic and syntenic data demonstrate complex evolutionary processes in early radiation of the rosids. Mol Ecol Resour Baker WJ, Bailey P, Barber V, Barker A, Bellot S, Bishop D, Botigué LR, Brewer G, Carruthers T, Clarkson JJ et al (2022) A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Syst Biol 71:301–319 Lewin HA, Richards S, Lieberman Aiden E, Allende ML, Archibald JM, Bálint M, Barker KB, Baumgartner B, Belov K, Bertorelle G et al (2020) The Earth BioGenome Project. : Starting the clock. Proc Natl Acad Sci U S A. 2022; 119:e2115635118 Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for Gene Orthology inference. Brief Bioinform 12:379–391 Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60 van Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30:121–141 Standley DM, Katoh K (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780 Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609–W612 Capella-Gutierrez S, Silla-Martinez J, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973 Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534 Zhang C, Mirarab S (2022) ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees. Bioinformatics 38:4949–4950 Morel B, Williams TA, Stamatakis A, Schwartz R (2023) Asteroid: a new algorithm to infer species trees from gene trees under high proportions of missing data. Bioinformatics 39:btac832 Kurtzer GM, Sochat V, Bauer MW (2017) Singularity: Scientific containers for mobility of compute. PLoS ONE 12:e177459 Additional Declarations There is NO Competing Interest. Supplementary Files orthoindexSI.docx orthoindexsupplement3.pdf SupplementaryFiguresandTables.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4798240","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":335229082,"identity":"4480230a-849e-448a-8fe3-2221957ea2cf","order_by":0,"name":"Kai-Hua Jia","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+klEQVRIiWNgGAWjYBACCQbGNiDFxgPl2/Dw8zeQpiVNRnLGAUJaGNiQ+YdtDBoS8GuRbD/c9pinhk+Gv4H94Ycff87zGDAcYPzwMQe3FmmexHZjnmNsPBIHeIwle9tu85gzNzBLztyGW4scQ2KbNA8b0C8HeBgkeBtu81g2HGBj5sWnhf8hUMs/Nh75A+yPf/75c47H4EACfi3SEkBbeNvYgCoZzIDWHSCsRXLGwzbJuX1sPIYHeMysZduSeSRnHGzG6xeJ8+nPJN58O2YvB3TYzTd/7Oz5+ZsPfviIRwsUHGNgkH8A4zA2EFQPBDXEKBoFo2AUjIKRCgDAIkqAclA6ywAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0002-8134-5830","institution":"Shandong Academy of Agricultural Sciences","correspondingAuthor":true,"prefix":"","firstName":"Kai-Hua","middleName":"","lastName":"Jia","suffix":""},{"id":335229083,"identity":"3b5adfbc-14b7-499a-b32c-0c954c01669c","order_by":1,"name":"Ren-Gang Zhang","email":"","orcid":"https://orcid.org/0000-0002-8028-9208","institution":"Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Ren-Gang","middleName":"","lastName":"Zhang","suffix":""},{"id":335229084,"identity":"c4fdce6b-d022-439b-96c8-88b57f380f63","order_by":2,"name":"Hong-Yun Shang","email":"","orcid":"","institution":"Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Hong-Yun","middleName":"","lastName":"Shang","suffix":""},{"id":335229085,"identity":"e37685a6-769d-4b32-864a-6798f3edae67","order_by":3,"name":"Heng Shu","email":"","orcid":"","institution":"Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Heng","middleName":"","lastName":"Shu","suffix":""},{"id":335229086,"identity":"fcdd6b95-4656-40d4-a153-cd39c6b0b538","order_by":4,"name":"Yongpeng Ma","email":"","orcid":"https://orcid.org/0000-0002-7725-3677","institution":"Kunming Institute of Botany, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Yongpeng","middleName":"","lastName":"Ma","suffix":""}],"badges":[],"createdAt":"2024-07-25 01:30:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4798240/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4798240/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":63125931,"identity":"08a4020d-d60f-47a3-94c0-10db85817456","added_by":"auto","created_at":"2024-08-23 12:18:19","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":551075,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe Illustration of the \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eOrthology Index\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e \u003c/em\u003e\u003cstrong\u003ein identifying orthologous synteny: a typical case. A)\u003c/strong\u003e Schematic evolutionary history of poplar and willow genomes, adapted from the literature [15-18]. \u003cstrong\u003eB)\u003c/strong\u003e \u003cem\u003eKs\u003c/em\u003e-colored dot plots showing synteny detected by WGDI, with an observable distinction of three categories of syntenic blocks derived from three evolutionary events (three peaks: \u003cem\u003eKs \u003c/em\u003e≈ 1.5, \u003cem\u003eKs \u003c/em\u003e≈ 0.27, and \u003cem\u003eKs \u003c/em\u003e≈ 0.13). \u003cstrong\u003eC)\u003c/strong\u003e \u003cem\u003eKs\u003c/em\u003e-colored dot plots illustrating orthology inferred by OrthoFinder2, with an observable high proportion (~30 %) of hidden out-paralogs (\u003cem\u003eKs \u003c/em\u003e≈ 0.27). \u003cstrong\u003eD)\u003c/strong\u003e \u003cem\u003eOrthology Index \u003c/em\u003e(\u003cem\u003eOI\u003c/em\u003e)-colored dot plots: integrating synteny (\u003cstrong\u003eB\u003c/strong\u003e) and orthology (\u003cstrong\u003eC\u003c/strong\u003e), with polarized and scalable distinction of three categories of syntenic blocks (three peaks: \u003cem\u003eOI \u003c/em\u003e≈ 0, \u003cem\u003eOI \u003c/em\u003e≈ 0.1, and \u003cem\u003eOI \u003c/em\u003e≈ 0.9). \u003cstrong\u003eE)\u003c/strong\u003e \u003cem\u003eKs\u003c/em\u003e-colored dot plots of synteny after applying an \u003cem\u003eOI\u003c/em\u003ecutoff of 0.6, with clean 1:1 orthology as expected from the evolutionary history. \u003cstrong\u003eB-E\u003c/strong\u003e are plotted using the ‘dotplot’ subcommand with four subplots: a) dot plots with colored by \u003cem\u003eKs \u003c/em\u003eor \u003cem\u003eOI \u003c/em\u003e(x-axis and y-axis, chromosomes of the two genomes; a dot indicates a homologous gene pair between the two genomes), b) histogram (with the same color map as the dot plots) of \u003cem\u003eKs \u003c/em\u003eor \u003cem\u003eOI\u003c/em\u003e (x-axis, \u003cem\u003eKs \u003c/em\u003eor \u003cem\u003eOI\u003c/em\u003e; y-axis, number of homologous gene pairs), c-d) synteny depth (relative ploidy) derived from 50-gene windows (x-axis, synteny depth; y-axis, number of windows). Examples of the three categories of syntenic blocks [referred to as WGT-SBs (\u003cem\u003eKs \u003c/em\u003e≈ 1.5, \u003cem\u003eOI \u003c/em\u003e≈ 0), WGD-SBs (\u003cem\u003eKs \u003c/em\u003e≈ 0.27, \u003cem\u003eOI \u003c/em\u003e≈ 0.1), and S-SBs (\u003cem\u003eKs \u003c/em\u003e≈ 0.13, \u003cem\u003eOI \u003c/em\u003e≈ 0.9)] are highlighted with dashed squares. These are associated with the evolutionary events and peaks of \u003cem\u003eKs \u003c/em\u003eor \u003cem\u003eOrthology Index\u003c/em\u003e, indicated by arrows, and labeled as ‘Out-paralogy’ or ‘Orthology’. Additional visible cases of other lineages can be found in \u003cstrong\u003eFigs. S1-90\u003c/strong\u003e (summarized in \u003cstrong\u003eTable S1\u003c/strong\u003e).\u003c/p\u003e","description":"","filename":"fig.1.png","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/f23d0cd1de51d5e6fc4e1c5f.png"},{"id":63125935,"identity":"a762bc9d-9bf5-4d03-be6e-1d51bd850214","added_by":"auto","created_at":"2024-08-23 12:18:19","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":284604,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eInference of polyploidy in the Apiaceae + Araliaceae genomes using the \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eOrthology Index\u003c/strong\u003e\u003c/em\u003e. \u003cstrong\u003eA)\u003c/strong\u003eSchematic illustration of determination of shared or lineage-specific polyploidy events using theorthologous syntenic relationships identified with the \u003cem\u003eOrthology Index\u003c/em\u003e. Despite a similar 2:2 ratio of synteny depth, the two scenarios have distinct patterns of orthologous synteny (1:1 orthology vs. 2:2 orthology). Labels A and B indicate two species, and A1–2 and B1–2 indicate duplicated chromosomes or blocks from a WGD event. \u003cstrong\u003eB)\u003c/strong\u003e \u003cem\u003eOrthology Index\u003c/em\u003e-colored dot plots indicating orthologous and out-paralogous synteny between genomes of \u003cem\u003eCentella asiatica\u003c/em\u003e (Apiaceae) and \u003cem\u003eAralia elata\u003c/em\u003e(Araliaceae). A typical 1:1 orthology + 1:1 out-paralogy synteny pattern is highlighted by the dashed squares. \u003cstrong\u003eC)\u003c/strong\u003e Phylogeny reconstructed from genomes of species in the Apiaceae and the Araliaceae (Apiales) with labels indicating polyploidization events. \u003cem\u003eL1\u003c/em\u003e and \u003cem\u003eL2\u003c/em\u003e represent the average branch length (genetic distance) of the Apiaceae and the Araliaceae from their most recent common ancestor (MRCA), respectively. Numbers at the nodes denote bootstrapping values (percentage). The maximum-likelihood phylogenetic tree was reconstructed using IQ-TREE2, based on concatenated codon alignments of 2,363 single-copy genes (with at most 20 % taxa missing). Additional evidence supporting the inferred polyploidization events can be found in \u003cstrong\u003eFigs. S91-97\u003c/strong\u003e.\u003c/p\u003e","description":"","filename":"fig.2.png","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/efa1b242d60b2e9001c3b50b.png"},{"id":63125929,"identity":"cb932dc3-8223-4674-aebf-7fdc16dcb4f6","added_by":"auto","created_at":"2024-08-23 12:18:19","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":296595,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExamples of reticulation inferences based on the \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eOrthology Index\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e.\u003c/strong\u003e \u003cstrong\u003eA-B\u003c/strong\u003e) \u003cem\u003eOrthology Index\u003c/em\u003e-colored dot plots (\u003cstrong\u003eA\u003c/strong\u003e) ofthe genomes of \u003cem\u003eA. thaliana\u003c/em\u003e + \u003cem\u003eA. arenosa\u003c/em\u003e and their hybrid \u003cem\u003eA. thaliana\u003c/em\u003e × \u003cem\u003eA. arenosa\u003c/em\u003e, and the inference (\u003cstrong\u003eB\u003c/strong\u003e; in dendrogram form) from the orthology evidence. \u003cstrong\u003eC-D\u003c/strong\u003e) \u003cem\u003eOrthology Index\u003c/em\u003e-colored dot plots (\u003cstrong\u003eC\u003c/strong\u003e) of the genomes of the tretaploid \u003cem\u003eA. hypogaea\u003c/em\u003e and its diploid progenitors, and the inference (\u003cstrong\u003eD\u003c/strong\u003e; in dendrogram form) from the orthology evidence. \u003cstrong\u003eE-F\u003c/strong\u003e) \u003cem\u003eOrthology Index\u003c/em\u003e-colored dot plots (\u003cstrong\u003eE\u003c/strong\u003e) ofthe genomes of the hexaploid \u003cem\u003eT. aestivum\u003c/em\u003e and its intermediate tretaploid\u003cem\u003e T. turgidum\u003c/em\u003e, and the inference (\u003cstrong\u003eF\u003c/strong\u003e; in dendrogram form) from the orthology evidence. \u003cstrong\u003eG-H\u003c/strong\u003e) \u003cem\u003eOrthology Index\u003c/em\u003e-colored dot plots (\u003cstrong\u003eG\u003c/strong\u003e) of the genomes of the neo-octoploid \u003cem\u003eP. setigerum\u003c/em\u003e and its intermediate tretaploid\u003cem\u003e P. somniferum\u003c/em\u003e, and the inference (\u003cstrong\u003eH\u003c/strong\u003e; in dendrogram form) from the orthology evidence. Only one set of representative homoeologous chromosomes is shown in the dot plots, and the dot plots with full set of chromosomes can be found in \u003cstrong\u003eFigs. S98-101\u003c/strong\u003e.\u003c/p\u003e","description":"","filename":"Fig.3.png","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/abbe6b4a881fb4f75fe03901.png"},{"id":63125932,"identity":"056cfc83-ffb7-43db-997c-91e10d984225","added_by":"auto","created_at":"2024-08-23 12:18:19","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":457604,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAn example (core eudicots) of phylogenomics based on the \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eOrthology Index\u003c/strong\u003e\u003c/em\u003e. \u003cstrong\u003eA\u003c/strong\u003e) The number of multi-copy and single-copy syntenic orthogroups (SOGs) with different taxon occupancy. \u003cstrong\u003eB\u003c/strong\u003e) The SOG occupancy in species with different relative ploidy (i.e. orthologous syntenic depth to the grape genome) with at most 40 % taxa missing. Each point represents one species. ns, \u003cem\u003ep\u003c/em\u003e \u0026gt; 0.05; **,\u003cem\u003e p\u003c/em\u003e \u0026lt;= 0.01; ****, \u003cem\u003ep\u003c/em\u003e\u0026lt;= 0.0001; Wilcoxon test. \u003cstrong\u003eC\u003c/strong\u003e) Phylogenetic relationships within thecore eudicots reconstructed in this study versus in APG IV. Conflicting positions are marked in red; unresolved relationships in APG IV are marked in green, and orders not covered in this study are marked in blue. The numbers at the nodes are posterior probabilities from ASTRAL, with the black representing thosefrom the multi-copy SOGs and orange representing thosefrom the single-copy SOGs (omitted for equal values). Furtherdetails of the two trees reconstructed in this study can be found in \u003cstrong\u003eFigs. S102-103\u003c/strong\u003e.\u003c/p\u003e","description":"","filename":"fig.4.png","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/b6767f26dfa8b341040e1daf.png"},{"id":63127863,"identity":"9108ac1c-c528-4eee-86bb-1b7bbe48e5d9","added_by":"auto","created_at":"2024-08-23 12:34:22","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2220273,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/fbacb3c3-abf3-4085-b5f4-a050382835c8.pdf"},{"id":63126991,"identity":"3751bbdf-6783-4438-8dbf-cfb86dac53c3","added_by":"auto","created_at":"2024-08-23 12:26:20","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":304394,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"orthoindexSI.docx","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/e8b6a88c86bdea83e9496b76.docx"},{"id":63125936,"identity":"40fec268-c1a7-44b0-a707-71d0e65e4c49","added_by":"auto","created_at":"2024-08-23 12:18:21","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":85024461,"visible":true,"origin":"","legend":"","description":"","filename":"orthoindexsupplement3.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/7b0ce6508556506d4391d1e5.pdf"},{"id":63125930,"identity":"919926ad-2ffb-4fd2-9b76-f51b1ccfd6e9","added_by":"auto","created_at":"2024-08-23 12:18:19","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":15308,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFiguresandTables.docx","url":"https://assets-eu.researchsquare.com/files/rs-4798240/v1/e09107d986ed035d1dcba2fe.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes","fulltext":[{"header":"Introduction","content":"\u003cp\u003eReconstruction of the evolutionary histories of organisms, including inference of the tree/network of life, identification of polyploidy events, and placement of the events on the tree/network, usually relies on the orthologous relationships at gene, block, chromosome, subgenome and/or whole genome scales. In general, the more orthologous genes or loci involved in the reconstruction of evolutionary history, the more confidence can be placed in it. For example, phylogenomic reconstructions using genome-scale data have become the gold standard for understanding the evolution of lineages in the tree of life [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Orthologous synteny has been established as a proxy to generate the maximum number of orthologs, from the chromosomal to the subgenomic/genomic scales. Orthologous synteny has been used successfully in reconstructing evolutionary histories in, for example, the poppies [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] and the angiosperms as a whole [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. However, accurately identification of orthologous synteny is still challenging, especially in flowering plants, where recurrent whole-genome duplications (WGDs) or polyploidizations have produced substantial numbers of syntenic paralogs, which can confuse the inference of the orthology [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] and may mislead the reconstruction of evolutionary history. For example, the overlooked orthologous relationships in the synteny between two \u003cem\u003ePapaver\u003c/em\u003e species resulted in an incorrect interpretation of the history of polyploidy in these species [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo date, two main strategies have been employed to identify orthologous synteny. One strategy is to filter the detected synteny with certain criteria, which are usually specific to each case and are therefore not scalable for large-scale datasets. One widely-used criterion is the synonymous substitution rate (\u003cem\u003eKs\u003c/em\u003e). Due to the whole genome duplications (WGDs; producing paralogs) and speciation (producing orthologs) events having occurred at different times in the past, \u003cem\u003eKs\u003c/em\u003e between syntenic gene pairs can be used to differentiate the ages of orthologous and paralogous synteny [e.g. 7, 8]. However, \u003cem\u003eKs\u003c/em\u003e-based methods are not always effective for distinguishing syntenic blocks from different events [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], and cannot be universally applied to the large-scale automated identification of orthologous syntenic blocks, because, as the events of different species occur at different times, the \u003cem\u003eKs\u003c/em\u003e values vary case by case, and because species have different substitution rates. A parameter, \u003cem\u003ehomo\u003c/em\u003e, has been proposed for use with WGDI, to extract the best homology (orthology) of syntenic blocks [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. This parameter relies on another parameter, \u003cem\u003emultiple\u003c/em\u003e, to define the top number of hits as best hits. Unfortunately, the parameter cannot be universally applied to different cases with different syntenic depth ratios. Another tool, QUOTA-ALIGN [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] was developed to screen orthologous syntenic blocks under given constraints on syntenic depths (\u003cem\u003eQUOTA\u003c/em\u003e). However, users are required to set the prior \u003cem\u003eQUOTA\u003c/em\u003e parameter for cases of different evolutionary histories. For example, it is necessary to set \u003cem\u003eQUOTA\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1:1 for \u003cem\u003eArabidopsis thaliana\u003c/em\u003e and \u003cem\u003eA. lyrata\u003c/em\u003e, but to set \u003cem\u003eQUOTA\u003c/em\u003e\u0026thinsp;=\u0026thinsp;4:2 for \u003cem\u003eA. thaliana\u003c/em\u003e and poplar [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. This is also similarly required when setting the \u003cem\u003emultiple\u003c/em\u003e parameter in WGDI (-c option) when extracting orthologous syntenic blocks.\u003c/p\u003e \u003cp\u003eThe second alternative strategy for identifying orthologous synteny is the use of pre-inferred orthologs to call synteny, with tools such as MCScanX_h [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. This strategy is scalable for large-scale datasets. However, hidden out-paralogs in the pre-inferred orthologs may result in out-paralogous syntenic blocks that need further removal with the first strategy. This problem can be frequently observed in previously published work, including certain studies investigating willow [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] and poplar [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] which used pre-inferred orthologs from OrthoFinder2 [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] and OrthoMCL [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], respectively, to call synteny. Moreover, the hidden orthologs (i.e. false negatives in ortholog inference) can reduce the efficiency of the subsequent detection of orthologous synteny in practice.\u003c/p\u003e \u003cp\u003eTo address these issues and to screen orthologous synteny robustly for large-scale datasets, we developed a scalable approach called \u003cem\u003eOrthology Index\u003c/em\u003e (\u003cem\u003eOI\u003c/em\u003e). We combined the information regarding pre-inferred orthology (provided by OrthoFinder2 or analogs) and synteny (by WGDI or analogs) to calculate the \u003cem\u003eOI\u003c/em\u003e, which represents the proportion of orthologous gene pairs within a syntenic block. We evaluated the efficacy of \u003cem\u003eOI\u003c/em\u003e using a comprehensive dataset comprising nearly 100 well-documented cases with diverse WGD and speciation events. Our analysis demonstrated the high robustness and accuracy of \u003cem\u003eOI\u003c/em\u003e, as well as its usefulness in evolutionary inference including polyploidy, reticulation and phylogenomics. We finally integrated the index into a toolkit (freely available from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zhangrengang/OrthoIndex\u003c/span\u003e\u003cspan address=\"https://github.com/zhangrengang/OrthoIndex\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) to facilitate its use.\u003c/p\u003e"},{"header":"Results and Discussion","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of orthologous synteny using the Orthology Index\u003c/h2\u003e \u003cp\u003eWe aim to distinguish orthologous synteny from out-paralogy robustly for genomes of two species. The genomes should have shared a whole genome duplication event (WGD, or polyploidization; producing out-paralogy) and a speciation event (producing orthology), and these events should have occurred within a short time period (denoted as Δ\u003cem\u003eT\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). To evaluate our approach, we selected ninety well-documented plant species pairs that shared at least one polyploidy event (\u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e, \u003cb\u003eFigs. S1-90\u003c/b\u003e), along with the typical poplar\u0026ndash;willow case (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), from the literature. The typical example is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e with poplar (\u003cem\u003ePopulus trichocarpa\u003c/em\u003e as the representative) and willow (\u003cem\u003eSalix dunnii\u003c/em\u003e as the representative), which shared two rounds of polyploidization. The first round was a paleohexaploidization event (γ event, whole genome triplication or WGT) common to the core eudicots at ~\u0026thinsp;120\u0026nbsp;million years ago (Mya). The second was a lineage-specific tetraploidization event (WGD) shared by poplar and willow at ~\u0026thinsp;60 Mya [\u003cspan additionalcitationids=\"CR16 CR17\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). Following the recurrent polyploidization events, the two genera speciated later at ~\u0026thinsp;50 Mya [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003eWith the collinearity detector WGDI [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], syntenic blocks derived from the three evolutionary events (two polyploidization and one speciation) can be identified and distinguished visually using their distinct synonymous substitution rate (\u003cem\u003eKs\u003c/em\u003e) values and fragmentation levels (the more ancient the event, the more fragmented the blocks or greater the chromosomal rearrangements, and the higher the \u003cem\u003eKs\u003c/em\u003e value) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). We denoted the three categories of syntenic blocks as WGT-SBs (\u003cem\u003eKs\u003c/em\u003e\u0026thinsp;\u0026asymp;\u0026thinsp;1.5, most fragmented, out-paralogous), WGD-SBs (\u003cem\u003eKs\u003c/em\u003e\u0026thinsp;\u0026asymp;\u0026thinsp;0.27, moderately fragmented, out-paralogous), and S-SBs (\u003cem\u003eKs\u003c/em\u003e\u0026thinsp;\u0026asymp;\u0026thinsp;0.13, least fragmented, orthologous) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). The recent two \u003cem\u003eKs\u003c/em\u003e peaks derived from WGD-SBs (out-paralogy) and S-SBs (orthology) overlap (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB) and are difficult to split with a unified cutoff.\u003c/p\u003e \u003cp\u003eThe orthology inference method, OrhtoFinder2 [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], inferred a substantial number of orthologous genes (~\u0026thinsp;30%) that exhibit synteny in the out-paralogous blocks (\u003cem\u003eKs\u003c/em\u003e\u0026thinsp;\u0026asymp;\u0026thinsp;0.27; Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC), suggesting that they are hidden paralogs and would introduce out-paralogous synteny when imported to tools such as MCScanX_h [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. This problem was observed in numerous instances where the Δ\u003cem\u003eT\u003c/em\u003e was not substantial (e.g. \u003cb\u003eFigs. S1, 5, 6, 8, 10\u003c/b\u003e), which could be attributed to biased gene loss or systematic errors [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Nevertheless, in the poplar\u0026ndash;willow case, the peak (\u003cem\u003eKs\u003c/em\u003e\u0026thinsp;\u0026asymp;\u0026thinsp;0.27) of out-paralogs from the nearest WGD (WGD-SBs) was significantly reduced, while out-paralogs from the older WGT (WGT-SBs, \u003cem\u003eKs\u003c/em\u003e\u0026thinsp;\u0026asymp;\u0026thinsp;0.13) were almost entirely eliminated (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). This suggests that the level of accuracy of orthology inference by OrthoFinder2 is relatively high, although not entirely sufficient.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eConsequently, we combined the algorithmic advances of the two methods described above by introducing a straightforward index, referred to as the \u003cem\u003eOrthology Index\u003c/em\u003e (\u003cem\u003eOI\u003c/em\u003e), to determine the orthology of a syntenic block. Remarkably, the \u003cem\u003eOI\u003c/em\u003e polarized orthology from out-paralogy with a very clear boundary (\u003cem\u003eOI\u003c/em\u003e\u0026thinsp;=\u0026thinsp;~\u0026thinsp;0.3\u0026ndash;0.7) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD), showing a much higher distinguishability than the \u003cem\u003eKs\u003c/em\u003e. The WGT-SBs exhibit an \u003cem\u003eOI\u003c/em\u003e of nearly 0 (out-paralogous), and the WGD-SBs have an \u003cem\u003eOI\u003c/em\u003e of approximately 0.1 (out-paralogous), whereas the S-SBs display an \u003cem\u003eOI\u003c/em\u003e peak of approximately 0.9 (orthologous). We applied the index analysis to ninety study cases with diverse polyploidy and varying Δ\u003cem\u003eT\u003c/em\u003e (\u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e, Figs. S1-90\u003c/b\u003e). The \u003cem\u003eOI\u003c/em\u003e cutoff was then further adjusted to 0.6 to give a universal identification of orthology, according to the ranges of the polarized orthology \u0026ndash; out-paralogy boundary calculated for these examined cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD, \u003cb\u003eFigs. S1-90D\u003c/b\u003e). Interestingly, we observed that when the \u003cem\u003eOI\u003c/em\u003e cutoff value was set to 0.6, the index performed exceptionally and consistently well in identifying orthology accurately in nearly all instances (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE, \u003cb\u003eFigs. S1-90E\u003c/b\u003e). This finding highlights the efficiency and robustness of our index as a potentially universal criterion for identifying orthologous synteny. As for the poplar\u0026ndash;willow case, a cutoff value of \u003cem\u003eOI\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.6 retained 12.2% of the blocks and 47.6% of the gene pairs as orthologs, effectively retrieving a 1:1 orthology relationship (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE) as expected from the evolutionary history (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). False positives (hidden paralogy) and false negatives (excessive removal of true orthologous synteny) were rarely observed in the examined cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE, \u003cb\u003eFigs. S1-90E\u003c/b\u003e), demonstrating the high robustness. Therefore, the index can be applied in automated pipelines for large-scale datasets (ranging from dozens to hundreds of genomes at present) with a unified threshold of \u003cem\u003eOI\u003c/em\u003e. We integrated the index into a toolkit (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zhangrengang/OrthoIndex\u003c/span\u003e\u003cspan address=\"https://github.com/zhangrengang/OrthoIndex\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) to facilitate downstream application.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePotential applications of\u003c/b\u003e \u003cb\u003eOI\u003c/b\u003e \u003cb\u003ein polyploidy inference: determining whether WGD events were shared among or specific to lineages\u003c/b\u003e\u003c/p\u003e \u003cp\u003eDistinguishing synteny resulting from orthology and out-paralogy is vital for the inference of the evolutionary history of polyploidy. We illustrate a simple model to explain the process. When two genomes exhibit a 2:2 ratio of synteny depth, there are two main hypotheses to consider: the genomes share a tetraploidization event (WGD), or they both underwent a lineage-specific tetraploidization event (WGD) separately (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). The \u003cem\u003eOI\u003c/em\u003e offers a straightforward and visual method to test the two hypotheses with separated orthology and out-paralogy. If the genomes display a clear 1:1 orthology\u0026thinsp;+\u0026thinsp;1:1 out-paralogy, similar to the poplar\u0026ndash;willow case (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD; ignoring the very ancient out-paralogy from the γ event), the first hypothesis is supported (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, \u003cb\u003eleft panel\u003c/b\u003e). On the contrary, if they exhibit a 2:2 orthology, the second hypothesis is supported (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, \u003cb\u003eright panel\u003c/b\u003e). Our experience of studying this in dozens of known cases (\u003cb\u003eFigs. S1-90\u003c/b\u003e) suggests that these inferences of polyploidy history from orthologous synteny patterns are reasonable and accurate.\u003c/p\u003e \u003cp\u003eConsequently, we applied this strategy to the controversial question of whether the Apiaceae and Araliaceae families (both within Apiales) share a tetraploidization event. There are two major opposing views on this: (i) the two families share a tetraploidization event, as evidenced by synteny and corrected \u003cem\u003eKs\u003c/em\u003e [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]; (ii) oppositely, the two families do not share the tetraploidization event, as inferred by synteny and \u003cem\u003eKs\u003c/em\u003e-based methods [\u003cspan additionalcitationids=\"CR21\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Here, we reexamined the polyploidy of the available genomes from the two families using the \u003cem\u003eOI\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eUsing \u003cem\u003eCentella asiatica\u003c/em\u003e (Apiaceae) and \u003cem\u003eAralia elata\u003c/em\u003e (Araliaceae) as representatives, a clear 2:1 orthologous synteny depth with the grape (\u003cem\u003eVitis vinifera\u003c/em\u003e) genome is observed (\u003cb\u003eFig. S91\u003c/b\u003e), suggesting that both species have undergone a tetraploidization event following their divergence from grape. These inferences are in agreement with previous research [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. However, upon comparison between the two genomes, a clear 1:1 orthology\u0026thinsp;+\u0026thinsp;1:1 out-paralogy pattern was highlighted by the \u003cem\u003eOI\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB), aligning with the first model (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, \u003cb\u003eleft panel\u003c/b\u003e) and thus suggesting a shared tetraploidization event (ω, WGD; Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC) that is inconsistent with the first view above [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. This inference was corroborated with macro-synteny phylogenies (\u003cb\u003eFig. S92\u003c/b\u003e), demonstrating its reliability and confidence. Similar patterns were also observed in other combinations of genomes from the two families (e.g. \u003cem\u003eCentella asiatica\u003c/em\u003e : \u003cem\u003eEleutherococcus senticosus\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1:2 orthology) (\u003cb\u003eFigs. S93-94\u003c/b\u003e), further supporting the inference. Additionally, we noted a considerable disparity in branch length (or substitution rate, genetic distance) between the two families from their most recent common ancestor (MRCA), with the mean branch length of Apiaceae (\u003cem\u003eL1\u003c/em\u003e) nearly three times than that of Araliaceae (\u003cem\u003eL2\u003c/em\u003e) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). This discrepancy can explain the unreasonable inferences drawn from uncorrected \u003cem\u003eKs\u003c/em\u003e-based methods [\u003cspan additionalcitationids=\"CR21\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], which assumed equal substitution rates for different lineages when estimating the timing of polyploidization and speciation events.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFurthermore, we deduced additional lineage-specific polyploidy events within the two families using the \u003cem\u003eOI\u003c/em\u003e-based approach. We observed that all studied genomes from species in the Apioideae subfamily (including \u003cem\u003eDaucus carota\u003c/em\u003e, \u003cem\u003eAngelica sinensis\u003c/em\u003e, \u003cem\u003eApium graveolens\u003c/em\u003e and \u003cem\u003eCoriandrum sativum\u003c/em\u003e) exhibit a 2:1 orthologous synteny depth when compared to the \u003cem\u003eC. asiatica\u003c/em\u003e genome (\u003cb\u003eFig. S95\u003c/b\u003e), suggesting that they each experienced a tetraploidization (WGD) event subsequent to their divergence from \u003cem\u003eC. asiatica\u003c/em\u003e. Moreover, any two genomes within this subfamily display distinct 1:1 orthology\u0026thinsp;+\u0026thinsp;1:1 out-paralogy patterns (\u003cb\u003eFigs. S56, 96\u003c/b\u003e), suggesting a shared tetraploidization event (α), which is in line with previous research [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Thus the α event was placed between the stem and crown of the Apioideae subfamily (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). In addition, employing the \u003cem\u003eOI\u003c/em\u003e, we also inferred one species-specific tetraploidization event in both the \u003cem\u003ePanax ginseng\u003c/em\u003e and \u003cem\u003eEleutherococcus senticosus\u003c/em\u003e genomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC, \u003cb\u003eFigs. S57-59, 97\u003c/b\u003e), consistent with some previous research [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] but in conflict with the results from Song et al [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. These inferences demonstrate the ability of the \u003cem\u003eOI\u003c/em\u003e to infer the diverse polyploidy events explicitly and position them on the tree of life.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ePotential applications of\u003c/b\u003e \u003cb\u003eOI\u003c/b\u003e \u003cb\u003ein the identification of reticulation\u003c/b\u003e\u003c/p\u003e \u003cp\u003eReticulation, driven by allopolyploidization or hybridization, constitutes a significant factor in eukaryotic evolution, yielding novel phenotypes that facilitate ecological diversification and the occupation of new niches [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Numerous genomes originating from recent reticulation events have been documented (summarized in Jia et al [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]), encompassing critical cereals [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], fruits [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], vegetables [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], trees [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] and fish [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. We evaluated the application of the \u003cem\u003eOrthology Index\u003c/em\u003e in several well-documented cases, ranging from simple to complex evolutionary scenarios (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003e). We conclude that straightforward reticulate speciation can be inferred solely from the \u003cem\u003eOI\u003c/em\u003e-colored dot plots.\u003c/p\u003e \u003cp\u003eThe cases of \u003cem\u003eArabidopsis\u003c/em\u003e [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] and \u003cem\u003eArachis\u003c/em\u003e [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] represent the simple hybridization or allopolyploidization scenarios. From the \u003cem\u003eOI\u003c/em\u003e-colored dot plots, the two subgenomes of both the hybrid (\u003cem\u003eA. thaliana \u0026times; A. renosa\u003c/em\u003e) and tretaploid (\u003cem\u003eA. hypogaea\u003c/em\u003e) show clear and separate orthologous relationships with their diploid progenitors (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eA and \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, \u003cb\u003eFig. S98-99\u003c/b\u003e). Based on the orthologous relationships revealed by the \u003cem\u003eOI\u003c/em\u003e, the hybridization events can be easily inferred (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eB and \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe orthologous relationships between the genomes in complicated polyploid species complexes (\u003cem\u003eTriticum\u003c/em\u003e and \u003cem\u003ePapaver\u003c/em\u003e), are also clear from the \u003cem\u003eOI\u003c/em\u003e-colored dot plots (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eE and \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eG, \u003cb\u003eFig. S100-101\u003c/b\u003e). The two subgenomes of the tretaploid \u003cem\u003eT. turgidum\u003c/em\u003e are orthologous to the two subgenomes of the hexaploid \u003cem\u003eT. aestivum\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eE, \u003cb\u003eFig. S100\u003c/b\u003e), leading the inference that \u003cem\u003eT. turgidum\u003c/em\u003e is the intermediate tretaploid progenitor of the hexaploid \u003cem\u003eT. aestivum\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eF), in line with previous results [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Similarly, a reticulate allopolyploidization origin in two \u003cem\u003ePapaver\u003c/em\u003e genomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eH) can be inferred from the \u003cem\u003eOI\u003c/em\u003e-colored dot plots (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e3\u003c/span\u003eG, \u003cb\u003eFig. S101\u003c/b\u003e). This inference agrees with our previous work [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], however, the phylogenetic relationships between homoeologous subgenomes cannot be resolved directly from the \u003cem\u003eOI\u003c/em\u003e-colored dot plots and require further evidence, such as chromosome or subgenome-scale phylogenies [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003ePotential applications of\u003c/b\u003e \u003cb\u003eOI\u003c/b\u003e \u003cb\u003ein phylogenomics\u003c/b\u003e\u003c/p\u003e \u003cp\u003eAccurate inference of orthology plays a crucial role in the estimation of species trees, and the pseudo-orthologs (i.e. hidden paralogs) derived from WGD and gene loss can mislead species tree inference greatly under some circumstances [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. As demonstrated in various cases above (Figs.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, S1-90), the \u003cem\u003eOrthology Index\u003c/em\u003e exhibited a high level of accuracy in identifying syntenic orthology (or syntelogs) and can therefore minimize the detrimental influence. Consequently, the results can be directly applied to species tree reconstruction. We used the example of the core eudicots to showcase the application. It is accepted that all core eudicots share a paleohexaploidization event (γ event, WGT), meanwhile no two orders within core eudicots share an additional polyploidy event [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Therefore, it is appropriate to use the \u003cem\u003eOrthology Index\u003c/em\u003e to remove the out-paralogy produced from the γ event, and to better resolve the phylogenetic relationships among the orders of the core eudicots, which remain poorly resolved for some orders within the core eudicots, such as the Celastrales\u0026ndash;Oxalidales\u0026ndash;Malpighiales (COM) clade [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Here we utilized the genome-scale syntenic orthologs inferred by the \u003cem\u003eOrthology Index\u003c/em\u003e to reconstruct a backbone phylogeny of the core eudicots, aiming to minimize the detrimental influence of the γ event.\u003c/p\u003e \u003cp\u003eWe collected a high-quality genomic dataset that covers 28 (70%) of the 40 orders and 98 (33%) of the 298 families of core eudicots treated in APG IV [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. We then applied the \u003cem\u003eOI\u003c/em\u003e to identify syntenic orthologs, which resulted in the identification of 54,322 syntenic orthogroups (SOGs). After filtering, 12,277 multi-copy and 5,154 single-copy SOGs were retrieved, allowing for up to 40% taxa missing (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). This imbalance between the numbers of multi-copy and single-copy SOGs is attributable to lineage-specific polyploidy events within the core eudicots. As a result of these polyploidy events, the occupancy of single-copy SOGs showed significant decreases in species with high relative ploidy (i.e. orthologous syntenic depth to the grape genome) (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eB). Nevertheless, the order-level species tree topologies based on the two gene sets were identical and strongly supported with high posterior probabilities (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e, although the tree based on multi-copy SOGs was more robust, with equal or higher posterior probabilities at nearly all nodes (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e and there were slight differences in the positions of a few of the species (\u003cb\u003eFigs. S102-103\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe observed large incongruences between our phylogeny and the APG IV [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], as nearly half of the orders covered in this study have inconsistent phylogenetic positions (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. For example, the COM clade is not monophyletic and is placed with the malvids, in contrast to that from APG IV (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. However, recent phylogenomics/phylotranscriptomic results are consistent with most of our findings [e.g. 34\u0026ndash;40]. For example, the phylogenetic positions of the Fagales, Rosales, Celastrales\u0026ndash;Oxalidales\u0026ndash;Malpighiales, Myrtales, Cornales and Ericales (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e are consistent with the genome-scale phylogenomics based on coalescent-based analysis of 482 single-copy nuclear orthologous sequences [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. In addition, the phylogenetic positions of the Santalales (sister to superrosids), Aquifoliales (sister to Garryales), and Crossosomatales (sister to fabids\u0026thinsp;+\u0026thinsp;remaining malvids) (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e are consistent with the phylogenetic inference based on coalescent tree analysis of 410 single-copy gene families extracted from transcriptome and genome data [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eOur results further provide interesting phylogenomic insights for the core eudicots. We discovered that Vitales is sister to Saxifragales with high support (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. Additionally, the Vitales\u0026thinsp;+\u0026thinsp;Saxifragales clade is found to be the sister group of the remaining rosids (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e, rejecting the hypothesis that Vitales are sister to Saxifragales and all other taxa in the superrosids clade [\u003cspan additionalcitationids=\"CR35\" citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. We also found that the Zygophyllales were sister to the malvids clade with high support (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e, inconsistent with its placement as sister of the Malvales [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. Considering that our analyses involved ten thousand syntenic orthologous gene families, and that we have minimized the detrimental effects of the shared γ event, we believe that our results are likely to be more robust than these mentioned above in the depiction of the real tree of life of the core eudicots. Nevertheless, our analyses were limited by taxon sampling (lack of high-quality genomes), which is expected to be resolved in the near future with the continuing developments and efforts in the field of genome sequencing [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eLimitations\u003c/h2\u003e \u003cp\u003eThe \u003cem\u003eOrthology Index\u003c/em\u003e may not perform well in extremely complex cases. For instance, when Δ\u003cem\u003eT\u003c/em\u003e is notably small (e.g. radiation following polyploidization in a few generations), \u003cem\u003eOI\u003c/em\u003e may find it difficult to distinguish between out-paralogy and orthology. This method is also limited in some scenarios when ortholog inference and/or synteny detection is limited. For example, synteny is known not to be conserved in distantly-related lineages [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e], suggesting that this method should not be applied in cases involving lineages that are distantly related (e.g. gymnosperms\u0026ndash;angiosperms). Additionally, fragmented assemblies, as well as mis-assemblies, can disrupt synteny and subsequently reduce the efficiency of the index. However, this is likely to cease to be a concern in the near future with the fast development of sequencing and assembly techniques.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusions","content":"\u003cp\u003eIn summary, we present a human-interpretable and machine-actionable approach to distinguish orthology from out-paralogy for syntenic blocks. The approach can identify orthologous synteny robustly, as validated with nearly 100 representative cases. We have demonstrated the broad and valuable applications of this approach to the reconstruction of evolutionary history in plant genomes, including reconstruction of the tree/network of life, and identification of and placing of polyploidy events on the tree/network. This approach will extend our analytical capacity in evolutionary genomics and might reduce the misleading data generated using some traditional methods.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eData collection and pre-processing\u003c/h2\u003e \u003cp\u003eThe genomic data were obtained from public databases or from the corresponding authors, as detailed in \u003cb\u003eTable \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eAn all-versus-all BLAST search of protein sequences for each species was conducted pairwise using DIAMOND v0.9.24 [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Orthologous relationships were inferred using OrthoFinder v2.3.1 (Emms and Kelly 2019) (parameters: -M msa). Syntenic/collinear blocks were identified with the `-icl` option of WGDI [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] v0.6.2 (default parameters). The synonymous substitution rate (\u003cem\u003eKs\u003c/em\u003e) was calculated for homologous gene pairs using the `-ks` option of WGDI (default parameters).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eDistinguishing orthologous and out-paralogous syntenic blocks using the Orthology Index\u003c/h2\u003e \u003cp\u003eWe propose an index, named the \u003cem\u003eOrthology Index\u003c/em\u003e, that can distinguish the syntenic blocks of orthology from out-paralogy. The index (\u003cem\u003eOI\u003c/em\u003e) is defined as:\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAD4AAAArCAYAAAAg2jtsAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAMxSURBVGhD7Zk7TCpBFIYPNyY2ioUUPkKBhsIO4qMwFkasJLEzsYIKW0VjrbYIia2x9hU7MaEzJlKZgEYKgxYUxFcJ2u/d/2SGC3tlWczNDePul0zWeRD5d86cM+fg0nTIhvwST9vhCLcbjnC7oZTw29tb8ng85HK56OLigj4/P2llZYX7JycnYpU1lBIeDAapUCjQ/Pw8DQ8P087ODm1ublIymaTn52exyhrKmXoul6OhoSHK5/O0tbVFg4ODdHNzQ3Nzc2KFNZQTfnZ2Rt3d3bSwsEA9PT309PTE436/n59WUUr46+srZbNZWlxc5J0Gl5eXNDU1xS+hHZQS/vb2RhMTEzQ7O8t9vIjT01MaGxtjx9cOloXjn25sbNS8Kp5GTxqNRnlOtlQqJWb+DcbdxXeqVCpULBbZ8bUFkpRWpNNprb+/X9ve3tY+Pj547Pj4GMkNz9Wje1heqzsfMdKZtBQuRUOokUgkooVCodrLABhD63RMTR1naHV1lcLhMC0vL4vRRkqlEunC+W/pfJaWlrjfyZgKPzo64jO0trYmRszBmevq6uLLRafTVDiug5lMhgKBwJcxUu7uzMxMQ2jxer2mMdXoAI0N8/8FYfJ/AeeEsw1n9RU4+/i4dG445zjvzdZ3Gt+K47CGvb090oXWYipuUOVyue2roxW+soxWrRVNhQ8MDFBfXx/d39+LkT/s7+/T3d0d7e7u1mKqFTMH3zF1fYPabi3RFzVFxmQZymDO6+vrPFYfvx8fH7WRkRFlzByYCgcHBwcsFO8ITwivj9sQiznZVIjhwCkv2w1HuN1whHc6uCKPjo5yrEeFFYUH5OYYw5yx3wplhCMfQG4wOTlJ7+/vdHV1xRcp/f7Aos/Pz+nw8JD7VlDK1FFhBdVqleLxOJeU3W43V2BQccUT0bm3t5fXmaGUcFRYITQWi9X6+qWqoY8U2krhURnhSIyww1KYTItRcZX9l5cXGh8fF58wRxnhyP6wuzIbhNn7fL6GPn5ouL6+/lnOzVhhNZo1zvfDwwNnh7IwYoZzV7cbjvCfDLw/bnyJREKM2EQ4KsFGV2arHZ+enhYjjle3G0S/AayDynwKHzKEAAAAAElFTkSuQmCC\" width=\"62\" height=\"43\"\u003e\u003c/p\u003e\n\u003cp\u003ewhere \u003cem\u003em\u003c/em\u003e represents the total number of collinear gene pairs in a pre-inferred collinearity block, and \u003cem\u003en\u003c/em\u003e denotes the number of collinear gene pairs pre-inferred as orthologs. Thus, the index represents the proportion of orthologous syntenic gene pairs within a block. It ranges from 0 to 1 and can differentiate orthology from out-paralogy. A higher index value suggests a higher likelihood of orthology, whereas a lower index value suggests a higher likelihood of out-paralogy. From our experience with many different cases, the peak with the highest index can be generally considered to be orthologous (Figs. S1\u0026ndash;90).\u003c/p\u003e \u003cp\u003eUsing this index as a foundation, we developed user-friendly all-in-one programs for visualization and downstream analyses. The subcommand \u0026lsquo;dotplot\u0026rsquo; enables visualization and evaluation of synteny, with the dots colored by the index or \u003cem\u003eK\u003c/em\u003es values. The subcommand \u0026lsquo;filter\u0026rsquo; retrieves orthologous blocks by discarding all blocks with less than a default index value of 0.6. The subcommand \u0026lsquo;cluster\u0026rsquo; groups orthologous syntenic genes into syntenic orthogroups (SOGs), by constructing an orthologous syntenic graph and applying the Markov Cluster (MCL) algorithm [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e] to perform graph clustering and break weak links. The subcommand \u0026lsquo;outgroup\u0026rsquo; retrieves syntenic orthologs from outgroups that lack WGDs shared with ingroups. The subcommand \u0026lsquo;phylo\u0026rsquo; reconstructs multi-copy or single-copy gene trees, by aligning protein sequences with MAFFT v7.481 [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e], converting protein alignments to codon alignments with PAL2NAL v14 [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e], trimming alignments with trimAl v1.2 [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e] (parameter: -automated1) and reconstructing maximum-likelihood trees with IQ-TREE v2.2.0.3 [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. These gene trees serve as input to infer a species tree with the coalescence-based method ASTRAL-Pro v1.10.1.3 [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. The default threshold for missing taxa is set to 40%, according to a recent study [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis tool is implemented in Python3 and supports synteny outputs from state-of-the-art synteny detectors, including MCscan/JCVI [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], MCscanX [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] and WGDI [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], as well as orthology outputs from OrthoFinder2 [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], OrthoMCL [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] and other tools upon request. The tool can be easily installed using the conda environment or the Apptainer/Singularity container system [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e], and can be seamlessly integrated into other pipelines. The source code is accessible on GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zhangrengang/OrthoIndex\u003c/span\u003e\u003cspan address=\"https://github.com/zhangrengang/OrthoIndex\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e "},{"header":"Declarations","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eCode availability\u003c/h2\u003e \u003cp\u003eThe codes and typical examples used in this study can be found on GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zhangrengang/OrthoIndex\u003c/span\u003e\u003cspan address=\"https://github.com/zhangrengang/OrthoIndex\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eData availability\u003c/h2\u003e \u003cp\u003eThe codon alignments and gene trees of the core eudicots are available from Figshare (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.6084/m9.figshare.24174930\u003c/span\u003e\u003cspan address=\"10.6084/m9.figshare.24174930\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e\n\u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003ch2\u003eAuthor contributions\u003c/h2\u003e \u003cp\u003eRGZ and YPM conceived and designed the study; RGZ, HYS and HS programmed and tested the code; RGZ collected and analyzed the data; RGZ and HYS prepared figures; KHJ, RGZ and HYS drafted the manuscript; YPM revised the manuscript; all authors approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgments\u003c/h2\u003e \u003cp\u003eWe thank Professors Ying-Xiong Qiu, Xiao-Ming Song, Quan-Jun Hu, Tao Zhou, and Zhao-Ying Liu for generously sharing their genomic data, and the other researchers who have already released their genomic data publicly.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSteenwyk JL, Li Y, Zhou X, Shen X, Rokas A (2023) Incongruence in the phylogenomics era. Nat Rev Genet\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang R, Lu C, Li G, Lv J, Wang L, Wang Z, Chen Z, Liu D, Zhao Y, Shi T et al (2023) Subgenome-aware analyses suggest a reticulate allopolyploidization origin in three \u003cem\u003ePapaver\u003c/em\u003e genomes. Nat Commun 14:2204\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa J, Sun P, Wang D, Wang Z, Yang J, Li Y, Mu W, Xu R, Wu Y, Dong C et al (2021) The \u003cem\u003eChloranthus sessilifolius\u003c/em\u003e genome provides insight into early diversification of angiosperms. Nat Commun 12:6929\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo C, Luo Y, Gao LM, Yi TS, Li HT, Yang JB, Li DZ (2023) Phylogenomics and the flowering plant tree of life. J Integr Plant Biol 65:299\u0026ndash;323\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang X, Gao S, Guo L, Wang B, Jia Y, Zhou J, Che Y, Jia P, Lin J, Xu T et al (2021) Three chromosome-scale \u003cem\u003ePapaver\u003c/em\u003e genomes reveal punctuated patchwork evolution of the morphinan and noscapine biosynthesis pathway. Nat Commun 12:6030\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang X, Gao S, Xu T, Wang B, Jia Y, Ye K (2023) Reply to Subgenome-aware analyses suggest a reticulate allopolyploidization origin in three \u003cem\u003ePapaver\u003c/em\u003e genomes. Nat Commun 14:2203\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAn X, Gao K, Chen Z, Li J, Yang X, Yang X, Zhou J, Guo T, Zhao T, Huang S et al (2021) High quality haplotype-resolved genome assemblies of \u003cem\u003ePopulus tomentosa\u003c/em\u003e Carr., a stabilized interspecific hybrid species that is widespread in Asia. Mol Ecol Resour 22:786\u0026ndash;802\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang T, Qiao Q, Du X, Zhang X, Hou Y, Wei X, Sun C, Zhang R, Yun Q, Crabbe MJC et al (2022) Cultivated hawthorn (\u003cem\u003eCrataegus pinnatifida\u003c/em\u003e var. \u003cem\u003emajor\u003c/em\u003e) genome sheds light on the evolution of Maleae (apple tribe). J Integr Plant Biol 64:1487\u0026ndash;1501\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang H, Lyons E, Pedersen B, Schnable JC, Paterson AH, Freeling M (2011) Screening synteny blocks in pairwise genome comparisons through integer programming. BMC Bioinformatics 12:102\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun P, Jiao B, Yang Y, Shan L, Li T, Li X, Xi Z, Wang X, Liu J (2022) WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant 15:1841\u0026ndash;1851\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H et al (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J, Huang Y, Brachi B, Yun Q, Zhang W, Lu W, Li H, Li W, Sun X, Wang G et al (2019) Genome-wide analysis of Cushion willow provides insights into alpine plant divergence in a biodiversity hotspot. Nat Commun 10:5230\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178\u0026ndash;2189\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A (2006) The genome of black cottonwood, \u003cem\u003ePopulus trichocarpa\u003c/em\u003e (Torr. \u0026amp; Gray). Science. ; 313:1596\u0026ndash;604\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDai X, Hu Q, Cai Q, Feng K, Ye N, Tuskan GA, Milne R, Chen Y, Wan Z, Wang Z (2014) The willow genome and divergent evolution from poplar after the common genome duplication. Cell Res 24:1274\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH (2008) Synteny and collinearity in plant genomes. Science 320:486\u0026ndash;488\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe L, Jia KH, Zhang RG, Wang Y, Shi TL, Li ZC, Zeng SW, Cai XJ, Wagner ND, H\u0026ouml;randl E et al (2021) Chromosome-scale assembly of the genome of \u003cem\u003eSalix dunnii\u003c/em\u003e reveals a male‐heterogametic sex determination system on chromosome 7. Mol Ecol Resour 21:1966\u0026ndash;1982\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong X, Sun P, Yuan J, Gong K, Li N, Meng F, Zhang Z, Li X, Hu J, Wang J et al (2021) The celery genome sequence reveals sequential paleo-polyploidizations, karyotype evolution and resistance gene reduction in apiales. Plant Biotechnol J 19:731\u0026ndash;744\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang Z, Tu L, Yang W, Zhang Y, Hu T, Ma B, Lu Y, Cui X, Gao J, Wu X et al (2021) The chromosome-level reference genome assembly for \u003cem\u003ePanax notoginseng\u003c/em\u003e and insights into ginsenoside biosynthesis. Plant Commun 2:100113\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Zhang H, Ri HC, An Z, Wang X, Zhou J, Zheng D, Wu H, Wang P, Yang J et al (2022) Deletion and tandem duplications of biosynthetic genes drive the diversity of triterpenoids in \u003cem\u003eAralia elata\u003c/em\u003e. Nat Commun 13:2224\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang Z, Chen S, Wang S, Hu Y, Zhang G, Dong Y, Yang S, Miao J, Chen W, Sheng J (2021) Chromosomal-scale genome assembly of \u003cem\u003eEleutherococcus senticosus\u003c/em\u003e provides insights into chromosome evolution in Araliaceae. Mol Ecol Resour 21:2204\u0026ndash;2220\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOtto SP, Whitton J (2000) Polyploid incidence and evolution. Annu Rev Genet 34:401\u0026ndash;437\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJia K, Wang Z, Wang L, Li G, Zhang W, Wang X, Xu F, Jiao S, Zhou S, Liu H et al (2022) SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome-specific \u003cem\u003ek\u003c/em\u003e-mers. New Phytol 235:801\u0026ndash;809\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, Jakobsen KS, Wulff BBH, Steuernagel B, Mayer KFX, Olsen OA et al (2014) Ancient hybridizations among the ancestral genomes of bread wheat. Science 345:1250092\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEdger PP, Poorten TJ, VanBuren R, Hardigan MA, Colle M, McKain MR, Smith RD, Teresi SJ, Nelson ADL, Wai CM et al (2019) Origin and evolution of the octoploid strawberry genome. Nat Genet 51:541\u0026ndash;547\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B et al (2014) Early allopolyploid evolution in the post-Neolithic \u003cem\u003eBrassica napus\u003c/em\u003e oilseed genome. Science 345:950\u0026ndash;953\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi JT, Wang Q, Huang YM, Li QS, Cui MS, Dong ZJ, Wang HW, Yu JH, Zhao YJ, Yang CR et al (2021) Parallel subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish. Nat Genet 53:1493\u0026ndash;1503\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang X, Song Q, Ye W, Chen ZJ (2021) Concerted genomic and epigenomic changes accompany stabilization of \u003cem\u003eArabidopsis\u003c/em\u003e allopolyploids. Nat Ecol Evol 5:1382\u0026ndash;1393\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhuang W, Chen H, Yang M, Wang J, Pandey MK, Zhang C, Chang W, Zhang L, Zhang X, Tang R et al (2019) The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication. Nat Genet 51:865\u0026ndash;876\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiong H, Wang D, Shao C, Yang X, Yang J, Ma T, Davis CC, Liu L, Xi Z (2022) Species tree estimation and the impact of gene loss following whole-genome duplication. Syst Biol 71:1348\u0026ndash;1361\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan de Peer Y, Mizrachi E, Marchal K (2017) The evolutionary significance of polyploidy. Nat Rev Genet 18:411\u0026ndash;424\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThe Angiosperm Phylogeny Group (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc 181:1\u0026ndash;20\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZeng L, Zhang N, Zhang Q, Endress PK, Huang J, Ma H (2017) Resolution of deep eudicot phylogeny and their temporal diversification using nuclear genes from transcriptomic and genomic datasets. New Phytol 214:1338\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeebens-Mack JH, Barker MS, Carpenter EJ, Deyholos MK, Gitzendanner MA, Graham SW, Grosse I, Li Z, Melkonian M, Mirarab S et al (2019) One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574:679\u0026ndash;685\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang L, Su D, Chang X, Foster CSP, Sun L, Huang C, Zhou X, Zeng L, Ma H, Zhong B (2020) Phylogenomic insights into deep phylogeny of angiosperms based on broad nuclear gene sampling. Plant Commun 1:100027\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang C, Zhang T, Luebert F, Xiang Y, Huang C, Hu Y, Rees M, Frohlich MW, Qi J, Weigend M et al (2020) Asterid phylogenomics/phylotranscriptomics uncover morphological evolutionary histories and support phylogenetic placement for numerous whole genome duplications. Mol Biol Evol. :msaa160\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu H, Sun P, Yang Y, Ma J, Liu J (2023) Genome-scale angiosperm phylogenies based on nuclear, plastome, and mitochondrial datasets. J Integr Plant Biol 65:1479\u0026ndash;1489\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu L, Chen M, Folk RA, Wang M, Zhao T, Shang F, Soltis DE, Li P (2023) Phylogenomic and syntenic data demonstrate complex evolutionary processes in early radiation of the rosids. Mol Ecol Resour\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBaker WJ, Bailey P, Barber V, Barker A, Bellot S, Bishop D, Botigu\u0026eacute; LR, Brewer G, Carruthers T, Clarkson JJ et al (2022) A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Syst Biol 71:301\u0026ndash;319\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLewin HA, Richards S, Lieberman Aiden E, Allende ML, Archibald JM, B\u0026aacute;lint M, Barker KB, Baumgartner B, Belov K, Bertorelle G et al (2020) The Earth BioGenome Project. : Starting the clock. Proc Natl Acad Sci U S A. 2022; 119:e2115635118\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for Gene Orthology inference. Brief Bioinform 12:379\u0026ndash;391\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBuchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59\u0026ndash;60\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Dongen S (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30:121\u0026ndash;141\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStandley DM, Katoh K (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772\u0026ndash;780\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSuyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609\u0026ndash;W612\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCapella-Gutierrez S, Silla-Martinez J, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972\u0026ndash;1973\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMinh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530\u0026ndash;1534\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang C, Mirarab S (2022) ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees. Bioinformatics 38:4949\u0026ndash;4950\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMorel B, Williams TA, Stamatakis A, Schwartz R (2023) Asteroid: a new algorithm to infer species trees from gene trees under high proportions of missing data. Bioinformatics 39:btac832\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKurtzer GM, Sochat V, Bauer MW (2017) Singularity: Scientific containers for mobility of compute. PLoS ONE 12:e177459\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Orthologous synteny, Polyploidy, Reticulation, Phylogenomics, Orthology Index","lastPublishedDoi":"10.21203/rs.3.rs-4798240/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4798240/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWe developed a scalable and robust approach, the \u003cem\u003eOrthology Index\u003c/em\u003e (\u003cem\u003eOI\u003c/em\u003e), to accurately identify orthologous synteny by calculating the proportion of pre-inferred orthologs within syntenic blocks. Our evaluation of a comprehensive dataset comprising nearly 100 known cases with diverse polyploidy events revealed that the approach is highly reliable and robust in the identification of orthologous synteny. This discovery highlights \u003cem\u003eOI\u003c/em\u003e as a potentially universal criterion for the automated identification of orthologous synteny. Additionally, we demonstrate its broad applications in reconstructing plant genome evolutionary histories, including polyploidy and reticulation inference, and phylogenomics. The index is packaged in an all-in-one toolkit (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/zhangrengang/OrthoIndex\u003c/span\u003e\u003cspan address=\"https://github.com/zhangrengang/OrthoIndex\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e","manuscriptTitle":"Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-23 12:18:11","doi":"10.21203/rs.3.rs-4798240/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6cf5bf20-9819-4435-bd90-0f6b10e25b2e","owner":[],"postedDate":"August 23rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":35504958,"name":"Biological sciences/Plant sciences/Plant evolution"},{"id":35504959,"name":"Biological sciences/Evolution/Evolutionary genetics"}],"tags":[],"updatedAt":"2024-08-23T12:18:13+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-23 12:18:11","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4798240","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4798240","identity":"rs-4798240","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00