Identification and characterization of specific motifs in effector proteins of plant parasites using MOnSTER | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Identification and characterization of specific motifs in effector proteins of plant parasites using MOnSTER Silvia Bottini, giulia calia, paola porracciolo, yongpan chen, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3931000/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Motivation: Plant pathogens cause billions of dollars of crop loss every year and are a major threat to global food security. Identifying and characterizing pathogens effectors is crucial towards their improved control. Because of their poor sequence conservation, effector identification is challenging, and current methods generate too many candidates without indication for prioritizing experimental studies. In most phyla, effectors contain specific sequence motifs which influence their localization and targets in the plant. Therefore, there is an urgent need to develop bioinformatics tools tailored for pathogens effectors. Results To circumvent these limitations, we have developed MOnSTER a novel tool that identifies clu sters of m otifs of p rotein s equences (CLUMPs). MOnSTER can be fed with motifs identified by de novo tools or from databases such as Pfam and InterProScan. The advantage of MOnSTER is the reduction of motif redundancy by clustering them and associating a score. This score encompasses the physicochemical properties of AAs and the motif occurrences. We built up our method to identify discriminant CLUMPs in oomycetes effectors. Consequently, we applied MOnSTER on PPN and identified six CLUMPs in about 60% of the known nematode candidate parasitism proteins. Furthermore, we found co-occurrences of CLUMPs with protein domains important for invasion and pathogenicity. The potentiality of this tool goes beyond the effector characterization and can be used to easily cluster motifs and calculate the CLUMP-score on any set of protein sequences. Availability and implementation: The source python code and related data are available at: https://github.com/Plant-Net/MOnSTER_PROMOCA.git Biological sciences/Computational biology and bioinformatics/Protein analysis/Protein sequence analyses Biological sciences/Computational biology and bioinformatics/Software Motif clustering motif scoring effectors plant parasite interaction oomycetes nematodes Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Plant pathogens are a major threat to global food security. To cause the infection, pathogenic organisms secrete effector proteins that promote colonization of the host plant by overcoming the physical barriers of plant cell walls, suppressing or evading immune perception, and deriving nutrients from host tissues [ 1 ]. Therefore, identifying and characterizing pathogens effectors is crucial towards understanding how they manipulate the plant and better combat them. Effector proteins are often specific to pathogens and essential for causing plant pathology, constituting targets of choice for the development of cleaner and more specific control methods [ 2 ]–[ 4 ]. Because of their poor sequence conservation, effector identification among the set of predicted proteins from the genome (proteome) is challenging and current methods generate too many candidates without further indication for prioritizing experimental studies. Classically, effector proteins are indirectly identified among the predicted secretome based on the presence of a signal peptide for secretion and a lack of transmembrane region [ 5 ], [ 6 ]. However, these criteria alone suffer from two main limitations. On one side, the secretome comprises many proteins that are not effectors, on the other side some known effectors do not possess signal peptides for secretion. In most phyla, effectors contain specific sequence motifs which target host proteins with distinct roles in the infection process and control virulence [ 7 ]. The best-studied example is effectors secreted via the type III secretion system (T3SS) class of Gram-negative bacterial pathogens which are characterized by a specific motif/domain conferring a repertoire of molecular determinants with important roles during infection [ 8 ], [ 9 ]. However, these features are not conserved in other bacteria. Indeed, gram-positive pathogens and certain phloem- and xylem-colonizers, such as Candidatus liberibacter and Xylella spp ., do not encode the T3SS. In these bacteria, effector delivery is dependent on the presence of the N-terminal signal peptide, which is required for protein secretion [ 10 ]. In fungi, often effectors are small in size and present cysteine-rich sequences [ 11 ]. Another well-characterized example is the effectors of the oomycetes pathogens. Oomycetes are eukaryotic filamentous and heterotrophic microorganisms among which, more than 60% of them parasitize plants [ 12 ]. Well-known plant pathogens in oomycetes include late blight of potato, sudden oak death, root rot agents ( Phytophthora species), and downy mildew Peronospora and Bremia species [ 13 ], [ 14 ]. These pathogens code for two notable classes of effector proteins RxLR and Crinkler (CRN), that can be predicted by the occurrence of the related motifs, RxLR, -dEER and LxLFLAK-HVLVxxP in the N-terminal region downstream the signal peptide [ 15 ]–[ 17 ]. Although for some plant pathogens such as oomycetes, effectors have been studied extensively and characteristics motifs have been identified [ 18 ], [ 19 ], research on Plant-Parasitic Nematode effectors (PPN) did not identify any consensus motif, conserved across multiple species. The most economically important PPNs are the sedentary Root-Knot Nematodes (RKNs) and cyst nematodes [ 20 ]. These sedentary parasites induce the formation of a feeding structure that serves as a constant food source for the nematode. Other PPNs are migratory and a whole spectrum of variations exists between endo and ecto parasites, with semi-endoparasites an intermediate between the two extremes [ 21 ]. The different lifestyles of PPNs are expected to be reflected in their secretions, which presumably contain effectors with different functions according to the nematode's specific needs, thus presenting a high variety of characteristic motifs complicating their identification. A first step toward the identification of motif characteristics of RKN effectors was performed by Vens et al. [ 22 ]. The authors developed a bioinformatic tool, called MERCI, to identify motifs with high occurrences in a positive dataset (known effector sequences) and absent in the negative one (non-effector sequences). MERCI uses a graph-based approach incorporating physicochemical features of the amino acids composing protein sequences. By analyzing the known effector sequences of the RKN species Meloidogyne incognita , one of the most important known crop pathogens among all [ 23 ], they identified 4 motifs. However, at the time of their publication, very few genomes for RKN species were available, and the study was therefore conducted on one single RKN species. Furthermore, the genome used at that time was later shown to be partially incomplete [ 24 ]. These limitations prevent the generalization of the previous findings. Da Rocha et al. identified a cis -regulatory promoter motif (Mel-DOG box) characteristic of dorsal gland effectors [ 25 ]. Recently, Rocha et al. used this motif combined with other criteria to select new putative effectors and validated 14 new dorsal gland-specific candidate effectors expressed in adult females [ 26 ]. Although all these studies have contributed to enlarging the list of known effectors, a global characterization of their properties is still missing. Therefore, there is an urgent need for a novel study of the properties of PPN effector sequences and motif research. By taking advantage of the multitude of proteomes available nowadays for several PPN, we developed a comprehensive motif mining analysis to identify characteristic motifs of candidate parasitism protein sequences of these species. Sequence motifs are usually of constant short size and are often repeated and conserved. Typically, motifs conform to a particular sequence pattern, where certain positions can be constrained to a specific amino acid, whereas others are not [ 27 ]. This confers a high degeneration of the motifs yielding a huge list of non-redundant motif sequences and consequently, some motifs that are not characteristics of effector sequences only [ 28 ]. Furthermore, different amino acids (AAs) can have similar physicochemical properties, thus different motif sequences can share similar properties. However, most available motif discovery tools do not consider these properties. To circumvent these limitations, we have developed MOnSTER a novel tool that identifies clu sters of m otifs of p rotein s equences (CLUMP) and associates a score to each CLUMP. This score encompasses the physicochemical properties of AAs and the motif occurrences. Overall, one of the key advantages of MOnSTER is that it reduces the redundancy of motifs found by de novo tools. Furthermore, already known motifs available in publicly available databases such as Pfam [ 29 ] and/or InterProScan [ 30 ] can also be used as input of MOnSTER to identify discriminant CLUMPs. We built up our method to identify discriminant CLUMPs in 1743 candidate parasitism proteins of plant-pathogenic oomycetes. We showed the reliability of MOnSTER by identifying 5 CLUMPs that correspond to the known motifs: RxLR, -dEER and LxLFLAK-HVLVxxP. After this proof of concept, we applied MOnSTER on PPN effector proteins and identified peculiar motifs in their sequences at an unprecedented level. We selected a set of 4395 protein sequences from 13 PPN species belonging to the genera Meloidogyne, Globodera, Heterodera, Radopholus and Bursaphelenchus . We identified 6 CLUMPs present in 60% of the known effectors (positive dataset). Of note these CLUMPs were found in only 5% of the sequences of the negative datasets, thus highlighting the enrichment of the identified motifs in effector sequences. Furthermore, we found a specific co-occurrence of at least two CLUMPs in PPN candidate parasitism protein sequences bearing protein domains important for invasion and pathogenicity. The potentiality of this tool goes behind the candidate parasitism proteins and can be used to easily cluster motifs and calculate the CLUMPs score on any set of protein sequences. Furthermore, we also provide a new scoring system capable of measuring the physicochemical properties of motifs grouped in CLUMPs and a motif alignment algorithm to better explore chemical-physical properties within the CLUMPs. MOnSTER is freely available at https://github.com/Plant-Net/MOnSTER_PROMOCA.git Materials and methods Datasets Oomycetes We used proteins from five oomycetes species to create the input datasets for MOnSTER, namely Phytophthora infestans , Phytophthora sojae , Phytophthora ramorum , Hyaloperonospora arabidopsidis and Bremia lactucae . Positive dataset The positive dataset consists of 1743 effector proteins belonging to the aforementioned oomycetes obtained from a concatenation of proteins selected from PHI-base database (v4.14) [ 31 ], Uniprot (release 2023_02)[ 32 ], and the work of Haas et al., (2009) [ 33 ], in which they have manually curated the annotations of the proteins. Since the proteins come from different sources, we used CD-HIT (v4.8.1) [ 34 ] with the parameters in Supplementary information , to filter out identical protein sequences. A total of 1283 proteins are annotated as RxLR effectors, 377 as Crinkler effectors and the last 83 sequences are proteins with no previously identified motif and known to be involved in the host-pathogen interaction. Negative dataset Proteins in the negative dataset derive all from Uniprot (release 2023_02) and from the oomycetes species cited before filtered from proteins included in the positive dataset and for evident effector-related annotations. Due to the large amount of non-effector proteins remaining from the filtering we firstly used ‘cd-hit’ to reduce protein sequence redundancy and then, to also reduce the unbalance of the final dataset we refined the selection taking only the representative sequences of the orthogroups found with Orthofinder (v2.5.4) [ 35 ]. In total 3009 non effector proteins are included in the negative dataset. Motif Discovery The last input file consists in a list of motifs identified as enriched in the sequences of the positive dataset compared to the sequences of the negative one. We used MERCI and STREME (v5.5.1) [ 36 ], with parameters detailed in Supplementary information . We imposed different lengths for motifs prediction to be inclusive but more stringent on the motifs in which we are interested. STREME’s output is a list of motifs. Hence, we used the tool FIMO (v5.5.1) [ 37 ], with default parameters to extract 246 degenerated motifs from the 4524 different motifs. We obtained the following numbers of non-redundant motifs: 19 with MERCI and 246 with STREME. Then, we removed the identical motifs and created a single non-redundant list containing all the motifs in the same format, which resulted in 265 different motifs. Plant Parasitic Nematodes (PPNs) Positive dataset The positive dataset contains candidate parasitism proteins selected to be likely secreted by PPNs in their plant host and belonging to 13 species ( Meloidogyne incognita, Meloidogyne javanica, Meloidogyne arenaria, Meloidogyne hapla, Meloidogyne chitwoodi, Meloidogyne graminicola, Globodera rostochiensis, Globodera pallida, Heterodera havenae, Heterodera glycines, Heterodera schachtii, Radopholus similis, Bursaphelenchus xylophilus) . We collected candidate parasitism protein from literature mining. More precisely we considered as candidate parasitism protein those proteins for which in-situ hybridization experiments showed that the corresponding transcript is present in nematode secretory glands (dorsal or sub-ventral), implying that these proteins are likely secreted by the nematodes into the host plant. The literature mining led to the extraction of 163 proteins from NCBI GeneBank thanks to the NCBI ‘entrez’ API. We also manually extracted 41 sequences from the publications’ core text and Supplementary information. In addition, we downloaded 41 sequences from WormBase ParaSite ( www.parasite.wormbase.org , vWBPS17-WS282 [ 38 ], [ 39 ]), and eight sequences from nematode.net [ 40 ]. In total we obtained 229 candidate parasitism protein. We extended the positive dataset with proteins that are non-redundant homologs of the previous candidate parasitism proteins in PPN proteomes. We first used cd-hit-2D with parameters in Supplementary information , to cluster sequences from PPNs proteomes and candidate parasitism proteins [ 41 ]. We then pooled all the candidate parasitism proteins from closely related Meloidogyne species (e.g., M. incognita , M. javanica and M. arenaria ) and scanned each corresponding proteome with this multi-species set of sequences using cd-hit. Since the remaining species are genetically distinct, we then scanned each proteome with the relative set of candidate parasitism proteins, except for H. havenae and M. chitwoodi for which no proteomes were currently available. We merged the two sets of selected candidate parasitism proteins and we performed CD-HIT intra- and inter-species to reduce dataset redundancy (parameters in Supplementary information ), retaining only sequences having more than 1% divergence and aligning on more than 80% of their length (the longest sequence from each cluster was kept). The final positive dataset includes 546 candidate parasitism proteins from 13 species. Negative dataset The negative dataset is composed of 3849 protein sequences that we obtained by selecting genes widely conserved across the nematode tree of life and close outgroup species, including many species that are non-parasites. Specifically, we filtered the results from a previous analysis [ 42 ] and only retained genes from orthogroups i) conserved in more than 90% (62/64) of the analyzed species including two tardigrade species (outgroups), and ii) presenting less than 10 genes/species/orthogroups to avoid multigenic families, which would lead to overrepresentation of some proteins. To remove the redundancy, we used the same strategy as for the positive dataset (cdhit2D first and then CD-HIT). Motif Discovery Using the aforementioned software in the same configuration we obtained the following numbers of non-redundant motifs: 40 with MERCI and 229 with STREME applying FIMO. In total, we obtained 269 different motifs. All datasets are available at https://github.com/Plant-Net/MOnSTER_PROMOCA.git and in Supplementary tables 1.1–1.2 and 2.1–2.2 . MOnSTER pipeline The MOnSTER (MOtifs of cluSTERs) pipeline is composed of three main steps as described in Fig. 1 and in the following paragraphs. Feature calculation The first step of the pipeline concerns the calculation of parameters that describe protein sequences (Fig. 1 A). To allow an easy calculation of the features on any dataset, we calculated the sequence length and used ProteinAnalysis class from the Bio.SeqUtils.ProtParam , a python sub-package to select 13 additional features based on individual AA properties, belonging to 4 categories: secondary structure propensity ‘helix’ (V, I, Y, F, W, L), ‘turn’ (N, P, G, S), and ‘sheet’ (E, M, A, L)). amino-acids dimensions (‘tiny’ (A, C, G, S, T) and ‘small’ (A, C, F, G, I, L, M, P, V, W, Y)). pH (‘basic’ (H, K, R), ‘acid’ (B, D, E), and ‘charged’ (H, K, R, B, D, E)). physicochemical properties (‘hydropathy-score’, ‘polar’ (D, E, H, K, N, Q, R, S, T), ‘non-polar’ (A, C, F, G, I, L, M, P, V, W, Y), ‘aromatic’ (F, H, W, Y), and ‘aliphatic’ (A, I, L, V)). We performed feature calculations on the positive and negative datasets and the list of motifs. At the end of this step, we obtained three tables of features, one for each of the input datasets (positive, negative datasets and the list of motifs). Clustering This step allowed to cluster motifs based on their properties described by the 13 features. To make the features comparable to each other, we performed data normalization by using the StandardScaler method from sklearn.preprocessing [ 43 ]. This normalization consists of the removal of the mean and the scaling to unit variance. Then, we performed a hierarchical clustering of the motifs using the Euclidian distance. We then divided the resulting tree into clusters of motifs of proteins (CLUMPs) selecting the threshold distance that minimized the Davies-Bouldin score [ 44 ]. For each CLUMP, we removed the redundant motifs. Briefly, we identified motifs that shared a core sequence (for example: ‘HWT in HWTQ’ and ‘GHWTQ’), and we only retained the cores (for instance: “HWT”) in the CLUMPs. Scoring The final objective is to identify the CLUMP(s) with the highest discriminative power concerning the positive dataset. Thus, we conceived a new score called the MOnSTER score, to rank the CLUMPs by their discriminative power. The MOnSTER score is composed of three parts: the CLUMP score and two modified versions of the Jaccard index. CLUMP score This score considers the AA composition of the motifs belonging to each CLUMP concerning the preferences of the sequences of the positive dataset. The procedure that we implemented to calculate this score is shown in Fig. 1 B. a) Feature selection We used the Mann-Whitney test to identify the features whose values were significantly different between the positive and negative datasets. We only retained the statistically significant features, with a p-value < 0.05. Then, we assigned them a score, by calculating -Log(p-value) of each feature. We will refer to it as the ‘feature weight’ hereafter. b) Average calculation For each of the selected features (ranging from one to f ), we calculated the average value for the positive dataset, the negative dataset, and each CLUMP (ranging from zero to c ). We will refer to these values with the notation: \({\mu }_{f}^{+}\) , \({\mu }_{f}^{-}\) and \({\mu }_{f}^{{CLUMP}_{c}}\) , respectively. c) CLUMPs sorting We compared the averages of the positive and negative datasets for each feature and sorted CLUMPs accordingly. Thus, if the \({\mu }_{f}^{+} \ge {\mu }_{f}^{-}\) , the CLUMPs averages would be sorted in ascending order. Otherwise ( \({\mu }_{f}^{+} < {\mu }_{f}^{-}\) ), CLUMPs averages would be sorted in descending order. d) CLUMPs voting For each feature, and each CLUMP, we divided the CLUMP into two groups accordingly to the following statements: If \({\mu }_{f}^{+} \ge {\mu }_{f}^{-}\) : CLUMPs with \({\mu }_{f}^{{CLUMP}_{c}}\ge {\mu }_{f}^{+}\) have a vote from 1 to the number of CLUMPs with an increment of 1, otherwise the score is set to 0. If \({\mu }_{f}^{+} < {\mu }_{f}^{-}\) : CLUMPs with \({\mu }_{f}^{{CLUMP}_{c}}<{\mu }_{f}^{+}\) the vote attributed goes from 1 to the number of CLUMPs, otherwise it is 0. e) CLUMPs scoring For each CLUMP (ranging from zero to c ), for each feature (ranging from one to f ), we multiplied the feature-vote by the ‘feature weight’ ( W f ) and summed-up to obtain a CLUMP-vote. Then we scaled each CLUMP-vote to a range from 0 to 1 using the following formula: $${CLUMPscore}_{c }= \frac{{V}_{c}- \text{m}\text{i}\text{n}\left(V\right)}{\left(\text{max}\left(V\right)-\text{m}\text{i}\text{n}(V\right))}$$ where: V is the list of CLUMPs votes and V c is calculated as: $${V}_{c}=\sum _{features[1, f]}\left({vote}_{f} \subset {CLUMP}_{c}\right) {W}_{f}$$ Occurrences indexes The two indexes respectively consider: i) the occurrences of the motifs, for each CLUMP, in the positive dataset compared to the negative, and ii) the number of positive sequences containing the motifs in each CLUMP concerning the negatives (Fig. 1 C). a) CLUMPs occurrences We calculated the occurrences of the motifs in each CLUMPs in the two datasets (positive and negative). b) I’s scores We propose two ways to calculate the dissimilarity between two sets that will be called I 1 and I 2 hereafter. To obtain I 1 , we calculated the number of occurrences of the motifs for each CLUMP (ranging from zero to c ) in the negative dataset over the number of occurrences of the motifs of the same CLUMP in the positive dataset, using the following equation: $${I}_{1 \forall CLUMP[0, c]}= \frac{1}{2} \left(1- \frac{\sum {\varDelta }_{-} \subset {CLUMP}_{c}}{\sum {\varDelta }_{+} \subset {CLUMP}_{c}}\right)$$ Where: \({\varDelta }_{-}\) and \({\varDelta }_{+}\) the number of occurrences of the motifs of the CLUMP in the negative or in the positive dataset, respectively. To obtain I 2 , for each CLUMP (ranging from zero to c ), we calculated the number of sequences of the negative dataset that contain at least a motif of the CLUMP, over the number of sequences of the positive dataset that contain at least a motif of the same CLUMP, accordingly to the following formula: $${I}_{2 \forall CLUMP[0, c]}= \frac{1}{2} \left(1- \frac{\sum {seq}_{-}\subset {CLUMP}_{c}}{\sum {seq}_{+} \subset { CLUMP}_{c}}\right)$$ Where: \({seq}_{-}\) is the number of sequences of the negative dataset containing at least a motif of the CLUMP. \({seq}_{+}\) is the number of sequences of the positive dataset containing at least a motif of the CLUMP. The ½ factor is applied to have values between 0 and 0.5 for each Index to have equal weight in the final score, and (1 – Index) is to consider the degree of dissimilarity rather than similarity. MOnSTER score The MOnSTER score, for each CLUMP (from zero to c ), is the sum of the corresponding CLUMP score, and the two I indexes: $${MOnSTERscore}_{c}={CLUMPscore}_{c}+{I}_{1c}+ {I}_{2c}$$ Results & Discussion MOnSTER identified five CLUMPs containing known motifs characteristics of oomycetes effector protein sequences Characteristic motifs of oomycetes effector proteins are well-known in the literature, such as RxLR, -dEER and LxLFLAK-HVLVxxP [ 15 ]. Thus, we reasoned to apply our novel tool, MOnSTER, on oomycetes effectors to test its ability to recover well-characterized motifs. We compiled a set of 4752 oomycetes proteins, comprising 1743 effectors and 3009 non effectors, from five oomycetes species. We performed motif discovery on this set of proteins using MERCI and STREME and we identified 265 significantly enriched motifs (see methods for further details). Then we fed MOnSTER with these motifs and we obtained 11 CLUMPs ( Supplementary table 3 ), employing the Davis-Bouldin score, as a criterion to cut the tree. By selecting CLUMPs having a MOnSTER score greater than the median of the overall scores we identified six CLUMPs (CLUMP7, 4, 10, 6, 2 and 9), the first five best-scoring CLUMPs, accordingly to the MOnSTER score, correspond to the known motifs (Fig. 2 ). In Supplementary Fig. 2 we can also observe that the motifs are respectively grouped in two clades, the two characteristics motifs of CRN-effectors (LxLFLAK and HVLVxxP), form a separate subclade on the right, while the RxLR and -dEER motifs fall into the left clade, resembling the family distinction of effectors to which they belong. More precisely RxLR motifs are divided into two different CLUMPs; CLUMP6 containing only RYLR and RFLR motifs, and CLUMP10, containing other RxLR motifs and included in the same sub-clade of the dEER motif (CLUMP2). The last best-scoring CLUMP contains no known motifs, perhaps suggesting a novel putative motif for oomycetes effectors to investigate. Since oomycetes effectors characterization is not in the scope of this article, we did not consider this last CLUMP for further analysis. In support of that, CLUMPs 7, 4, 10, 6 and 2 are present in 1205/1743 effectors (~ 70% of the sequences in the positive dataset) while in combination with the last significant CLUMP (CLUMP9) only two more sequences can be detected. Thus, we investigated the occurrences and co-occurrences of the five selected CLUMPs in oomycetes effectors and non-effectors ( Supplementary Fig. 3 ). For the effectors we deeply analyzed the two distinct families; in total we found that 68% of the RxLR-effectors in the positive dataset contain the motifs in CLUMPs associated with the RxLR motif (CLUMP10, 6 and 2). In particular, CLUMP10 and 6 are present alone in 41% of the RxLR-effectors (1238/1743 RxLR-effectors), while 19% of the RxLR-effectors contained the co-occurrence of these CLUMPs with the CLUMPs representing the dEER motif (CLUMP2). This reflects the importance of the RxLR motifs in the effector sequences and the role of the attached dEER [ 51 ]. On the other hand, the co-occurrence of CLUMPs specific for LxLFLAK and HVLVxxP (CLUMP7 and 4), in CRN-effector sequences accounts for 67% of the relative sequences in the positive dataset (377/1743). The high co-occurrences rate of CLUMP7 and 4 is strongly in agreement with the presence of LxLFLAK and HVLVxxP motif marking the beginning and the end of the DWL-domain in the Crinkler-effector family [ 33 ]. For the negative dataset, instead, only 15% of the sequences show the presence of CLUMP-motifs with a huge decrease in CLUMPs co-occurrences. Overall co-occurrences, indeed, are present in around 30% of positive sequences and in 1% of negative ones. Previous research showed that the motifs characteristics of oomycetes effectors have strong sequence position preferences [ 52 ]–[ 54 ]. Thus, we plotted the CLUMPs occurrences in the positive versus negative dataset ( Supplementary Fig. 4 ). Indeed, we can observe that the CLUMPs are concentrated at the beginning of the sequence in positive sequences and conversely spread around the sequence of negative dataset proteins. More precisely the five most interesting CLUMPs are condensed in the first 40% of the sequence with a higher preference at the very beginning and around 30% of the sequence probably corresponding to the N-terminal of the protein in which the target motifs lie. Altogether these results highlight the ability of MOnSTER to identify CLUMPs containing biologically relevant motifs. MOnSTER allowed to identify six CLUMPs characteristics of nematode candidate parasitism proteins The application of MOnSTER of the oomycetes effectors served as a proof of concept of our methodology. Thus, we moved to the characterization of nematode candidate parasitism sequences for which no characteristic motifs have been identified yet. We collected a set of 4395 proteins, including 546 well-known candidate parasitism proteins and 3849 proteins in the negative dataset, coming from 13 nematode species. By running motif discovery analysis as for the previous dataset, we found 269 motifs enriched in the candidate parasitism protein sequences. By applying MOnSTER with the previous configuration, the 269 input motifs were grouped into 11 CLUMPs. Six best-scoring CLUMPs were selected using the median as the significant threshold ( Supplementary table 4 ). Similar to the oomycetes results, we observe two main clades (Fig. 3 ): the second and the third best scoring ones (CLUMP2 and 5 respectively) form a single clade while the other significant CLUMPs (CLUMP1, 3, 7 and 10) are distributed in the bigger clade with the non-significant ones. Overall, we found at least one occurrence of one of the six CLUMPs in almost 60% of sequences from the positive dataset compared to 5% of sequences from the negative. Then we investigated the presence of the six CLUMPs in each of the 13 PPN species present in the dataset. Figure 4 shows the abundance of the six best-scoring CLUMPs in the species according to their phylogeny tree. The first three species are the most represented in the positive dataset. Interestingly very distant species show similar CLUMPs frequencies thus suggesting that they might share common characteristics at the sequence level for accomplishing similar functions. Furthermore, we could identify characteristic CLUMPs also for species represented in the dataset with very few sequences reinforcing the previous observation. Overall, this analysis suggests that CLUMPs might be associated with the functional properties of PPN nematodes. Finally, we focused on the positional sequence preferences of CLUMPs in candidate parasitism protein sequences ( Supplementary Fig. 5 ). In general, we observe a difference in the position preferences of the best-scoring CLUMPs between positive and negative dataset sequences. The six CLUMPs tend to occur more frequently in the middle of the sequences in candidate parasitism proteins (positive dataset), with more abundance in central (around 50% of the sequence) and terminal (around 70%), positions. The same CLUMPs are rare in the central position of the negative dataset protein sequences (negative dataset). Contrary to the properties of oomycetes effectors, whose characteristics CLUMPs occur mainly at the beginning of the sequence, PPN candidate parasitism proteins showed a different pattern of occurrences, privileging a central – C terminal occurrence. Co-occurrences of different CLUMPs are associated with functional protein domains. We investigated the co-occurrence patterns of CLUMPs in the PPNs candidate parasitism protein sequences (all possible combinations of co-occurrences are reported in Supplementary Fig. 6) . Overall, we notice that CLUMPs tend to co-occur more frequently in the sequences of the positive dataset than in the negative one, despite the positive set being smaller than the negative one. 30% of candidate parasitism protein sequences show co-occurrences of the six selected CLUMPs, while in the sequences from the negative dataset, co-occurrences, are present in less than 1% of the sequences. As observed for oomycetes, some CLUMPs tend to be present alone, while others tend to co-occur with specific CLUMPs. This suggests that different classes of nematode candidate parasitism proteins might exist, similar to the oomycetes effectors. Interestingly, among the 311 candidate parasitism proteins bearing at least one occurrence of one of the six selected CLUMPs, 72 do not have a predicted signal peptide, consisting of 55% of the proteins in the positive dataset not having the signal peptide. Of note, this is a similar percentage to the percentage of proteins bearing both the CLUMPs and the signal peptide, suggesting that CLUMPs characterize sequence properties beyond the type of secretion. Furthermore, similar patterns of co-occurrences of CLUMPs in candidate parasitism proteins bearing or not the signal peptide are observed with slightly higher co-occurrence presences in the sequences not having the signal peptide ( Supplementary Fig. 7 ). Importantly, there is no relationship between the sequence length and the number of co-occurrences possibly suggesting a functional role for CLUMPs co-occurrences ( Supplementary Fig. 8 ). To inspect further a putative functional role of CLUMPs in candidate parasitism protein sequences, we queried the sequences having at least one CLUMP or a co-occurrence of multiple CLUMPs against several protein domain databases (see supplementary information, results in Fig. 5 and Supplementary table 5 ). Among the 311 candidate parasitism protein sequences bearing at least one occurrence of at least one of the six CLUMPs, 84 also have at least an occurrence of a known protein domain. The most recurrent hits are the coil domain, intrinsically disordered domain and the presence of the signal peptide (SP) followed by the pectate lyase domain, glycosyl hydrolase family 5, Stichodactyla toxin (ShK) domain, 14-3-3 family and cysteine-rich domain. Importantly, none of these domains was also found in the sequences from the negative dataset bearing at least one occurrence of at least one of the six CLUMPs. Interestingly, we observe the almost exclusive association between CLUMPs and functional domains, mainly when multiple CLUMPs co-occur in candidate parasitism protein sequences. The strongest association that we observe is between the co-occurrences of CLUMPs 7 and 10 and the glycosyl hydrolase family 5 domain on one hand and the co-occurrences of CLUMPs 3, 7, 10 and the cysteine-rich domain, on the other hand. Specifically, all 23 candidate parasitism protein sequences containing the co-occurrences of CLUMP 7 and 10 bear also the glycosyl hydrolase family 5 domain. By inspecting the position of CLUMPs occurrences within the sequences, we observed that the two CLUMPs are flanking the domain: CLUMP7 is consistently present at the beginning of the sequence and consequently of the domain, while CLUMP10 mostly concentrates at the end of the domain, around 60–80% of the sequences ( Supplementary Fig. 9 ). Examples of these genes in nematodes is poorly characterized and likely resulting from horizontal transfer [ 55 ], [ 56 ]. Similarly, all 17 sequences presenting the co-occurrence of CLUMPs 3, 7,10 also contain the cysteine-rich domain. Cysteine-rich domain and CAP protein are known to be involved in the virulence of nematodes [ 57 ]. They are expressed in both plants and pathogens; in the latter, they are important for their virulence by suppressing the host’s immune responses and promoting colonization. Interestingly, these sequences do not contain disordered regions or coil domains, consistently with unique conserved sandwich fold with a large central cavity of these kinds of proteins [ 58 ]. 16 out of 19 sequences presenting co-occurrences of CLUMPs 2, 3 have also the 14-3-3 family domain, a eukaryotic-specific protein family with a general role in the signal transduction [ 59 ]. We also observe only one motif from CLUMP 2 in these sequences (KDKM) and 4 from CLUMP 3 (NKDKAC, KMKG, PTHPIR, PTHP). 13 out of 34 sequences bearing only CLUMP 1 also contain the pectate lyase domain. Of note, these sequences do not contain coiled or disordinate regions, and only seven show the presence of the SP. Pectate lyase enzymes in nematodes facilitate penetration in plant-cell walls made of pectin [ 60 ]. Numerous recent reports showed that these enzymes are produced in specialized nematode gland cells and secreted during the parasitism process. In the case of sedentary endo-parasitic nematodes, this occurs mainly during juvenile migration through the root tissue, when these enzymes play a crucial role in the maceration of the plant tissue facilitating the infection [ 61 ]. Finally, eight out of 22 sequences bear the co-occurrences of CLUMPs 2, 5 and the ShK domain. Although the exact biological function of the ShK domain remains unclear, previous reports have shown that this domain might be associated with immunosuppression [ 62 ], [ 63 ]. Overall, these findings highlight that specific CLUMPs co-occurrences are associated with specific functional domains with roles in invasion and/or infection and might suggest different classes of candidate parasitism proteins cross-species. CLUMPs screening yielded the identification of a novel effector in M. incognita validated by in situ hybridization. To inspect whether the novel-identified CLUMPs could also help to find new effectors, we focused on the selection of a novel putative effector to validate experimentally. Thus, we selected all proteins of Meloidogyne incognita proteome bearing the signal peptide for secreted proteins and no transmembrane domain. Then we screened these sequences and retrieved the ones containing at least one motif of the six significant CLUMPs. Among them, 23% contain at least one occurrence of motifs in CLUMP5 (Supplementary Table 6). Since this is the most abundant CLUMP in this species, we decided to focus on this one to identify a putative candidate to validate experimentally. By literature mining, we refined our list, by sorting out seven sequences that were already experimentally validated by previous studies (Supplementary Table 6). Then we filtered out any candidates having homologs in species other than root-knot nematodes and more than two gene copies to avoid dealing with multigene families according to [ 42 ]. Finally, among these eight new putative effector sequences, we studied the pattern of expression of one candidate: MiEFF72 ( Minc3s00056g02931 ) by performing in situ hybridisation (ISH, see supplementary information). A specific signal was detected in the subventral oesophageal gland cells of pre-parasitic J2s after hybridisation with digoxigenin-labelled MiEFF72 antisense probes (Fig. 6 A). No signal was detected in pre-J2s with sense negative controls. MiEFF72 fused to the C-terminus of GFP was transiently expressed in N. benthamiana leaf epidermis. GFP fluorescence was detected in the cytoplasm and in cytoplasmic vesicles (Fig. 6 B). This finding suggests that MiEFF72 be secreted and play a role in planta in nematode parasitism. Conclusions This work is structured around three main aims: (1) the development of a novel method to cluster and score discriminant motifs of protein sequences called MOnSTER, (2) the validation of the MOnSTER results by applying it to identify CLUMPs specific to candidate parasitism protein sequences of oomycetes (3) the application of MOnSTER to protein sequences from plant-parasitic nematodes with unprecedented discriminant motifs detection. The application of MOnSTER on oomycetes yielded the identification of five CLUMPs corresponding to the well-known effector-related motifs like RxLR-dEER and LxLFLAK-HVLVxxP motifs in oomycetes. This demonstrated that the novel scoring method introduced by MOnSTER is a good parameter with which calculate CLUMP specificity for effector protein sequences. When applied to the nematodes candidate parasitism protein, MOnSTER found six novel CLUMPs, not previously characterized. The main advantage of MOnSTER is that the definition of CLUMPs allowed us to reduce the degeneration of 265 and 269 motifs (oomycetes and nematodes respectively), to 11 CLUMPs. Candidate parasitism protein sequences of both pathogens show some common characteristics. Indeed, selected CLUMPs-motifs are present in about 70% of the input proteins for oomycetes and 60% in PPN compared to 15% and 5% in the negative dataset proteins, respectively. Furthermore, around 30% of candidate parasitism protein sequences have co-occurring CLUMPs, in contrast with less than 1% of the negative dataset sequences, in both applications. The main difference between candidate parasitism protein specific motifs of the two pathogens is the positional preference: the beginning of the sequence for oomycetes and central C-terminal for PPNs. This highlights MOnSTER ability to cluster motifs specifically relevant for candidate parasitism protein sequences without privileging any portion of the sequence, like other motif discovery tools. Concerning the novel identified motifs for PPNs candidate parasitism proteins, we observed that the pattern of occurrences and co-occurrences of CLUMPs in candidate parasitism protein sequences is associated with specific functional domains and might suggest the existence of different classes of candidate parasitism proteins. Importantly we did not observe any species-related preferences thus implying the generality of these results. In conclusion, MOnSTER quantifies the motifs and sequence properties in each dataset provided, thus allowing a wide application to other protein classes. Since the MOnSTER score considers the physicochemical properties and occurrences of motifs in CLUMPs concerning the protein sequences provided, it works without the need for a reference dataset. Furthermore, the MOnSTER scores are normalized values, therefore, allowing direct comparison between different studies. Our results highlighted that MOnSTER is a powerful new method to cluster and score discriminant motifs in protein sequences according to their physicochemical properties and pattern of occurrences. It is also a tool that can be easily used on any set of protein sequences and a list of motifs. Therefore, by constructing a dataset of positive candidate parasitism protein sequences and a negative dataset, MOnSTER can be also used to identify CLUMPs characteristics of fungal or bacterial candidate parasitism proteins. As such, MOnSTER can be included in any pipeline needing motif calling and will be of great use to accelerate both computational and experimental studies relating to protein motif discovery. Declarations Data availability The source code and related data are available at: https://github.com/Plant-Net/MOnSTER_PROMOCA.git Funding This work was supported by the French government, through the UCA JEDI Investments in the Future project managed by the National Research Agency (ANR) under reference number ANR-15-IDEX-01. Competing Interests The authors declare that they have no competing interests. Financial Disclosure statement The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Acknowledgements Microscopy work was performed at the SPIBOC imaging facility of Institut Sophia Agrobiotech, and we thank Dr Olivier Pierre for is availability. CRediT author contribution GC: Methodology, Software, Validation, Formal analysis, Writing – original draft, Visualization. PP: Methodology, Software, Writing – original draft. JC: Investigation, Resources. DK: Software, Resources. HS: Writing – review & editing, Supervision. AC: Writing – review & editing, Supervision. MQ: Investigation, Resources, Writing – review & editing, Supervision. BF: Investigation, Resources, Writing – review & editing, Supervision. EGJD: Conceptualization, Methodology, Writing – review & editing, Supervision. SB: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Supervision, Project administration. References Toruño, T.Y., Stergiopoulos, I., Coaker, G.: Plant-Pathogen Effectors: Cellular Probes Interfering with Plant Defenses in Spatial and Temporal Manners. Annu. Rev. Phytopathol. 54 , 419–441 (Aug. 2016). 10.1146/annurev-phyto-080615-100204 Haegeman, A., Mantelin, S., Jones, J.T., Gheysen, G.: Functional roles of effectors of plant-parasitic nematodes. Gene. 492 (1), 19–31 (Jan. 2012). 10.1016/j.gene.2011.10.040 Selin, C., de Kievit, T.R., Belmonte, M.F., Fernando, W.G.D.: Elucidating the Role of Effectors in Plant-Fungal Interactions: Progress and Challenges, Front. Microbiol. , vol. 7, [Online]. Available: https://www.frontiersin.org/articles/ (2016). 10.3389/fmicb.2016.00600 Bird, D.M., Jones, J.T., Opperman, C.H., Kikuchi, T., Danchin, E.G.J.: Signatures of adaptation to plant parasitism in nematode genomes, Parasitology , vol. 142 Suppl 1, no. Suppl 1, pp. S71-84, Feb. (2015). 10.1017/S0031182013002163 Sperschneider, J., Williams, A.H., Hane, J.K., Singh, K.B., Taylor, J.M.: Evaluation of Secretion Prediction Highlights Differing Approaches Needed for Oomycete and Fungal Effectors. Front. Plant. Sci. 6 , 1168 (2015). 10.3389/fpls.2015.01168 Sonah, H., Deshmukh, R.K., Bélanger, R.R.: Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges. Front. Plant. Sci. 7 , 126 (2016). 10.3389/fpls.2016.00126 Liu, L., et al.: Arms race: diverse effector proteins with conserved motifs. Plant. Signal. Behav. 14 (2), 1557008 (Jan. 2019). 10.1080/15592324.2018.1557008 Dean, P.: Functional domains and motifs of bacterial type III effector proteins and their roles in infection, FEMS Microbiol. Rev. , vol. 35, no. 6, pp. 1100–1125, Nov. (2011). 10.1111/j.1574-6976.2011.00271.x Green, E.R., Mecsas, J.: Bacterial Secretion Systems: An Overview. Microbiol. Spectr. 4 (1) (Feb. 2016). 10.1128/microbiolspec.VMBF-0012-2015 Natale, P., Brüser, T., Driessen, A.J.M.: Sec- and Tat-mediated protein secretion across the bacterial cytoplasmic membrane—Distinct translocases and mechanisms, Biochim. Biophys. Acta BBA - Biomembr. , vol. 1778, no. 9, pp. 1735–1756, Sep. (2008). 10.1016/j.bbamem.2007.07.015 Sperschneider, J., Dodds, P.N., Gardiner, D.M., Manners, J.M., Singh, K.B., Taylor, J.M.: Advances and Challenges in Computational Prediction of Effectors from Plant Pathogenic Fungi. PLOS Pathog. 11 (5), e1004806 (May 2015). 10.1371/journal.ppat.1004806 Beakes, G.W., Glockling, S.L., Sekimoto, S.: The evolutionary phylogeny of the oomycete ‘fungi’. Protoplasma. 249 (1), 3–19 (Jan. 2012). 10.1007/s00709-011-0269-2 Thines, M., Kamoun, S.: Oomycete-plant coevolution: recent advances and future prospects, Curr. Opin. Plant Biol. , vol. 13, no. 4, pp. 427–433, Aug. (2010). 10.1016/j.pbi.2010.04.001 Wood, K.J., et al.: Oct., Effector prediction and characterization in the oomycete pathogen Bremia lactucae reveal host-recognized WY domain proteins that lack the canonical RXLR motif, PLOS Pathog. , vol. 16, no. 10, p. e1009012, (2020). 10.1371/journal.ppat.1009012 Franceschetti, M., Maqbool, A., Jiménez-Dalmaroni, M.J., Pennington, H.G., Kamoun, S., Banfield, M.J.: Effectors of Filamentous Plant Pathogens: Commonalities amid Diversity. Microbiol. Mol. Biol. Rev. 81 (2), e00066–e00016 (Mar. 2017). 10.1128/MMBR.00066-16 Jiang, R.H.Y., Tripathy, S., Govers, F., Tyler, B.M.: RXLR effector reservoir in two Phytophthora species is dominated by a single rapidly evolving superfamily with more than 700 members, Proc. Natl. Acad. Sci. U. S. A. , vol. 105, no. 12, pp. 4874–4879, Mar. (2008). 10.1073/pnas.0709303105 Torto, T.A., et al.: Jul., EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen Phytophthora, Genome Res. , vol. 13, no. 7, pp. 1675–1685, (2003). 10.1101/gr.910003 Pritchard, L., Birch, P.: A systems biology perspective on plant–microbe interactions: Biochemical and structural targets of pathogen effectors. Plant. Sci. 180 (4), 584–603 (Apr. 2011). 10.1016/j.plantsci.2010.12.008 Lovelace, A.H., Dorhmi, S., Hulin, M.T., Li, Y., Mansfield, J.W., Ma, W.: Effector Identification in Plant Pathogens. Phytopathology. 113 (4), 637–650 (Apr. 2023). 10.1094/PHYTO-09-22-0337-KD Jones, J.T., et al.: Dec., Top 10 plant-parasitic nematodes in molecular plant pathology, Mol. Plant Pathol. , vol. 14, no. 9, pp. 946–961, (2013). 10.1111/mpp.12057 Holterman, M., et al.: Disparate gain and loss of parasitic abilities among nematode lineages. PloS One. 12 (9), e0185445 (2017). 10.1371/journal.pone.0185445 Vens, C., Rosso, M.-N., Danchin, E.G.J.: Identifying discriminative classification-based motifs in biological sequences. Bioinformatics. 27 (9), 1231–1238 (May 2011). 10.1093/bioinformatics/btr110 Blanc-Mathieu, R., et al.: Hybridization and polyploidy enable genomic plasticity without sex in the most devastating plant-parasitic nematodes. PLoS Genet. 13 (6), e1006777 (Jun. 2017). 10.1371/journal.pgen.1006777 Abad, P., et al.: Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat. Biotechnol. 26 (2008). 8, Art. 8, Aug 10.1038/nbt.1482 Rocha, M.D., et al.: Genome Expression Dynamics Reveal the Parasitism Regulatory Landscape of the Root-Knot Nematode Meloidogyne incognita and a Promoter Motif Associated with Effector Genes. Genes. 12 (5), 771 (May 2021). 10.3390/genes12050771 Rocha, R.O., Hussey, R.S., Pepi, L.E., Azadi, P., Mitchum, M.G.: Discovery of Novel Effector Protein Candidates Produced in the Dorsal Gland of Adult Female Root-Knot Nematodes, Mol. Plant-Microbe Interactions , vol. 36, no. 6, pp. 372–380, Jun. (2023). 10.1094/MPMI-11-22-0232-R Davey, N.E., Cyert, M.S., Moses, A.M.: Short linear motifs – ex nihilo evolution of protein regulation. Cell. Commun. Signal. 13 (1), 43 (Nov. 2015). 10.1186/s12964-015-0120-z Roberson, E.D.O.: Motif scraper: a cross-platform, open-source tool for identifying degenerate nucleotide motif matches in FASTA files, Bioinforma. Oxf. Engl. , vol. 34, no. 22, pp. 3926–3928, Nov. (2018). 10.1093/bioinformatics/bty437 Mistry, J., et al.: Jan., Pfam: The protein families database in 2021, Nucleic Acids Res. , vol. 49, no. D1, pp. D412–D419, (2021). 10.1093/nar/gkaa913 Jones, P., et al.: InterProScan 5: genome-scale protein function classification. Bioinformatics. 30 (9), 1236–1240 (May 2014). 10.1093/bioinformatics/btu031 Urban, M., Pant, R., Raghunath, A., Irvine, A.G., Pedro, H., Hammond-Kosack, K.E.: The Pathogen-Host Interactions database (PHI-base): additions and future developments. Nucleic Acids Res. 43 , D645–655 (Jan. 2015). no. Database issue10.1093/nar/gku1165 The UniProt Consortium:, UniProt: the Universal Protein Knowledgebase in 2023, Nucleic Acids Res. , vol. 51, no. D1, pp. D523–D531, (2023). 10.1093/nar/gkac1052 Haas, B.J., et al.: Sep., Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans, Nature , vol. 461, no. 7262, Art. no. 7262, (2009). 10.1038/nature08358 Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data, Bioinformatics , vol. 28, no. 23, pp. 3150–3152, Dec. (2012). 10.1093/bioinformatics/bts565 Emms, D.M., Kelly, S.: OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20 (1), 238 (Nov. 2019). 10.1186/s13059-019-1832-y Bailey, T.L.: STREME: accurate and versatile sequence motif discovery, Bioinformatics , vol. 37, no. 18, pp. 2834–2840, Sep. (2021). 10.1093/bioinformatics/btab203 Grant, C.E., Bailey, T.L., Noble, W.S.: FIMO: scanning for occurrences of a given motif, Bioinformatics , vol. 27, no. 7, pp. 1017–1018, Apr. (2011). 10.1093/bioinformatics/btr064 Howe, K.L., et al.: WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 44 (Jan. 2016). D1, pp. D774–D780 10.1093/nar/gkv1217 Howe, K.L., Bolt, B.J., Shafie, M., Kersey, P., Berriman, M.: WormBase ParaSite – a comprehensive resource for helminth genomics. Mol. Biochem. Parasitol. 215 , 2–10 (Jul. 2017). 10.1016/j.molbiopara.2016.11.005 Martin, J., et al.: Helminth.net: expansions to Nematode.net and an introduction to Trematode.net. Nucleic Acids Res. 43 , D698–D706 (Jan. 2015). no. D110.1093/nar/gku1128 Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics , vol. 22, no. 13, pp. 1658–1659, Jul. (2006). 10.1093/bioinformatics/btl158 Grynberg, P., et al.: Comparative Genomics Reveals Novel Target Genes towards Specific Control of Plant-Parasitic Nematodes. Genes. 11 (11), 1347 (Nov. 2020). 10.3390/genes11111347 Pedregosa, F., et al.: Scikit-learn: Machine Learning in Python. Mach. Learn. PYTHON Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI–1 (2), 224–227 (Apr. 1979). 10.1109/TPAMI.1979.4766909 Crooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E.: WebLogo: a sequence logo generator, Genome Res. , vol. 14, no. 6, pp. 1188–1190, Jun. (2004). 10.1101/gr.849004 Nielsen, H.: Predicting Secretory Proteins with SignalP, in Protein Function Prediction: Methods and Protocols , D. Kihara, Ed., in Methods in Molecular Biology. New York, NY: Springer, pp. 59–73. (2017). 10.1007/978-1-4939-7015-5_6 Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, J. Mol. Biol. , vol. 305, no. 3, pp. 567–580, Jan. (2001). 10.1006/jmbi.2000.4315 Caillaud, M.-C., Favery, B.: In Vivo Imaging of Microtubule Organization in Dividing Giant Cell, in Plant Cell Division: Methods and Protocols , M.-C. Caillaud, Ed., in Methods in Molecular Biology. New York, NY: Springer, pp. 137–144. (2016). 10.1007/978-1-4939-3142-2_11 Jaouannet, M., Nguyen, C.-N., Quentin, M., Jaubert-Possamai, S., Rosso, M.-N., Favery, B.: In situ Hybridization (ISH) in Preparasitic and Parasitic Stages of the Plant-parasitic Nematode Meloidogyne spp. Bio-Protoc. 8 (6), e2766 (Mar. 2018). 10.21769/BioProtoc.2766 Mejias, J., et al.: The root-knot nematode effector MiEFF18 interacts with the plant core spliceosomal protein SmD1 required for giant cell formation. New. Phytol. 229 (6), 3408–3423 (2021). 10.1111/nph.17089 Boutemy, L.S., et al.: Oct., Structures of Phytophthora RXLR effector proteins: a conserved but adaptable fold underpins functional diversity, J. Biol. Chem. , vol. 286, no. 41, pp. 35834–35842, (2011). 10.1074/jbc.M111.262303 Schornack, S., et al.: Oct., Ancient class of translocated oomycete effectors targets the host nucleus, Proc. Natl. Acad. Sci. U. S. A. , vol. 107, no. 40, pp. 17421–17426, (2010). 10.1073/pnas.1008491107 Rehmany, A.P., et al.: Jun., Differential recognition of highly divergent downy mildew avirulence gene alleles by RPP1 resistance genes from two Arabidopsis lines, Plant Cell , vol. 17, no. 6, pp. 1839–1850, (2005). 10.1105/tpc.105.031807 Amaro, T.M.M.M., Thilliez, G.J.A., Motion, G.B., Huitema, E.: A Perspective on CRN Proteins in the Genomics Age: Evolution, Classification, Delivery and Function Revisited, Front. Plant Sci. , vol. 8, Accessed: Jun. 01, 2023. [Online]. Available: https://www.frontiersin.org/articles/ (2017). 10.3389/fpls.2017.00099 Danchin, E.G.J., et al.: Oct., Multiple lateral gene transfers and duplications have promoted plant parasitism ability in nematodes, Proc. Natl. Acad. Sci. U. S. A. , vol. 107, no. 41, pp. 17651–17656, (2010). 10.1073/pnas.1008486107 Aspeborg, H., Coutinho, P.M., Wang, Y., Brumer, H., Henrissat, B.: Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5). BMC Evol. Biol. 12 (1), 186 (Sep. 2012). 10.1186/1471-2148-12-186 Han, Z., Xiong, D., Schneiter, R., Tian, C.: The function of plant PR1 and other members of the CAP protein superfamily in plant–pathogen interactions. Mol. Plant. Pathol. 24 (6), 651–668 (2023). 10.1111/mpp.13320 Gibbs, G.M., Roelants, K., O’Bryan, M.K.: The CAP superfamily: cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 proteins–roles in reproduction, cancer, and immune defense, Endocr. Rev. , vol. 29, no. 7, pp. 865–897, Dec. (2008). 10.1210/er.2008-0032 Lozano-Durán, R., Robatzek, S.: 14-3-3 Proteins in Plant-Pathogen Interactions. Mol. Plant-Microbe Interact. 28 (5), 511–518 (May 2015). 10.1094/MPMI-10-14-0322-CR Wieczorek, K., et al.: Sep., A Distinct Role of Pectate Lyases in the Formation of Feeding Structures Induced by Cyst and Root-Knot Nematodes, Mol. Plant-Microbe Interactions , vol. 27, no. 9, pp. 901–912, (2014). 10.1094/MPMI-01-14-0005-R Hewezi, T., Baum, T.J.: Manipulation of plant cells by cyst and root-knot nematode effectors. Mol. Plant-Microbe Interact. MPMI. 26 (1), 9–16 (Jan. 2013). 10.1094/MPMI-05-12-0106-FI Song, H., et al.: Oct., The Meloidogyne javanica effector Mj2G02 interferes with jasmonic acid signalling to suppress cell death and promote parasitism in Arabidopsis, Mol. Plant Pathol. , vol. 22, no. 10, pp. 1288–1301, (2021). 10.1111/mpp.13111 Niu, J., et al.: Jan., Msp40 effector of root-knot nematode manipulates plant immunity to facilitate parasitism, Sci. Rep. , vol. 6, no. 1, Art. no. 1, (2016). 10.1038/srep19443 Additional Declarations There is NO Competing Interest. Supplementary Files Supplementarytable1.1.xls Supplementarytable1.2.xls SupplementaryMaterial.docx Supplementarytable2.1.xls Supplementarytable2.2.xls Supplementarytable3.xls Supplementarytable4.xls Supplementarytable5.xls Supplementarytable6.xls Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3931000","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":274186427,"identity":"86095eea-3870-4350-9638-bdc2993bb3b2","order_by":0,"name":"Silvia Bottini","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7ElEQVRIie3PsYrCMBzH8V8I6BJwzaHQVwgIHkK5e5WI0FGErg6KULe6+hj6Bi0Bp8N7gAOp+AKCoB1E/FeLg9C43pAvWfKHD/kHcLn+Y5yPoVEcIIHyxWPs2wgrSY0XJChJYHuGCEpCGbwl3oxFMhth0PDmqRkOf1ufTbPLoLeVRBkieo3wI+IwC/UnunHQVtBhNeFsqnQNveWaiCCiftCRLNfVi00Lcn2SDZH6OYeuJjBskvWiJ0mIiA5sRN1JLOkvfUWkX5BQagvx5iZJ85M/aPB0fxSXr29abHU42BZ7JF/ub4HL5XK5rN0Ayo9N0oDozLcAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-0605-4646","institution":"Institut Sophia Agrobiotech","correspondingAuthor":true,"prefix":"","firstName":"Silvia","middleName":"","lastName":"Bottini","suffix":""},{"id":274186428,"identity":"03b1e362-9fe3-417b-b346-aa404e291edb","order_by":1,"name":"giulia calia","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"giulia","middleName":"","lastName":"calia","suffix":""},{"id":274186429,"identity":"39a2ac54-1244-4c38-9fd6-650c59459e3f","order_by":2,"name":"paola porracciolo","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"paola","middleName":"","lastName":"porracciolo","suffix":""},{"id":274186430,"identity":"e0ebded7-4224-409e-bba4-712703d28037","order_by":3,"name":"yongpan chen","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"yongpan","middleName":"","lastName":"chen","suffix":""},{"id":274186431,"identity":"fa2843e3-3e70-4ab8-ac1c-18fc77a9529c","order_by":4,"name":"djampa kozlowski","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"djampa","middleName":"","lastName":"kozlowski","suffix":""},{"id":274186432,"identity":"e69ca29a-bbca-4261-9df7-c40a3d90ab61","order_by":5,"name":"Hannes Schuler","email":"","orcid":"","institution":"Free University of Bozen-Bolzano","correspondingAuthor":false,"prefix":"","firstName":"Hannes","middleName":"","lastName":"Schuler","suffix":""},{"id":274186433,"identity":"e30af8d6-fb5a-43db-99c0-d2ddc0bf61ee","order_by":6,"name":"alessandro cestaro","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"alessandro","middleName":"","lastName":"cestaro","suffix":""},{"id":274186434,"identity":"c81bf110-a327-406a-8fc3-d0f502afe045","order_by":7,"name":"michael quentin","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"michael","middleName":"","lastName":"quentin","suffix":""},{"id":274186435,"identity":"dd19e22f-f5b4-4753-8bb1-d8853e72344f","order_by":8,"name":"bruno favery","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"bruno","middleName":"","lastName":"favery","suffix":""},{"id":274186436,"identity":"a47c8b1b-2637-41dd-b402-e7247a6ae8bc","order_by":9,"name":"Etienne Danchin","email":"","orcid":"https://orcid.org/0000-0003-4146-5608","institution":"Institut Sophia Agrobiotech, INRAE, Université Côte d’Azur, CNRS","correspondingAuthor":false,"prefix":"","firstName":"Etienne","middleName":"","lastName":"Danchin","suffix":""}],"badges":[],"createdAt":"2024-02-05 13:10:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3931000/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3931000/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":51540003,"identity":"5d7a9385-07b1-4f37-a7c8-cc682b928372","added_by":"auto","created_at":"2024-02-23 10:57:23","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":501050,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMOnSTER pipeline scheme.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) MOnSTER pipeline is composed of three steps. It takes two FASTA protein sequences datasets (positive and negative) and a list of predicted motifs (enriched in the positive dataset) as input. The output is a list of CLUMPs and an associated MOnSTER score. The MOnSTER score is constituted by: (B) CLUMP\u003csub\u003escore\u003c/sub\u003e calculation. (C) Two occurrences Indexes.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/088f5ed6d003e4942f818fab.png"},{"id":51540002,"identity":"60cf7edf-b321-4f13-a395-043e09171fd2","added_by":"auto","created_at":"2024-02-23 10:57:23","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":50644,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMotif logos of CLUMPs compared to the target motifs.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eUpper-panel: alignments of motifs in the respective CLUMP are produced by PROMOCA, and then the aligned motif sequences are used to produce the logos with WebLogo3. The x-axis represents the AA position in the motif, while the y-axis represents log-transformed frequencies translated into bits of information. Lower-panel: characteristic motifs of oomycetes effectors families from literature.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/99134fb309d285447fee43f2.png"},{"id":51540015,"identity":"157a20ef-8db6-45fe-bbd6-69162ab1e3ae","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1065420,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDendrogram of CLUMPs in Plant Parasitic Nematodes (PPNs)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e11 CLUMPs produced by MOnSTER (indicated with “/” sign). The coloured ones are those selected as best-scoring CLUMPs after MOnSTER-score calculation. Each best-scoring CLUMP is associated with the corresponding motif logo; alignment of motifs in each CLUMP is produced by PROMOCA and then WebLogo 3 is used to produce the image (the x-axis shows the AA position of the motif and the y-axis represents the log-transformed frequency of each AA in terms of bits of information).\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/33a3b7a8dd5b317e122e8fc8.png"},{"id":51540012,"identity":"4b2f4840-8901-481f-b354-bfd915f6dd22","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":30338,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCardinality of CLUMPs-motifs in each PPN species considered.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe total number of motifs belonging to each significant CLUMP per PPN species accordingly to their phylogeny. (minc: \u003cem\u003eMeloidogyne incognita, \u003c/em\u003emjav:\u003cem\u003e Meloidogyne javanica, \u003c/em\u003emare:\u003cem\u003eMeloidogyne arenaria, \u003c/em\u003emhap:\u003cem\u003e Meloidogyne hapla, \u003c/em\u003emchi:\u003cem\u003e Meloidogyne chitwoodi, \u003c/em\u003emgra:\u003cem\u003e Meloidogyne graminicola, \u003c/em\u003egros:\u003cem\u003e Globodera rostochiensis, \u003c/em\u003egpal:\u003cem\u003e Globodera pallida, \u003c/em\u003ehave:\u003cem\u003e Heterodera havenae, \u003c/em\u003ehgly:\u003cem\u003e Heterodera glycines, \u003c/em\u003ehsch:\u003cem\u003e Heterodera schachtii,\u003c/em\u003e rsim:\u003cem\u003eRadopholus similis, \u003c/em\u003ebxyl:\u003cem\u003e Bursaphelenchus xylophilus\u003c/em\u003e)\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/12b11a6b7b90743604126be9.png"},{"id":51540004,"identity":"7f6f9c6e-1fbc-415d-b25c-7839b457b11d","added_by":"auto","created_at":"2024-02-23 10:57:23","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":74796,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCandidate parasitism\u003c/strong\u003e \u003cstrong\u003eproteins showing the presence of CLUMP/s associated with pathogenicity-related protein domain/s.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe table on the right shows the co-occurrence of CLUMP or CLUMPs with specific domain classes (dc); dc1, pectate lyase domain class, dc2, glycosyl hydrolase family 5 domain class, dc3 Stichodactyla toxin (ShK) domain class, dc4 14-3-3 family domain class and dc5, cysteine-rich domain class. The upset plot on the left represents the occurrences and co-occurrences of respective CLUMPs in the positive dataset, highlighting the sequences that also have an interesting protein domain following the table counts.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/3488c0d178cc22b6a19b4589.png"},{"id":51540016,"identity":"d48028f0-753e-4059-9962-b80ae0c303ef","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":729201,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMiEFF72 is specifically expressed in the subventral glands.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A)\u003cem\u003e In situ\u003c/em\u003e hybridisation showing EFF72 transcripts in the subventral glands (SvG) of J2s of \u003cem\u003eM. incognita \u003c/em\u003e(two left pictures). Sense probe for the MiEFF72 transcripts was used as a negative control (right picture). (B) MiEFF72 localised to the cytoplasm of plant cells and in cytoplasmic vesicles (red arrows). The MiEFF72 sequence was fused to C-terminal end of the GFP and expressed in \u003cem\u003eN. benthamiana\u003c/em\u003e leaves by agroinfiltration. Bars = 20 µm.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/1b75ad85aaff266c14bcf018.png"},{"id":51540687,"identity":"5964b9be-6a14-435d-a8b3-7751d5292565","added_by":"auto","created_at":"2024-02-23 11:13:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2287004,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/cc5c2da4-cdd6-40bb-9cfd-d0f05a16c682.pdf"},{"id":51540006,"identity":"7d0e7ac7-90ad-4784-9149-c3c75b97c128","added_by":"auto","created_at":"2024-02-23 10:57:23","extension":"xls","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":686080,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable1.1.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/a371dd9add7ac7dd0f6f2187.xls"},{"id":51540021,"identity":"48ee24e2-2104-4515-8a45-5d602ad4fa7c","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"xls","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":2458624,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable1.2.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/8743177f74d617bd97db7a41.xls"},{"id":51540005,"identity":"acc13dd8-8a23-41f6-9d38-895910834631","added_by":"auto","created_at":"2024-02-23 10:57:23","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1098848,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/da804c83757020999e04371f.docx"},{"id":51540018,"identity":"64770c3f-7ecf-4970-9cca-e14468bcc1cd","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"xls","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":230400,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable2.1.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/26ca0c0206fa9efda4476a98.xls"},{"id":51540020,"identity":"3e8cd376-49d9-4f8b-a581-6afff0bdfbbb","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"xls","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":2227200,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable2.2.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/53863b4101f249a374f6cf88.xls"},{"id":51540523,"identity":"66f7ef8c-3531-44be-a1b2-2083a4c3ad97","added_by":"auto","created_at":"2024-02-23 11:05:23","extension":"xls","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":7168,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable3.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/472fb5ef00b3a6f1fa4a8412.xls"},{"id":51540007,"identity":"304a4f88-b38a-4f51-a83b-71983b9d78f2","added_by":"auto","created_at":"2024-02-23 10:57:23","extension":"xls","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":7168,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable4.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/4b38553c487e30571a1fb4d4.xls"},{"id":51540017,"identity":"9818059e-8c85-455f-8de0-4c87653b2887","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"xls","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":24064,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable5.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/ee5f1a1e9aff6c96529b3f43.xls"},{"id":51540019,"identity":"12fc9f15-c0a6-422a-814f-e9b29368471e","added_by":"auto","created_at":"2024-02-23 10:57:24","extension":"xls","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":16896,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"Supplementarytable6.xls","url":"https://assets-eu.researchsquare.com/files/rs-3931000/v1/977468d616d07e64fef5bf69.xls"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Identification and characterization of specific motifs in effector proteins of plant parasites using MOnSTER","fulltext":[{"header":"Introduction","content":"\u003cp\u003ePlant pathogens are a major threat to global food security. To cause the infection, pathogenic organisms secrete effector proteins that promote colonization of the host plant by overcoming the physical barriers of plant cell walls, suppressing or evading immune perception, and deriving nutrients from host tissues [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Therefore, identifying and characterizing pathogens effectors is crucial towards understanding how they manipulate the plant and better combat them. Effector proteins are often specific to pathogens and essential for causing plant pathology, constituting targets of choice for the development of cleaner and more specific control methods [\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u0026ndash;[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Because of their poor sequence conservation, effector identification among the set of predicted proteins from the genome (proteome) is challenging and current methods generate too many candidates without further indication for prioritizing experimental studies. Classically, effector proteins are indirectly identified among the predicted secretome based on the presence of a signal peptide for secretion and a lack of transmembrane region [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. However, these criteria alone suffer from two main limitations. On one side, the secretome comprises many proteins that are not effectors, on the other side some known effectors do not possess signal peptides for secretion. In most phyla, effectors contain specific sequence motifs which target host proteins with distinct roles in the infection process and control virulence [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The best-studied example is effectors secreted via the type III secretion system (T3SS) class of Gram-negative bacterial pathogens which are characterized by a specific motif/domain conferring a repertoire of molecular determinants with important roles during infection [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. However, these features are not conserved in other bacteria. Indeed, gram-positive pathogens and certain phloem- and xylem-colonizers, such as \u003cem\u003eCandidatus liberibacter\u003c/em\u003e and \u003cem\u003eXylella spp\u003c/em\u003e., do not encode the T3SS. In these bacteria, effector delivery is dependent on the presence of the N-terminal signal peptide, which is required for protein secretion [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. In fungi, often effectors are small in size and present cysteine-rich sequences [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Another well-characterized example is the effectors of the oomycetes pathogens. Oomycetes are eukaryotic filamentous and heterotrophic microorganisms among which, more than 60% of them parasitize plants [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Well-known plant pathogens in oomycetes include late blight of potato, sudden oak death, root rot agents (\u003cem\u003ePhytophthora\u003c/em\u003e species), and downy mildew \u003cem\u003ePeronospora\u003c/em\u003e and \u003cem\u003eBremia\u003c/em\u003e species [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. These pathogens code for two notable classes of effector proteins RxLR and Crinkler (CRN), that can be predicted by the occurrence of the related motifs, RxLR, -dEER and LxLFLAK-HVLVxxP in the N-terminal region downstream the signal peptide [\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u0026ndash;[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAlthough for some plant pathogens such as oomycetes, effectors have been studied extensively and characteristics motifs have been identified [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], research on Plant-Parasitic Nematode effectors (PPN) did not identify any consensus motif, conserved across multiple species. The most economically important PPNs are the sedentary Root-Knot Nematodes (RKNs) and cyst nematodes [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. These sedentary parasites induce the formation of a feeding structure that serves as a constant food source for the nematode. Other PPNs are migratory and a whole spectrum of variations exists between endo and ecto parasites, with semi-endoparasites an intermediate between the two extremes [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. The different lifestyles of PPNs are expected to be reflected in their secretions, which presumably contain effectors with different functions according to the nematode's specific needs, thus presenting a high variety of characteristic motifs complicating their identification.\u003c/p\u003e \u003cp\u003eA first step toward the identification of motif characteristics of RKN effectors was performed by Vens et al. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. The authors developed a bioinformatic tool, called MERCI, to identify motifs with high occurrences in a positive dataset (known effector sequences) and absent in the negative one (non-effector sequences). MERCI uses a graph-based approach incorporating physicochemical features of the amino acids composing protein sequences. By analyzing the known effector sequences of the RKN species \u003cem\u003eMeloidogyne incognita\u003c/em\u003e, one of the most important known crop pathogens among all [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], they identified 4 motifs. However, at the time of their publication, very few genomes for RKN species were available, and the study was therefore conducted on one single RKN species. Furthermore, the genome used at that time was later shown to be partially incomplete [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. These limitations prevent the generalization of the previous findings. Da Rocha et al. identified a \u003cem\u003ecis\u003c/em\u003e-regulatory promoter motif (Mel-DOG box) characteristic of dorsal gland effectors [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Recently, Rocha et al. used this motif combined with other criteria to select new putative effectors and validated 14 new dorsal gland-specific candidate effectors expressed in adult females [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Although all these studies have contributed to enlarging the list of known effectors, a global characterization of their properties is still missing. Therefore, there is an urgent need for a novel study of the properties of PPN effector sequences and motif research.\u003c/p\u003e \u003cp\u003eBy taking advantage of the multitude of proteomes available nowadays for several PPN, we developed a comprehensive motif mining analysis to identify characteristic motifs of candidate parasitism protein sequences of these species. Sequence motifs are usually of constant short size and are often repeated and conserved. Typically, motifs conform to a particular sequence pattern, where certain positions can be constrained to a specific amino acid, whereas others are not [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. This confers a high degeneration of the motifs yielding a huge list of non-redundant motif sequences and consequently, some motifs that are not characteristics of effector sequences only [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Furthermore, different amino acids (AAs) can have similar physicochemical properties, thus different motif sequences can share similar properties. However, most available motif discovery tools do not consider these properties. To circumvent these limitations, we have developed MOnSTER a novel tool that identifies \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eclu\u003c/span\u003esters of \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003em\u003c/span\u003eotifs of \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003ep\u003c/span\u003erotein \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003es\u003c/span\u003eequences (CLUMP) and associates a score to each CLUMP. This score encompasses the physicochemical properties of AAs and the motif occurrences. Overall, one of the key advantages of MOnSTER is that it reduces the redundancy of motifs found by \u003cem\u003ede novo\u003c/em\u003e tools. Furthermore, already known motifs available in publicly available databases such as Pfam [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] and/or InterProScan [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] can also be used as input of MOnSTER to identify discriminant CLUMPs.\u003c/p\u003e \u003cp\u003eWe built up our method to identify discriminant CLUMPs in 1743 candidate parasitism proteins of plant-pathogenic oomycetes. We showed the reliability of MOnSTER by identifying 5 CLUMPs that correspond to the known motifs: RxLR, -dEER and LxLFLAK-HVLVxxP. After this proof of concept, we applied MOnSTER on PPN effector proteins and identified peculiar motifs in their sequences at an unprecedented level. We selected a set of 4395 protein sequences from 13 PPN species belonging to the genera \u003cem\u003eMeloidogyne, Globodera, Heterodera, Radopholus and Bursaphelenchus\u003c/em\u003e. We identified 6 CLUMPs present in 60% of the known effectors (positive dataset). Of note these CLUMPs were found in only 5% of the sequences of the negative datasets, thus highlighting the enrichment of the identified motifs in effector sequences. Furthermore, we found a specific co-occurrence of at least two CLUMPs in PPN candidate parasitism protein sequences bearing protein domains important for invasion and pathogenicity.\u003c/p\u003e \u003cp\u003eThe potentiality of this tool goes behind the candidate parasitism proteins and can be used to easily cluster motifs and calculate the CLUMPs score on any set of protein sequences. Furthermore, we also provide a new scoring system capable of measuring the physicochemical properties of motifs grouped in CLUMPs and a motif alignment algorithm to better explore chemical-physical properties within the CLUMPs. MOnSTER is freely available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Plant-Net/MOnSTER_PROMOCA.git\u003c/span\u003e\u003cspan address=\"https://github.com/Plant-Net/MOnSTER_PROMOCA.git\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eDatasets\u003c/h2\u003e \u003cdiv id=\"Sec4\" class=\"Section3\"\u003e \u003ch2\u003eOomycetes\u003c/h2\u003e \u003cp\u003eWe used proteins from five oomycetes species to create the input datasets for MOnSTER, namely \u003cem\u003ePhytophthora infestans\u003c/em\u003e, \u003cem\u003ePhytophthora sojae\u003c/em\u003e, \u003cem\u003ePhytophthora ramorum\u003c/em\u003e, \u003cem\u003eHyaloperonospora arabidopsidis\u003c/em\u003e and \u003cem\u003eBremia lactucae\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePositive dataset\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe positive dataset consists of 1743 effector proteins belonging to the aforementioned oomycetes obtained from a concatenation of proteins selected from PHI-base database (v4.14) [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], Uniprot (release 2023_02)[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e], and the work of Haas et al., (2009) [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], in which they have manually curated the annotations of the proteins. Since the proteins come from different sources, we used CD-HIT (v4.8.1) [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] with the parameters in \u003cb\u003eSupplementary information\u003c/b\u003e, to filter out identical protein sequences. A total of 1283 proteins are annotated as RxLR effectors, 377 as Crinkler effectors and the last 83 sequences are proteins with no previously identified motif and known to be involved in the host-pathogen interaction.\u003c/p\u003e \u003cp\u003e \u003cb\u003eNegative dataset\u003c/b\u003e \u003c/p\u003e \u003cp\u003eProteins in the negative dataset derive all from Uniprot (release 2023_02) and from the oomycetes species cited before filtered from proteins included in the positive dataset and for evident effector-related annotations. Due to the large amount of non-effector proteins remaining from the filtering we firstly used ‘cd-hit’ to reduce protein sequence redundancy and then, to also reduce the unbalance of the final dataset we refined the selection taking only the representative sequences of the orthogroups found with Orthofinder (v2.5.4) [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. In total 3009 non effector proteins are included in the negative dataset.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eMotif Discovery\u003c/h2\u003e \u003cp\u003eThe last input file consists in a list of motifs identified as enriched in the sequences of the positive dataset compared to the sequences of the negative one. We used MERCI and STREME (v5.5.1) [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e], with parameters detailed in \u003cb\u003eSupplementary information\u003c/b\u003e. We imposed different lengths for motifs prediction to be inclusive but more stringent on the motifs in which we are interested. STREME’s output is a list of motifs. Hence, we used the tool FIMO (v5.5.1) [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e], with default parameters to extract 246 degenerated motifs from the 4524 different motifs.\u003c/p\u003e \u003cp\u003eWe obtained the following numbers of non-redundant motifs: 19 with MERCI and 246 with STREME. Then, we removed the identical motifs and created a single non-redundant list containing all the motifs in the same format, which resulted in 265 different motifs.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003ePlant Parasitic Nematodes (PPNs)\u003c/h2\u003e \u003cp\u003e \u003cb\u003ePositive dataset\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe positive dataset contains candidate parasitism proteins selected to be likely secreted by PPNs in their plant host and belonging to 13 species (\u003cem\u003eMeloidogyne incognita, Meloidogyne javanica, Meloidogyne arenaria, Meloidogyne hapla, Meloidogyne chitwoodi, Meloidogyne graminicola, Globodera rostochiensis, Globodera pallida, Heterodera havenae, Heterodera glycines, Heterodera schachtii, Radopholus similis, Bursaphelenchus xylophilus)\u003c/em\u003e. We collected candidate parasitism protein from literature mining. More precisely we considered as candidate parasitism protein those proteins for which \u003cem\u003ein-situ\u003c/em\u003e hybridization experiments showed that the corresponding transcript is present in nematode secretory glands (dorsal or sub-ventral), implying that these proteins are likely secreted by the nematodes into the host plant. The literature mining led to the extraction of 163 proteins from NCBI GeneBank thanks to the NCBI ‘entrez’ API. We also manually extracted 41 sequences from the publications’ core text and Supplementary information. In addition, we downloaded 41 sequences from WormBase ParaSite (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e\u003ca href=\"https://github.com/Plant-Net/MOnSTER_PROMOCA.git\" target=\"_blank\"\u003ewww.parasite.wormbase.org\u003c/a\u003e\u003c/span\u003e\u003cspan address=\"http://www.parasite.wormbase.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, vWBPS17-WS282 [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e], [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]), and eight sequences from nematode.net [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. In total we obtained 229 candidate parasitism protein. We extended the positive dataset with proteins that are non-redundant homologs of the previous candidate parasitism proteins in PPN proteomes. We first used cd-hit-2D with parameters in \u003cb\u003eSupplementary information\u003c/b\u003e, to cluster sequences from PPNs proteomes and candidate parasitism proteins [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. We then pooled all the candidate parasitism proteins from closely related \u003cem\u003eMeloidogyne\u003c/em\u003e species (e.g., \u003cem\u003eM. incognita\u003c/em\u003e, \u003cem\u003eM. javanica\u003c/em\u003e and \u003cem\u003eM. arenaria\u003c/em\u003e) and scanned each corresponding proteome with this multi-species set of sequences using cd-hit. Since the remaining species are genetically distinct, we then scanned each proteome with the relative set of candidate parasitism proteins, except for \u003cem\u003eH. havenae\u003c/em\u003e and \u003cem\u003eM. chitwoodi\u003c/em\u003e for which no proteomes were currently available. We merged the two sets of selected candidate parasitism proteins and we performed CD-HIT intra- and inter-species to reduce dataset redundancy (parameters in \u003cb\u003eSupplementary information\u003c/b\u003e), retaining only sequences having more than 1% divergence and aligning on more than 80% of their length (the longest sequence from each cluster was kept). The final positive dataset includes 546 candidate parasitism proteins from 13 species.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eNegative dataset\u003c/h2\u003e \u003cp\u003eThe negative dataset is composed of 3849 protein sequences that we obtained by selecting genes widely conserved across the nematode tree of life and close outgroup species, including many species that are non-parasites. Specifically, we filtered the results from a previous analysis [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e] and only retained genes from orthogroups i) conserved in more than 90% (62/64) of the analyzed species including two tardigrade species (outgroups), and ii) presenting less than 10 genes/species/orthogroups to avoid multigenic families, which would lead to overrepresentation of some proteins. To remove the redundancy, we used the same strategy as for the positive dataset (cdhit2D first and then CD-HIT).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eMotif Discovery\u003c/h2\u003e \u003cp\u003eUsing the aforementioned software in the same configuration we obtained the following numbers of non-redundant motifs: 40 with MERCI and 229 with STREME applying FIMO. In total, we obtained 269 different motifs.\u003c/p\u003e \u003cp\u003eAll datasets are available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Plant-Net/MOnSTER_PROMOCA.git\u003c/span\u003e\u003cspan address=\"https://github.com/Plant-Net/MOnSTER_PROMOCA.git\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e and in \u003cb\u003eSupplementary tables 1.1–1.2 and 2.1–2.2\u003c/b\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eMOnSTER pipeline\u003c/h2\u003e \u003cp\u003eThe MOnSTER (MOtifs of cluSTERs) pipeline is composed of three main steps as described in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and in the following paragraphs.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eFeature calculation\u003c/h2\u003e \u003cp\u003eThe first step of the pipeline concerns the calculation of parameters that describe protein sequences (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). To allow an easy calculation of the features on any dataset, we calculated the sequence length and used \u003cem\u003eProteinAnalysis\u003c/em\u003e class from the \u003cem\u003eBio.SeqUtils.ProtParam\u003c/em\u003e, a python sub-package to select 13 additional features based on individual AA properties, belonging to 4 categories:\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cul\u003e \u003cli\u003e \u003cp\u003esecondary structure propensity ‘helix’ (V, I, Y, F, W, L), ‘turn’ (N, P, G, S), and ‘sheet’ (E, M, A, L)).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eamino-acids dimensions (‘tiny’ (A, C, G, S, T) and ‘small’ (A, C, F, G, I, L, M, P, V, W, Y)).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003epH (‘basic’ (H, K, R), ‘acid’ (B, D, E), and ‘charged’ (H, K, R, B, D, E)).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ephysicochemical properties (‘hydropathy-score’, ‘polar’ (D, E, H, K, N, Q, R, S, T), ‘non-polar’ (A, C, F, G, I, L, M, P, V, W, Y), ‘aromatic’ (F, H, W, Y), and ‘aliphatic’ (A, I, L, V)).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eWe performed feature calculations on the positive and negative datasets and the list of motifs. At the end of this step, we obtained three tables of features, one for each of the input datasets (positive, negative datasets and the list of motifs).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eClustering\u003c/h2\u003e \u003cp\u003eThis step allowed to cluster motifs based on their properties described by the 13 features. To make the features comparable to each other, we performed data normalization by using the \u003cem\u003eStandardScaler\u003c/em\u003e method from \u003cem\u003esklearn.preprocessing\u003c/em\u003e [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. This normalization consists of the removal of the mean and the scaling to unit variance.\u003c/p\u003e \u003cp\u003eThen, we performed a hierarchical clustering of the motifs using the Euclidian distance. We then divided the resulting tree into clusters of motifs of proteins (CLUMPs) selecting the threshold distance that minimized the Davies-Bouldin score [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFor each CLUMP, we removed the redundant motifs. Briefly, we identified motifs that shared a core sequence (for example: ‘HWT in HWTQ’ and ‘GHWTQ’), and we only retained the cores (for instance: “HWT”) in the CLUMPs.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eScoring\u003c/h2\u003e \u003cp\u003eThe final objective is to identify the CLUMP(s) with the highest discriminative power concerning the positive dataset. Thus, we conceived a new score called the MOnSTER score, to rank the CLUMPs by their discriminative power.\u003c/p\u003e \u003cp\u003eThe MOnSTER score is composed of three parts: the CLUMP score and two modified versions of the Jaccard index.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eCLUMP score\u003c/h2\u003e \u003cp\u003eThis score considers the AA composition of the motifs belonging to each CLUMP concerning the preferences of the sequences of the positive dataset. The procedure that we implemented to calculate this score is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB.\u003c/p\u003e \u003cp\u003ea) Feature selection\u003c/p\u003e \u003cp\u003eWe used the Mann-Whitney test to identify the features whose values were significantly different between the positive and negative datasets. We only retained the statistically significant features, with a p-value \u0026lt; 0.05. Then, we assigned them a score, by calculating -Log(p-value) of each feature. We will refer to it as the ‘feature weight’ hereafter.\u003c/p\u003e \u003cp\u003eb) Average calculation\u003c/p\u003e \u003cp\u003eFor each of the selected features (ranging from one to \u003cem\u003ef\u003c/em\u003e), we calculated the average value for the positive dataset, the negative dataset, and each CLUMP (ranging from zero to \u003cem\u003ec\u003c/em\u003e). We will refer to these values with the notation: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{+}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{-}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{{CLUMP}_{c}}\\)\u003c/span\u003e\u003c/span\u003e, respectively.\u003c/p\u003e \u003cp\u003ec) CLUMPs sorting\u003c/p\u003e \u003cp\u003eWe compared the averages of the positive and negative datasets for each feature and sorted CLUMPs accordingly.\u003c/p\u003e \u003cp\u003eThus, if the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{+} \\ge {\\mu }_{f}^{-}\\)\u003c/span\u003e\u003c/span\u003e, the CLUMPs averages would be sorted in ascending order.\u003c/p\u003e \u003cp\u003eOtherwise (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{+} \u0026lt; {\\mu }_{f}^{-}\\)\u003c/span\u003e\u003c/span\u003e), CLUMPs averages would be sorted in descending order.\u003c/p\u003e \u003cp\u003ed) CLUMPs voting\u003c/p\u003e \u003cp\u003eFor each feature, and each CLUMP, we divided the CLUMP into two groups accordingly to the following statements:\u003c/p\u003e \u003cp\u003eIf \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{+} \\ge {\\mu }_{f}^{-}\\)\u003c/span\u003e\u003c/span\u003e : CLUMPs with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{{CLUMP}_{c}}\\ge {\\mu }_{f}^{+}\\)\u003c/span\u003e\u003c/span\u003e have a vote from 1 to the number of CLUMPs with an increment of 1, otherwise the score is set to 0.\u003c/p\u003e \u003cp\u003eIf \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{+} \u0026lt; {\\mu }_{f}^{-}\\)\u003c/span\u003e\u003c/span\u003e : CLUMPs with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\mu }_{f}^{{CLUMP}_{c}}\u0026lt;{\\mu }_{f}^{+}\\)\u003c/span\u003e\u003c/span\u003e the vote attributed goes from 1 to the number of CLUMPs, otherwise it is 0.\u003c/p\u003e \u003cp\u003ee) CLUMPs scoring\u003c/p\u003e \u003cp\u003eFor each CLUMP (ranging from zero to \u003cem\u003ec\u003c/em\u003e), for each feature (ranging from one to \u003cem\u003ef\u003c/em\u003e), we multiplied the feature-vote by the ‘feature weight’ (\u003cem\u003eW\u003c/em\u003e\u003csub\u003e\u003cem\u003ef\u003c/em\u003e\u003c/sub\u003e) and summed-up to obtain a CLUMP-vote. Then we scaled each CLUMP-vote to a range from 0 to 1 using the following formula:\u003c/p\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$${CLUMPscore}_{c }= \\frac{{V}_{c}- \\text{m}\\text{i}\\text{n}\\left(V\\right)}{\\left(\\text{max}\\left(V\\right)-\\text{m}\\text{i}\\text{n}(V\\right))}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cem\u003eV\u003c/em\u003e is the list of CLUMPs votes and \u003cem\u003eV\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e is calculated as:\u003c/p\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$${V}_{c}=\\sum _{features[1, f]}\\left({vote}_{f} \\subset {CLUMP}_{c}\\right) {W}_{f}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003eOccurrences indexes\u003c/h2\u003e \u003cp\u003eThe two indexes respectively consider: i) the occurrences of the motifs, for each CLUMP, in the positive dataset compared to the negative, and ii) the number of positive sequences containing the motifs in each CLUMP concerning the negatives (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC).\u003c/p\u003e \u003cp\u003ea) CLUMPs occurrences\u003c/p\u003e \u003cp\u003eWe calculated the occurrences of the motifs in each CLUMPs in the two datasets (positive and negative).\u003c/p\u003e \u003cp\u003eb) I’s scores\u003c/p\u003e \u003cp\u003eWe propose two ways to calculate the dissimilarity between two sets that will be called I\u003csub\u003e1\u003c/sub\u003e and I\u003csub\u003e2\u003c/sub\u003e hereafter.\u003c/p\u003e \u003cp\u003eTo obtain I\u003csub\u003e1\u003c/sub\u003e, we calculated the number of occurrences of the motifs for each CLUMP (ranging from zero to \u003cem\u003ec\u003c/em\u003e) in the negative dataset over the number of occurrences of the motifs of the same CLUMP in the positive dataset, using the following equation:\u003c/p\u003e\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$${I}_{1 \\forall CLUMP[0, c]}= \\frac{1}{2} \\left(1- \\frac{\\sum {\\varDelta }_{-} \\subset {CLUMP}_{c}}{\\sum {\\varDelta }_{+} \\subset {CLUMP}_{c}}\\right)$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e \u003cp\u003eWhere:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\({\\varDelta }_{-}\\)\u003c/span\u003e \u003c/span\u003eand \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\varDelta }_{+}\\)\u003c/span\u003e\u003c/span\u003ethe number of occurrences of the motifs of the CLUMP in the negative or\u003c/p\u003e \u003cp\u003ein the positive dataset, respectively.\u003c/p\u003e \u003cp\u003eTo obtain I\u003csub\u003e2\u003c/sub\u003e, for each CLUMP (ranging from zero to \u003cem\u003ec\u003c/em\u003e), we calculated the number of sequences of the negative dataset that contain at least a motif of the CLUMP, over the number of sequences of the positive dataset that contain at least a motif of the same CLUMP, accordingly to the following formula:\u003c/p\u003e\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$${I}_{2 \\forall CLUMP[0, c]}= \\frac{1}{2} \\left(1- \\frac{\\sum {seq}_{-}\\subset {CLUMP}_{c}}{\\sum {seq}_{+} \\subset { CLUMP}_{c}}\\right)$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e \u003cp\u003eWhere:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\({seq}_{-}\\)\u003c/span\u003e \u003c/span\u003e is the number of sequences of the negative dataset containing at least a motif of the CLUMP.\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\({seq}_{+}\\)\u003c/span\u003e \u003c/span\u003e is the number of sequences of the positive dataset containing at least a motif of the CLUMP.\u003c/p\u003e \u003cp\u003eThe ½ factor is applied to have values between 0 and 0.5 for each Index to have equal weight in the final score, and (1 – Index) is to consider the degree of dissimilarity rather than similarity.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eMOnSTER score\u003c/h2\u003e \u003cp\u003eThe MOnSTER score, for each CLUMP (from zero to \u003cem\u003ec\u003c/em\u003e), is the sum of the corresponding CLUMP score, and the two I indexes:\u003c/p\u003e\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$${MOnSTERscore}_{c}={CLUMPscore}_{c}+{I}_{1c}+ {I}_{2c}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003cdiv id=\"Sec17\" class=\"Section3\"\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Results \u0026 Discussion","content":"\u003ch2\u003eMOnSTER identified five CLUMPs containing known motifs characteristics of oomycetes effector protein sequences\u003c/h2\u003e\u003cp\u003eCharacteristic motifs of oomycetes effector proteins are well-known in the literature, such as RxLR, -dEER and LxLFLAK-HVLVxxP [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Thus, we reasoned to apply our novel tool, MOnSTER, on oomycetes effectors to test its ability to recover well-characterized motifs. We compiled a set of 4752 oomycetes proteins, comprising 1743 effectors and 3009 non effectors, from five oomycetes species. We performed motif discovery on this set of proteins using MERCI and STREME and we identified 265 significantly enriched motifs (see methods for further details). Then we fed MOnSTER with these motifs and we obtained 11 CLUMPs (\u003cb\u003eSupplementary table \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003e3\u003c/span\u003e\u003c/b\u003e), employing the Davis-Bouldin score, as a criterion to cut the tree. By selecting CLUMPs having a MOnSTER score greater than the median of the overall scores we identified six CLUMPs (CLUMP7, 4, 10, 6, 2 and 9), the first five best-scoring CLUMPs, accordingly to the MOnSTER score, correspond to the known motifs (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In \u003cb\u003eSupplementary Fig.\u0026nbsp;2\u003c/b\u003e we can also observe that the motifs are respectively grouped in two clades, the two characteristics motifs of CRN-effectors (LxLFLAK and HVLVxxP), form a separate subclade on the right, while the RxLR and -dEER motifs fall into the left clade, resembling the family distinction of effectors to which they belong. More precisely RxLR motifs are divided into two different CLUMPs; CLUMP6 containing only RYLR and RFLR motifs, and CLUMP10, containing other RxLR motifs and included in the same sub-clade of the dEER motif (CLUMP2). The last best-scoring CLUMP contains no known motifs, perhaps suggesting a novel putative motif for oomycetes effectors to investigate. Since oomycetes effectors characterization is not in the scope of this article, we did not consider this last CLUMP for further analysis. In support of that, CLUMPs 7, 4, 10, 6 and 2 are present in 1205/1743 effectors (~ 70% of the sequences in the positive dataset) while in combination with the last significant CLUMP (CLUMP9) only two more sequences can be detected.\u003c/p\u003e\u003cp\u003eThus, we investigated the occurrences and co-occurrences of the five selected CLUMPs in oomycetes effectors and non-effectors (\u003cb\u003eSupplementary Fig.\u0026nbsp;3\u003c/b\u003e). For the effectors we deeply analyzed the two distinct families; in total we found that 68% of the RxLR-effectors in the positive dataset contain the motifs in CLUMPs associated with the RxLR motif (CLUMP10, 6 and 2). In particular, CLUMP10 and 6 are present alone in 41% of the RxLR-effectors (1238/1743 RxLR-effectors), while 19% of the RxLR-effectors contained the co-occurrence of these CLUMPs with the CLUMPs representing the dEER motif (CLUMP2). This reflects the importance of the RxLR motifs in the effector sequences and the role of the attached dEER [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. On the other hand, the co-occurrence of CLUMPs specific for LxLFLAK and HVLVxxP (CLUMP7 and 4), in CRN-effector sequences accounts for 67% of the relative sequences in the positive dataset (377/1743). The high co-occurrences rate of CLUMP7 and 4 is strongly in agreement with the presence of LxLFLAK and HVLVxxP motif marking the beginning and the end of the DWL-domain in the Crinkler-effector family [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. For the negative dataset, instead, only 15% of the sequences show the presence of CLUMP-motifs with a huge decrease in CLUMPs co-occurrences. Overall co-occurrences, indeed, are present in around 30% of positive sequences and in 1% of negative ones.\u003c/p\u003e\u003cp\u003ePrevious research showed that the motifs characteristics of oomycetes effectors have strong sequence position preferences [\u003cspan additionalcitationids=\"CR53\" citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]–[\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. Thus, we plotted the CLUMPs occurrences in the positive versus negative dataset (\u003cb\u003eSupplementary Fig.\u0026nbsp;4\u003c/b\u003e). Indeed, we can observe that the CLUMPs are concentrated at the beginning of the sequence in positive sequences and conversely spread around the sequence of negative dataset proteins. More precisely the five most interesting CLUMPs are condensed in the first 40% of the sequence with a higher preference at the very beginning and around 30% of the sequence probably corresponding to the N-terminal of the protein in which the target motifs lie.\u003c/p\u003e\u003cp\u003eAltogether these results highlight the ability of MOnSTER to identify CLUMPs containing biologically relevant motifs.\u003c/p\u003e\u003ch2\u003eMOnSTER allowed to identify six CLUMPs characteristics of nematode candidate parasitism proteins\u003c/h2\u003e\u003cp\u003eThe application of MOnSTER of the oomycetes effectors served as a proof of concept of our methodology. Thus, we moved to the characterization of nematode candidate parasitism sequences for which no characteristic motifs have been identified yet. We collected a set of 4395 proteins, including 546 well-known candidate parasitism proteins and 3849 proteins in the negative dataset, coming from 13 nematode species. By running motif discovery analysis as for the previous dataset, we found 269 motifs enriched in the candidate parasitism protein sequences. By applying MOnSTER with the previous configuration, the 269 input motifs were grouped into 11 CLUMPs. Six best-scoring CLUMPs were selected using the median as the significant threshold (\u003cb\u003eSupplementary table \u003cspan refid=\"MOESM4\" class=\"InternalRef\"\u003e4\u003c/span\u003e\u003c/b\u003e). Similar to the oomycetes results, we observe two main clades (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e): the second and the third best scoring ones (CLUMP2 and 5 respectively) form a single clade while the other significant CLUMPs (CLUMP1, 3, 7 and 10) are distributed in the bigger clade with the non-significant ones. Overall, we found at least one occurrence of one of the six CLUMPs in almost 60% of sequences from the positive dataset compared to 5% of sequences from the negative.\u003c/p\u003e\u003cp\u003eThen we investigated the presence of the six CLUMPs in each of the 13 PPN species present in the dataset. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the abundance of the six best-scoring CLUMPs in the species according to their phylogeny tree. The first three species are the most represented in the positive dataset. Interestingly very distant species show similar CLUMPs frequencies thus suggesting that they might share common characteristics at the sequence level for accomplishing similar functions. Furthermore, we could identify characteristic CLUMPs also for species represented in the dataset with very few sequences reinforcing the previous observation. Overall, this analysis suggests that CLUMPs might be associated with the functional properties of PPN nematodes.\u003c/p\u003e\u003cp\u003eFinally, we focused on the positional sequence preferences of CLUMPs in candidate parasitism protein sequences (\u003cb\u003eSupplementary Fig.\u0026nbsp;5\u003c/b\u003e). In general, we observe a difference in the position preferences of the best-scoring CLUMPs between positive and negative dataset sequences. The six CLUMPs tend to occur more frequently in the middle of the sequences in candidate parasitism proteins (positive dataset), with more abundance in central (around 50% of the sequence) and terminal (around 70%), positions. The same CLUMPs are rare in the central position of the negative dataset protein sequences (negative dataset). Contrary to the properties of oomycetes effectors, whose characteristics CLUMPs occur mainly at the beginning of the sequence, PPN candidate parasitism proteins showed a different pattern of occurrences, privileging a central – C terminal occurrence.\u003c/p\u003e\u003cp\u003e \u003cb\u003eCo-occurrences of different CLUMPs are associated with functional protein domains.\u003c/b\u003e \u003c/p\u003e\u003cp\u003eWe investigated the co-occurrence patterns of CLUMPs in the PPNs candidate parasitism protein sequences (all possible combinations of co-occurrences are reported in \u003cb\u003eSupplementary Fig.\u0026nbsp;6)\u003c/b\u003e. Overall, we notice that CLUMPs tend to co-occur more frequently in the sequences of the positive dataset than in the negative one, despite the positive set being smaller than the negative one. 30% of candidate parasitism protein sequences show co-occurrences of the six selected CLUMPs, while in the sequences from the negative dataset, co-occurrences, are present in less than 1% of the sequences. As observed for oomycetes, some CLUMPs tend to be present alone, while others tend to co-occur with specific CLUMPs. This suggests that different classes of nematode candidate parasitism proteins might exist, similar to the oomycetes effectors. Interestingly, among the 311 candidate parasitism proteins bearing at least one occurrence of one of the six selected CLUMPs, 72 do not have a predicted signal peptide, consisting of 55% of the proteins in the positive dataset not having the signal peptide. Of note, this is a similar percentage to the percentage of proteins bearing both the CLUMPs and the signal peptide, suggesting that CLUMPs characterize sequence properties beyond the type of secretion. Furthermore, similar patterns of co-occurrences of CLUMPs in candidate parasitism proteins bearing or not the signal peptide are observed with slightly higher co-occurrence presences in the sequences not having the signal peptide (\u003cb\u003eSupplementary Fig.\u0026nbsp;7\u003c/b\u003e). Importantly, there is no relationship between the sequence length and the number of co-occurrences possibly suggesting a functional role for CLUMPs co-occurrences (\u003cb\u003eSupplementary Fig.\u0026nbsp;8\u003c/b\u003e).\u003c/p\u003e\u003cp\u003eTo inspect further a putative functional role of CLUMPs in candidate parasitism protein sequences, we queried the sequences having at least one CLUMP or a co-occurrence of multiple CLUMPs against several protein domain databases (see supplementary information, results in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e and \u003cb\u003eSupplementary table \u003cspan refid=\"MOESM5\" class=\"InternalRef\"\u003e5\u003c/span\u003e\u003c/b\u003e). Among the 311 candidate parasitism protein sequences bearing at least one occurrence of at least one of the six CLUMPs, 84 also have at least an occurrence of a known protein domain. The most recurrent hits are the coil domain, intrinsically disordered domain and the presence of the signal peptide (SP) followed by the pectate lyase domain, glycosyl hydrolase family 5, Stichodactyla toxin (ShK) domain, 14-3-3 family and cysteine-rich domain. Importantly, none of these domains was also found in the sequences from the negative dataset bearing at least one occurrence of at least one of the six CLUMPs. Interestingly, we observe the almost exclusive association between CLUMPs and functional domains, mainly when multiple CLUMPs co-occur in candidate parasitism protein sequences.\u003c/p\u003e\u003cp\u003eThe strongest association that we observe is between the co-occurrences of CLUMPs 7 and 10 and the glycosyl hydrolase family 5 domain on one hand and the co-occurrences of CLUMPs 3, 7, 10 and the cysteine-rich domain, on the other hand. Specifically, all 23 candidate parasitism protein sequences containing the co-occurrences of CLUMP 7 and 10 bear also the glycosyl hydrolase family 5 domain. By inspecting the position of CLUMPs occurrences within the sequences, we observed that the two CLUMPs are flanking the domain: CLUMP7 is consistently present at the beginning of the sequence and consequently of the domain, while CLUMP10 mostly concentrates at the end of the domain, around 60–80% of the sequences (\u003cb\u003eSupplementary Fig.\u0026nbsp;9\u003c/b\u003e). Examples of these genes in nematodes is poorly characterized and likely resulting from horizontal transfer [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e], [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]. Similarly, all 17 sequences presenting the co-occurrence of CLUMPs 3, 7,10 also contain the cysteine-rich domain. Cysteine-rich domain and CAP protein are known to be involved in the virulence of nematodes [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]. They are expressed in both plants and pathogens; in the latter, they are important for their virulence by suppressing the host’s immune responses and promoting colonization. Interestingly, these sequences do not contain disordered regions or coil domains, consistently with unique conserved sandwich fold with a large central cavity of these kinds of proteins [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]. 16 out of 19 sequences presenting co-occurrences of CLUMPs 2, 3 have also the 14-3-3 family domain, a eukaryotic-specific protein family with a general role in the signal transduction [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]. We also observe only one motif from CLUMP 2 in these sequences (KDKM) and 4 from CLUMP 3 (NKDKAC, KMKG, PTHPIR, PTHP). 13 out of 34 sequences bearing only CLUMP 1 also contain the pectate lyase domain. Of note, these sequences do not contain coiled or disordinate regions, and only seven show the presence of the SP. Pectate lyase enzymes in nematodes facilitate penetration in plant-cell walls made of pectin [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. Numerous recent reports showed that these enzymes are produced in specialized nematode gland cells and secreted during the parasitism process. In the case of sedentary endo-parasitic nematodes, this occurs mainly during juvenile migration through the root tissue, when these enzymes play a crucial role in the maceration of the plant tissue facilitating the infection [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]. Finally, eight out of 22 sequences bear the co-occurrences of CLUMPs 2, 5 and the ShK domain. Although the exact biological function of the ShK domain remains unclear, previous reports have shown that this domain might be associated with immunosuppression [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e], [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eOverall, these findings highlight that specific CLUMPs co-occurrences are associated with specific functional domains with roles in invasion and/or infection and might suggest different classes of candidate parasitism proteins cross-species.\u003c/p\u003e\u003cp\u003e \u003cb\u003eCLUMPs screening yielded the identification of a novel effector in\u003c/b\u003e \u003cb\u003eM. incognita\u003c/b\u003e \u003cb\u003evalidated by\u003c/b\u003e \u003cb\u003ein situ\u003c/b\u003e \u003cb\u003ehybridization.\u003c/b\u003e\u003c/p\u003e\u003cp\u003eTo inspect whether the novel-identified CLUMPs could also help to find new effectors, we focused on the selection of a novel putative effector to validate experimentally. Thus, we selected all proteins of \u003cem\u003eMeloidogyne incognita\u003c/em\u003e proteome bearing the signal peptide for secreted proteins and no transmembrane domain. Then we screened these sequences and retrieved the ones containing at least one motif of the six significant CLUMPs. Among them, 23% contain at least one occurrence of motifs in CLUMP5 (Supplementary Table\u0026nbsp;6). Since this is the most abundant CLUMP in this species, we decided to focus on this one to identify a putative candidate to validate experimentally. By literature mining, we refined our list, by sorting out seven sequences that were already experimentally validated by previous studies (Supplementary Table\u0026nbsp;6). Then we filtered out any candidates having homologs in species other than root-knot nematodes and more than two gene copies to avoid dealing with multigene families according to [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. Finally, among these eight new putative effector sequences, we studied the pattern of expression of one candidate: \u003cem\u003eMiEFF72\u003c/em\u003e (\u003cem\u003eMinc3s00056g02931\u003c/em\u003e) by performing \u003cem\u003ein situ\u003c/em\u003e hybridisation (ISH, see supplementary information). A specific signal was detected in the subventral oesophageal gland cells of pre-parasitic J2s after hybridisation with digoxigenin-labelled MiEFF72 antisense probes (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). No signal was detected in pre-J2s with sense negative controls. MiEFF72 fused to the C-terminus of GFP was transiently expressed in \u003cem\u003eN. benthamiana\u003c/em\u003e leaf epidermis. GFP fluorescence was detected in the cytoplasm and in cytoplasmic vesicles (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB). This finding suggests that MiEFF72 be secreted and play a role \u003cem\u003ein planta\u003c/em\u003e in nematode parasitism.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThis work is structured around three main aims: (1) the development of a novel method to cluster and score discriminant motifs of protein sequences called MOnSTER, (2) the validation of the MOnSTER results by applying it to identify CLUMPs specific to candidate parasitism protein sequences of oomycetes (3) the application of MOnSTER to protein sequences from plant-parasitic nematodes with unprecedented discriminant motifs detection.\u003c/p\u003e \u003cp\u003eThe application of MOnSTER on oomycetes yielded the identification of five CLUMPs corresponding to the well-known effector-related motifs like RxLR-dEER and LxLFLAK-HVLVxxP motifs in oomycetes. This demonstrated that the novel scoring method introduced by MOnSTER is a good parameter with which calculate CLUMP specificity for effector protein sequences. When applied to the nematodes candidate parasitism protein, MOnSTER found six novel CLUMPs, not previously characterized. The main advantage of MOnSTER is that the definition of CLUMPs allowed us to reduce the degeneration of 265 and 269 motifs (oomycetes and nematodes respectively), to 11 CLUMPs. Candidate parasitism protein sequences of both pathogens show some common characteristics. Indeed, selected CLUMPs-motifs are present in about 70% of the input proteins for oomycetes and 60% in PPN compared to 15% and 5% in the negative dataset proteins, respectively. Furthermore, around 30% of candidate parasitism protein sequences have co-occurring CLUMPs, in contrast with less than 1% of the negative dataset sequences, in both applications. The main difference between candidate parasitism protein specific motifs of the two pathogens is the positional preference: the beginning of the sequence for oomycetes and central C-terminal for PPNs. This highlights MOnSTER ability to cluster motifs specifically relevant for candidate parasitism protein sequences without privileging any portion of the sequence, like other motif discovery tools.\u003c/p\u003e \u003cp\u003eConcerning the novel identified motifs for PPNs candidate parasitism proteins, we observed that the pattern of occurrences and co-occurrences of CLUMPs in candidate parasitism protein sequences is associated with specific functional domains and might suggest the existence of different classes of candidate parasitism proteins. Importantly we did not observe any species-related preferences thus implying the generality of these results.\u003c/p\u003e \u003cp\u003eIn conclusion, MOnSTER quantifies the motifs and sequence properties in each dataset provided, thus allowing a wide application to other protein classes. Since the MOnSTER score considers the physicochemical properties and occurrences of motifs in CLUMPs concerning the protein sequences provided, it works without the need for a reference dataset. Furthermore, the MOnSTER scores are normalized values, therefore, allowing direct comparison between different studies.\u003c/p\u003e \u003cp\u003eOur results highlighted that MOnSTER is a powerful new method to cluster and score discriminant motifs in protein sequences according to their physicochemical properties and pattern of occurrences. It is also a tool that can be easily used on any set of protein sequences and a list of motifs. Therefore, by constructing a dataset of positive candidate parasitism protein sequences and a negative dataset, MOnSTER can be also used to identify CLUMPs characteristics of fungal or bacterial candidate parasitism proteins. As such, MOnSTER can be included in any pipeline needing motif calling and will be of great use to accelerate both computational and experimental studies relating to protein motif discovery.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe source code and related data are available at: https://github.com/Plant-Net/MOnSTER_PROMOCA.git\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the French government, through the UCA JEDI Investments in the Future project managed by the National Research Agency (ANR) under reference number ANR-15-IDEX-01.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFinancial Disclosure statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMicroscopy work was performed at the SPIBOC imaging facility of Institut Sophia Agrobiotech, and we thank Dr Olivier Pierre for is availability.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCRediT author contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGC: Methodology, Software, Validation, Formal analysis, Writing \u0026ndash; original draft, Visualization. PP: Methodology, Software, Writing \u0026ndash; original draft. JC: Investigation, Resources. DK: Software, Resources. HS: Writing \u0026ndash; review \u0026amp; editing, Supervision. AC: Writing \u0026ndash; review \u0026amp; editing, Supervision. MQ: Investigation, Resources, Writing \u0026ndash; review \u0026amp; editing, Supervision. BF: Investigation, Resources, Writing \u0026ndash; review \u0026amp; editing, Supervision. EGJD: Conceptualization, Methodology, Writing \u0026ndash; review \u0026amp; editing, Supervision. SB: Conceptualization, Methodology, Writing \u0026ndash; original draft, Writing \u0026ndash; review \u0026amp; editing, Supervision, Project administration.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eToru\u0026ntilde;o, T.Y., Stergiopoulos, I., Coaker, G.: Plant-Pathogen Effectors: Cellular Probes Interfering with Plant Defenses in Spatial and Temporal Manners. Annu. Rev. Phytopathol. \u003cb\u003e54\u003c/b\u003e, 419\u0026ndash;441 (Aug. 2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1146/annurev-phyto-080615-100204\u003c/span\u003e\u003cspan address=\"10.1146/annurev-phyto-080615-100204\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHaegeman, A., Mantelin, S., Jones, J.T., Gheysen, G.: Functional roles of effectors of plant-parasitic nematodes. Gene. \u003cb\u003e492\u003c/b\u003e(1), 19\u0026ndash;31 (Jan. 2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.gene.2011.10.040\u003c/span\u003e\u003cspan address=\"10.1016/j.gene.2011.10.040\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSelin, C., de Kievit, T.R., Belmonte, M.F., Fernando, W.G.D.: Elucidating the Role of Effectors in Plant-Fungal Interactions: Progress and Challenges, \u003cem\u003eFront. Microbiol.\u003c/em\u003e, vol. 7, [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.frontiersin.org/articles/\u003c/span\u003e\u003cspan address=\"https://www.frontiersin.org/articles/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fmicb.2016.00600\u003c/span\u003e\u003cspan address=\"10.3389/fmicb.2016.00600\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBird, D.M., Jones, J.T., Opperman, C.H., Kikuchi, T., Danchin, E.G.J.: Signatures of adaptation to plant parasitism in nematode genomes, \u003cem\u003eParasitology\u003c/em\u003e, vol. 142 Suppl 1, no. Suppl 1, pp. S71-84, Feb. (2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1017/S0031182013002163\u003c/span\u003e\u003cspan address=\"10.1017/S0031182013002163\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSperschneider, J., Williams, A.H., Hane, J.K., Singh, K.B., Taylor, J.M.: Evaluation of Secretion Prediction Highlights Differing Approaches Needed for Oomycete and Fungal Effectors. Front. Plant. Sci. \u003cb\u003e6\u003c/b\u003e, 1168 (2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2015.01168\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2015.01168\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSonah, H., Deshmukh, R.K., B\u0026eacute;langer, R.R.: Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges. Front. Plant. Sci. \u003cb\u003e7\u003c/b\u003e, 126 (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2016.00126\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2016.00126\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, L., et al.: Arms race: diverse effector proteins with conserved motifs. Plant. Signal. Behav. \u003cb\u003e14\u003c/b\u003e(2), 1557008 (Jan. 2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1080/15592324.2018.1557008\u003c/span\u003e\u003cspan address=\"10.1080/15592324.2018.1557008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDean, P.: Functional domains and motifs of bacterial type III effector proteins and their roles in infection, \u003cem\u003eFEMS Microbiol. Rev.\u003c/em\u003e, vol. 35, no. 6, pp. 1100\u0026ndash;1125, Nov. (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/j.1574-6976.2011.00271.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1574-6976.2011.00271.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGreen, E.R., Mecsas, J.: Bacterial Secretion Systems: An Overview. Microbiol. Spectr. \u003cb\u003e4\u003c/b\u003e(1) (Feb. 2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1128/microbiolspec.VMBF-0012-2015\u003c/span\u003e\u003cspan address=\"10.1128/microbiolspec.VMBF-0012-2015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNatale, P., Br\u0026uuml;ser, T., Driessen, A.J.M.: Sec- and Tat-mediated protein secretion across the bacterial cytoplasmic membrane\u0026mdash;Distinct translocases and mechanisms, \u003cem\u003eBiochim. Biophys. Acta BBA - Biomembr.\u003c/em\u003e, vol. 1778, no. 9, pp. 1735\u0026ndash;1756, Sep. (2008). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.bbamem.2007.07.015\u003c/span\u003e\u003cspan address=\"10.1016/j.bbamem.2007.07.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSperschneider, J., Dodds, P.N., Gardiner, D.M., Manners, J.M., Singh, K.B., Taylor, J.M.: Advances and Challenges in Computational Prediction of Effectors from Plant Pathogenic Fungi. PLOS Pathog. \u003cb\u003e11\u003c/b\u003e(5), e1004806 (May 2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.ppat.1004806\u003c/span\u003e\u003cspan address=\"10.1371/journal.ppat.1004806\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBeakes, G.W., Glockling, S.L., Sekimoto, S.: The evolutionary phylogeny of the oomycete \u0026lsquo;fungi\u0026rsquo;. Protoplasma. \u003cb\u003e249\u003c/b\u003e(1), 3\u0026ndash;19 (Jan. 2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00709-011-0269-2\u003c/span\u003e\u003cspan address=\"10.1007/s00709-011-0269-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThines, M., Kamoun, S.: Oomycete-plant coevolution: recent advances and future prospects, \u003cem\u003eCurr. Opin. Plant Biol.\u003c/em\u003e, vol. 13, no. 4, pp. 427\u0026ndash;433, Aug. (2010). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.pbi.2010.04.001\u003c/span\u003e\u003cspan address=\"10.1016/j.pbi.2010.04.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWood, K.J., et al.: Oct., Effector prediction and characterization in the oomycete pathogen Bremia lactucae reveal host-recognized WY domain proteins that lack the canonical RXLR motif, \u003cem\u003ePLOS Pathog.\u003c/em\u003e, vol. 16, no. 10, p. e1009012, (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.ppat.1009012\u003c/span\u003e\u003cspan address=\"10.1371/journal.ppat.1009012\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFranceschetti, M., Maqbool, A., Jim\u0026eacute;nez-Dalmaroni, M.J., Pennington, H.G., Kamoun, S., Banfield, M.J.: Effectors of Filamentous Plant Pathogens: Commonalities amid Diversity. Microbiol. Mol. Biol. Rev. \u003cb\u003e81\u003c/b\u003e(2), e00066\u0026ndash;e00016 (Mar. 2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1128/MMBR.00066-16\u003c/span\u003e\u003cspan address=\"10.1128/MMBR.00066-16\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang, R.H.Y., Tripathy, S., Govers, F., Tyler, B.M.: RXLR effector reservoir in two Phytophthora species is dominated by a single rapidly evolving superfamily with more than 700 members, \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e, vol. 105, no. 12, pp. 4874\u0026ndash;4879, Mar. (2008). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.0709303105\u003c/span\u003e\u003cspan address=\"10.1073/pnas.0709303105\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTorto, T.A., et al.: Jul., EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen Phytophthora, \u003cem\u003eGenome Res.\u003c/em\u003e, vol. 13, no. 7, pp. 1675\u0026ndash;1685, (2003). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/gr.910003\u003c/span\u003e\u003cspan address=\"10.1101/gr.910003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePritchard, L., Birch, P.: A systems biology perspective on plant\u0026ndash;microbe interactions: Biochemical and structural targets of pathogen effectors. Plant. Sci. \u003cb\u003e180\u003c/b\u003e(4), 584\u0026ndash;603 (Apr. 2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.plantsci.2010.12.008\u003c/span\u003e\u003cspan address=\"10.1016/j.plantsci.2010.12.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLovelace, A.H., Dorhmi, S., Hulin, M.T., Li, Y., Mansfield, J.W., Ma, W.: Effector Identification in Plant Pathogens. Phytopathology. \u003cb\u003e113\u003c/b\u003e(4), 637\u0026ndash;650 (Apr. 2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1094/PHYTO-09-22-0337-KD\u003c/span\u003e\u003cspan address=\"10.1094/PHYTO-09-22-0337-KD\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones, J.T., et al.: Dec., Top 10 plant-parasitic nematodes in molecular plant pathology, \u003cem\u003eMol. Plant Pathol.\u003c/em\u003e, vol. 14, no. 9, pp. 946\u0026ndash;961, (2013). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/mpp.12057\u003c/span\u003e\u003cspan address=\"10.1111/mpp.12057\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHolterman, M., et al.: Disparate gain and loss of parasitic abilities among nematode lineages. PloS One. \u003cb\u003e12\u003c/b\u003e(9), e0185445 (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0185445\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0185445\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVens, C., Rosso, M.-N., Danchin, E.G.J.: Identifying discriminative classification-based motifs in biological sequences. Bioinformatics. \u003cb\u003e27\u003c/b\u003e(9), 1231\u0026ndash;1238 (May 2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btr110\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btr110\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBlanc-Mathieu, R., et al.: Hybridization and polyploidy enable genomic plasticity without sex in the most devastating plant-parasitic nematodes. PLoS Genet. \u003cb\u003e13\u003c/b\u003e(6), e1006777 (Jun. 2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pgen.1006777\u003c/span\u003e\u003cspan address=\"10.1371/journal.pgen.1006777\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbad, P., et al.: Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat. Biotechnol. \u003cb\u003e26\u003c/b\u003e (2008). 8, Art. 8, Aug \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nbt.1482\u003c/span\u003e\u003cspan address=\"10.1038/nbt.1482\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRocha, M.D., et al.: Genome Expression Dynamics Reveal the Parasitism Regulatory Landscape of the Root-Knot Nematode Meloidogyne incognita and a Promoter Motif Associated with Effector Genes. Genes. \u003cb\u003e12\u003c/b\u003e(5), 771 (May 2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/genes12050771\u003c/span\u003e\u003cspan address=\"10.3390/genes12050771\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRocha, R.O., Hussey, R.S., Pepi, L.E., Azadi, P., Mitchum, M.G.: Discovery of Novel Effector Protein Candidates Produced in the Dorsal Gland of Adult Female Root-Knot Nematodes, \u003cem\u003eMol. Plant-Microbe Interactions\u003c/em\u003e, vol. 36, no. 6, pp. 372\u0026ndash;380, Jun. (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1094/MPMI-11-22-0232-R\u003c/span\u003e\u003cspan address=\"10.1094/MPMI-11-22-0232-R\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDavey, N.E., Cyert, M.S., Moses, A.M.: Short linear motifs \u0026ndash; ex nihilo evolution of protein regulation. Cell. Commun. Signal. \u003cb\u003e13\u003c/b\u003e(1), 43 (Nov. 2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12964-015-0120-z\u003c/span\u003e\u003cspan address=\"10.1186/s12964-015-0120-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRoberson, E.D.O.: Motif scraper: a cross-platform, open-source tool for identifying degenerate nucleotide motif matches in FASTA files, \u003cem\u003eBioinforma. Oxf. Engl.\u003c/em\u003e, vol. 34, no. 22, pp. 3926\u0026ndash;3928, Nov. (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/bty437\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/bty437\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMistry, J., et al.: Jan., Pfam: The protein families database in 2021, \u003cem\u003eNucleic Acids Res.\u003c/em\u003e, vol. 49, no. D1, pp. D412\u0026ndash;D419, (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkaa913\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkaa913\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJones, P., et al.: InterProScan 5: genome-scale protein function classification. Bioinformatics. \u003cb\u003e30\u003c/b\u003e(9), 1236\u0026ndash;1240 (May 2014). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btu031\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btu031\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUrban, M., Pant, R., Raghunath, A., Irvine, A.G., Pedro, H., Hammond-Kosack, K.E.: The Pathogen-Host Interactions database (PHI-base): additions and future developments. Nucleic Acids Res. \u003cb\u003e43\u003c/b\u003e, D645\u0026ndash;655 (Jan. 2015). no. Database issue10.1093/nar/gku1165\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThe UniProt Consortium:, UniProt: the Universal Protein Knowledgebase in 2023, \u003cem\u003eNucleic Acids Res.\u003c/em\u003e, vol. 51, no. D1, pp. D523\u0026ndash;D531, (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkac1052\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkac1052\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHaas, B.J., et al.: Sep., Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans, \u003cem\u003eNature\u003c/em\u003e, vol. 461, no. 7262, Art. no. 7262, (2009). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nature08358\u003c/span\u003e\u003cspan address=\"10.1038/nature08358\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data, \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 28, no. 23, pp. 3150\u0026ndash;3152, Dec. (2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/bts565\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/bts565\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmms, D.M., Kelly, S.: OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. \u003cb\u003e20\u003c/b\u003e(1), 238 (Nov. 2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-019-1832-y\u003c/span\u003e\u003cspan address=\"10.1186/s13059-019-1832-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBailey, T.L.: STREME: accurate and versatile sequence motif discovery, \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 37, no. 18, pp. 2834\u0026ndash;2840, Sep. (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btab203\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btab203\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrant, C.E., Bailey, T.L., Noble, W.S.: FIMO: scanning for occurrences of a given motif, \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 27, no. 7, pp. 1017\u0026ndash;1018, Apr. (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btr064\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btr064\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHowe, K.L., et al.: WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. \u003cb\u003e44\u003c/b\u003e (Jan. 2016). D1, pp. D774\u0026ndash;D780 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkv1217\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkv1217\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHowe, K.L., Bolt, B.J., Shafie, M., Kersey, P., Berriman, M.: WormBase ParaSite\u0026thinsp;\u0026ndash;\u0026thinsp;a comprehensive resource for helminth genomics. Mol. Biochem. Parasitol. \u003cb\u003e215\u003c/b\u003e, 2\u0026ndash;10 (Jul. 2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molbiopara.2016.11.005\u003c/span\u003e\u003cspan address=\"10.1016/j.molbiopara.2016.11.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMartin, J., et al.: Helminth.net: expansions to Nematode.net and an introduction to Trematode.net. Nucleic Acids Res. \u003cb\u003e43\u003c/b\u003e, D698\u0026ndash;D706 (Jan. 2015). no. D110.1093/nar/gku1128\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 22, no. 13, pp. 1658\u0026ndash;1659, Jul. (2006). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btl158\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btl158\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrynberg, P., et al.: Comparative Genomics Reveals Novel Target Genes towards Specific Control of Plant-Parasitic Nematodes. Genes. \u003cb\u003e11\u003c/b\u003e(11), 1347 (Nov. 2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/genes11111347\u003c/span\u003e\u003cspan address=\"10.3390/genes11111347\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePedregosa, F., et al.: Scikit-learn: Machine Learning in Python. Mach. Learn. PYTHON\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDavies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. \u003cb\u003ePAMI\u0026ndash;1\u003c/b\u003e(2), 224\u0026ndash;227 (Apr. 1979). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TPAMI.1979.4766909\u003c/span\u003e\u003cspan address=\"10.1109/TPAMI.1979.4766909\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCrooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E.: WebLogo: a sequence logo generator, \u003cem\u003eGenome Res.\u003c/em\u003e, vol. 14, no. 6, pp. 1188\u0026ndash;1190, Jun. (2004). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/gr.849004\u003c/span\u003e\u003cspan address=\"10.1101/gr.849004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNielsen, H.: Predicting Secretory Proteins with SignalP, in \u003cem\u003eProtein Function Prediction: Methods and Protocols\u003c/em\u003e, D. Kihara, Ed., in Methods in Molecular Biology. New York, NY: Springer, pp. 59\u0026ndash;73. (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-1-4939-7015-5_6\u003c/span\u003e\u003cspan address=\"10.1007/978-1-4939-7015-5_6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, \u003cem\u003eJ. Mol. Biol.\u003c/em\u003e, vol. 305, no. 3, pp. 567\u0026ndash;580, Jan. (2001). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1006/jmbi.2000.4315\u003c/span\u003e\u003cspan address=\"10.1006/jmbi.2000.4315\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCaillaud, M.-C., Favery, B.: In Vivo Imaging of Microtubule Organization in Dividing Giant Cell, in \u003cem\u003ePlant Cell Division: Methods and Protocols\u003c/em\u003e, M.-C. Caillaud, Ed., in Methods in Molecular Biology. New York, NY: Springer, pp. 137\u0026ndash;144. (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-1-4939-3142-2_11\u003c/span\u003e\u003cspan address=\"10.1007/978-1-4939-3142-2_11\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaouannet, M., Nguyen, C.-N., Quentin, M., Jaubert-Possamai, S., Rosso, M.-N., Favery, B.: In situ Hybridization (ISH) in Preparasitic and Parasitic Stages of the Plant-parasitic Nematode Meloidogyne spp. Bio-Protoc. \u003cb\u003e8\u003c/b\u003e(6), e2766 (Mar. 2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.21769/BioProtoc.2766\u003c/span\u003e\u003cspan address=\"10.21769/BioProtoc.2766\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMejias, J., et al.: The root-knot nematode effector MiEFF18 interacts with the plant core spliceosomal protein SmD1 required for giant cell formation. New. Phytol. \u003cb\u003e229\u003c/b\u003e(6), 3408\u0026ndash;3423 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/nph.17089\u003c/span\u003e\u003cspan address=\"10.1111/nph.17089\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoutemy, L.S., et al.: Oct., Structures of Phytophthora RXLR effector proteins: a conserved but adaptable fold underpins functional diversity, \u003cem\u003eJ. Biol. Chem.\u003c/em\u003e, vol. 286, no. 41, pp. 35834\u0026ndash;35842, (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1074/jbc.M111.262303\u003c/span\u003e\u003cspan address=\"10.1074/jbc.M111.262303\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchornack, S., et al.: Oct., Ancient class of translocated oomycete effectors targets the host nucleus, \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e, vol. 107, no. 40, pp. 17421\u0026ndash;17426, (2010). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.1008491107\u003c/span\u003e\u003cspan address=\"10.1073/pnas.1008491107\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRehmany, A.P., et al.: Jun., Differential recognition of highly divergent downy mildew avirulence gene alleles by RPP1 resistance genes from two Arabidopsis lines, \u003cem\u003ePlant Cell\u003c/em\u003e, vol. 17, no. 6, pp. 1839\u0026ndash;1850, (2005). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1105/tpc.105.031807\u003c/span\u003e\u003cspan address=\"10.1105/tpc.105.031807\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmaro, T.M.M.M., Thilliez, G.J.A., Motion, G.B., Huitema, E.: A Perspective on CRN Proteins in the Genomics Age: Evolution, Classification, Delivery and Function Revisited, \u003cem\u003eFront. Plant Sci.\u003c/em\u003e, vol. 8, Accessed: Jun. 01, 2023. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.frontiersin.org/articles/\u003c/span\u003e\u003cspan address=\"https://www.frontiersin.org/articles/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2017.00099\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2017.00099\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDanchin, E.G.J., et al.: Oct., Multiple lateral gene transfers and duplications have promoted plant parasitism ability in nematodes, \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e, vol. 107, no. 41, pp. 17651\u0026ndash;17656, (2010). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.1008486107\u003c/span\u003e\u003cspan address=\"10.1073/pnas.1008486107\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAspeborg, H., Coutinho, P.M., Wang, Y., Brumer, H., Henrissat, B.: Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5). BMC Evol. Biol. \u003cb\u003e12\u003c/b\u003e(1), 186 (Sep. 2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/1471-2148-12-186\u003c/span\u003e\u003cspan address=\"10.1186/1471-2148-12-186\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan, Z., Xiong, D., Schneiter, R., Tian, C.: The function of plant PR1 and other members of the CAP protein superfamily in plant\u0026ndash;pathogen interactions. Mol. Plant. Pathol. \u003cb\u003e24\u003c/b\u003e(6), 651\u0026ndash;668 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/mpp.13320\u003c/span\u003e\u003cspan address=\"10.1111/mpp.13320\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGibbs, G.M., Roelants, K., O\u0026rsquo;Bryan, M.K.: The CAP superfamily: cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 proteins\u0026ndash;roles in reproduction, cancer, and immune defense, \u003cem\u003eEndocr. Rev.\u003c/em\u003e, vol. 29, no. 7, pp. 865\u0026ndash;897, Dec. (2008). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1210/er.2008-0032\u003c/span\u003e\u003cspan address=\"10.1210/er.2008-0032\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLozano-Dur\u0026aacute;n, R., Robatzek, S.: 14-3-3 Proteins in Plant-Pathogen Interactions. Mol. Plant-Microbe Interact. \u003cb\u003e28\u003c/b\u003e(5), 511\u0026ndash;518 (May 2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1094/MPMI-10-14-0322-CR\u003c/span\u003e\u003cspan address=\"10.1094/MPMI-10-14-0322-CR\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWieczorek, K., et al.: Sep., A Distinct Role of Pectate Lyases in the Formation of Feeding Structures Induced by Cyst and Root-Knot Nematodes, \u003cem\u003eMol. Plant-Microbe Interactions\u003c/em\u003e, vol. 27, no. 9, pp. 901\u0026ndash;912, (2014). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1094/MPMI-01-14-0005-R\u003c/span\u003e\u003cspan address=\"10.1094/MPMI-01-14-0005-R\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHewezi, T., Baum, T.J.: Manipulation of plant cells by cyst and root-knot nematode effectors. Mol. Plant-Microbe Interact. MPMI. \u003cb\u003e26\u003c/b\u003e(1), 9\u0026ndash;16 (Jan. 2013). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1094/MPMI-05-12-0106-FI\u003c/span\u003e\u003cspan address=\"10.1094/MPMI-05-12-0106-FI\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong, H., et al.: Oct., The Meloidogyne javanica effector Mj2G02 interferes with jasmonic acid signalling to suppress cell death and promote parasitism in Arabidopsis, \u003cem\u003eMol. Plant Pathol.\u003c/em\u003e, vol. 22, no. 10, pp. 1288\u0026ndash;1301, (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/mpp.13111\u003c/span\u003e\u003cspan address=\"10.1111/mpp.13111\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNiu, J., et al.: Jan., Msp40 effector of root-knot nematode manipulates plant immunity to facilitate parasitism, \u003cem\u003eSci. Rep.\u003c/em\u003e, vol. 6, no. 1, Art. no. 1, (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/srep19443\u003c/span\u003e\u003cspan address=\"10.1038/srep19443\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Motif clustering , motif scoring , effectors , plant,parasite interaction , oomycetes , nematodes ","lastPublishedDoi":"10.21203/rs.3.rs-3931000/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3931000/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eMotivation:\u003c/b\u003e\u003c/p\u003e \u003cp\u003ePlant pathogens cause billions of dollars of crop loss every year and are a major threat to global food security. Identifying and characterizing pathogens effectors is crucial towards their improved control. Because of their poor sequence conservation, effector identification is challenging, and current methods generate too many candidates without indication for prioritizing experimental studies. In most phyla, effectors contain specific sequence motifs which influence their localization and targets in the plant. Therefore, there is an urgent need to develop bioinformatics tools tailored for pathogens effectors.\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e \u003cp\u003eTo circumvent these limitations, we have developed MOnSTER a novel tool that identifies \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eclu\u003c/span\u003esters of \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003em\u003c/span\u003eotifs of \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003ep\u003c/span\u003erotein \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003es\u003c/span\u003eequences (CLUMPs). MOnSTER can be fed with motifs identified by \u003cem\u003ede novo\u003c/em\u003e tools or from databases such as Pfam and InterProScan. The advantage of MOnSTER is the reduction of motif redundancy by clustering them and associating a score. This score encompasses the physicochemical properties of AAs and the motif occurrences. We built up our method to identify discriminant CLUMPs in oomycetes effectors. Consequently, we applied MOnSTER on PPN and identified six CLUMPs in about 60% of the known nematode candidate parasitism proteins. Furthermore, we found co-occurrences of CLUMPs with protein domains important for invasion and pathogenicity. The potentiality of this tool goes beyond the effector characterization and can be used to easily cluster motifs and calculate the CLUMP-score on any set of protein sequences.\u003c/p\u003e\u003cp\u003e\u003cb\u003eAvailability and implementation:\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe source python code and related data are available at: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Plant-Net/MOnSTER_PROMOCA.git\u003c/span\u003e\u003cspan address=\"https://github.com/Plant-Net/MOnSTER_PROMOCA.git\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e","manuscriptTitle":"Identification and characterization of specific motifs in effector proteins of plant parasites using MOnSTER","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-23 10:57:18","doi":"10.21203/rs.3.rs-3931000/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"communications-biology","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"commsbio","sideBox":"Learn more about [Communications Biology](http://www.nature.com/commsbio/)","snPcode":"","submissionUrl":"","title":"Communications Biology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Communications Series","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"a6640274-0cd3-49cd-9a42-d9c87c990014","owner":[],"postedDate":"February 23rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":28901555,"name":"Biological sciences/Computational biology and bioinformatics/Protein analysis/Protein sequence analyses"},{"id":28901556,"name":"Biological sciences/Computational biology and bioinformatics/Software"}],"tags":[],"updatedAt":"2024-06-27T21:25:19+00:00","versionOfRecord":[],"versionCreatedAt":"2024-02-23 10:57:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3931000","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3931000","identity":"rs-3931000","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.