FungiRegEx: A tool for patterns identification in Fungal Proteomic sequences using regular expressions

doi:10.21203/rs.3.rs-3852782/v1

FungiRegEx: A tool for patterns identification in Fungal Proteomic sequences using regular expressions

2024 · doi:10.21203/rs.3.rs-3852782/v1

preprint OA: closed CC-BY-NC-4.0

📄 Open PDF Full text JSON View at publisher

Full text 105,411 characters · extracted from preprint-html · click to expand

FungiRegEx: A tool for patterns identification in Fungal Proteomic sequences using regular expressions | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article FungiRegEx: A tool for patterns identification in Fungal Proteomic sequences using regular expressions Victor Terron-Macias, Jezreel Mejía-Miranda, Miguel Canseco-Pérez, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3852782/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract In the context of genome-scale research, it is imperative to automatically analyze numerous species and sub-species to discern distinctive features present in multiple proteomes that contain specific sequences of interest since they provide specific properties. Complex sequences must be recognized within an organism’s complete set of proteomes to accomplish this. This study introduces FungiRegEx, a user-friendly software for automatic genome-scale proteome analysis of fungi organisms, addressing the limitations of existing tools. FungiRegEx utilizes real-time data retrieval of the different species from the JGI Mycocosm database without downloading any files. With a user-friendly GUI, the tool offers efficient regular expression searches across 2,402 fungal species from the JGI Mycocosm portal. Validation with the sequence AXSXG or effector RXRL demonstrates FungiRegEx’s effectiveness in identifying user-defined patterns in the retrieved sequences. FungiRegEx accelerates result retrieval compared to manual processes, providing a console-free and programming-free experience; this tool allows customization, result filtering, and the possibility of saving the results for future research. FungiRegEx offers a promising solution for researchers exploring specific sequences in the fungal proteomes. It combines speed, adaptability, and ease of use, displaying the results in a GUI and making it easy to read. Its architecture ensures optimized resource usage and deployment flexibility, allowing the customization of specific software parameters. The tool’s potential for future research and exploration is emphasized, providing a nuanced perspective on its practical use within the fungal genomics community. Biological sciences/Computational biology and bioinformatics/Software Biological sciences/Computational biology and bioinformatics/Data acquisition Biological sciences/Computational biology and bioinformatics/Data processing Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Background Understanding the characteristics of the species of the fungi phylogenetic tree requires the identification of certain sequences in the proteomes, which in turn may be correlated with the environment and its conditions 1 . The phylogenetic analysis provides a framework to develop research and identify multiple similarities and conservation zones. To identify and characterize proteins, a detailed analysis of proteomes is essential. Human experts can perform this task, but the analysis is challenging on a large scale. A critical aspect of such automatic processes is the efficient traversal of proteomes. In the biological field, hard-coded algorithms mostly traverse phylogenetic trees, and some resources, such as grep 2 , which is a text-processing program designed for regular pattern matching within the text, allow the search of regular expressions in a string; this string can be a proteome or any other type of text sequence; some requisites of grep are: If the user runs Linux as the Operating System, this tool is included; if not, the user must acquire a similar tool for the respective Operating System that performs the same functions as grep. Knowledge about how to use the bash to execute this tool from a terminal. Knowledge about how to use the wildcards and syntax to perform the search. Note: This program has no GUI to display the results. Another resource to find regular expressions is msgfdb2pepxml 3 ; this resource is a library that converts the output from the MS-GFDB search engine to pepXML, uses regular expressions to recognize enzyme uses and cleavage rules, and supports PSI-MS, in order to execute this library requires: Knowledge in Python programming language. Knowledge of the syntax of the library to use it correctly. Note As this is a library, we do not have a GUI to present the results directly. Another resource to find regular expressions is PhyloPattern 4 , which is a library focused on identifying regular expressions in phylogenetic trees; this library is not focused on proteomes or any other biological sequence, also to execute this tool requires: Prolog syntax knowledge. Install in Prolog engine the library PhyloPattern. Knowledge in Prolog programming language. Note knowledge another critical aspect is that this library does not provide a GUI directly. Another resource is PatScan 5 ; which is a program focused on searches for protein or nucleotide sequences of a pattern (regular expression). In order to execute this program the requisites are: Compile the source files. A FASTA file to perform the search on. Knowledge in terminal use and syntax. Another resource to identify repeats using regular expressions is Patscan 5 and PatMatch 6 . These programs do not automate searching for patterns within the sequences because they require the user to write the complete sequence in which they want to search for the pattern; entering all the sequences of a genome or proteome can take a long time for the size and amount of elements. PatMatch requires: Access to this tool through their website. Knowledge about the pattern syntax to perform the search. Notes: [ 1 ] Focuses only on peptide and nucleotide sequences. [ 2 ] The length limitation to search for is less than 20 residues. [ 3 ] Only can process one sequence by time. The process of searching for regular expressions is notably time-intensive if the sequences are introduced to any software manually. However, the programs and libraries already mentioned could require specific knowledge, like commands, bash, computer, download of files, or programming knowledge; this seriously complicates the process of searching for regular expressions on a large scale for users without proficiency in using programs or libraries that help analyze biological sequences. Tools like those already mentioned above are examples of software that offers a simple pattern-matching system, which may or may not include a GUI, and its implementation could be challenging. Also, the information source needed to function must be trustworthy. Fortunately, the Joint Genome Institute (JGI) 7 offers biological sequences like DNA or Protein/Nucleotide sequences with the certainty that all the information has been validated and is trustworthy. In that context, software that can be easily integrated into automatic genome-scale processes to read and analyze proteomes on a large scale, detect matches, and save considerable time without downloading files is now needed. In this way, we present FungiRegEx, a software that takes the available information from the JGI Mycocosm portal and performs a search into the proteome databases of the multiple species with the user-defined regular expression through its web scraper module integrated into the tool; also, it integrates a GUI with a user-friendly interface, to use it it is not necessary for the user to install any additional components, download additional files or have solid programming knowledge in any programming language. This software helps to the recognition of repeated sequences, which holds substantial significance, as it offers valuable insights into the functional and evolutionary roles of diverse organisms 8,9 , driving evolution, inducing variation, and regulating gene expression 10 . These patterns can be important for identifying certain protein functions or key structural regions. For example, searching for protein sequences containing a specific pattern can help identify proteins that bind to certain ligands or have specific enzymatic activity 11 . Searching for repetitive patterns in protein sequences can also help to identify evolutionarily related proteins, which can provide information about the evolution of proteins and their functions over time 12 . Numerous software applications are accessible for the identification of various types of repeats. Nonetheless, no one has focused on FUNGI pattern detection through web scrapping; FungiRegEx does it. FungiRegEx employs a straightforward sequential search method to identify regular expressions directly from the protein sequences of FUNGI organisms. Diverging from the prevalent approach of employing a suffix tree or alignment matrix as a primary data structure, the algorithm introduced in this paper operates by directly identifying regular expressions within the protein sequence. As a result, this methodology exhibits efficiency in memory usage due to launching and running the scraper instances, boasts enhanced comprehensibility and ease of implementation, and offers great speed in getting multiple sequences at the same time compared to if the process were carried out manually or using tools like PatMatch 6 that requires to introduce one by one sequence. Also, FungiRegEx does not require downloading any fasta or file. Another relevant aspect of the tool is that it includes a GUI, which means the user does not need to have a strong knowledge of any programming language or commands to use it, compared to if grep or another tool that requires a bash interface were used. Also, the scrapper module is customizable to adapt it to the resources of the computer or server where it is executed (in case the user wants a greater or lesser number of scraper instances). Finally, this tool could be deployed on a server or a computer if the user wants to. Various tools and resources, such as grep, msgfdb2pepxml, PhyloPattern, and PatScan, exist for searching regular expressions in biological sequences. However, these tools often require specific knowledge, limiting accessibility for users without programming proficiency. The process of manually introducing sequences to these tools is time-intensive and complex. Addressing these challenges, FungiRegEx is introduced as a user-friendly software designed for automatic genome-scale proteome analysis. Integrated with a web scraper module, FungiRegEx efficiently searches user-defined regular expressions in the proteome databases of multiple fungal species sourced from the JGI Mycocosm database. Notably, FungiRegEx stands out by providing a GUI, eliminating the need for additional downloads or programming knowledge. Materials and methods Architecture of FungiRegEx FungiRegEx front-end is based on React JS 17.0.2v, is a JavaScript library that is both available and open-source, designed for constructing interfaces 13 , and Node JS 16.17v, serving as the back-end, is a JavaScript runtime constructed upon the V8 JavaScript engine 14 , Chromium version 79.0.3945.117, an open-source browser project, is dedicated to creating a more secure, expeditious, and dependable means for users to engage with the web 15 . A collection of React JS components was created to execute these functions: provide an interactive GUI (see Fig. 1 ) to choose specific parameters to perform a search through the NodeJS back-end with the user-defined Regular Expression into proteomes, pick the species or species into fungi organisms to perform the search, visualize the results of the search, and download the results in CSV format if the user wants to, also the Node JS back-end was designed to perform these tasks: launch the scrapper instances into the JGI Mycocosm database, with the obtained information of every instance the regular expression is looked for into the proteome, to save RAM memory the backend reuses each instance of Chromium once they have obtained the information, in case of error an automatic restart of the instance is performed. FungiRegEx works as follows: first, the user selects the type of search he wants to perform globally into all 2,402 different species of fungi or a specific species ( This means that new taxonomic additions will not be available in the software, unless that the user add it. ). Second, the user selects to scrap the regular expression into a range of identifiers or a list of identifiers in the database. Third, to start the search, the retrieved sequences of proteomes are scrapped into the Joint Genome Institute through the Node JS script and displayed in the table through React JS; the table can be ordered through alphabetical order of Specie or the number of matches into the proteome in ascending or descending order, the last column of the table displays the coincidences. Fourth, the results of the table can be downloaded in an output file in CSV format. FungiRegEx is distributed as a compressed file in ZIP format. The source code is available for download at SourceForge and https://github.com/Maigolinox/fungiregex GitHub repository. Once the FungiRegEx directory has been uncompressed, it shows the FungiRegEx directory in which the results.csv file is empty, and package.json contains all the instructions to execute the web application correctly. The user must download each prerequisite from the platforms indicated in Table 1 and the documentation and uncompress and install them for the correct execution of FungiRegEx. The command to execute the front-end of FungiRegEx is npm run start:frontend , and the command to execute the back-end of FungiRegEx is npm run start:backend . Also, you can find more instructions in the documentation according to your operating system. Validation of FungiRegEx results FungiRegEx was challenged by querying multiple databases of various species available on the JGI Mycocosm website., the results were validated with the search of the sequence AXSXG as a regular expression, which is a pentapeptide of a lipase group that brings thermostability and resistance to solvents of an enzyme 16 that has been little described in fungi. Similarly, the results that show FungiRegEx have been validated with the next considerations: Specie: Saccharomyces cerevisiae (SacceM3836_1) 17–20 . AXSXG pentapeptide where X represents whatever amino acid. To bring some results, we perform the search with the next parameters: A.S.G, where . in regular expressions notation means whatever amino acid. Search in a specific range: 1 to 2000. This means that FungiRegEx will launch the scrapper instances to retrieve the data in the JGI Mycocosm database from 2000 proteomes. After running the search, the tool showed that of the 2000 scraped sequences, only one with identifier 1434 has that pentapeptide only once, while it also identified matches in 281 sequences with similarity. Figure 2 shows the results of the JGI Mycocosm database and Fig. 3 shows the match with the tool: As mentioned, the tool also looks into the complete sequence for other similarities according to the regular expression (see Fig. 4 ). We hide the proteome column for the image size to show the results of FungiRegEx with the mentioned parameters. In this way, the tool proves to be capable of finding the regular expression in the search proteome determined by the user, which may be of interest for subsequent studies. A second use case executed to validate FungiRegEx functionality involves the search for effectors. Liping Liu et al. 22 identified different effectors, such as the RXLR, asserting that fungi, oomycetes, and bacteria release small secreted proteins crucial for symbiotic interaction and pathogenicity. Liping Liu investigated various effectors in different species, such as Mg3LysM (Mycosphaerella graminicola LysM), secreted by Mycosphaerella graminicola 23,24 . For this example, the JGI Mycocosm database of Mycosphaerella graminícola v2.0 will be used with the RXLR sequence, where X represents any amino acid. In regular expression language, the regular expression is R.LR. Figure 5 shows the results of FungiRegEx hiding the proteome column due to the size of the proteomes to show the results in the figure. As shown in Fig. 5 , FungiRegEx can find the effector of interest in the proteome, showing the number of matches and the sequences. In addition to the above, it is identified that the species Mycosphaerella graminicola does indeed have this effector (RXRL), as stated by Liping Liu et al. 22 . Implementation Regular expressions search for FUNGI organisms is based on finding exact repeats of length k along the amino acid chain. The regular expression can be as long as the user wants. If the protein sequence of length k is diminutive, the comparative process proceeds with greater expeditiousness. Once the regular expression is found in the protein sequence of the FUNGI organism, it is filtered to eliminate those that do not match. Searching for regular expression matches The application back-end begins its search by creating a regular expression object. Regular expressions are patterns used to search for character combinations in text strings. Regular expressions can contain various special characters and modifiers that define the pattern to search for 25 . The magnitude of the search range directly impacts the algorithm’s processing time; in this way, smaller ranges are preferred for optimal efficiency. If matches between regular expression and protein sequence are found The aforementioned search process identifies the regular expression within the amino acid chain with a length of k, as illustrated in Fig. 3 . As the detected repeats do not accurately reflect the true length of the repeating pattern, they need to be expanded to match the actual repeat length. In this paper, the chosen approach involves reading the characters located to the left or right of all repeats and storing them in an array. If no matches between regular expression and protein sequence are found If no matches are found, the algorithm will continue the search in another amino acid chain of the proteome. The search algorithm can progress in larger intervals without overlooking any repetitions. The crucial factor in progressing with larger intervals lies in ensuring that the search algorithm never overlooks any matches. Processing speed An approximation of the speed will take the algorithm to process the regular expression given by the next mathematical formula. Reducing the size of search intervals can improve processing speed. However, the algorithm’s speed may decrease if the intervals become too small. This is because smaller intervals can cause the program to spend more time launching browser instances than acquiring information and performing the search for the regular expression. With 200 puppeteer instances, 50,000 results are obtained in approximately 139 minutes, as can be seen in Fig. 7 with the estimated time that indicates the monitor (Depending on the resources of the computer, the computer can consult at least seven pages per second; this means that the 50,000 results can be consulted in only 12 minutes); it also clarifies that this depends on the available computer resources. It should be mentioned that just requesting the JGI Mycocosm database and getting a response on the webpage takes around 6.64 seconds, as shown in Fig. 8 . Results The FungiRegEx tool was constructed to identify the regular expression in Fungal proteomes, the results are listed: Efficiency in Proteomic Sequence Analysis: FungiRegEx is demonstrated as a tool that streamlines the process of analyzing proteomic sequences of 2,402 species available in the JGI Mycocosm portal. Real-time Data Retrieval without FASTA Downloads: The FungiRegEx scrapper module eliminates the need to download any FASTA files for proteomic analysis. FungiRegEx dynamically requests data from the JGI Mycocosm website, ensuring real-time access to information. Accelerated Results Retrieval: FungiRegEx exhibits a significant increase in result retrieval speed compared to manual proteomic data extraction on the JGI Mycocosm website. Optimized Computational Resource Usage: FungiRegEx effectively utilizes the computational resources available on the user’s computer, demonstrating efficiency in resource management. Adaptability and Customization: Users can easily adapt FungiRegEx to run on any computer, allowing customization of the number of scrapper instances the program utilizes. User-Friendly GUI Presentation: The software features a GUI that presents results without requiring users to code or possess specialized knowledge like the use of bash. Platform Independence: FungiRegEx operates seamlessly without needing a specific operating system, providing users with flexibility in execution. Local and Server Deployment Options: The tool can be launched locally or deployed on a server, giving users the autonomy to choose the preferred execution mode. Efficient Result Filtering: FungiRegEx offers features for filtering results, facilitating the identification of specific sequences and enhancing result interpretation. Console-Free User Experience: Users can interpret results without needing a console, enhancing accessibility compared to other tools that may require console interaction. Simplified Search Syntax with User-Defined Parameters: FungiRegEx allows users to input any sequence of interest for searching across the entire proteome or within a specific range. This customization is particularly useful for identifying sequences in specific species, potentially contributing valuable insights for further research. Potential for Future Research: The identification of the user-defined regular expression on certain proteomes allows the user to save those results for further researchs, as we could see in the validation section of this research with the AXSXG example sequence where that sequence was of interest due the properties that it provides. These identified sequences may or may not exhibit specific characteristics, paving the way for future research and exploration. Details of the finding sequence: The tool indicates the number of sequences that match the regular expression and shows the exact match. In conclusion, FungiRegEx enhances the efficiency of proteomic sequence analysis and provides a user-friendly, adaptable, and customizable tool with advanced features for result interpretation and exploration in subsequent research endeavors. The programs listed in Table 1 accomplished the FungiRegEx function. Table 1 Components of FungiRegEx that accomplish the functionality. Program Version NodeJS 16.17.0v ReactJS 18.0.0v Chromium 79.0.3945.117v Nodemon 2.0.20v Axios 0.27.2v ExpressJS 4.17.13v Puppeteer cluster 0.23.0v Regex Parser 2.2.11v Cheerio 1.0.0-rc.12v CORS 2.8.5v DotEnv 16.0.1v Flatted 3.2.6v Discussion In this section, we delve into a comprehensive discussion of the essential findings and limitations associated with FungiRegEx, a tool designed to streamline the analysis of proteomic sequences from the JGI Mycocosm database. The results highlight the tool’s efficiency in handling a vast dataset of 2,402 Fungi species, offering accelerated results retrieval and optimized utilization of computational resources. Additionally, FungiRegEx introduces user-friendly features such as real-time data retrieval, adaptability for customization, and a graphical user interface (GUI), presenting a promising solution for researchers engaged in proteomic analysis. However, as with any technological advancement, the discussion also addresses certain limitations and considerations, providing valuable insights into the practical use of FungiRegEx. From speed considerations to potential IP blocking risks and task limitations, the ensuing dialogue aims to assess these aspects critically, offering researchers a nuanced perspective on the tool’s capabilities and areas for potential improvement. The results are presented in Table 2 and Table 3 ; the information was divided into two tables due to their size. Table 2 Advantages and limitations of FungiRegEx. Advantages Limitations FungiRegEx has demonstrated remarkable efficiency in analyzing proteomic sequences across the extensive dataset of 2,402 species on the JGI Mycocosm portal. This efficiency is crucial in accelerating research pro- cesses and enabling broader research. Users are required to enter the characteristics for newly added Fungi species manually. Failure to do so re- stricts users to the initially registered 2,402 species, potentially limiting the software’s applicability to fu- ture taxonomic additions. Eliminating the need for FASTA file downloads through the FungiRegEx scrapper module signifies a significant advancement. Real-time data retrieval from the JGI Mycocosm website enhances the imme- diacy of access to the latest proteomic information, contributing to the timeliness of research outcomes. To use the tool and get the available information from the website internet connection is needed, also the time to get the information depends of the user connection, limiting the speed to perform the search. [ 1 ]The observed increase in result retrieval speed with FungiRegEx compared to manual extraction on the JGI Mycocosm website underscores the tool’s potential to save researchers valuable time. This acceleration in the data retrieval process could have substantial implications for large-scale studies. The speed of FungiRegEx is influenced by internet speed, available computational resources, and the num- ber of deployed scrapper instances. Balancing these factors is crucial, as fewer instances reduce resource consumption but extend task completion time. [ 2 ] Deploying many scrapper instances (over 100) poses the risk of temporary IP blocking by the JGI My- cocosm website. This limitation necessitates a careful balance to prevent potential issues with site access. The effective utilization of computational resources by FungiRegEx highlights its efficiency in resource management. This optimization ensures that the tool operates smoothly, providing a seamless user experi- ence. The resources that FungiRegEx uses must necessarily be configured by the user by changing the parameters of FungiRegEx. The adaptability of FungiRegEx for use on any com- puter and the ability to customize the number of scrap- per instances enhances its versatility. Users can tailor the tool to suit their specific computational capabilities and requirements. [1] The resources that FungiRegEx uses must neces- sarily be configured by the user. [2] FungiRegEx currently supports only one task at a time. If multiple users attempt simultaneous searches, the tool prioritizes the latest task, potentially deleting the previous user’s task. This limitation may impact concurrent usage scenarios. A user-friendly GUI in FungiRegEx simplifies result presentation, making it accessible to users without specialized coding knowledge. This feature promotes ease of use in the scientific community. The results table of FungiRegEx shows the complete proteome in a single line, making it difficult to read the table in very long proteomes; for this reason, the GUI can be improved to facilitate the reading and presentation of the results. The platform independence of FungiRegEx, coupled with the choice between local and server deployment, provides users with flexibility. Researchers can choose the execution mode that aligns with their preferences and available infrastructure. While FungiRegEx offers flexibility in deployment, it is recommended for users to install the application locally. This recommendation emphasizes that Fun- giRegEx is single task. Table 3 Advantages and limitations of FungiRegEx. Advantages Limitations The tool’s provision for user-defined parameters in the search syntax, including the ability to specify se- quences in certain species, enhances its applicability for diverse research scenarios. The user needs to know how many proteomes to search for, that is, the length of the proteome of the species or species of interest. As demonstrated in the validation section, identifying and filtering the user-defined regular expression into the results table of FungiRegEx for future investiga- tions highlights the tool’s potential for ongoing and future research. This capability allows users to explore specific sequences of interest and their potential char- acteristics inside of the already obtained results. The user must already have the sequence of interest that they wish to search for in the proteome of a certain species. FungiRegEx makes it possible to store the results in CSV format. FungiRegEx does not include an internal database to store the information; this means that if the user does not save the results, all the search of the regular ex- pression inside the scraped proteomes will be lost. The tool’s provision of information on the number of matching sequences and the exact match further enhances result transparency and aids researchers in refining their analyses. Although the tool locates the sequences matching the regular expression in the proteome when listing the proteome in the results table, the matches do not have a differentiator that facilitates their location, such as changing the font color only in the matching part; this makes it challenging to locate the matching sequences in the proteome. Declarations Acknowledgements None. Funding Dr. Miguel Angel Canseco Pérez supported the development of FungiRegEx with personal financial resources. Availability of data and materials The data used or analyzed during the current software tool are available from the corresponding author in the Joint Genome Institute database. Ethics approval and consent to participate Not applicable. Competing interests The authors declare that they have no competing interests. Consent for publication Not applicable. Authors’ contributions VMTM, MACP: Investigation, Web-Application Development, Writing, Resources, Project administration, Formal Analysis, Validation. JMM, MAMM, MTH: Writing —Review & editing Availability and Requirements Project name: FungiRegEx. Project home page: https://sourceforge.net/projects/fungiregex/ Operating system(s): Windows, Linux. Programming language: JavaScript. License: Creative Commons Attribution Non-Commercial License V2.0. Any restrictions to use by non-academics: None. References Lucia Muggia, K. S., Claudio G. Ametrano & Tesei, D. An overview of genomics, phylogenomics and proteomics approaches in ascomycota. MDPI Life 10 , 356, DOI: 10.3390/life10120356 (2020). Bull, R., Trevors, A., Malton, A. & Godfrey, M. Semantic grep: regular expressions + relational abstraction. In Ninth Working Conference on Reverse Engineering, 2002. Proceedings. , 267–276, DOI: 10.1109/WCRE.2002.1173084 (2002). Boris Nagaev, K. Y. & Palmblad, M. msgfdb2pepxml (2011). Philippe Gouret, J. D. T. & Pontarotti, P. Phylopattern: regular expressions to identify complex patterns in phylogenetic trees. BMC Bioinforma. 10 , 298, DOI: 10.1186/1471-2105-10-298 (2009). Dsouza M, O. R., Larsen N. Searching for patterns in genomic data. Trends genet 13 , 497–498, DOI: 10.1016/s0168-9525 (1997). Yan T, B. T. M. L. W. D. W. S. C. J. R. S., Yoo D. Patmatch: a program for finding patterns in peptide and nucleotide sequences. Nucleic Acids 13 , 262–266, DOI: 10.1093/nar/gki368 (2005). JGI, J. G. I. About us (2022). Achaz G, N. P. R. E., Coissac E. Associations between inverted repeats and the structural evolution of bacterial genomes. Genetics 164 , 1279–1289, DOI: 10.1093/genetics/164.4.1279 (2003). van Belkum A, v. A. L. V. H., Scherer S. Short-sequence dna repeats in prokaryotic genomes. Microbiol Mol Biol Rev 62 , 275–293, DOI: 10.1128/MMBR.62.2.275-293.1998 (1998). Xingyu Liao, J. Z. H. L. X. X. B. Z., Wufei Zhu & Gao, X. Repetitive dna sequence detection and its role in the human genome. Commun. biology 6 , 954, DOI: 10.1038/s42003-023-05322-y (2023). Daniel Barry Roche, D. A. B. & McGuffin, L. J. Proteins and their interacting partners: An introduction to protein-ligand binding site prediction methods. Int. J. Mol. Sci. 16 , DOI: 10.3390/ijms161226202 (2015). Matthew Merski, J. L. J. S. S. D.-H. . M. W. G., Krzysztof Młynarczyk. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinforma. 21 , DOI: 10.1186/s12859-020-3493-y (2020). Meta Platforms, F. O. S. Getting started, what is react and documentation (2020). Foundation, O. Getting started, what is node js and documentation (2020). LLC, G. Getting started, what is chromium and documentation (2020). Denise Esther Gutiérrez-Domínguez, M. M. R.-A. J. N. A. T. I. I.-F. M. C.-P., Bartolomé Chí-Manzanero & Canto-Canché, B. Identification of a novel lipase with ahsmg pentapeptide in hypocreales and glomerellales filamentous fungi. Int. J. Mol. Sci. 23 , 9367, DOI: 10.3390/ijms23169367 (2022). Cherry JM, A. C.-B. R. B. G.-C. E. C. K. C. M. D. S. E. S. F. D. H. J. H. B. K. K. K. C. M. S. N. R. P. J. S. M. S. M. W. S. W. E., Hong EL. New data and collaborations at the saccharomyces genome database: updated reference genome, alleles, and the alliance of genome resources. Genetics DOI: 10.1093/genetics/iyab224 (2022). Stephen F. Altschul, A. A. S. J. Z. Z. Z. W. M., Thomas L. Madden & Lipman, D. J. Gapped blast and psi-blast: a new generation of protein database search programs, DOI: 10.1093/nar/25.17.3389 (1997). Alejandro A. Schaffer, T. L. M. S. S. J. L. S. Y. I. W. E. V. K., L. Aravind & Altschul, S. F. Improving the accuracy of psi-blast protein database searches with composition-based statistics and other refinements. Nucleic Acids Res DOI: 10.1093/nar/29.14.2994 (2001). Edith D Wong, S. A. K. K. R. S. N. M. S. S. S. W. S. R. E. J. M. C., Stuart R Miyasato. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. Genetics DOI: 10.1093/genetics/iyac191 (2023). Steven D. Brown, C. M. J. A. C. A. A. A. S. A. S., Dawn M. Klingeman. Genome sequences of industrially relevant saccharomyces cerevisiae strain m3707, isolated from a sample of distillers yeast and four haploid derivatives. ASM Journals - Genome Anouncements 1 , DOI: 10.1128/genomeA.00323-13 (2013). Liping Liu, Q. J. R. P. R. O. W. Z., Le Xu & Wu, C. Arms race: diverse effector proteins with conserved motifs. Plant Signal. & Behav. 14 , 1557008, DOI: 10.1080/15592324.2018.1557008 (2019). PMID: 30621489, https://doi.org/10.1080/15592324.2018.1557008. Marshall, R. et al. Analysis of Two in Planta Expressed LysM Effector Homologs from the Fungus Mycosphaerella graminicola Reveals Novel Functional Properties and Varying Contributions to Virulence on Wheat. Plant Physiol. 156 , 756–769, DOI: 10.1104/pp.111.176347 (2011). Lee, W.-S., Rudd, J. J., Hammond-Kosack, K. E. & Kanyuka, K. Mycosphaerella graminicola lysm effector-mediated stealth pathogenesis subverts recognition through both cerk1 and cebip homologues in wheat. Mol. Plant-Microbe Interactions 27 , 236–243, DOI: 10.1094/MPMI-07-13-0201-R (2014). web docs, M. Regular expressions (2023). Additional Declarations No competing interests reported. Supplementary Files AXSXGresults.csv RXLRresults.csv derechosAutorFungiRegEx.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3852782","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":267664840,"identity":"835d609c-4570-4577-a917-7c1c61a2e09b","order_by":0,"name":"Victor Terron-Macias","email":"","orcid":"","institution":"Centro de Investigación en Matemáticas CIMAT, A.C.","correspondingAuthor":false,"prefix":"","firstName":"Victor","middleName":"","lastName":"Terron-Macias","suffix":""},{"id":267664841,"identity":"31d7eb35-841a-49e0-a1af-3b1b0080cbed","order_by":1,"name":"Jezreel Mejía-Miranda","email":"","orcid":"","institution":"Centro de Investigación en Matemáticas CIMAT, A.C.","correspondingAuthor":false,"prefix":"","firstName":"Jezreel","middleName":"","lastName":"Mejía-Miranda","suffix":""},{"id":267664842,"identity":"4561f1d5-d544-4816-9452-2bf64f932fbe","order_by":2,"name":"Miguel Canseco-Pérez","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABE0lEQVRIiWNgGAWjYBADOTYIfYAHKsBMUIsx6VoSG6BaYAK4tei2H3728EeFXXof+wE26YKaOzL8s5sff2CosE5sEDtjgE2L2Zk0c2OeM8m5bTwJbNIzjj3jkbhzzEyC4Ux6YoN0WgJWLTcYzKQZ2w7ktjEAtfA2HOZhuJFgxsDYdhioJfkAdi3s3yR//juQzsb/AKJF/kb65w+M/0BaYB5E18JjJsHbcCCBTQJqi8GNHAMJxgY8tpzJKZPmOZZs2CbxsNl6xrHDPIZ3zpRJJBxLN27D5Zfjx7dJ/qixk5fvTz54u6DmsL3c7fbNHz7UWMv2S+dgDTEkwNgAiQsJIAYZz0ZAPRggtIyCUTAKRsEoQAIADA9egwaAu6IAAAAASUVORK5CYII=","orcid":"","institution":"Universidad Politécnica de Chiapas","correspondingAuthor":true,"prefix":"","firstName":"Miguel","middleName":"","lastName":"Canseco-Pérez","suffix":""},{"id":267664843,"identity":"1f14069a-26ee-49e3-a13b-ca32688a86f6","order_by":3,"name":"Mirna Muñoz-Mata","email":"","orcid":"","institution":"Centro de Investigación en Matemáticas CIMAT, A.C.","correspondingAuthor":false,"prefix":"","firstName":"Mirna","middleName":"","lastName":"Muñoz-Mata","suffix":""},{"id":267664844,"identity":"0a95acdc-ccf6-44bc-a0b6-6e8299793dff","order_by":4,"name":"Miguel Terron-Hernández","email":"","orcid":"","institution":"Universidad Tecnológica de Tlaxcala","correspondingAuthor":false,"prefix":"","firstName":"Miguel","middleName":"","lastName":"Terron-Hernández","suffix":""}],"badges":[],"createdAt":"2024-01-11 08:19:44","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3852782/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3852782/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":49820863,"identity":"2dd49658-508e-4282-940d-5114774fdfa7","added_by":"auto","created_at":"2024-01-18 14:50:13","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":112966,"visible":true,"origin":"","legend":"\u003cp\u003eGUI of FungiRegEx.\u003c/p\u003e","description":"","filename":"Figure1GUIofFungiRegEx.png","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/57acac21b1a39bb3f5bbdb47.png"},{"id":49820864,"identity":"a97e6d56-2601-4cee-92b4-e516b25c60d2","added_by":"auto","created_at":"2024-01-18 14:50:14","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":293183,"visible":true,"origin":"","legend":"\u003cp\u003eRetrieved sequence of protein Saccharomyces cerevisiae M3836 v1.0. with ID 1434.\u003ca href=\"#_bookmark27\"\u003e\u003csup\u003e21\u003c/sup\u003e\u003c/a\u003e\u003c/p\u003e","description":"","filename":"Figure2Retrievedsequenceofprotein.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/35443a72e521419b146d2487.jpg"},{"id":49821543,"identity":"8016cff9-9fb1-42f8-927c-1eb280ac685a","added_by":"auto","created_at":"2024-01-18 15:06:14","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":156216,"visible":true,"origin":"","legend":"\u003cp\u003eResults from FungiRegEx using A.S.G regular expression filtering by the specific Expression: AHSMG.\u003c/p\u003e","description":"","filename":"Figure3ResultsfromFungiRegExfilteringbythespecific.jpg","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/72b583b219af7341af321728.jpg"},{"id":49821078,"identity":"f4724f90-433e-42a1-aa44-269f4851661b","added_by":"auto","created_at":"2024-01-18 14:58:14","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":134139,"visible":true,"origin":"","legend":"\u003cp\u003eResults from FungiRegEx hiding the proteome column using A.S.G regular expression.\u003c/p\u003e","description":"","filename":"Figure4ResultsfromFungiRegExhidingtheproteomecolumn.png","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/640e43e0a7273d3725292fcb.png"},{"id":49821542,"identity":"28bcead4-ba78-49a0-8c69-5c9c0a50fa9f","added_by":"auto","created_at":"2024-01-18 15:06:14","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":132650,"visible":true,"origin":"","legend":"\u003cp\u003eResults from FungiRegEx hiding the proteome column using R.LR effector regular expression.\u003c/p\u003e","description":"","filename":"Figure5RXLRresults.png","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/59b4a6d21b08afed810171b1.png"},{"id":49820870,"identity":"71bac052-903d-4029-86c9-a8679af27f4f","added_by":"auto","created_at":"2024-01-18 14:50:14","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":32345,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAn occurrence of a match. \u003c/strong\u003eCoincidences in the amino acid chain are detected by reading all and scanning the regular expression for k matches.\u003c/p\u003e","description":"","filename":"Figure6Anoccurrenceofamatch.png","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/d4ade8edc0cb82fb28b55cb1.png"},{"id":49820865,"identity":"f22a94e6-190c-48f5-9e4b-e4c76a55facd","added_by":"auto","created_at":"2024-01-18 14:50:14","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":138019,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eProgress monitor and calculation of processing time. \u003c/strong\u003eThe Puppeteer cluster includes a tool that monitors the progress of data acquisition and the performance of each instance.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/52962f521ed43caeb43f308a.png"},{"id":49820868,"identity":"6c9d2a30-5369-46d3-a5e1-c601c49e8b4f","added_by":"auto","created_at":"2024-01-18 14:50:14","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":95979,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eNetwork tool in web browser. \u003c/strong\u003eYou can see the time it takes to request the JGI Mycocosm database one by one server is 6.64 seconds; if the process were manual for 50,000 requests, it would take approximately 92 hours.\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/5ef60888e393e87bf75cea66.png"},{"id":52066629,"identity":"5770a486-4a11-4e57-905a-0adc2774371f","added_by":"auto","created_at":"2024-03-06 06:54:38","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1304509,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/8dee3cc9-abf5-4a56-b84e-8d601195e7f8.pdf"},{"id":49820869,"identity":"0a9ddd3b-de8d-419d-8f92-1d5bbdeae262","added_by":"auto","created_at":"2024-01-18 14:50:14","extension":"csv","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1177594,"visible":true,"origin":"","legend":"","description":"","filename":"AXSXGresults.csv","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/1fa01a9872fe861ab7501128.csv"},{"id":49821080,"identity":"f5bfb383-c3d4-4e6b-a0f7-ee97fa3200ab","added_by":"auto","created_at":"2024-01-18 14:58:14","extension":"csv","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":1324812,"visible":true,"origin":"","legend":"","description":"","filename":"RXLRresults.csv","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/910ee254e4c3ae9224b89ddc.csv"},{"id":49820874,"identity":"20176677-33e6-47ad-8be5-98e4bdf48a0e","added_by":"auto","created_at":"2024-01-18 14:50:14","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":472951,"visible":true,"origin":"","legend":"","description":"","filename":"derechosAutorFungiRegEx.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3852782/v1/1e4be53ef8533021ef0f11e0.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"FungiRegEx: A tool for patterns identification in Fungal Proteomic sequences using regular expressions","fulltext":[{"header":"Background","content":"\u003cp\u003eUnderstanding the characteristics of the species of the fungi phylogenetic tree requires the identification of certain sequences in the proteomes, which in turn may be correlated with the environment and its conditions\u003csup\u003e1\u003c/sup\u003e. The phylogenetic analysis provides a framework to develop research and identify multiple similarities and conservation zones. To identify and characterize proteins, a detailed analysis of proteomes is essential. Human experts can perform this task, but the analysis is challenging on a large scale.\u003c/p\u003e \u003cp\u003eA critical aspect of such automatic processes is the efficient traversal of proteomes. In the biological field, hard-coded algorithms mostly traverse phylogenetic trees, and some resources, such as grep\u003csup\u003e2\u003c/sup\u003e, which is a text-processing program designed for regular pattern matching within the text, allow the search of regular expressions in a string; this string can be a proteome or any other type of text sequence; some requisites of grep are:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eIf the user runs Linux as the Operating System, this tool is included; if not, the user must acquire a similar tool for the respective Operating System that performs the same functions as grep.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge about how to use the bash to execute this tool from a terminal.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge about how to use the wildcards and syntax to perform the search. Note: This program has no GUI to display the results.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eAnother resource to find regular expressions is msgfdb2pepxml\u003csup\u003e3\u003c/sup\u003e; this resource is a library that converts the output from the MS-GFDB search engine to pepXML, uses regular expressions to recognize enzyme uses and cleavage rules, and supports PSI-MS, in order to execute this library requires:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge in Python programming language.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge of the syntax of the library to use it correctly.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eNote\u003c/strong\u003e \u003cp\u003eAs this is a library, we do not have a GUI to present the results directly.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eAnother resource to find regular expressions is PhyloPattern\u003csup\u003e4\u003c/sup\u003e, which is a library focused on identifying regular expressions in phylogenetic trees; this library is not focused on proteomes or any other biological sequence, also to execute this tool requires:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eProlog syntax knowledge.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eInstall in Prolog engine the library PhyloPattern.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge in Prolog programming language.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eNote\u003c/strong\u003e \u003cp\u003eknowledge another critical aspect is that this library does not provide a GUI directly.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eAnother resource is PatScan\u003csup\u003e5\u003c/sup\u003e; which is a program focused on searches for protein or nucleotide sequences of a pattern (regular expression). In order to execute this program the requisites are:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCompile the source files.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eA FASTA file to perform the search on.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge in terminal use and syntax.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eAnother resource to identify repeats using regular expressions is Patscan\u003csup\u003e5\u003c/sup\u003e and PatMatch\u003csup\u003e6\u003c/sup\u003e. These programs do not automate searching for patterns within the sequences because they require the user to write the complete sequence in which they want to search for the pattern; entering all the sequences of a genome or proteome can take a long time for the size and amount of elements. PatMatch requires:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAccess to this tool through their website.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eKnowledge about the pattern syntax to perform the search.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eNotes: [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] Focuses only on peptide and nucleotide sequences. [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] The length limitation to search for is less than 20 residues. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] Only can process one sequence by time.\u003c/p\u003e \u003cp\u003eThe process of searching for regular expressions is notably time-intensive if the sequences are introduced to any software manually.\u003c/p\u003e \u003cp\u003eHowever, the programs and libraries already mentioned could require specific knowledge, like commands, bash, computer, download of files, or programming knowledge; this seriously complicates the process of searching for regular expressions on a large scale for users without proficiency in using programs or libraries that help analyze biological sequences. Tools like those already mentioned above are examples of software that offers a simple pattern-matching system, which may or may not include a GUI, and its implementation could be challenging. Also, the information source needed to function must be trustworthy.\u003c/p\u003e \u003cp\u003eFortunately, the Joint Genome Institute (JGI)\u003csup\u003e7\u003c/sup\u003e offers biological sequences like DNA or Protein/Nucleotide sequences with the certainty that all the information has been validated and is trustworthy.\u003c/p\u003e \u003cp\u003eIn that context, software that can be easily integrated into automatic genome-scale processes to read and analyze proteomes on a large scale, detect matches, and save considerable time without downloading files is now needed.\u003c/p\u003e \u003cp\u003eIn this way, we present FungiRegEx, a software that takes the available information from the JGI Mycocosm portal and performs a search into the proteome databases of the multiple species with the user-defined regular expression through its web scraper module integrated into the tool; also, it integrates a GUI with a user-friendly interface, to use it it is not necessary for the user to install any additional components, download additional files or have solid programming knowledge in any programming language.\u003c/p\u003e \u003cp\u003eThis software helps to the recognition of repeated sequences, which holds substantial significance, as it offers valuable insights into the functional and evolutionary roles of diverse organisms\u003csup\u003e8,9\u003c/sup\u003e, driving evolution, inducing variation, and regulating gene expression\u003csup\u003e10\u003c/sup\u003e. These patterns can be important for identifying certain protein functions or key structural regions. For\u003c/p\u003e \u003cp\u003eexample, searching for protein sequences containing a specific pattern can help identify proteins that bind to certain ligands or have specific enzymatic activity\u003csup\u003e11\u003c/sup\u003e. Searching for repetitive patterns in protein sequences can also help to identify evolutionarily related proteins, which can provide information about the evolution of proteins and their functions over time\u003csup\u003e12\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eNumerous software applications are accessible for the identification of various types of repeats. Nonetheless, no one has focused on FUNGI pattern detection through web scrapping; FungiRegEx does it. FungiRegEx employs a straightforward sequential search method to identify regular expressions directly from the protein sequences of FUNGI organisms. Diverging from the prevalent approach of employing a suffix tree or alignment matrix as a primary data structure, the algorithm introduced in this paper operates by directly identifying regular expressions within the protein sequence. As a result, this methodology exhibits efficiency in memory usage due to launching and running the scraper instances, boasts enhanced comprehensibility and ease of implementation, and offers great speed in getting multiple sequences at the same time compared to if the process were carried out manually or using tools like PatMatch\u003csup\u003e6\u003c/sup\u003e that requires to introduce one by one sequence. Also, FungiRegEx does not require downloading any fasta or file.\u003c/p\u003e \u003cp\u003eAnother relevant aspect of the tool is that it includes a GUI, which means the user does not need to have a strong knowledge of any programming language or commands to use it, compared to if grep or another tool that requires a bash interface were used. Also, the scrapper module is customizable to adapt it to the resources of the computer or server where it is executed (in case the user wants a greater or lesser number of scraper instances). Finally, this tool could be deployed on a server or a computer if the user wants to.\u003c/p\u003e \u003cp\u003eVarious tools and resources, such as grep, msgfdb2pepxml, PhyloPattern, and PatScan, exist for searching regular expressions in biological sequences. However, these tools often require specific knowledge, limiting accessibility for users without programming proficiency. The process of manually introducing sequences to these tools is time-intensive and complex.\u003c/p\u003e \u003cp\u003eAddressing these challenges, FungiRegEx is introduced as a user-friendly software designed for automatic genome-scale proteome analysis. Integrated with a web scraper module, FungiRegEx efficiently searches user-defined regular expressions in the proteome databases of multiple fungal species sourced from the JGI Mycocosm database. Notably, FungiRegEx stands out by providing a GUI, eliminating the need for additional downloads or programming knowledge.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003eArchitecture of FungiRegEx\u003c/h2\u003e\n \u003cp\u003eFungiRegEx front-end is based on React JS 17.0.2v, is a JavaScript library that is both available and open-source, designed for constructing interfaces\u003csup\u003e13\u003c/sup\u003e, and Node JS 16.17v, serving as the back-end, is a JavaScript runtime constructed upon the V8 JavaScript engine\u003csup\u003e14\u003c/sup\u003e, Chromium version 79.0.3945.117, an open-source browser project, is dedicated to creating a more secure, expeditious, and dependable means for users to engage with the web\u003csup\u003e15\u003c/sup\u003e. A collection of React JS components was created to execute these functions: provide an interactive GUI (see Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e) to choose specific parameters to perform a search through the NodeJS back-end with the user-defined Regular Expression into proteomes, pick the species or species into fungi organisms to perform the search, visualize the results of the search, and download the results in CSV format if the user wants to, also the Node JS back-end was designed to perform these tasks: launch the scrapper instances into the JGI Mycocosm database, with the obtained information of every instance the regular expression is looked for into the proteome, to save RAM memory the backend reuses each instance of Chromium once they have obtained the information, in case of error an automatic restart of the instance is performed.\u003c/p\u003e\n \u003cp\u003eFungiRegEx works as follows: first, the user selects the type of search he wants to perform globally into all 2,402 different species of fungi or a specific species (\u003cstrong\u003eThis means that new taxonomic additions will not be available in the software, unless that the user add it.\u003c/strong\u003e). Second, the user selects to scrap the regular expression into a range of identifiers or a list of identifiers in the database. Third, to start the search, the retrieved sequences of proteomes are scrapped into the Joint Genome Institute through the Node JS script and displayed in the table through React JS; the table can be ordered through alphabetical order of Specie or the number of matches into the proteome in ascending or descending order, the last column of the table displays the coincidences. Fourth, the results of the table can be downloaded in an output file in CSV format.\u003c/p\u003e\n \u003cp\u003eFungiRegEx is distributed as a compressed file in ZIP format. The source code is available for download at SourceForge and \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Maigolinox/fungiregex\u003c/span\u003e\u003c/span\u003e GitHub repository. Once the FungiRegEx directory has been uncompressed, it shows the FungiRegEx directory in which the results.csv file is empty, and package.json contains all the instructions to execute the web application correctly. The user must download each prerequisite from the platforms indicated in Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e and the documentation and uncompress and install them for the correct execution of FungiRegEx. The command to execute the front-end of FungiRegEx is \u003cem\u003enpm run start:frontend\u003c/em\u003e, and the command to execute the back-end of FungiRegEx is \u003cem\u003enpm run start:backend\u003c/em\u003e. Also, you can find more instructions in the documentation according to your operating system.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003eValidation of FungiRegEx results\u003c/h2\u003e\n \u003cp\u003eFungiRegEx was challenged by querying multiple databases of various species available on the JGI Mycocosm website., the results were validated with the search of the sequence AXSXG as a regular expression, which is a pentapeptide of a lipase group that brings thermostability and resistance to solvents of an enzyme\u003csup\u003e16\u003c/sup\u003e that has been little described in fungi.\u003c/p\u003e\n \u003cp\u003eSimilarly, the results that show FungiRegEx have been validated with the next considerations:\u003c/p\u003e\n \u003col\u003e\n \u003cli\u003eSpecie:\u0026nbsp;Saccharomyces\u0026nbsp;cerevisiae\u0026nbsp;(SacceM3836_1)\u003csup\u003e17\u0026ndash;20\u003c/sup\u003e.\u003c/li\u003e\n \u003cli\u003eAXSXG pentapeptide where X represents whatever amino acid.\u0026nbsp;\u003c/li\u003e\n \u003c/ol\u003e\n \u003cp\u003eTo bring some results, we perform the search with the next parameters:\u003c/p\u003e\n \u003col\u003e\n \u003cli\u003eA.S.G,\u0026nbsp;where\u0026nbsp;.\u0026nbsp;in\u0026nbsp;regular\u0026nbsp;expressions\u0026nbsp;notation\u0026nbsp;means\u0026nbsp;whatever\u0026nbsp;amino\u0026nbsp;acid.\u003c/li\u003e\n \u003cli\u003eSearch in a specific range: 1 to 2000. This means that FungiRegEx will launch the scrapper instances to retrieve the data in the JGI Mycocosm database from 2000 proteomes.\u003c/li\u003e\n \u003c/ol\u003e\n \u003cp\u003eAfter running the search, the tool showed that of the 2000 scraped sequences, only one with identifier 1434 has that pentapeptide only once, while it also identified matches in 281 sequences with similarity. Figure \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e shows the results of the JGI Mycocosm database and Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e shows the match with the tool:\u003c/p\u003e\n \u003cp\u003eAs mentioned, the tool also looks into the complete sequence for other similarities according to the regular expression (see Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e). We hide the proteome column for the image size to show the results of FungiRegEx with the mentioned parameters.\u003c/p\u003e\n \u003cp\u003eIn this way, the tool proves to be capable of finding the regular expression in the search proteome determined by the user, which may be of interest for subsequent studies.\u003c/p\u003e\n \u003cp\u003eA second use case executed to validate FungiRegEx functionality involves the search for effectors. Liping Liu et al.\u003csup\u003e22\u003c/sup\u003e identified different effectors, such as the RXLR, asserting that fungi, oomycetes, and bacteria release small secreted proteins crucial for symbiotic interaction and pathogenicity. Liping Liu investigated various effectors in different species, such as Mg3LysM (Mycosphaerella graminicola LysM), secreted by Mycosphaerella graminicola\u003csup\u003e23,24\u003c/sup\u003e. For this example, the JGI Mycocosm database of Mycosphaerella gramin\u0026iacute;cola v2.0 will be used with the RXLR sequence, where X represents any amino acid. In regular expression language, the regular expression is R.LR.\u003c/p\u003e\n \u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e shows the results of FungiRegEx hiding the proteome column due to the size of the proteomes to show the results in the figure.\u003c/p\u003e\n \u003cp\u003eAs shown in Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e, FungiRegEx can find the effector of interest in the proteome, showing the number of matches and the sequences. In addition to the above, it is identified that the species Mycosphaerella graminicola does indeed have this effector (RXRL), as stated by Liping Liu et al.\u003csup\u003e22\u003c/sup\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003eImplementation\u003c/h2\u003e\n \u003cp\u003eRegular expressions search for FUNGI organisms is based on finding exact repeats of length \u003cstrong\u003ek\u003c/strong\u003e along the amino acid chain. The regular expression can be as long as the user wants. If the protein sequence of length \u003cstrong\u003ek\u003c/strong\u003e is diminutive, the comparative process proceeds with greater expeditiousness. Once the regular expression is found in the protein sequence of the FUNGI organism, it is filtered to eliminate those that do not match.\u003c/p\u003e\n \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e\n \u003ch2\u003eSearching for regular expression matches\u003c/h2\u003e\n \u003cp\u003eThe application back-end begins its search by creating a regular expression object. Regular expressions are patterns used to search for character combinations in text strings. Regular expressions can contain various special characters and modifiers that define the pattern to search for\u003csup\u003e25\u003c/sup\u003e.\u003c/p\u003e\n \u003cp\u003eThe magnitude of the search range directly impacts the algorithm\u0026rsquo;s processing time; in this way, smaller ranges are preferred for optimal efficiency.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003eIf matches between regular expression and protein sequence are found\u003c/h2\u003e\n \u003cp\u003eThe aforementioned search process identifies the regular expression within the amino acid chain with a length of k, as illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e. As the detected repeats do not accurately reflect the true length of the repeating pattern, they need to be expanded to match the actual repeat length. In this paper, the chosen approach involves reading the characters located to the left or right of all repeats and storing them in an array.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003eIf no matches between regular expression and protein sequence are found\u003c/h2\u003e\n \u003cp\u003eIf no matches are found, the algorithm will continue the search in another amino acid chain of the proteome. The search algorithm can progress in larger intervals without overlooking any repetitions.\u003c/p\u003e\n \u003cp\u003eThe crucial factor in progressing with larger intervals lies in ensuring that the search algorithm never overlooks any matches.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003eProcessing speed\u003c/h2\u003e\n \u003cp\u003eAn approximation of the speed will take the algorithm to process the regular expression given by the next mathematical formula.\u003c/p\u003e\n \u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAALoAAAAyCAYAAADr7cFEAAAIwklEQVR4Ae2b21ElOwxFyYVE+CMWQiESAiEOYuHWombd2Qj367yG0y1V9djW29K2u4Gah8+mrsABKvBwgD32FrsCnw30BsEhKtBAP0Sbe5MN9Bti4OHh4ZPn+fn5hlG3hzLPl5eX7cZ/LN7e3r72iq/39/eT/VzKsIF+qUqu9APIX19fV2rfVu3j4+MLnACT5/HxcTIB9gGI50iwz+ncSjaf6a2yOEgcgcT4GwlgX/IQ8kY4561wyRo10C9ZzQVf3HBzt+SC+VXFAHzpht6aAHtlz7+BDgV0b1Qa6q3qKzgbYtPVoVHw/LbmlsKHt59rfWjv2kZ7w6mfeajDCECQZQx5mcOaQ6Nd+spYWRN0lnzySWNuWR98um/k6qmTe1aeeVx7fiigW0zBwkhDaBCNgJjz0HBkErrwJfSRMwJo5My1d64+o0DyABADPUlwGBdf5oqOPhnhp0wfjvoyloDWt3qO+Mr9yZ8a64Go+bhX7M1F0M/lPRXvXP7hgE7RAQqPTRdAFJNmCIosLo2zUfBpNMAQSMgFOXKaqYy1zTYmPAHLHMJHyiv4kOMz/f4x/TGYTwrMOXnOa2z5o5H45C4xr8DPWNYz66ftrcbDAR0w0pgEC03OJqCTtw7gy0aiy9pmC2JtbGw2sfpEhr4+GLXXnzJ0zSF56T/nxGJPleDlvpUbz/XSSA7YQNq6huf+kzeV01KsS8l/VuNSnn+pH8AkoEiRZiSI4SFPQCSokSNLINlEG4s8Y4x8wsOHNsRgzVPzQdcYeSDhj4jY9UCY88ge3ZrvyK+83PvIFl7qaDfSVXbt8XBArw0AQDySt5GAoDkAL4FfD0IFFjZzPomFPMFFXmmDDjHNgxyq3JzrONKFl/HSZqSf8pxzMNNP3Ts5ZyxqkcQ+PdzJv/b8UECnCbXwNCWbl3N0saE5AE5b1zanNk+5gLD56gNYdJLQJRcJG+3r4VNnaiRPbdHBb/pOO32vBR+585CfORoLH8ZWDxlzCLm1yRxuMf9e7Y0RLZIA2Gh+c3XypDlJFD6Lj1wec/cID8qDwFp9fVZ9+TRcvwJDGWPaoZc1JcYUUNNHztE3nkBLufOtvt2H+QleYrkvY7Mn5+ZCvH9Bm4BOkmysaT8VALBzB2EvO90EdE5l074qcJSerkKur6t8/XATsPZ1RfuV88rKtdDwNeY6R20Z/9XrLfPZ85wa0yN6eJRarwI6TQfQ9RVHkSoPMEMWk5HPHfh+h3oQ0MOHuqwpPk8l9dCtz0i/2vf6bwW8cGrv/mrsb7Ya6AlGywD485udOTxvecFJQQG3QNfedQKfJmDX1BW4ZAVWAd0buQb29pYPoOEJfm5a1t4cjHn7ciBc56HQX49dgUtVYBXQK0AJDjAFqckkcOEB8jwMyPO2zk8Q3wL6qqMHIW2c1zyqba+7AquAXgFK2QBXgtbPEEvq2tsdfn7+KE8f6Hj76+eU8enp6cd3vIeix58/4+yhJvR8jlYBnUJAeetyUwNWeQDWObqs8zZXzsgDIU+b0ZvjS7H/6QqcWYFFoHvzAnZv5xGv3vAAOD8pADE+koe/vE0ucZufWY8232kFFoG+0333tg5WgQb6wRp+1O020K/YeT/X8rPviuFOdp2fonv9fGygnwyPdYb+HLJO+/Za5gfY/YXB7bO4fsQG+pVrzA/f+QP4lcNtds/bxt+CbTa+I4MG+pWbxa9QfyuQOID5K+BTSsHePCz+Bo23A6SMGMgg3yDqyh99PqlziYti10C3UBQXomDw6ncovFpMeFN2+hHAfou7/goW/5kim1vjqCsYMj/9wgMI+jEvbXMUXNjwjEg/6uTfMkb6Szxy98CwP3IghjzWxsicmLMvntRl3/CmarWUz0g+rsRI8055FtgGZNHZEsWuPIGQW8Yevn78OwFN4YHPmMQ6QUAcm6uesRghbMyZNXx8MMKv9vphRC5gWJNT+kpd5uiS0zkEIDMn4pkzMog9WRv3gI088mTNw1wiN+si79Rx10Cn0BSWgtpQ5hbTRrC26BQydSwsoEg/Nko74hhDG3g8kqDI5uEn1+arjbmZs/w6kge+ksin8pSbizWQv3UkhrlZN3x64Nhb7lHdjMOerQHy9Jn1S5ut8++V2Wr9y/UtmGAkXRpgUU0/GwGP4mIr2TibZPNsgqBR37H61U5w4U8fyoyhD/JFRxv5dSRW7hM5awFX9dnflKzqzq3Jl9g87gX95DE3FjrKzJdc5DG3FvDS51weS7JdA50mWGAKMQIkRU0di57AkmcxaRBNwBZCXhtS/aKHXeoR1wZnDsYxXwEhv44CI3NGB5/14GhLHlMyddaMxKhxp+zQtWbonBPfnqz1sWug1yZQnFoY1vJoGDY8SYACW6mCBPsKxvSrXR4OeKyrHXEEDrKai75ytOkjXgIr5cTOPaVsae7hvOU4yok+TO2v6u8W6BSAQiQJ2ORbLEEOuASugKOhzvHHOgusPP0C0AQSOhXU6CeQ0U8fzKtN7sc5uWVOAj/jq8tI7uac/Hua26+1Oe8W6ACkggRQJSAokjwBB8gTlPXACCILLMiwSRJM8GtM9dIWHWInTdmljnPzMp780Uhd8kCNdJZ41qnmiF941hM/6iYPvrlmLmmLfsqyXuy39ncu5+/dmdNs2W4qAHimbvu1m6ygxQ6efpUDcg9wgjYPCHwuBh8PImv9eHGYX9rLmxsb6HPV2aFs6yt/qgQAWlCjw9wbFuAyn4oF8NXFVqAzR+Yan8whgI0/aMrvl3Dinwb6RGH2xAYYAAUSRJfYH6DkptUvMfKmBaje5hmPWzpBa27oIPMQkCs+qh/46qTfuXkDfa46O5IJQkBzLnnL4kvA5QGSzzjSBcwekAQ2PPXzcBIj4+RhWruXBvraSrXe/xXw0DB6M/sNDS9vcXWTl7oCGOfoeBAZsQX4gp61thyQLdRA31Kt1r3bCjTQ77Z1nfiWCjTQt1Srde+2Ag30u21dJ76lAg30LdVq3butwH9tgH38VjxX5gAAAABJRU5ErkJggg==\" width=\"186\" height=\"50\"\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003cp\u003eReducing the size of search intervals can improve processing speed. However, the algorithm\u0026rsquo;s speed may decrease if the intervals become too small. This is because smaller intervals can cause the program to spend more time launching browser instances than acquiring information and performing the search for the regular expression.\u003c/p\u003e\n \u003cp\u003eWith 200 puppeteer instances, 50,000 results are obtained in approximately 139 minutes, as can be seen in Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e with the estimated time that indicates the monitor (Depending on the resources of the computer, the computer can consult at least seven pages per second; this means that the 50,000 results can be consulted in only 12 minutes); it also clarifies that this depends on the available computer resources. It should be mentioned that just requesting the JGI Mycocosm database and getting a response on the webpage takes around 6.64 seconds, as shown in Fig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eThe FungiRegEx tool was constructed to identify the regular expression in Fungal proteomes, the results are listed:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eEfficiency in Proteomic Sequence Analysis: FungiRegEx is demonstrated as a tool that streamlines the process of analyzing proteomic sequences of 2,402 species available in the JGI Mycocosm portal.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eReal-time Data Retrieval without FASTA Downloads: The FungiRegEx scrapper module eliminates the need to download any FASTA files for proteomic analysis. FungiRegEx dynamically requests data from the JGI Mycocosm website, ensuring real-time access to information.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAccelerated Results Retrieval: FungiRegEx exhibits a significant increase in result retrieval speed compared to manual proteomic data extraction on the JGI Mycocosm website.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eOptimized Computational Resource Usage: FungiRegEx effectively utilizes the computational resources available on the user\u0026rsquo;s computer, demonstrating efficiency in resource management.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAdaptability and Customization: Users can easily adapt FungiRegEx to run on any computer, allowing customization of the number of scrapper instances the program utilizes.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eUser-Friendly GUI Presentation: The software features a GUI that presents results without requiring users to code or possess specialized knowledge like the use of bash.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePlatform Independence: FungiRegEx operates seamlessly without needing a specific operating system, providing users with flexibility in execution.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eLocal and Server Deployment Options: The tool can be launched locally or deployed on a server, giving users the autonomy to choose the preferred execution mode.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eEfficient Result Filtering: FungiRegEx offers features for filtering results, facilitating the identification of specific sequences and enhancing result interpretation.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eConsole-Free User Experience: Users can interpret results without needing a console, enhancing accessibility compared to other tools that may require console interaction.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eSimplified Search Syntax with User-Defined Parameters: FungiRegEx allows users to input any sequence of interest for searching across the entire proteome or within a specific range. This customization is particularly useful for identifying sequences in specific species, potentially contributing valuable insights for further research.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePotential for Future Research: The identification of the user-defined regular expression on certain proteomes allows the user to save those results for further researchs, as we could see in the validation section of this research with the AXSXG example sequence where that sequence was of interest due the properties that it provides. These identified sequences may or may not exhibit specific characteristics, paving the way for future research and exploration.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eDetails of the finding sequence: The tool indicates the number of sequences that match the regular expression and shows the exact match.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eIn conclusion, FungiRegEx enhances the efficiency of proteomic sequence analysis and provides a user-friendly, adaptable, and customizable tool with advanced features for result interpretation and exploration in subsequent research endeavors.\u003c/p\u003e \u003cp\u003eThe programs listed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e accomplished the FungiRegEx function.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComponents of FungiRegEx that accomplish the functionality.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProgram\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVersion\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNodeJS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e16.17.0v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReactJS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e18.0.0v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChromium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e79.0.3945.117v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNodemon\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.0.20v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAxios\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.27.2v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eExpressJS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4.17.13v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePuppeteer cluster\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.23.0v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRegex Parser\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.2.11v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCheerio\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.0.0-rc.12v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCORS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.8.5v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDotEnv\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e16.0.1v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFlatted\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3.2.6v\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this section, we delve into a comprehensive discussion of the essential findings and limitations associated with FungiRegEx, a tool designed to streamline the analysis of proteomic sequences from the JGI Mycocosm database. The results highlight the tool\u0026rsquo;s efficiency in handling a vast dataset of 2,402 Fungi species, offering accelerated results retrieval and optimized utilization of computational resources. Additionally, FungiRegEx introduces user-friendly features such as real-time data retrieval, adaptability for customization, and a graphical user interface (GUI), presenting a promising solution for researchers engaged in proteomic analysis. However, as with any technological advancement, the discussion also addresses certain limitations and considerations, providing valuable insights into the practical use of FungiRegEx. From speed considerations to potential IP blocking risks and task limitations, the ensuing dialogue aims to assess these aspects critically, offering researchers a nuanced perspective on the tool\u0026rsquo;s capabilities and areas for potential improvement.\u003c/p\u003e \u003cp\u003eThe results are presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e; the information was divided into two tables due to their size.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAdvantages and limitations of FungiRegEx.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAdvantages\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLimitations\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFungiRegEx has demonstrated remarkable efficiency\u003c/p\u003e \u003cp\u003ein analyzing proteomic sequences across the extensive dataset of 2,402 species on the JGI Mycocosm portal. This efficiency is crucial in accelerating research pro- cesses and enabling broader research.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUsers are required to enter the characteristics for newly\u003c/p\u003e \u003cp\u003eadded Fungi species manually. Failure to do so re- stricts users to the initially registered 2,402 species, potentially limiting the software\u0026rsquo;s applicability to fu- ture taxonomic additions.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEliminating the need for FASTA file downloads\u003c/p\u003e \u003cp\u003ethrough the FungiRegEx scrapper module signifies a significant advancement. Real-time data retrieval from the JGI Mycocosm website enhances the imme- diacy of access to the latest proteomic information, contributing to the timeliness of research outcomes.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTo use the tool and get the available information from\u003c/p\u003e \u003cp\u003ethe website internet connection is needed, also the time to get the information depends of the user connection, limiting the speed to perform the search.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]The observed increase in result retrieval speed with\u003c/p\u003e \u003cp\u003eFungiRegEx compared to manual extraction on the JGI Mycocosm website underscores the tool\u0026rsquo;s potential to save researchers valuable time. This acceleration in the data retrieval process could have substantial implications for large-scale studies.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe speed of FungiRegEx is influenced by internet\u003c/p\u003e \u003cp\u003espeed, available computational resources, and the num- ber of deployed scrapper instances. Balancing these factors is crucial, as fewer instances reduce resource consumption but extend task completion time.\u003c/p\u003e \u003cp\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] Deploying many scrapper instances (over 100) poses the risk of temporary IP blocking by the JGI My- cocosm website. This limitation necessitates a careful\u003c/p\u003e \u003cp\u003ebalance to prevent potential issues with site access.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe effective utilization of computational resources\u003c/p\u003e \u003cp\u003eby FungiRegEx highlights its efficiency in resource management. This optimization ensures that the tool operates smoothly, providing a seamless user experi- ence.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe resources that FungiRegEx uses must necessarily\u003c/p\u003e \u003cp\u003ebe configured by the user by changing the parameters of FungiRegEx.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe adaptability of FungiRegEx for use on any com-\u003c/p\u003e \u003cp\u003eputer and the ability to customize the number of scrap- per instances enhances its versatility. Users can tailor the tool to suit their specific computational capabilities and requirements.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e[1] The resources that FungiRegEx uses must neces-\u003c/p\u003e \u003cp\u003esarily be configured by the user.\u003c/p\u003e \u003cp\u003e[2] FungiRegEx currently supports only one task at a time. If multiple users attempt simultaneous searches, the tool prioritizes the latest task, potentially deleting the previous user\u0026rsquo;s task. This limitation may impact concurrent usage scenarios.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eA user-friendly GUI in FungiRegEx simplifies result\u003c/p\u003e \u003cp\u003epresentation, making it accessible to users without specialized coding knowledge. This feature promotes ease of use in the scientific community.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe results table of FungiRegEx shows the complete\u003c/p\u003e \u003cp\u003eproteome in a single line, making it difficult to read the table in very long proteomes; for this reason, the GUI can be improved to facilitate the reading and presentation of the results.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe platform independence of FungiRegEx, coupled\u003c/p\u003e \u003cp\u003ewith the choice between local and server deployment, provides users with flexibility. Researchers can choose the execution mode that aligns with their preferences and available infrastructure.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWhile FungiRegEx offers flexibility in deployment,\u003c/p\u003e \u003cp\u003eit is recommended for users to install the application locally. This recommendation emphasizes that Fun- giRegEx is single task.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAdvantages and limitations of FungiRegEx.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAdvantages\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLimitations\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe tool\u0026rsquo;s provision for user-defined parameters in\u003c/p\u003e \u003cp\u003ethe search syntax, including the ability to specify se- quences in certain species, enhances its applicability for diverse research scenarios.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe user needs to know how many proteomes to search\u003c/p\u003e \u003cp\u003efor, that is, the length of the proteome of the species or species of interest.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAs demonstrated in the validation section, identifying\u003c/p\u003e \u003cp\u003eand filtering the user-defined regular expression into the results table of FungiRegEx for future investiga- tions highlights the tool\u0026rsquo;s potential for ongoing and future research. This capability allows users to explore specific sequences of interest and their potential char- acteristics inside of the already obtained results.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe user must already have the sequence of interest\u003c/p\u003e \u003cp\u003ethat they wish to search for in the proteome of a certain species.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFungiRegEx makes it possible to store the results in\u003c/p\u003e \u003cp\u003eCSV format.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFungiRegEx does not include an internal database to\u003c/p\u003e \u003cp\u003estore the information; this means that if the user does not save the results, all the search of the regular ex- pression inside the scraped proteomes will be lost.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe tool\u0026rsquo;s provision of information on the number\u003c/p\u003e \u003cp\u003eof matching sequences and the exact match further enhances result transparency and aids researchers in refining their analyses.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlthough the tool locates the sequences matching the\u003c/p\u003e \u003cp\u003eregular expression in the proteome when listing the proteome in the results table, the matches do not have a differentiator that facilitates their location, such as changing the font color only in the matching part; this makes it challenging to locate the matching sequences in the proteome.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNone.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDr. Miguel Angel Canseco P\u0026eacute;rez supported the development of FungiRegEx with personal financial resources.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data used or analyzed during the current software tool are available from the corresponding author in the Joint Genome Institute database.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eVMTM, MACP: Investigation, Web-Application Development, Writing, Resources, Project administration, Formal Analysis, Validation. JMM, MAMM, MTH: Writing \u0026mdash;Review \u0026amp; editing\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability and Requirements\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eProject name: FungiRegEx.\u003c/li\u003e\n \u003cli\u003eProject home page: https://sourceforge.net/projects/fungiregex/\u003c/li\u003e\n \u003cli\u003eOperating system(s): Windows, Linux.\u003c/li\u003e\n \u003cli\u003eProgramming language: JavaScript.\u003c/li\u003e\n \u003cli\u003eLicense: Creative Commons Attribution Non-Commercial License V2.0.\u003c/li\u003e\n \u003cli\u003eAny restrictions to use by non-academics: None.\u003c/li\u003e\n\u003c/ul\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eLucia Muggia, K. S., Claudio G. Ametrano \u0026amp; Tesei, D. An overview of genomics, phylogenomics and proteomics approaches in ascomycota. \u003cem\u003eMDPI Life \u003c/em\u003e\u003cstrong\u003e10\u003c/strong\u003e, 356, DOI: 10.3390/life10120356 (2020).\u003c/li\u003e\n\u003cli\u003eBull, R., Trevors, A., Malton, A. \u0026amp; Godfrey, M. Semantic grep: regular expressions + relational abstraction. In \u003cem\u003eNinth Working Conference on Reverse Engineering, 2002. Proceedings.\u003c/em\u003e, 267\u0026ndash;276, DOI: 10.1109/WCRE.2002.1173084 (2002).\u003c/li\u003e\n\u003cli\u003eBoris Nagaev, K. Y. \u0026amp; Palmblad, M. msgfdb2pepxml (2011).\u003c/li\u003e\n\u003cli\u003ePhilippe Gouret, J. D. T. \u0026amp; Pontarotti, P. Phylopattern: regular expressions to identify complex patterns in phylogenetic trees. \u003cem\u003eBMC Bioinforma. \u003c/em\u003e\u003cstrong\u003e10\u003c/strong\u003e, 298, DOI: 10.1186/1471-2105-10-298 (2009).\u003c/li\u003e\n\u003cli\u003eDsouza M, O. R., Larsen N. Searching for patterns in genomic data. \u003cem\u003eTrends genet \u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, 497\u0026ndash;498, DOI: 10.1016/s0168-9525 (1997).\u003c/li\u003e\n\u003cli\u003eYan T, B. T. M. L. W. D. W. S. C. J. R. S., Yoo D. Patmatch: a program for finding patterns in peptide and nucleotide sequences. \u003cem\u003eNucleic Acids \u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, 262\u0026ndash;266, DOI: 10.1093/nar/gki368 (2005).\u003c/li\u003e\n\u003cli\u003eJGI, J. G. I. About us (2022).\u003c/li\u003e\n\u003cli\u003eAchaz G, N. P. R. E., Coissac E. Associations between inverted repeats and the structural evolution of bacterial genomes. \u003cem\u003eGenetics \u003c/em\u003e\u003cstrong\u003e164\u003c/strong\u003e, 1279\u0026ndash;1289, DOI: 10.1093/genetics/164.4.1279 (2003).\u003c/li\u003e\n\u003cli\u003evan Belkum A, v. A. L. V. H., Scherer S. Short-sequence dna repeats in prokaryotic genomes. \u003cem\u003eMicrobiol Mol Biol Rev \u003c/em\u003e\u003cstrong\u003e62\u003c/strong\u003e, 275\u0026ndash;293, DOI: 10.1128/MMBR.62.2.275-293.1998 (1998).\u003c/li\u003e\n\u003cli\u003eXingyu Liao, J. Z. H. L. X. X. B. Z., Wufei Zhu \u0026amp; Gao, X. Repetitive dna sequence detection and its role in the human genome. \u003cem\u003eCommun. biology \u003c/em\u003e\u003cstrong\u003e6\u003c/strong\u003e, 954, DOI: 10.1038/s42003-023-05322-y (2023).\u003c/li\u003e\n\u003cli\u003eDaniel Barry Roche, D. A. B. \u0026amp; McGuffin, L. J. Proteins and their interacting partners: An introduction to protein-ligand binding site prediction methods. \u003cem\u003eInt. J. Mol. Sci. \u003c/em\u003e\u003cstrong\u003e16\u003c/strong\u003e, DOI: 10.3390/ijms161226202 (2015).\u003c/li\u003e\n\u003cli\u003eMatthew Merski, J. L. J. S. S. D.-H. . M. W. G., Krzysztof Młynarczyk. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. \u003cem\u003eBMC Bioinforma. \u003c/em\u003e\u003cstrong\u003e21\u003c/strong\u003e, DOI: 10.1186/s12859-020-3493-y (2020).\u003c/li\u003e\n\u003cli\u003eMeta Platforms, F. O. S. Getting started, what is react and documentation (2020).\u003c/li\u003e\n\u003cli\u003eFoundation, O. Getting started, what is node js and documentation (2020).\u003c/li\u003e\n\u003cli\u003eLLC, G. Getting started, what is chromium and documentation (2020).\u003c/li\u003e\n\u003cli\u003eDenise Esther Guti\u0026eacute;rrez-Dom\u0026iacute;nguez, M. M. R.-A. J. N. A. T. I. I.-F. M. C.-P., Bartolom\u0026eacute; Ch\u0026iacute;-Manzanero \u0026amp; Canto-Canch\u0026eacute;, B. Identification of a novel lipase with ahsmg pentapeptide in hypocreales and glomerellales filamentous fungi. \u003cem\u003eInt. J. Mol. Sci. \u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e, 9367, DOI: 10.3390/ijms23169367 (2022).\u003c/li\u003e\n\u003cli\u003eCherry JM, A. C.-B. R. B. G.-C. E. C. K. C. M. D. S. E. S. F. D. H. J. H. B. K. K. K. C. M. S. N. R. P. J. S. M. S. M. W. S. W. E., Hong EL. New data and collaborations at the saccharomyces genome database: updated reference genome, alleles, and the alliance of genome resources. \u003cem\u003eGenetics \u003c/em\u003eDOI: 10.1093/genetics/iyab224 (2022).\u003c/li\u003e\n\u003cli\u003eStephen F. Altschul, A. A. S. J. Z. Z. Z. W. M., Thomas L. Madden \u0026amp; Lipman, D. J. Gapped blast and psi-blast: a new generation of protein database search programs, DOI: 10.1093/nar/25.17.3389 (1997).\u003c/li\u003e\n\u003cli\u003eAlejandro A. Schaffer, T. L. M. S. S. J. L. S. Y. I. W. E. V. K., L. Aravind \u0026amp; Altschul, S. F. Improving the accuracy of psi-blast protein database searches with composition-based statistics and other refinements. \u003cem\u003eNucleic Acids Res \u003c/em\u003eDOI: 10.1093/nar/29.14.2994 (2001).\u003c/li\u003e\n\u003cli\u003eEdith D Wong, S. A. K. K. R. S. N. M. S. S. S. W. S. R. E. J. M. C., Stuart R Miyasato. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. \u003cem\u003eGenetics \u003c/em\u003eDOI: 10.1093/genetics/iyac191 (2023).\u003c/li\u003e\n\u003cli\u003eSteven D. Brown, C. M. J. A. C. A. A. A. S. A. S., Dawn M. Klingeman. Genome sequences of industrially relevant saccharomyces cerevisiae strain m3707, isolated from a sample of distillers yeast and four haploid derivatives. \u003cem\u003eASM Journals - Genome Anouncements \u003c/em\u003e\u003cstrong\u003e1\u003c/strong\u003e, DOI: 10.1128/genomeA.00323-13 (2013).\u003c/li\u003e\n\u003cli\u003eLiping Liu, Q. J. R. P. R. O. W. Z., Le Xu \u0026amp; Wu, C. Arms race: diverse effector proteins with conserved motifs. \u003cem\u003ePlant Signal. \u0026amp; Behav. \u003c/em\u003e\u003cstrong\u003e14\u003c/strong\u003e, 1557008, DOI: 10.1080/15592324.2018.1557008 (2019). PMID: 30621489, https://doi.org/10.1080/15592324.2018.1557008.\u003c/li\u003e\n\u003cli\u003eMarshall, R. \u003cem\u003eet al. \u003c/em\u003eAnalysis of Two in Planta Expressed LysM Effector Homologs from the Fungus Mycosphaerella graminicola Reveals Novel Functional Properties and Varying Contributions to Virulence on Wheat. \u003cem\u003ePlant Physiol. \u003c/em\u003e\u003cstrong\u003e156\u003c/strong\u003e, 756\u0026ndash;769, DOI: 10.1104/pp.111.176347 (2011).\u003c/li\u003e\n\u003cli\u003eLee, W.-S., Rudd, J. J., Hammond-Kosack, K. E. \u0026amp; Kanyuka, K. Mycosphaerella graminicola lysm effector-mediated stealth pathogenesis subverts recognition through both cerk1 and cebip homologues in wheat. \u003cem\u003eMol. Plant-Microbe Interactions \u003c/em\u003e\u003cstrong\u003e27\u003c/strong\u003e, 236\u0026ndash;243, DOI: 10.1094/MPMI-07-13-0201-R (2014).\u003c/li\u003e\n\u003cli\u003eweb docs, M. Regular expressions (2023).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-3852782/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3852782/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIn the context of genome-scale research, it is imperative to automatically analyze numerous species and sub-species to discern distinctive features present in multiple proteomes that contain specific sequences of interest since they provide specific properties. Complex sequences must be recognized within an organism\u0026rsquo;s complete set of proteomes to accomplish this. This study introduces FungiRegEx, a user-friendly software for automatic genome-scale proteome analysis of fungi organisms, addressing the limitations of existing tools.\u003c/p\u003e \u003cp\u003eFungiRegEx utilizes real-time data retrieval of the different species from the JGI Mycocosm database without downloading any files. With a user-friendly GUI, the tool offers efficient regular expression searches across 2,402 fungal species from the JGI Mycocosm portal. Validation with the sequence AXSXG or effector RXRL demonstrates FungiRegEx\u0026rsquo;s effectiveness in identifying user-defined patterns in the retrieved sequences. FungiRegEx accelerates result retrieval compared to manual processes, providing a console-free and programming-free experience; this tool allows customization, result filtering, and the possibility of saving the results for future research.\u003c/p\u003e \u003cp\u003eFungiRegEx offers a promising solution for researchers exploring specific sequences in the fungal proteomes. It combines speed, adaptability, and ease of use, displaying the results in a GUI and making it easy to read. Its architecture ensures optimized resource usage and deployment flexibility, allowing the customization of specific software parameters.\u003c/p\u003e \u003cp\u003eThe tool\u0026rsquo;s potential for future research and exploration is emphasized, providing a nuanced perspective on its practical use within the fungal genomics community.\u003c/p\u003e","manuscriptTitle":"FungiRegEx: A tool for patterns identification in Fungal Proteomic sequences using regular expressions","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-18 14:50:09","doi":"10.21203/rs.3.rs-3852782/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3957699f-3788-42a8-ac92-18c82ea3252d","owner":[],"postedDate":"January 18th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":28203318,"name":"Biological sciences/Computational biology and bioinformatics/Software"},{"id":28203319,"name":"Biological sciences/Computational biology and bioinformatics/Data acquisition"},{"id":28203320,"name":"Biological sciences/Computational biology and bioinformatics/Data processing"}],"tags":[],"updatedAt":"2024-03-06T06:46:29+00:00","versionOfRecord":[],"versionCreatedAt":"2024-01-18 14:50:09","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3852782","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3852782","identity":"rs-3852782","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-20T11:00:21.680559+00:00

License: CC-BY-NC-4.0