{"paper_id":"20f445df-6506-4633-a9e7-df10de76a390","body_text":"R2G2: A Python-R Framework for Seamless Integration of R/Bioconductor Tools into \nGalaxy \n \nJayadev Joshi1, Fabio Cumbo1, Daniel Blankenberg1,2,* \n1 Center for Computational Life Sciences, Cleveland Clinic Research, Cleveland Clinic, \nCleveland, OH, USA \n2 Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case \nWestern Reserve University, Cleveland, OH, USA \n \n* To whom correspondence should be addressed: \nDaniel Blankenberg1,2,*, Center for Computational Life Sciences, Cleveland Clinic Research, \nCleveland Clinic, 9500 Euclid Avenue, NA2, Cleveland, OH 44195, USA. \nEmail: blanked2@ccf.org \n \n \nABSTRACT \nR is widely used in statistical computing, data analysis, and bioinformatics. A key contributor to \nits success in bioinformatics and computational biology is the open-source project Bioconductor. \nAs of its latest release (3.20), the Bioconductor community offers 2,289 software packages for \nbiomedical research, including genomic, transcriptomic, and proteomic analyses. Given R’s \ngrowing importance, integrating R and Bioconductor tools into platforms like Galaxy enhances \naccessibility, reproducibility, and scalability in bioinformatics workflows. The Galaxy Toolshed \nprovides multiple tools leveraging R and Bioconductor packages. Additionally, various \nopen-source public Galaxy servers, such as usegalaxy.org and usegalaxy.eu, already host several \nR and Bioconductor-based tools, highlighting the importance of past integration efforts. \nHowever, given the vast number of available packages, the full potential of R and Bioconductor \nwithin the Galaxy ecosystem remains underutilized. Galaxy’s web-based interface makes these \npowerful tools more accessible to researchers without programming expertise, fostering broader \ncollaboration. Despite its advantages, integrating R packages into Galaxy can be complex. It \nrequires XML wrappers to define inputs, outputs, and parameters, which can be time-consuming. \nManaging dependencies from CRAN and Bioconductor, resolving installation issues, and \nensuring compatibility across different package versions further complicates the process. Many \ntools also require custom scripting, creating a steep learning curve for non-programmers. To \naddress these challenges, we have developed a tool that automates the generation of Galaxy \nwrappers for R packages. This eliminates the need for manual XML writing, reduces complexity, \nand saves time. Our tool provides an intuitive interface for creating Galaxy-compatible tools \nwithout programming expertise and automates dependency management for seamless execution. \nBioconductor has revolutionized bioinformatics, with thousands of researchers relying on its \ntools. Automating its integration into Galaxy removes technical barriers, democratizing access to \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nadvanced bioinformatics tools and workflows. Our solution bridges the gap between R-based \nanalysis and user-friendly, scalable tools, ultimately advancing research accessibility and \nscientific discovery.  \n \n \n1. Introduction \n \nIn recent decades, the world has experienced unprecedented growth in data, particularly in the \nbiomedical sciences, where high-throughput experiments and large-scale studies generate \nmassive datasets. Community-driven initiatives, such as open-source software platforms and \ncollaborative databases, have played a crucial role in enabling researchers to manage, share, and \nanalyze these massive datasets effectively. (Silva et al. 2025; Mansueto et al. 2024). \nProgramming languages, specifically R and Python, have played a crucial role in biomedical \nresearch by enabling rapid development of tools and algorithms to handle and analyze these \nmassive and diverse datasets (Giorgi et al. 2022). In the recent past, we have observed an \nenormous growth in software packages related to these programming languages. For example, \nBioconductor, the leading repository for R-based bioinformatics tools, has grown substantially \nand currently hosts 2,289 software packages as of 3.20 release, with approximately 75 new \npackages added annually through two releases each year. These include 928 annotation packages \nand 431 experiment data packages, supporting diverse genomic analyses. Similarly, Python's \nbioinformatics ecosystem, exemplified by the Biopython project, benefits from the broader \nPython Package Index (PyPI), which contains over 614,000 packages as of March 2025. (Giorgi \net al. 2022; Chan 2018; Staples 2023). Due to its math oriented community, R has become one of \nthe most widely adopted programming environments when it comes to statistical computing and \nmathematics application in bioinformatics. Its active development community and rigorous \nstandards for software interoperability and reproducibility have made R and Bioconductor an \nindispensable resource in the field (Siraji and Rahman 2023; Gentleman et al. 2004; Gentleman \net al. 2005). Despite the availability of numerous powerful methods provided free of cost through \nthese programming libraries, lack of advanced computational literacy among biologists remains \nthe biggest limiting factor in the widespread adoption of these techniques. A similar trend is \nindicated by recent surveys that while bioinformatics tools are becoming essential in life science \nresearch, many researchers lack the confidence and training to use them effectively. In a Saudi \nArabia-based survey of 309 scientists, 42.4% reported using bioinformatics tools in their \nresearch, but only 30.1% identified as working in bioinformatics-related fields. Among those \nusing bioinformatics tools, more than half (51.9%) did so only occasionally. Furthermore, 56.4% \nof respondents acknowledged lacking sufficient bioinformatics knowledge (Alomair and \nAbolfotouh 2023). A global training survey by SEB/GOBLET reported that 57% of wet-lab \nscientists lacked confidence in using bioinformatics tools, 74% had no programming experience, \nand 58% felt uncomfortable with statistical methods (Attwood et al. 2019; Williams et al. 2019). \nThese findings highlight a substantial gap between the growing demand for bioinformatics \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nanalysis and the practical ability of many researchers to perform such analyses independently \n(Balamurugan et al. 2021; Wilson Sayres et al. 2018; Verli and de Melo Minardi 2022). In recent \nyears, open-source platforms such as Galaxy have played a significant role in bridging this gap \nby providing user-friendly, web-based graphical interfaces that help biologists adopt these \nadvanced methods without computational expertise (Grüning et al. 2017; Joshi and Blankenberg \n2022; Blankenberg et al. 2010). Integrating R and Bioconductor packages within Galaxy not \nonly democratizes access to sophisticated statistical and bioinformatics methods but also \nfacilitates collaboration across diverse research groups (Afgan et al. 2018; Blankenberg et al. \n2010; Baichoo et al. 2018; Goecks et al. 2010; Langer et al. 2025). To date, 82 Bioconductor \npackages have been integrated with Galaxy; some of the most popular examples include DESeq2 \n(Love et al. 2014) and limma (Ritchie et al. 2015) for differential gene expression analysis. In a \nnutshell, developing Galaxy-compatible tools for R and Bioconductor packages typically \nrequires writing XML-based tool wrappers to define inputs, outputs, and parameters, a process \nthat is time-consuming, prone to error, and presents a steep learning curve for researchers \nwithout software development expertise (Cock et al. 2013; Joshi and Blankenberg 2022). \nAlthough the development of tools and workflows for Galaxy is largely community-driven, it \nstill requires a substantial amount of human labor and time to develop, test, and maintain \nhigh-quality tools for the research community. In addition to this, managing dependencies from \nCRAN and Bioconductor, resolving package installation issues, and ensuring compatibility \nacross different software versions introduce additional technical challenges. \nTo address these barriers, save time and labor, and reduce technical complexity, we have \ndeveloped R2G2, a Python- and R-based package that streamlines the integration of R packages \ninto the Galaxy platform. This solution eliminates the need for manual XML wrapper creation, \nsimplifies dependency management, and provides an intuitive interface for generating \nGalaxy-compatible tools. By automating these tasks, R2G2 lowers technical barriers, accelerates \ntool development, and broadens access to R and Bioconductor’s extensive resources as thousands \nof investigators worldwide relying on their tools for advanced biological data analysis (Ruprecht \net al. 2024; Giorgi et al. 2022). By automating the integration of R and Bioconductor packages \ninto Galaxy, R2G2 democratizes access to these resources, helps bridge the computational skills \ngap identified by recent surveys, and fosters broader collaboration between computational and \nexperimental scientists. This work contributes to ongoing community-driven efforts to make \nstate-of-the-art bioinformatics tools more accessible, scalable, and reproducible for the wider life \nscience research community. \n2. Material and Methods \n2.1 The main motivation of the work – The primary objective of this project was to develop a \ncomprehensive framework to facilitate the integration of valuable R and Bioconductor \nfunctionalities into Galaxy. While the R and Bioconductor ecosystem offers an extensive \ncollection of R-based packages for statistical analysis and visualization of various omics \ndatasets, its adoption within integrative workflow environments such as Galaxy remains limited \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\ndue to the previously mentioned challenges. To achieve this, we designed a Python and R-based \nlibrary that automates the generation of wrapper scripts for Bioconductor packages, thereby \nenabling their seamless deployment as Galaxy tools. Python was selected as the implementation \nlanguage because of its interoperability, wide adoption in the bioinformatics community, and \nnatural compatibility with the Galaxy ecosystem. \nThe library systematically manages package dependency resolution, argument parsing, and \ninput/output standardization, thereby reducing the burden on developers who would otherwise \nneed to manually configure wrappers. Beyond building wrappers directly from CLI \n(command-line interface) architecture based R scripts, we also implemented a dedicated module \ncapable of generating Galaxy wrappers for complete R packages. In both approaches, the \nworkflow begins with either an R script or a Bioconductor package. The subsequent step \ninvolves identifying and mapping the arguments into Galaxy input and output parameters. \nFinally, the system generates the command-line interface along with the input and output \nsections required for a fully functional Galaxy wrapper. The full implementation details are as \nfollows. A complete  framework has been shown in Figure 1:  \n \nFigure 1. Automatic Generation of R -based tools to integrate into the Galaxy platform. \nThis workflow illustrates how R library functions are mapped and converted into Galaxy tools. \nOn the left, R library packages are parsed to extract function objects, arguments, and \ndefinitions, which are then translated into Galaxy parameter sets. On the right, from a R \ncommand line script, FakeArg.r script  exported arguments in the python format as JSON \nobjects, which are converted back into Python code via a custom argument parsing class \n“CustomFakeArg”. These arguments are subsequently mapped to Galaxy parameters. \nTogether, both processes enable the automatic generation of XML wrappers to integrate \nR-based bioinformatics tools within Galaxy’s graphical user interface (GUI). The GUI on the \nright side was created using the ggplot2 library, while the GUI on the left side was generated \nusing an Rscript based on the DEP package. \n \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n2.2 System Architecture and Environment A detailed overview of the system architecture is as \nfollows  \n2.2.1 Galaxy Platform Setup – The R2G2 library can generate XML wrappers that are \ncompatible with the Galaxy software and function across multiple Galaxy releases. Additionally, \nthe provided tool template allows flexibility to specify a particular tool profile when needed. \nAdditionally, the provided tool template allows flexibility to specify a particular tool profile \nwhen needed. \n2.2.2 R Version and Bioconductor Release – By default, Galaxy tools rely on Conda \nenvironments for dependency resolution and reproducibility. Therefore, when generating a \nGalaxy tool wrapper from an R package using the R2G2 library, it is essential that a \ncorresponding Conda dependency is available. This ensures that the required R version, \nBioconductor release, and associated packages can be correctly installed and executed within \nGalaxy. Similarly, when generating wrappers from R scripts, the underlying R package should \nalso be available as a R or Bioconductor Conda package. This integration allows R2G2 to \nseamlessly reference the correct dependencies and manage them through Conda (or alternatives \nsuch as mamba, Docker, or Singularity when needed), ensuring consistent runtime environments \nacross Galaxy instances. \n2.2.3 Generating Galaxy tools from R packages – Developing software packages for various \nmethods and algorithms is one of the most common and effective approaches for disseminating \ncode within the scientific community. This practice enhances reproducibility and facilitates the \nbroader adoption of new methodologies. By providing ready-to-use functions and classes, \npackages enable researchers to utilize existing algorithms without the need to reinvent the wheel, \nwhile also supporting the development of novel algorithms that can build upon established \nfunctionalities. Given the broad availability of R and Bioconductor packages, we implemented \nfunctionality to automatically generate Galaxy wrappers directly from R packages. The \nimplementation details are as follows: \n2.2.3.1 Wrapper Generation Workflow – The Python script “r2g2_on_package.py” \nautomatically converts R library functions into Galaxy-compatible tool wrappers. The script \nleverages the rpy2 interface to interact with R packages, extract function signatures, and \ndynamically generate the corresponding Galaxy tool XML definitions. This process ensures \nthat R-based functionality can be seamlessly integrated into Galaxy workflows without \nrequiring extensive manual tool wrapper development. \n2.2.3.2 Input Parameters – The script accepts several command-line arguments to generate \nGalaxy tool wrappers. The “--name” parameter specifies the R package to be wrapped and is \nrequired. The “--package_name” argument defines the corresponding Conda package name, \nwhich is optional and defaults to the R package name if not provided. Similarly, the \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n‘--package_version” argument allows the user to specify the Conda package version, with the \ndefault set to the detected version. The “--out” parameter designates the output directory for \nthe generated Galaxy wrapper files, which defaults to out/. Finally, the \n“--galaxy_tool_version” argument assigns a version string to the Galaxy tools, with a default \nvalue of 0.0.1. Full parameter details are integrated with the R2G2’s “--help” command. \n2.2.3.3 R Package Import and Metadata Extraction – We utilized the rpy2 package to \nenable seamless integration of R and Bioconductor based tool Generation. The \nrpy2.robjects.packages module acts as Python’s gateway to the R package ecosystem, with its \ncore function importr() calling R’s library() and wrapping the resulting R namespace into a \nPackage object. This makes R functions accessible as callable Python objects through \nSignatureTranslatedFunction. R2G2 leverages these capabilities to extract metadata and other \nR objects, dynamically importing the specified R package. The package version is \nautomatically detected unless explicitly provided. Based on this information, a macro XML \nfile is generated, containing reusable tool components such as requirements, macros, and \nversioning details. \n2.2.3.4 Function Iteration and Documentation – The next step is to iterate over each \nfunction in the imported R package. For each function: \na. Function Discovery: Functions are identified via dir() on the package object. \nb. Documentation Retrieval:  Help pages are accessed through rpy2.robjects.help.pages, \nwhich extracts the underlying R help files (.Rd documentation associated with the R \nobject). When available, these .Rd entries are converted to reStructuredText (RST) and \nembedded into the tool’s help section. If multiple help pages exist, all are concatenated. \nWhen no R documentation is found, the Python docstring is used as a fallback. \nc. Tool Metadata Initialization:  A metadata dictionary is constructed, containing the \nGalaxy tool ID, version, function name, description, and other required XML fields. \n2.2.3.5 Parameter Processing – Function parameters are analyzed using the \npackage_obj.formals() method, which retrieves the formal arguments defined for each R \nfunction. For every parameter, the script identifies its default value, data type (e.g., integer, \nfloating point, string, or logical/boolean), and whether it represents a single or multiple input. \nEach parameter is then mapped to the corresponding Galaxy XML input template (Table 1), \nenabling seamless integration of function arguments into the Galaxy tool interface. When the \nparameter type cannot be reliably inferred, a generic not_determined template is applied to \nensure flexibility in handling diverse input types. Furthermore, the script incorporates \nspecialized handling for the ellipsis (...) parameter, which is converted into repeatable and \nconditional Galaxy inputs, thereby supporting functions that accept variable-length argument \nlists. \n \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n \n \n \n2.2.3.6 R Script Generation – For each wrapped function, an R script block is \nprogrammatically generated. This script begins by loading the required R package \n[library(package_name)], then maps Galaxy inputs to the corresponding R function arguments \nwith appropriate type conversions, such as using readRDS for datasets or applying string \nquoting for text values. Once the inputs are processed, the script executes the R function with \nthe mapped arguments and saves the resulting object(s) in RDS format, a native binary file \nformat in R (R Data Serialization) used to store single R objects and later read them back into \nan R session, ensuring compatibility with downstream Galaxy tools. However, RDS format \nrelies on R’s internal serialization mechanism, therefore it shares the same security drawbacks \n \nR Argument \nParsing (argparse) \nMeaning in CLI Galaxy Tool Parameter Notes / Translation \nRule \ntype = \"character\" String input <param type=\"text\"> Default text input in \nGalaxy. \ntype = \"integer\" Integer input <param type=\"integer\"> Restrict input to \nintegers. \ntype = \"double\" / \n\"numeric\" \nFloating-point input <param type=\"float\"> Maps to numeric entry. \naction = \"store_true\" Boolean flag (set True \nif present) \n<param type=\"boolean\" \ntruevalue=\"--flag\" \nfalsevalue=\"\" /> \nCommon for switches \nlike --verbose. \naction = \n\"store_false\" \nBoolean flag (set False \nif present) \n<param type=\"boolean\" \ntruevalue=\"\" \nfalsevalue=\"--flag\" /> \nInverted flag logic. \nchoices = \nc(\"A\",\"B\",\"C\") \nRestrict to set of values <param type=\"select\"> \nwith <options> \nDrop-down menus in \nGalaxy. \nnargs = \"+\" or \"*\" Multiple values allowed <param type=\"data\" \nmultiple=\"true\" /> \n<repeat>  \nmetavar = \"FILE\" Expected input file <param type=\"data\"> Galaxy handles file \ninput. \nSubparsers/Mutually \nexclusive groups  \nArgument parsing \ngroups \n<conditional>  Galaxy conditional \nparameter block \nTable 1 Mapping of R argument parsing parameters to corresponding Galaxy tool input and \noutput parameters. The table illustrates how commonly used R command-line arguments \n(defined via argument parsers) can be systematically translated into Galaxy XML tool \ndefinitions, enabling seamless integration of R-based tools within the Galaxy platform. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nas Python’s pickle format. The bigger security drawback is that the deserialization process can \nexecute embedded code. In practice, this risk can be avoided by restricting file sources to \ncontrolled environments and using alternative portable formats for data sharing. \n2.2.3.7 XML Tool Wrapper Creation – The processed metadata, function documentation, \nparameter specifications, and generated R script are inserted into an XML tool template \n(tool_xml). Each function results in a corresponding Galaxy tool XML file saved in the output \ndirectory. The tool IDs are automatically sanitized (removing invalid characters) to meet \nGalaxy ToolShed requirements. Optionally, a specialized r_load_matrix.xml wrapper is \ngenerated to facilitate matrix-to-RDS conversion. This wrapper serves as a useful helper tool \nthat allows Galaxy users to seamlessly transform tabular data into a format that R functions, \nand the automatically generated wrappers, can readily interpret and process. By bridging the \nformat gap between Galaxy’s standard tabular inputs and R’s native RDS format, it ensures \ncompatibility, since most R functions require structured R objects rather than raw text. \nMoreover, it standardizes inputs by providing a consistent mechanism for reading data once \nconverted into RDS, thereby simplifying integration and enhancing interoperability across all \nwrapped R functions. \n2.2.3.8 Error Handling – The script includes robust error handling. If a function cannot be \nprocessed (e.g., due to undocumented parameters or unsupported argument types), the \nfunction is skipped with a logged warning.  \n2.2.4 Generating Galaxy tools from R scripts – Another common and user-friendly way of \ndistributing algorithms is through command-line tools implemented in programming languages. \nLibraries such as argparse in Python and R make command-line argument handling more \nintuitive and powerful. These argument parsing based scripts enable the creation of more \ncomprehensive and complete tools which require almost no adjustment and can be used directly \nunder a Galaxy tool. A detailed description is provided below, outlining the in-depth workflow of \nthis implementation \n2.2.4.1 R Script Wrapper Generation – To enable Galaxy tool integration for R scripts with \ncommand-line arguments, we implemented a Python-based wrapper generation workflow. \nThis functionality enables users to generate Galaxy tool wrapper for an Rscript that can run as \na command line tool implemented based on the r-argparse library.  \n2.2.4.2 Converting R based arguments to Python based arguments – The first step in this \nprocess is to extract information about the command-line arguments that an R script accepts. \nTo achieve this, we implemented the FakeArg.r script based on the r-argument package. The \nFakeArgs class is implemented using the R6 package in R, which provides an object-oriented \nprogramming (OOP) system. This OOP system offers a reference-based approach and \nsupports a traditional object-oriented programming style similar to that of Python, Java, or \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nC++, including features such as methods, fields, inheritance, and encapsulation. The FakeArgs \nclass leverages the native functionality of the r-argparse library to extract arguments from the \nR script and convert them into Python-compatible strings. These Python-compatible strings, \nwhich contain the argument definitions, are then saved into a JSON file for downstream \nprocessing. \n2.2.4.3 Convert arguments strings into Python code – In the next step, we utilized \njson_to_python, json_to_python_for_param_info, and extract_simple_parser_info functions to \nassists in categorizing whether an argument belongs to a conditional block. This is determined \nby iterating over the arguments and inspecting their grouping, such as subparsers or mutually \nexclusive groups.  \n2.2.4.4 Dynamically extracting the parameters from the converted Python arguments – \nIn this step, we leverage the anvi’o (Eren et al. 2020) tool-wrapper-generation package. \nSpecifically, our CustomFakeArg class extends the functionality of the FakeArg class \nprovided by the anvio package (from anvio import FakeArg), which itself inherits from \nPython’s standard argparse library, thereby enabling structured handling of command-line \narguments within the wrapper generation workflow. Parameter metadata, including names, \ntypes, and categories, is extracted using CustomFakeArg. This information is then \nencapsulated within a blankenberg_parameters object, which provides methods to generate \nconditional input blocks, mutually exclusive groups, miscellaneous parameters, and \ncorresponding command-line representations. These components are subsequently combined \nto construct the full tool command section, ensuring that the original R script can be executed \nvia Galaxy with correctly mapped arguments \n2.2.4.5 Mapping Extracted parameters to Galaxy tool wrapper – Once all parameter \ndetails, including names, types, metadata, and information about various groups such as \nmutually exclusive groups or subparsers, are obtained, the next step is to convert them into the \ncorresponding components of a Galaxy wrapper. Various methods of the CustomFakeArg \nclass, such as generate_conditional_block, generate_mutual_group_conditionals, \ngenerate_misc_params, and generate_command_section_subpro, are used to construct \nindividual sections of the tool.  \n2.2.4.6 Dependencies management – Dependencies required by the R script are identified \nusing return_dependencies function and formatted into Galaxy-compatible <requirements> \ntags via return_galax_tag. This ensures that all necessary R packages are available in the \nConda environment at runtime.  \n2.2.4.7 XML Wrapper Generation – All components generated in the previous steps, \nincluding input blocks, output blocks, dependency blocks, and command-line sections, are fed \ninto a Jinja2-based XML template (from jinja2 import Template) to generate the complete \nGalaxy XML wrapper. The tool metadata, such as ID, name, version, description, inputs, \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\noutputs, commands, help text, and dependencies is incorporated into the template to produce a \nrobust and fully defined Galaxy tool. The resulting XML file is written to the specified output \ndirectory, and any temporary working directories and intermediate files are removed to \nmaintain a clean workspace. This workflow enables any R script with argument parsing to be \nautomatically converted into a fully functional Galaxy tool wrapper, bridging the gap between \nstandalone R scripts and reproducible Galaxy workflows while ensuring consistent parameter \nhandling, dependency management, and standardized input/output formats. \n2.2.5 Integration with Galaxy Toolshed – The Galaxy ToolShed is a central repository for \nsharing Galaxy tool wrappers, enabling researchers to easily access, install, and use a wide range \nof tools within the Galaxy platform. It serves as both a distribution hub and a version-controlled \narchive, helping maintain consistent tool functionality across different Galaxy instances. To \nfacilitate the development and deployment of high-quality tools, the Planemo toolkit can be used. \nPlanemo provides commands to lint, test, and validate Galaxy tool wrappers locally before \nsubmission, ensuring adherence to Galaxy’s standards and best practices. Once a wrapper passes \nvalidation, Planemo can be used to push it directly to a Galaxy ToolShed repository, streamlining \nthe process of distribution and making the tool readily available to the broader Galaxy \ncommunity. By leveraging ToolShed and Planemo, developers can maintain reliable, \nreproducible tools while simplifying installation and sharing across the broader Galaxy and \nresearch community. \n2.2.6 Reproducibility – For reproducibility, Galaxy supports containerization via Docker and \nSingularity, allowing users to deploy pre-installed Galaxy instances with the generated R-based \ntools, along with supporting data libraries and workflow suites. This approach enables \nresearchers to provide a fully configured, containerized environment that ensures a robust and \nreproducible data analysis experience with the developed tools. These workflows not only \nprovide access to the tools but also guarantee reproducibility of analyses, making them available \nto a global community of over 600,000 Galaxy users. By combining containerized Galaxy \ninstances, ToolShed distribution, and workflow sharing, developers can deliver reliable, \nreproducible, and easily accessible computational tools and workflows at scale. \n3. Results and Discussion \nTo evaluate the effectiveness of our automated Galaxy wrapper generation tool, we focused on \nintegrating R and Bioconductor packages into the Galaxy platform. Given the extensive \nrepertoire of over 2,200 Bioconductor packages, manual wrapper creation is time-consuming and \nprone to errors, especially for complex tools with multiple inputs, outputs, and interdependent \nparameters. Using our approach, wrappers can be generated automatically from R scripts, \nsignificantly reducing development time while ensuring proper handling of dependencies and \ncompatibility across versions. We applied the tool to both in-house and open-source R scripts, \nincluding widely used packages such as ggplot2, demonstrating its capability to produce fully \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nfunctional Galaxy tools. This automation not only streamlines the integration process but also \nenhances reproducibility, accessibility, and scalability, enabling researchers without \nprogramming expertise to leverage advanced R-based analyses within Galaxy workflows. By \nbridging the gap between Bioconductor and Galaxy, our approach facilitates the broader \nadoption of R-based bioinformatics tools, supporting more reproducible and collaborative \nresearch practices. \nWe present several use cases leveraging publicly available  R scripts alongside in-house \ndeveloped R scripts for various biological analyses \n3.3.1 Generating wrappers from R scripts that support command-line argument parsing. \n3.3.1.1 Based on in-house generated R scripts – Integrated analysis toolset for robust and \nreproducible analysis of mass spectrometry proteomics data.  \na. Dataset In this use case, we utilized the processed example dataset provided by the DEP \npackage (Zhang et al. 2018). The original data belong to the authors of the study in \nwhich ubiquitin-protein interactors were characterized (Zhang et al. 2017). Before \nsupplying the data with the DEP package, the raw mass spectrometry data was processed \nusing MaxQuant (Zhang et al. 2017; Cox and Mann 2008), and we simply utilized the \nresulting dataset provided through the DEP package. We used this publicly available \nprocessed file exclusively to demonstrate the usability of R2G2 on our in-house \ngenerated R script for automated Galaxy wrapper generation.  \nb. DEP_preprocessing.r CLI tool enables automated preprocessing and quality control of \nlabel-free quantitative proteomics data within Galaxy. Implemented command-line R \nmodule wraps the core DEP preprocessing functions behind a structured argument \ninterface. This script is providing reproducible and parameterized execution of all major \npreprocessing steps, including filtering, normalization, imputation, and QC \nvisualization,  Figure S1. The tool takes two primary inputs: the unique proteins table \nwith LFQ (Label Free Quantification) intensity values and the experimental design file, \nboth supplied in CSV format. Users may also specify the prefix used for LFQ intensity \ncolumns, allowing compatibility with datasets from different quantification pipelines \n(e.g., MaxQuant, FragPipe). Using these inputs, the script constructs the \nSummarizedExperiment object required for DEP-based processing. Following \nnormalization, protein intensities are imputed using the user-selected method (e.g., \nMinProb, MinDet, kNN, QRILC, MLE, bpca). The resulting imputed dataset is saved in \nboth RDS and CSV formats for downstream analysis. Additional diagnostic plots, \nincluding imputation density plots, are generated to help users evaluate the effect of the \nimputation strategy. For convenience, the tool produces a combined PDF containing all \ngenerated plots, facilitating rapid review of preprocessing quality in a single document. \nOverall, the generated Galaxy wrapper based on this  R script, with the help of R2G2 \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\npackage, provides a comprehensive and flexible preprocessing pipeline for LFQ \nproteomics data within Galaxy, enabling users to perform standardized QC, \nnormalization, and imputation without requiring direct interaction with R.  \nc. DEP_DE_analysis.r CLI tool exposes the core DEP differential expression and \nvisualization functions through a standardized argument interface. The tool accepts an \nimputed DEP SummarizedExperiment object (RDS format) and performs differential \nexpression testing using test_diff() based on a user-selected control condition, Figure S2. \nSignificance thresholds for P-value (α) and log₂ fold change can be set through Galaxy \nparameters, and the tool automatically annotates significant proteins using \nadd_rejections(). All differential expression results are exported as a CSV file for \ndownstream use within Galaxy workflows. In addition, the script provides two \nplot-generation modes, PCA and volcano, implemented based on the subparsers. Users \ncan  configure principal components, the number of variable proteins, point size, \ncontrast definitions, label size, and whether to display protein names. The script \ngenerates publication-quality plots using the corresponding DEP functions (plot_pca() or \nplot_volcano()) and saves them to the Galaxy-designated output directory. \nOverall, the generated Galaxy wrapper based on this  R script with the help of R2G2 package \nserves as the computational backend for the Galaxy tool, enabling flexible, reproducible \nexecution of DEP differential expression analysis and visualization directly within Galaxy’s \nworkflow system. It provides parameterized control, standardized outputs, and seamless \nintegration with upstream preprocessing and downstream interpretation tools. \n3.3.1.2 Based on open-source R scripts – To demonstrate the usability and versatility of the \nR2G2 package, we downloaded publicly available R script-based tools implemented in a \ncommand-line style using the r-argparse library. In total, we collected around 41 different \ntools, some of which are highlighted below, with details presented in Table 2. \n \nS. No. Script name Category \n1 indelfindr.R Genetic variants detection \n2 CpG_island_identificator.R CpG Island research \n3 DESeq2.R DE analysis \n4 edgeR.R \n5 extract_or_remove_and_split_from_multifasta.R Fasta file manipulation for \nsequence analysis \n6 extract_or_remove_seqs_from_fasta.R \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n \n7 fasta_select_or_remove_by_header_pattern.R \n8 fasta_split_to_multifasta_by_win_step.R \n9 Update fasta_split_to_multifasta_by_win_step.R \n10 fasta_split_to_singlefastas_by_win_step.R \n11 Update fasta_split_to_singlefastas_by_win_step.R \n12 merge_multifasta_to_singlefasta.R \n13 merge_seqs_from_singlefasta.R \n14 parse_headers_from_fasta.R \n15 select_fasta_or_header_by_length.R \n16 singlefasta_to_multifasta.R \n17 split_multifasta_to_multifastas.R \n18 split_multifasta_to_singlefasta.R \n19 trim_multifasta.R \n20 combine_dssp_statistics_to_table.R PDB data analysis \n21 pdb_to_fasta.R \n22 split_pdb_to_chains.R \n23 split_pdb_to_fasta.R \n24 trim_pdb.R \n25 trim_pdb_by_chain.R \n26 trim_pdb_to_fasta.R \n27 Update trim_pdb_to_fasta.R \n28 trim_pdb_to_fasta_by_chain.R \n29 Update trim_pdb_to_fasta_by_chain.R \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \na. INDELfindR is an open-source command-line tool developed in R for the detection of \nboth simple and complex insertion-deletion (INDEL) variants, a common form of genetic \nvariation where nucleotides are either inserted into or deleted from the genome. INDELs \ncan have profound biological consequences, often altering coding sequences, disrupting \nprotein function, or affecting gene regulation. Detecting these variants is therefore \nessential for understanding genetic diversity, disease mechanisms, and potential \nbiomarkers. The tool processes sequencing data and outputs INDEL calls in VCF v4.3 \nformat, making the results directly compatible with widely used downstream analysis and \nannotation pipelines. This ensures smooth integration with existing workflows for variant \nannotation, population genetics studies, and clinical interpretation. With INDELfindR, \nresearchers can analyze data from whole-genome or exome sequencing experiments to \nidentify INDELs linked to disease-associated genes, functional variants that impact \n \n30 all_aas_content_multifasta.R Sequence analysis and \nstatics \n31 calculate_and_select_aa_content.R \n32 calculate_and_select_at_content.R \n33 calculate_and_select_by_length.R \n34 calculate_and_select_gc_content.R \n35 calculate_biophysical_properties_multifasta.R \n36 get_length_from_multifasta.R \n37 multifastas_gc_at_stats.R \n38 multifastas_length_stats.R \n39 select_fasta_or_header_by_aa.R \n40 select_fasta_or_header_by_at.R \n41 select_fasta_or_header_by_gc.R \nTable 2 Summary of publicly available R scripts for computational biology analysis, used in \nthis study to demonstrate the usability of R2G2 in automatic Galaxy tool generation. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nprotein structure, or candidate markers for diagnostics and therapeutic targeting. By \nproviding a streamlined, reproducible, and standards-compliant workflow, INDELfindR \nsupports both fundamental research and translational applications in genomics and \nprecision medicine (https://github.com/TranslationalBioinformaticsLab/INDELfindR). \nb. biomisc_R biomisc_R is a collection of command-line bioinformatics scripts written in \nR. This toolkit provides modular utilities for common tasks such as handling FASTA \nfiles, analyzing sequence statistics, identifying CpG islands, performing \ndifferential-expression analysis, manipulating PDB structures, and processing synthetic \nbiology constructs. The repository leverages essential R packages including r-argparse, \nDESeq2, edgeR, ape, phylotools, stringr, bio3d, Biostrings, GeneGA, and Peptides to \nachieve functionality across domains like structural bioinformatics, genomics, and \nstatistics (https://github.com/olgatsiouri1996/biomisc_R).  \n3.3.1.3 Rscripts from Bioconductor packages – In this study, we systematically collected \nand analyzed 2,289 Bioconductor packages to extract Rscripts that utilize the argparse library \nfor command-line argument parsing. From these packages, we identified 51 Rscripts, reported \nin the Table 3, belonging to key tools, including CircSeqAlignTk, MAGAR, RnBeads, infercnv, \nand openCyto. Using R2G2, Galaxy wrappers for these Rscripts were generated and are \nprovided alongside this manuscript. This approach demonstrates that automated wrapper \ngeneration can standardize script execution and simplify integration into reproducible \npipelines. The wrappers enable streamlined access to diverse analytical functions, reduce \nmanual intervention, and enhance scalability across multiple datasets. Overall, this work \nillustrates the feasibility of creating a comprehensive, command-line-based interface for a \nwide array of Bioconductor tools, supporting more efficient and reproducible computational \nanalyses in genomics and proteomics research. \n \nS.No. Bioconductor package name  Rscript \n1 \n \n \nCircSeqAlignTk: A toolkit for \nend-to-end analysis of RNA-seq \ndata for circular genomes \nalignment.R \n \n \n2 MAGAR: R-package to compute \nmethylation Quantitative Trait Loci \n(methQTL) from DNA methylation \nand genotyping data \ncluster.R \n3 rscript_chromosome_job.R \n4 \n \nrscript_summary.R \n \n5 \nRnBeads: facilitates comprehensive \nanalysis of various types of DNA \nmethylation data at the genome \nscale. \n \n \nrscript_differential.R \n6 rscript_differential_chunk.R \n7 rscript_differential_wrapup.R \n8 rscript_exploratory.R \n9 rscript_import.R \n10 rscript_inference.R \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n \n \n \n \n11 rscript_preprocessing.R \n12 rscript_qc.R \n13 rscript_tnt.R \n14 rscript_wrapup.R \n15 \nInfercnv: Infer Copy Number \nVariation from Single-Cell \nRNA-Seq Data \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \ninferCNV_constants.R \n16 KS_matrix_comparison.R \n17 KS_matrix_comparison.use_infercnv_obj.R \n18 QQ_matrix_comparison.R \n19 apply_median_filtering.R \n20 boxplot_cell_exprs.R \n21 cross_cell_scaling_normalization.R \n22 dropout_matrix_comparison.R \n23 examine_normal_cutoffs_vs_KS.R \n24 examine_normal_sampling_distributions.R \n25 examine_normal_sampling_distributions.i3.R \n26 explore_HMM_exec.R \n27 explore_HMM_exec.hspike.R \n28 genome_smoothed_lineplots.R \n29 inferCNV_to_HB.R \n30 infercnv_obj_to_input_files.R \n31 meanvar_sim_counts.R \n32 plot_hspike.R \n33 plot_hspike.by_num_cells.R \n34 plot_hspike.diff_normal_tumor.R \n35 plot_hspike_vs_sample_chrs.R \n36 plot_infercnv_obj.R \n   37 plot_tumor_vs_normal_chr_densities.R \n38 plot_tumor_vs_normal_chr_densities.i3.R \n39 \n \nrecursive_random_tree_height_cutting.random\n_trees.R \n40 \n \nrecursive_random_tree_height_cutting.sigclust\n2.R \n41 \n \nrecursive_random_tree_height_cutting.using_h\nmms.R \n42 run.stub.R \n43 run_BayesNet.R \n44 run_HMM_each_cell_separately.R \n45 run_HMM_on_hspike.R \n46 run_HMM_on_subclusters.R \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n \n3.3.2 Generating tool wrappers from R package – The ggplot2 package is one of the most \nwidely used libraries in R for creating high-quality, publication-ready visualizations. By \ngenerating Galaxy tool wrappers for ggplot2, we demonstrate how standard visualization tasks \ncan be seamlessly integrated into the Galaxy platform. This use case highlights the automation of \nwrapper creation for the core functions of the ggplot2 library, enabling users to access advanced \nplotting capabilities and leverage these building blocks for developing more sophisticated \nGalaxy tools. R2G2 iterates over all available objects in the ggplot2 library, identifies functions \nbased on their signatures, and determines their arguments to generate corresponding Galaxy tool \nparameters. Through this process, more than 450 Galaxy tool wrappers are created. While the \nusability of these wrappers as standalone tools is limited for comprehensive analyses, the \nprimary motivation behind this implementation is to dynamically extract function-level \ninformation and provide a complete set of building blocks for robust Galaxy tool development. \nThese extracted wrappers can then be combined to assemble more complex and practical tools, \nparticularly in cases where argument-parsing scripts are not readily available. \nDespite this, the extensive array of available R and Bioconductor packages indicates that their \nfull potential within the Galaxy ecosystem has yet to be realized. The results demonstrate that the \nR2G2 package provides comprehensive and detailed support in streamlining the development of \nGalaxy tools for R-based packages. Efforts such as R2G2 are essential for harnessing the true \npower of these packages in bioinformatics analyses. By leveraging Galaxy’s web-based \ninterface, these tools become more accessible to researchers without programming expertise, \npromoting broader collaboration and facilitating the adoption of advanced computational \nanalyses across diverse research communities. R2G2 not only supports wrapper generation for \nR-based scripts implemented using argument parsing but also enables package-based wrapper \ngeneration, thereby enhancing usability and providing broader coverage in R-based tool \ndevelopment. Despite the powerful functionality of the R2G2 package, it is reasonable to \nhighlight some limitations we observed during testing, which could inform future development \nand enhancement of the package. Currently, R2G2 attempts to infer output parameters based on \n \n47 run_HMM_per_chr.R \n48 sim_vs_orig_counts.QQplot.R \n49 splatterScrape_sim_counts.R \n50 \nopenCyto:Hierarchical Gating \nPipeline for flow cytometry data \nas.data.table.R \n   51 functions.R \nTable 3 Summary of the Bioconductor packages and Rscripts extracted to demonstrate the \nusability of R2G2 in automatic Galaxy tool wrapper generation. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nkeywords, specifically by checking if an argument name begins with certain predefined terms \nsuch as \"output.\" While this heuristic can often identify output parameters, it has limitations and \nmay fail to correctly distinguish true outputs from inputs.As a result, in some cases certain input \nand output parameters may be misassigned if the user relies entirely on the automatic \ninput-output detection mechanism. This necessitates a brief manual review and adjustment after \nwrapper generation. To make output parameter definition more robust and straightforward, we \nhave introduced an argument-based output parameter specification, which effectively addresses \nthese scenarios.  This highlighted the need to implement a comprehensive and robust automated \ntesting framework for the tools generated by R2G2, allowing users to perform broad-coverage \ntests quickly and thereby improving tool reliability. Currently, we rely on the default testing \napproach, which limits the automated testing capabilities of R2G2. We aim to enhance and \nexpand this functionality in future versions. Generating tool wrappers directly from packages can \nsometimes produce an enormous number of tools, not all of which are immediately useful. Users \nmust identify the desired function-based tools and merge them into more practical and efficient \nworkflows. While R2G2 simplifies this process, it still requires user intuition and creativity to \nmake these individual wrappers truly useful, which can sometimes limit the overall usability of \nthis approach. Nevertheless, this represents the first approach of its kind, and in future versions, \nwe aim to make this functionality more robust, enabling the generation of more practical tools \ndirectly from packages, even when a command-line R script is not available. Overall, R2G2 not \nonly helps new users learn and implement useful tools in Galaxy but also saves considerable \neffort in creating complex Galaxy tool wrappers. R2G2 can generate these complex wrappers in \nseconds with a single command, reducing substantial amounts of the human effort typically \nneeded for developing such tools.  \nConclusion \nIn this study, we present a comprehensive framework for automatically generating Galaxy tool \nwrappers from R packages and scripts, with a focus on Bioconductor and ggplot2 workflows. By \nleveraging R2G2, we demonstrated how individual functions and argument parsing information \ncan be dynamically extracted to create robust, reusable building blocks for Galaxy tools. This \napproach significantly reduces the time and technical expertise required to integrate R-based \nanalyses into the Galaxy platform, while maintaining reproducibility, standardized input/output \nformats, and dependency management. Overall, the framework provides a scalable and flexible \nsolution for democratizing access to R-based bioinformatics tools. Ultimately, this work bridges \nthe gap between R’s extensive computational capabilities and Galaxy’s user-friendly interface, \nfacilitating reproducible, accessible, and scalable data analysis for the broader research \ncommunity. \nAvailability \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nThe R2G2 package can be installed either directly from PyPI or from its GitHub repository. For a \nsimplified installation, R2G2 is available as a PyPI package (https://pypi.org/project/r2g2/0.1.1/ ) \nand can be installed using standard Python package managers. Alternatively, the source code can \nbe downloaded directly from GitHub at https://github.com/BlankenbergLab/r2g2, where detailed \ninstallation and usage instructions are provided. All open-source R scripts used to demonstrate \nthe usability of R2G2 were obtained from publicly available GitHub repositories, including  \nhttps://github.com/TranslationalBioinformaticsLab/INDELfindR, \nhttps://github.com/olgatsiouri1996/biomisc_R, and the respective GitHub repositories of the \nreferenced Bioconductor packages. All Galaxy-compatible R script wrappers generated in this \nwork to demonstrate the capabilities of R2G2 are available at \nhttps://github.com/jaidevjoshi83/galaxy_tool_wrappers. \n \nAuthor Contributions \n \nDB conceived the project and supervised the research. JJ, FC and DB developed the code; JJ \nperformed the analysis. JJ, FC, and DB wrote the manuscript and approved the final version. \n \nConflict of Interests \n \nDB has a significant financial interest in GalaxyWorks, a company that may have a commercial \ninterest in the results of this research and technology. This potential conflict of interest has been \nreviewed and is managed by the Cleveland Clinic.  \nJJ and FC have no conflicts to disclose.  \n \nFunding  \n \nThis work was supported by the Wellcome Trust [313498/Z/24/Z]. \n \nReference:  \nAfgan, Enis, Dannon Baker, Bérénice Batut, et al. 2018. “The Galaxy Platform for Accessible, \nReproducible and Collaborative Biomedical Analyses: 2018 Update.” Nucleic Acids \nResearch 46 (W1): W537–W544. \nAlomair, Lamya, and Mostafa A. Abolfotouh. 2023. “Awareness and Predictors of the Use of \nBioinformatics in Genome Research in Saudi Arabia.” International Journal of General \nMedicine 16 (August): 3413–3425. \nAttwood, Teresa K., Sarah Blackford, Michelle D. Brazas, Angela Davies, and Maria Victoria \nSchneider. 2019. “A Global Perspective on Evolving Bioinformatics and Data Science \nTraining Needs.” Briefings in Bioinformatics 20 (2): 398–404. \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nBaichoo, Shakuntala, Yassine Souilmi, Sumir Panji, et al. 2018. “Developing Reproducible \nBioinformatics Analysis Workflows for Heterogeneous Computing Environments to \nSupport African Genomics.” BMC Bioinformatics 19 (1): 457. \nBalamurugan, S., Anand T. Krishnan, Dinesh Goyal, Balakumar Chandrasekaran, and Boomi \nPandi. 2021. Computation in BioInformatics: Multidisciplinary Applications. John Wiley & \nSons. \nBlankenberg, Daniel, Gregory V on Kuster, Nathaniel Coraor, et al. 2010. “Galaxy: A Web-Based \nGenome Analysis Tool for Experimentalists.” Current Protocols in Molecular Biology \nChapter 19 (January): Unit 19.10.1–21. \nChan, Bertram K. C. 2018. “Data Analysis Using R Programming.” Advances in Experimental \nMedicine and Biology 1082: 47–122. \nCock, Peter J. A., Björn A. Grüning, Konrad Paszkiewicz, and Leighton Pritchard. 2013. \n“Galaxy Tools and Workflows for Sequence Analysis with Applications in Molecular Plant \nPathology.” PeerJ 1 (September): e167. \nCox, Jürgen, and Matthias Mann. 2008. “MaxQuant Enables High Peptide Identification Rates, \nIndividualized P.p.b.-Range Mass Accuracies and Proteome-Wide Protein Quantification.” \nNature Biotechnology 26 (12): 1367–1372. \nEren, A. Murat, Evan Kiefl, Alon Shaiber, et al. 2020. “Community-Led, Integrated, \nReproducible Multi-Omics with Anvi’o.” Nature Microbiology 6 (1): 3–6. \nGentleman, Robert, Vincent Carey, Wolfgang Huber, Rafael Irizarry, and Sandrine Dudoit. 2005. \nBioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer \nScience & Business Media. \nGentleman, Robert C., Vincent J. Carey, Douglas M. Bates, et al. 2004. Bioconductor: Open \nSoftware Development for Computational Biology and Bioinformatics. \nGiorgi, Federico M., Carmine Ceraolo, and Daniele Mercatelli. 2022. “The R Language: An \nEngine for Bioinformatics and Data Science.” Life (Basel, Switzerland) 12 (5). \nhttps://doi.org/10.3390/life12050648. \nGoecks, Jeremy, Anton Nekrutenko, James Taylor, and Galaxy Team. 2010. “Galaxy: A \nComprehensive Approach for Supporting Accessible, Reproducible, and Transparent \nComputational Research in the Life Sciences.” Genome Biology 11 (8): R86. \nGrüning, Björn, Eric Rasche, Boris Rebolledo-Jaramillo, et al. 2017. Jupyter and Galaxy: Easing \nEntry Barriers Into Complex Data Analyses for Biomedical Researchers. \nJoshi, Jayadev, and Daniel Blankenberg. 2022. “PDAUG: A Galaxy Based Toolset for Peptide \nLibrary Analysis, Visualization, and Machine Learning Modeling.” BMC Bioinformatics 23 \n(1): 197. \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\nLanger, Björn E., Andreia Amaral, Marie-Odile Baudement, et al. 2025. “Empowering \nBioinformatics Communities with Nextflow and Nf-Core.” Genome Biology 26 (1): 228. \nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold \nChange and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550. \nMansueto, Locedie, Tobias Kretzschmar, Ramil Mauleon, and Graham J. King. 2024. “Building \na Community-Driven Bioinformatics Platform to Facilitate Multi-Omics Research.” \nGigaByte (Hong Kong, China) 2024 (October): gigabyte137. \nRitchie, Matthew E., Belinda Phipson, Di Wu, et al. 2015. “Limma Powers Differential \nExpression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids \nResearch 43 (7): e47. \nRuprecht, Nathan A., Joshua D. Kennedy, Benu Bansal, et al. 2024. “Transcriptomics and \nEpigenetic Data Integration Learning Module on Google Cloud.” Briefings in \nBioinformatics 25 (Supplement_1). https://doi.org/10.1093/bib/bbae352. \nSilva, Danilo, Monika Moir, Marcel Dunaiski, et al. 2025. “Review of Open-Source Software for \nDeveloping Heterogeneous Data Management Systems for Bioinformatics Applications.” \nBioinformatics Advances 5 (1): vbaf168. \nSiraji, Mushfiqul Anwar, and Munia Rahman. 2023. “Primer on Reproducible Research in R: \nEnhancing Transparency and Scientific Rigor.” Clocks & Sleep 6 (1): 1–10. \nStaples, Timothy L. 2023. “Expansion and Evolution of the R Programming Language.” Royal \nSociety Open Science 10 (4): 221550. \nVerli, Hugo, and Raquel Cardoso de Melo Minardi. 2022. Original Strategies for Training and \nEducational Initiatives in Bioinformatics. Frontiers Media SA. \nWilliams, Jason J., Jennifer C. Drew, Sebastian Galindo-Gonzalez, et al. 2019. “Barriers to \nIntegration of Bioinformatics into Undergraduate Life Sciences Education: A National \nStudy of US Life Sciences Faculty Uncover Significant Barriers to Integrating \nBioinformatics into Undergraduate Instruction.” PloS One 14 (11): e0224288. \nWilson Sayres, Melissa A., Charles Hauser, Michael Sierk, et al. 2018. “Bioinformatics Core \nCompetencies for Undergraduate Life Sciences Education.” PloS One 13 (6): e0196878. \nZhang, Xiaofei, Arne H. Smits, Gabrielle B. A. van Tilburg, et al. 2017. “An Interaction \nLandscape of Ubiquitin Signaling.” Molecular Cell 65 (5): 941–955.e8. \nZhang, Xiaofei, Arne H. Smits, Gabrielle Ba van Tilburg, Huib Ovaa, Wolfgang Huber, and \nMichiel Vermeulen. 2018. “Proteome-Wide Identification of Ubiquitin Interactions Using \nUbIA-MS.” Nature Protocols 13 (3): 530–550. \n \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\n \n \nFigure S1: Figure demonstrates the automated pre-processing outputs generated by the \nDEP_pre_processing tool, includes (A) protein coverage, (B) proteins per sample, (C) \nProtein identification overlap, (D) Missing value pattern heatmap, (E) Intensity \ndistributions before and after normalization and (F) density plots of normalized and \nimputed data \n \n \nFigure S2: This figure highlights how the DEP_DE_analysis tool generates key \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint \n\ndownstream outputs, (A) differential expression and (B) PCA clustering, allowing users \nto interpret proteomic responses, verify sample grouping, and assess global expression \nchanges across experimental conditions. \n \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted December 24, 2025. ; https://doi.org/10.64898/2025.12.22.695980doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}