Full text
2,352 characters
· extracted from
oa-doi-fallback
· click to expand
Abstract
The growing volume of public proteomics datasets and the advent of novel machine learning (ML)-based methods create unprecedented opportunities for discovery through large-scale reanalysis. However, traditional desktop tools are increasingly insufficient for processing and integrating data at this scale. To address this challenge, we present a novel package, quantms-rescoring, that extends the cloud-native quantms workflow with a machine learning-based rescoring module. Unlike prior tools that rescore single-engine outputs, quantms-rescoring seamlessly integrates multiple search engines (SAGE, COMET, and MSGF+), performs automatic model selection, model fine-tuning, and scales reproducibly on cloud infrastructures. In quantms-rescoring, we rely on multiple fragment-ion intensity (AlphaPeptDeep and MS2PIP) and retention-time prediction (DeepLC) methods to improve results from multiple peptide database search engines. It features automatic model selection, fine-tuning, and retraining for MS/MS intensity and retention time prediction to select the best model for a given dataset. We applied the novel workflow to five representative datasets spanning DDA label-free quantification, TMT 10-plex isobaric labelling of tumor proteomics data, immunopeptidomics, phospho-proteomics, and unseen lysine malonylation experiments. We achieved a 16-22.8% increase in identified spectra, along with the quantification of 2191 additional phosphorylated peptides and 1337 phosphosites. In the tandem mass tag (TMT)-labeled clear cell renal cell carcinoma dataset, 76 novel differentially expressed multiple search engines identified proteins with quantms-rescoring. Additionally, novel 11,688 HLA-II potential binders were detected in the immunopeptidomics dataset by multiple search engines with quantms-rescoring. For unseen malonylation data, we reported more than 58.8% malonylation PSMs and 30.5% modification sites than COMET alone. Together, these results show that integrating multi-engine searches with machine learning-derived features can be combined in a scalable workflow that enhances identification, PTM localization, and quantification performance.
Competing Interest Statement
T.S. and O.K. are officers in OpenMS Inc., a non-profit foundation managing OpenMS development. All remaining authors declare no competing interests.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.