{"paper_id":"2312207d-c8a4-47a6-b6ed-2993134a2e0f","body_text":"Collisional cross-section prediction for multiconformational peptide ions with IM2Deep | bioRxiv /* */ /* */ <!-- <!-- /*! * yepnope1.5.4 * (c) WTFPL, GPLv2 */ (function(a,b,c){function d(a){return\"[object Function]\"==o.call(a)}function e(a){return\"string\"==typeof a}function f(){}function g(a){return!a||\"loaded\"==a||\"complete\"==a||\"uninitialized\"==a}function h(){var a=p.shift();q=1,a?a.t?m(function(){(\"c\"==a.t?B.injectCss:B.injectJs)(a.s,0,a.a,a.x,a.e,1)},0):(a(),h()):q=0}function i(a,c,d,e,f,i,j){function k(b){if(!o&&g(l.readyState)&&(u.r=o=1,!q&&h(),l.onload=l.onreadystatechange=null,b)){\"img\"!=a&&m(function(){t.removeChild(l)},50);for(var d in y[c])y[c].hasOwnProperty(d)&&y[c][d].onload()}}var j=j||B.errorTimeout,l=b.createElement(a),o=0,r=0,u={t:d,s:c,e:f,a:i,x:j};1===y[c]&&(r=1,y[c]=[]),\"object\"==a?l.data=c:(l.src=c,l.type=a),l.width=l.height=\"0\",l.onerror=l.onload=l.onreadystatechange=function(){k.call(this,r)},p.splice(e,0,u),\"img\"!=a&&(r||2===y[c]?(t.insertBefore(l,s?null:n),m(k,j)):y[c].push(l))}function j(a,b,c,d,f){return q=0,b=b||\"j\",e(a)?i(\"c\"==b?v:u,a,b,this.i++,c,d,f):(p.splice(this.i++,0,a),1==p.length&&h()),this}function k(){var a=B;return a.loader={load:j,i:0},a}var l=b.documentElement,m=a.setTimeout,n=b.getElementsByTagName(\"script\")[0],o={}.toString,p=[],q=0,r=\"MozAppearance\"in l.style,s=r&&!!b.createRange().compareNode,t=s?l:n.parentNode,l=a.opera&&\"[object Opera]\"==o.call(a.opera),l=!!b.attachEvent&&!l,u=r?\"object\":l?\"script\":\"img\",v=l?\"script\":u,w=Array.isArray||function(a){return\"[object Array]\"==o.call(a)},x=[],y={},z={timeout:function(a,b){return b.length&&(a.timeout=b[0]),a}},A,B;B=function(a){function b(a){var a=a.split(\"!\"),b=x.length,c=a.pop(),d=a.length,c={url:c,origUrl:c,prefixes:a},e,f,g;for(f=0;f<d;f++)g=a[f].split(\"=\"),(e=z[g.shift()])&&(c=e(c,g));for(f=0;f<b;f++)c=x[f](c);return c}function g(a,e,f,g,h){var i=b(a),j=i.autoCallback;i.url.split(\".\").pop().split(\"?\").shift(),i.bypass||(e&&(e=d(e)?e:e[a]||e[g]||e[a.split(\"/\").pop().split(\"?\")[0]]),i.instead?i.instead(a,e,f,g,h):(y[i.url]?i.noexec=!0:y[i.url]=1,f.load(i.url,i.forceCSS||!i.forceJS&&\"css\"==i.url.split(\".\").pop().split(\"?\").shift()?\"c\":c,i.noexec,i.attrs,i.timeout),(d(e)||d(j))&&f.load(function(){k(),e&&e(i.origUrl,h,g),j&&j(i.origUrl,h,g),y[i.url]=2})))}function h(a,b){function c(a,c){if(a){if(e(a))c||(j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}),g(a,j,b,0,h);else if(Object(a)===a)for(n in m=function(){var b=0,c;for(c in a)a.hasOwnProperty(c)&&b++;return b}(),a)a.hasOwnProperty(n)&&(!c&&!--m&&(d(j)?j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}:j[n]=function(a){return function(){var b=[].slice.call(arguments);a&&a.apply(this,b),l()}}(k[n])),g(a[n],j,b,n,h))}else!c&&l()}var h=!!a.test,i=a.load||a.both,j=a.callback||f,k=j,l=a.complete||f,m,n;c(h?a.yep:a.nope,!!i),i&&c(i)}var i,j,l=this.yepnope.loader;if(e(a))g(a,0,l,0);else if(w(a))for(i=0;i (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0];var j=d.createElement(s);var dl=l!='dataLayer'?'&l='+l:'';j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;j.type='text/javascript';j.async=true;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-M677548'); Skip to main content Home About Submit ALERTS / RSS Search for this keyword Advanced Search New Results Collisional cross-section prediction for multiconformational peptide ions with IM2Deep View ORCID Profile Robbe Devreese , Alireza Nameni , View ORCID Profile Arthur Declercq , Emmy Terryn , View ORCID Profile Ralf Gabriels , View ORCID Profile Francis Impens , View ORCID Profile Kris Gevaert , View ORCID Profile Lennart Martens , View ORCID Profile Robbin Bouwmeester doi: https://doi.org/10.1101/2025.02.18.638865 Robbe Devreese 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Robbe Devreese Alireza Nameni 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site Arthur Declercq 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Arthur Declercq Emmy Terryn 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site Ralf Gabriels 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Ralf Gabriels Francis Impens 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Francis Impens Kris Gevaert 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Kris Gevaert Lennart Martens 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium 3 BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI FR2048 , Strasbourg, France Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Lennart Martens For correspondence: lennart.martens{at}ugent.be Robbin Bouwmeester 1 VIB-UGent Center for Medical Biotechnology, VIB , Ghent, Belgium 2 Department of Biomolecular Medicine, Ghent University , Ghent, Belgium Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Robbin Bouwmeester Abstract Full Text Info/History Metrics Data/Code Preview PDF Abstract Peptide collisional cross-section (CCS) prediction is complicated by the tendency of peptide ions to exhibit multiple conformations in the gas phase. This adds further complexity to downstream analysis of proteomics data, for example for identification or quantification through feature finding. Here, we present an improved version of IM2Deep that is trained on a carefully curated dataset to predict CCS values of multiconformational peptides. The training data is derived from a large and comprehensive set of publicly available datasets. This comprehensive training dataset together with a tailored architecture allows for the accurate CCS prediction of multiple peptide conformational states. Furthermore, the enhanced IM2Deep model also retains high precision for peptides with a single observed conformation. IM2Deep is publicly available under a permissive open source license at https://github.com/compomics/IM2Deep . Introduction In recent years, technical advances in LC-IM-MS/MS technology have strongly enhanced identification capabilities in complex proteomics workflows ( 1 – 3 ). Traditionally, LC-MS/MS systems solely depended on liquid chromatography and mass analyzers for peptide separation and selection respectively, before fragmentation. In contrast, MS instruments incorporating ion mobility, like the timsTOF series, allow for gas-phase ion separation with ion mobility spectrometry (IMS) and use parallel accumulation–serial fragmentation (PASEF) technology to increase sensitivity and acquisition speed ( 4 ). PASEF allows precursor ions to accumulate in the trapped ion mobility spectrometry (TIMS) tunnel before being sequentially released, a feature particularly advantageous for increasing the overall sensitivity for low-abundant ions and lowering spectrum complexity. The inversed reduced ion mobility (1/K0) and the mass-to-charge ratio of a peptide, measured by IMS and MS respectively, can be used to calculate the collisional cross section (CCS) of said peptide ion using the Mason-Schamp equation (5). CCS represents the rotationally averaged effective area with which an ion collides with a neutral gas. It is closely tied to the ion’s chemical structure and three-dimensional conformation, making it an important distinguishing characteristic of an ion in the gas phase. This characteristic is useful for improved identification confidence upon comparing predicted and observed CCS values ( 6 , 7 ). Furthermore, a predicted CCS value can also be used for more accurate quantification through feature finding ( 8 ) and to prioritize acquisition time ( 9 ). The potential benefit from predicted CCS values led to the development of various machine learning models based on amino acid- or atom-level features and physicochemical properties ( 6 , 7 , 10 ). However, current machine learning-based CCS prediction models overlook the possibility of peptide ions adopting multiple conformations in the gas phase ( Figure 1 ), as these are often trained on the most abundant conformer within a dataset. This approach can result in the exclusion of valuable data and biases the model towards the most prevalent conformers in each experiment, inherently assuming that the dominant conformer in one experiment will consistently dominate in others, which is not a foregone conclusion and might result in inaccurate prediction. Download figure Open in new tab Figure 1: Ion mobility distribution of a representative peptide precursor ion (UniProt ID of inferred protein: O75594, data from PRIDE accession PXD046507). The upper panel shows the ion mobility distribution with multiple distinct peaks, indicating the presence of multiple conformations. The lower panel displays the ion intensity plotted against retention time (s) and CCS (Å²). MaxQuant reported two CCS features for this precursor, one at 577 Å² and one at 605 Å². Previous studies have identified several factors contributing to the multiconformationality of precursor ions in LC-IM-MS/MS experiments. The emergence and distribution of these conformers can depend on the composition of the solvent from which the peptide is electrosprayed ( 11 ), suggesting that different conformational states in solution can produce distinct gas-phase conformers upon dehydration, that are distinguishable by IMS-MS. Additionally, peptide ions may adopt varying conformations depending on the activation voltage applied ( 12 ), and molecular modeling indicates that cis-trans isomerization of proline residues may also play a role in influencing conformation ( 13 ). Furthermore, charge localization within the peptide when there is no preferential charge site can also result in multiple gas phase conformations ( 14 ). The rising popularity of ion mobility-assisted LC-MS/MS experiments, coupled with an increasing commitment to open science within the proteomics community, has led to a significant expansion in publicly available TIMS-acquired proteomics datasets. These datasets are critical for advancing data-driven proteomics software. Deep learning models, which rely on large volumes of data for accurate predictions, particularly benefit from such datasets. By pooling multiple datasets, more comprehensive training sets can be created, especially for less common phenomena like multiconformational peptide ion detection. While a single LC-IM-MS/MS dataset might not yield enough training data for accurate multiconformer CCS predictions, aggregating datasets is expected to enhance both the quantity and reliability of training samples. We have recently introduced IM2Deep, a deep learning-based peptide CCS predictor that builds on the principles of DeepLC, a retention time predictor ( 7 , 15 ). IM2Deep encodes peptidoforms at the atomic composition level, enabling precise CCS predictions for peptides with modifications not encountered during training. In this study, we present an improved version of IM2Deep designed to predict CCS values for peptide ions exhibiting multiple conformations in the gas phase. This was achieved with a modified architecture of IM2Deep to support multi-output predictions together with transfer learning to tailor the model for a much smaller dataset of multiconformational peptide ions. These changes to the training data and architecture enable IM2Deep to accurately predict CCS values for multiconformational peptides. Methods Datasets We searched the PRIDE public proteomics data repository for datasets generated using timsTOF Pro and timsTOF Pro 2 instruments, focusing on those with readily available identification output, including CCS annotations. In total, we collected MaxQuant ( 16 ) evidence output from 30 different PRIDE projects, totaling 1,248 LC-IM-MS/MS runs for further processing ( Supplementary Table 1 ). Creation of a multiconformer peptide dataset This section details the steps taken to aggregate the separate experimental datasets into a unified dataset of multiconformational peptides ( Supplementary Figure S1 ). First of all, only peptide ions identified by separate MS1 clusters (features) were kept for further processing. The initial step involved aligning CCS values across all LC-IM-MS/MS runs in each experiment. This was done by calculating a charge-specific linear offset (y = x + b) between overlapping peptide-charge state pairs, similar to the method described by Meier et al . ( 10 ). For each peptide-charge state pair, the CCS of the identification with the highest intensity was used as the alignment reference, but CCS values of the less abundant identifications were also aligned and retained. Beginning with the run containing the highest number of identifications, all other runs were sequentially aligned and added to the dataset in descending order of identifications. To ensure good alignment, runs with less than 100 overlapping peptide-charge state pairs with the aligned dataset (n = 284) were ignored. Following CCS alignment, conformers were identified within each LC-IM-MS/MS run. Identical precursor identifications exhibiting a larger than 2% difference in CCS within the same run were considered distinct conformational states, as suggested by previous research ( 10 ). To increase the confidence in accurately identifying conformers, we filtered the dataset by checking for recurring conformers across multiple separate runs. Specifically, we compared the CCS values reported for each precursor in each run. Only multiconformational precursors with multiple CCS values that could be matched to corresponding values in other runs, within a 2% tolerance, were retained. The mean CCS value for each conformer was retained in the dataset. Creation of a uniconformer peptide dataset Precursors that did not meet the above criteria were classified as uniconformational and placed in a separate dataset. This dataset underwent additional filtering to confirm that multiconformational precursors, where distinct conformers appeared only between runs (and not within the same run), were excluded. Thus, only precursors with CCS values showing no deviation greater than 2% between aligned runs were included in the uniconformer peptide dataset. IM2Deep architecture for multiconformer CCS prediction The architecture of IM2Deep as described in Declercq & Devreese et al . ( 7 ) is nearly identical as the one of DeepLC ( 15 ). Briefly, IM2Deep employs a convolutional architecture with four distinct pathways through which each encoded peptide is processed. Three of these pathways employ convolutional and maximum pooling layers to capture local structures, handling atomic composition of amino acids, diamino acids, and one-hot encoding for unmodified amino acids. A fourth pathway processes global features using densely connected layers. The outputs from all paths are combined, flattened, and passed through six dense layers in the final combined path, culminating in a single output node that predicts CCS. To enable multiconformer CCS prediction, this architecture was adapted to support multi-output prediction. Instead of a single output node, IM2Deep now features a branched output with two final dense layers, each ending in a single output node, producing a distinct CCS prediction. We trained the multi-output IM2Deep models both with and without a pretrained single-output model as foundation. The single-output model had been trained on the dataset used in Declercq & Devreese et al . ( 7 ). Pretrained model weights were loaded into the shared layers of the multi-output model prior to training, with all parameters remaining fully trainable. The multi-output IM2Deep models were trained and evaluated on the multiconformer dataset, which was randomly split into 81% training, 9% validation, and 10% testing sets. Training continued for a maximum of 500 epochs, with early stopping based on validation set performance to prevent overfitting. To ensure distinct predictions for each target, we implemented a custom loss function. Simple loss functions, such as the sum of mean absolute errors (MAEs) between each prediction-target pair, can lead to cases where both predictions converge to the same target. Our custom loss function was designed to address this issue by forcing the model to produce two distinct predictions corresponding to two separate targets. Firstly, both targets and model outputs are ordered by size to impose a consistent prediction-target pairing. The mean absolute error is calculated between each ordered prediction and its corresponding ordered target: where N is batch size and Additionally, the difference between the targets and the predictions is computed: The mean absolute error of these differences is then calculated: The total loss is then calculated as the sum of the two target-prediction MAEs and the MAE of the differences, multiplied by a weight to emphasize the importance of the difference term: This custom loss function ensures that each prediction corresponds closely to a distinct target. For the final models, this difference weight was set to five, as determined by achieving the highest accuracy on the validation set in a tuning experiment where the weight was set from zero up to ten ( Supplementary Figure S2 ). Visualization of ion mobilograms To illustrate the advantages of the multiconformational IM2Deep model, ion mobilograms were generated to visualize ion mobility distributions for identified precursors from an independent public LC-IM-MS/MS dataset (PXD046507) ( 17 ). The raw timsTOF measurements were accessed from the raw .d folder using the Alphatims (version 1.0.8) Python package ( 18 ). Precursors identified by MaxQuant were matched to the raw ion mobility measurements based on their retention time and m/z values, with a retention time tolerance of 4 s and a m/z precursor tolerance of 10 ppm. Ion mobility measurements were converted to CCS using the Mason-Schamp equation (5). Results Description of the (multi)conformer dataset In this study, we compiled identification data from 1,248 LC-IM-MS/MS runs across 30 PRIDE projects to determine peptide conformers. To enhance the reliability of our dataset, we excluded conformers not identified in multiple runs ( Supplementary Figure S3 ). After filtering the dataset, the final multiconformer dataset included 29,691 unique peptidoform-charge state pairs. This is about 4.1% the size of the uniconformer dataset (n = 728,249), which is in line with previously published estimates ( 6 , 10 ). The number of observed conformational states for a single precursor in the multiconformer dataset reached as high as five (n = 5, Supplementary Figure 4 ). We identified differences in the physicochemical properties of uni- and multiconformational subsets of precursor ions ( Figure 2 ). Notably, multiconformational precursor ions tended to be longer ( Figure 2A ) and carried higher charges ( Figure 2B ). These observations support findings by Meier et al ., who reported that prediction models trained to estimate only one CCS value show a bias, particularly affecting the accuracy for longer, 3+ and 4+ charged peptides ( 10 ). Download figure Open in new tab Figure 2 Physicochemical properties and CCS differences between uniconformational (blue) and multiconformational (orange) peptides. A) Distribution of charge states. B) Sequence length distribution. C) Relative frequency of proline residues. D) Distribution of absolute differences in CCS between conformers for peptides with different charge states. Given the hypothesis that cis - trans isomerization of proline residues could contribute to multiconformationality, we examined the presence of proline within the two subsets. Indeed, we identified an enrichment in proline presence within the multiconformational peptides (61.9%), compared to peptides not exhibiting multiple conformations (52.4%). Additionally, the relative frequency of proline residues-defined as the number of prolines divided by the length of the sequence-is also higher in the multiconformer subset ( Figure 2C ). Furthermore, peptides with higher charge states exhibited greater differences in CCS between conformers ( Figure 2D ), with a broader distribution of CCS differences observed among highly charged peptide ions. Small amino acids, such as alanine and valine also occur more often in the multiconformational peptides, while longer amino acids such as glutamic acid, lysine and arginine had a lower relative presence ( Supplementary Figure S5 ). Multi-output IM2Deep for two conformer CCS prediction Using the multiconformer dataset, filtered for precursors exhibiting two conformations (n = 26,970), we trained a new version of the IM2Deep model specifically adapted for multi-output prediction, targeting two CCS values corresponding to each conformer. We employed two training strategies: one involved training a model from scratch, while the other utilized fine-tuning. In the fine-tuning approach, weights from a pre-trained single-output IM2Deep model were loaded into the shared layers of the multi-output model before training started. This method leverages the features learned from the pre-trained model to enhance prediction accuracy, a strategy that has previously demonstrated success in challenging peptide property predictions ( 19 , 20 ). Learning curves on the validation set reveal that the fine-tuned model outperforms the newly trained model ( Figure 3 ). The test set results further corroborate these findings ( Figure 4 ), with the fine-tuned model showing superior prediction accuracy compared to the model trained with random initialization. Importantly, the median relative error for both CCS predictions in the fine-tuned model remains well below the 2% threshold used to define multiconformationality. This suggests that the model can effectively distinguish between different conformers and accurately predicts their corresponding CCS values with high precision. While both training and testing datasets contained predominantly tryptic peptides, the model’s performance on non-tryptic peptides is comparable ( Supplementary Figure S6 ). Download figure Open in new tab Figure 3: Performance curves over 500 training epochs for the multi-output IM2Deep model, comparing performance with different parameter initialization strategies. The model initialized with pre-trained weights from a single-output IM2Deep model (orange) shows consistently lower validation MAE compared to the model initialized with random weights (blue). Dotted lines represent the points at which the highest validation accuracy is achieved, i.e. where early stopping occurs. Download figure Open in new tab Figure 4: Scatter plots comparing predicted versus observed CCS values for the multi-output IM2Deep model trained with different parameter initialization strategies. The top row shows the performance of the model trained with random weight initialization, while the bottom row displays the performance of the model fine-tuned with pre-trained weights from a single-output IM2Deep model. Each point represents a peptide ion, with color indicating its charge state. The diagonal red line represents the ideal scenario where predicted CCS values match observed values. The left plots correspond to the first, smaller CCS prediction, and the right plots to the second, larger CCS prediction of each multiconformational precursor. Evaluating model robustness and comparison with baseline models To ensure that an additional second prediction does not artificially improve its performance, a comparison is made with two baseline models: ( 1 ) a baseline model trained on CCS values from peptides exhibiting only one conformation (the uniconformer dataset, see methods); ( 2 ) a baseline model trained on the CCS values of one randomly selected conformation from each peptide in our multiconformational training set. Despite being trained on only one target CCS value per peptide, the architecture of these baseline models have two output nodes and thus allow for two predictions. These baseline models serve as benchmarks to evaluate the benefits of training a model on multiple CCS values for multiconformational predictions. The loss function used to train the baseline models is calculated as the sum of the mean absolute errors between each prediction and the target CCS. As illustrated in Figure 5 , the transfer-learned model strongly outperforms both baseline models in predicting multiple CCS values, with marked improvements in both MAE and Pearson correlation coefficients. These results underscore the model’s ability to effectively capture the multiconformational nature of these peptides, owing to its training on multiple conformations, and not just because of the capacity of the model to make two predictions per precursor. Download figure Open in new tab Figure 5: Scatter plots comparing predicted versus observed CCS values for the multi-output IM2Deep model and two baseline models, each trained on different datasets. The left plot shows the first CCS prediction, and the right plot shows the second CCS prediction for each peptide. Red dots represent predictions from the baseline model trained on peptides with only one conformation. Blue dots represent predictions from the baseline model trained on one randomly selected conformation from each multiconformational peptide. Green dots represent predictions from the transfer-learned multiconformational model trained on both conformations of multiconformational peptides. Each dot corresponds to a peptide ion, with the dashed line representing the ideal scenario where predicted CCS values match observed values. An ideal multiconformational model should also accurately predict CCS values for peptides where only one conformation is observed. To assess this, we compared the accuracy of the closest prediction made by the transfer-learned multiconformational model to that of a model trained exclusively on uniconformational peptides, on a test set of uniconformational peptides. Although the multiconformational model has the advantage of making two predictions and selecting the best one, we controlled for this potential bias by allowing the baseline model trained on uniconformational peptides to also make two predictions, again selecting the best one. The results, shown in Figure 6 , indicate that the multiconformational model, despite being trained on peptides with multiple conformations, maintains or even improves its accuracy in predicting CCS values for peptides with only one observed conformation, particularly in terms of MAE and Pearson correlation. This improvement likely stems from the model’s ability to capture important structural information that is not fully represented in models trained on a single CCS value. Download figure Open in new tab Figure 6: Scatter plot comparing predicted versus observed CCS values for uniconformational peptides using the different models. Examples demonstrating the advantages of the multiconformer model on an independent dataset To showcase the capabilities of the multi-output IM2Deep model, we predicted two CCS values for all identified precursors from a run in a different public LC-IM-MS/MS experiment (PXD046507) and visualized these predictions on mobilograms derived from raw timsTOF data. This section highlights specific scenarios where the new multiconformational model offers added value over the original IM2Deep model, which was limited to predicting a single CCS value. In Figure 7A , a precursor ion displays two distinct peaks in its ion mobility distribution, indicating the presence of two conformations. The original IM2Deep model successfully predicts the CCS for one of these conformers (red), while MaxQuant reports the CCS of the other, less intense conformer (black). In PSM rescoring, where the observed CCS values are compared to predicted ones, this discrepancy could lead to the penalization and potential exclusion of the PSM, even though IM2Deep made an accurate prediction. In contrast, the multiconformational model accurately predicts the CCS values for both conformations, eliminating this issue when the closest prediction to the observed CCS is used for rescoring. Download figure Open in new tab Figure 7: Mobilograms displaying the ion mobility distributions of selected precursor ions, highlighting the predictive capabilities of the multi-output IM2Deep model. CCS value reported by MaxQuant indicated by black dashed lines. Classical IM2Deep prediction (single-output) indicated by red dashed line. CCS predictions made by multiconformational model indicated by green dashed lines. A) Precursor ion (UniProt ID of inferred protein: P46782) showing two distinct peaks in the ion mobility distribution, where the original model predicts the CCS of the non-reported conformer. B) Precursor ion (UniProt ID of inferred protein: Q9Y2B0) showing two distinct conformations, where the multiconformational model accurately predicts CCS of a conformer not identified by MaxQuant. C) Precursor ion (UniProt ID of inferred protein: O15230) showing a single dominant conformation, accurately predicted by the multiconformational model. D) Precursor ion (UniProt IDs of inferred proteins: P0C0L4 and P0C0L5) where classical IM2Deep model prediction appears to be the average of two conformations. Observed and predicted CCS values are rounded to the nearest integer for simplicity. An interesting advantage of the new multiconformational model is its ability to accurately predict CCS values for conformations not identified by the search engine. For example, as shown in Figure 7B , while MaxQuant reported the CCS value of the most intense conformer, the multiconformational model accurately predicted the CCS value of a less intense, unreported conformer. This capability offers deeper insights into the structural dynamics and conformational diversity of peptide ions in the gas phase. In the majority of cases, peptide ions exhibit a single conformation ( Figure 7C ). As previously demonstrated, the multiconformational model still makes accurate predictions for these peptide ions. When the best prediction is selected, the second prediction might be redundant, but could also represent an accurate prediction of an unobserved conformer. As discussed by others, a maximum likelihood estimation approach to CCS prediction typically converges to the mean CCS value of all conformers when no prior conformer filtering is performed, and a mean value for each peptide ion is retained in the final training set ( 6 ). Previous approaches have tried to avoid this by retaining only the most intense conformer in datasets to prevent erroneous convergence to the mean of all conformers. However, when datasets are combined, and CCS values are averaged over multiple datasets, this assumes that the most intense conformer of each peptide is consistent across all experiments. This assumption may not always hold. Indeed, there are instances where the original IM2Deep model, which relies on this assumption, predicts a CCS value close to the mean of two conformers ( Figure 7D ). In contrast, our new approach assumes this consistency only for calculating linear offsets to align CCS values between datasets, where the impact of uncommon multiconformational peptides should be minor. For training, however, no hierarchical order is imposed on conformers; instead, they are matched across datasets based on their CCS values, with an average value retained for each conformer. This approach allows for more accurate and flexible predictions, accounting for the possibility of multiple conformations not necessarily consistent between experiments. It is important to note that none of the precursor ions used as examples in this analysis were included in the training data for the multiconformational model. Discussion In this study, we developed a new multi-output IM2Deep model designed to predict CCS values for peptides that exhibit multiple conformations. Using a multiconformer dataset, generated by combining LC-IM-MS/MS data from various experiments and selecting peptides with multiple observed conformations, we trained the model to predict two distinct CCS values for each peptide. The results demonstrate that the multiconformational model accurately predicts CCS values for multiple conformers, while retaining its ability to make accurate predictions for peptides exhibiting only one conformation. This marks an important step forward in the field of CCS prediction. In this study, we limited the training of the multi-output IM2Deep model to two CCS values due to the size limitations of our training set. While this approach provides important improvements, it is evident that predicting CCS values for higher-order conformations would further enhance performance for peptides showing more than two conformations ( Figure 8B ). Our current model architecture is flexible, requiring only slight adjustments to the architecture and training procedure to accommodate these higher-order predictions. Download figure Open in new tab Figure 8: Mobilograms of selected precursor ions demonstrating remaining challenges in CCS prediction. A) This peptide ion (UniProt ID: Q14974) exhibits an ion mobilogram with unclear distinction between conformers, especially in the shoulder of the main conformation. B) Precursor ion (UniProt ID: P12111) clearly exhibiting three peaks in the ion mobility distribution. CCS value reported by Maxquant is indicated by a black dashed line. Classical IM2Deep prediction (single-output) indicated by red dashed line. CCS predictions made by multiconformational model indicated by green dashed lines. Another intriguing aspect is the feasibility of predicting whether a precursor will exhibit multiconformational behavior. It might be the case that all precursors have the potential to exhibit multiple conformations depending on experimental conditions. Further investigations into the factors that influence multiconformationality are essential to facilitate the development of such predictors. Combining a predictor for multiconformational behavior with a multi-output CCS predictor can be a very effective tool for improved identification confidence. However, larger challenges remain, particularly in distinguishing between closely related conformers due to overlapping ion mobility signals and minimal CCS differences ( Figure 8A ). Because search engines typically report apex CCS values, future research should focus on accurately linking ion mobility measurements to identified peptides. This would enable the prediction of entire ion mobility distributions rather than discrete CCS values. Accurately capturing these distributions would require advanced modeling techniques capable of handling the inherent complexity and variability in ion mobility data. Advanced deep learning techniques, such as variational autoencoders or generative adversarial networks that can integrate experimental information, could be leveraged to address these challenges. Continued data collection and sharing within the proteomics community are vital for the advancement of CCS (and other peptide behavior) prediction models and will facilitate the development of more sophisticated models by providing the necessary data diversity and volume. The improvements provided by the multiconformational IM2Deep model have several implications for proteomics research. Improved CCS prediction accuracy can lead to more confident identification of peptides. Furthermore, the ability to predict CCS for multiple conformations can offer deeper insights into the structural dynamics and behavior of peptides in the gas phase. Conclusion In this study, we enhanced IM2Deep by implementing a multi-output deep learning approach to predict collision cross sections for peptides exhibiting multiple conformations. Our findings demonstrate that the enhanced IM2Deep model accurately predicts CCS values for various conformers while maintaining high precision for peptides with a single observed conformation. These findings highlight the model’s potential to improve peptide identification and understanding of peptide behavior in the gas phase. Availability IM2Deep is open source under the permissive Apache-2.0 license and is freely available as a Python package through PyPI and Bioconda. The source code is available on Github at https://github.com/compomics/IM2Deep . IM2DeepTrainer, a Python package used to train new (multi-output) IM2Deep models is open source under the permissive Apache-2.0 license and is freely available as a Python package on PyPI. The source code is available on Github at https://github.com/rodvrees/IM2DeepTrainer . All data and scripts required to reproduce the presented results are available on Zenodo at 10.5281/zenodo.14886715 Author contributions Robbe Devreese: Conceptualization, methodology, software, validation, formal analysis, visualization, investigation, writing – original draft Alireza Nameni: Methodology, writing – review & editing Arthur Declercq: writing – review & editing Emmy Terryn: Investigation, data curation Ralf Gabriels: Writing - review & editing Francis Impens: Validation, writing - review & editing Kris Gevaert: Validation, writing - review & editing Lennart Martens: Conceptualization, supervision, funding acquisition, writing - review & editing Robbin Bouwmeester: Conceptualization, methodology, software, validation, supervision, writing – review & editing Supplementary Information View this table: View inline View popup Download powerpoint Supplementary Table 1: Overview of the public datasets used for further processing, along with the number of LC-IM-MS/MS runs they consist of and species from which data was acquired. Download figure Open in new tab Supplementary Figure 1: Processing workflow for the creation of the multiconformational precursor ion dataset View this table: View inline View popup Download powerpoint Supplementary Table 2: Hyperparameters of the final multi-output IM2Deep model. Download figure Open in new tab Supplementary Figure 2: Validation accuracy for models trained with difference weights from 0 to 10. Download figure Open in new tab Supplementary Figure 3: Distribution of conformer counts identified across multiple LC-IM-MS/MS runs, illustrating the impact of filtering on the final dataset. X-axis represents the number of runs in which a conformer pairing was identified, while the y-axis (log-scale) shows the total number of conformer pairs. Yellow bars represent conformers that were included in the final dataset after filtering. Download figure Open in new tab Supplementary Figure 4: Distribution of the number of conformers observed per peptidoform-charge state pair in the multiconformer dataset after filtering, for each charge state. X-axis represents the number of conformers identified for each precursor, and the y-axis (log-scale) shows the count of peptidoform-charge state pairs. Download figure Open in new tab Supplementary Figure 5: Boxplots comparing the relative presence of all 20 amino acids between multiconformational and uniconformational peptides. Download figure Open in new tab Supplementary Figure 6: Scatter plots comparing predicted versus observed CCS values for the transfer-learned multi-output IM2Deep model, where performance on tryptic and non-tryptic precursor ions in the test set are plotted separately. Acknowledgements R.D., A.D., R.G., L.M. and R.B. acknowledge funding from the Research Foundation Flanders (FWO) [1SH9O24N, 12B7123N, 1SE3724N, G010023N, G028821N, 12A6L24N]. A.N. acknowledges funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement N° 956148. L.M. and F.I. acknowledge funding from the Horizon Europe Projects BAXERNA 2.0 [101080544] and COMBINE [101191739], L.M. acknowledges funding from the Ghent University Concerted Research Action [BOF21/GOA/033] and funding from the CHIST-ERA project ODEEP-EU [G0GDV23N]. Footnotes https://github.com/compomics/IM2Deep https://doi.org/10.5281/zenodo.14886715 References 1. ↵ Vitko , D. et al. timsTOF HT Improves Protein Identification and Quantitative Reproducibility for Deep Unbiased Plasma Protein Biomarker Discovery . J Proteome Res 2 , 929 – 938 ( 2024 ). OpenUrl 2. ↵ Gomez-Zepeda , D. et al. Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS2Rescore with MS2PIP timsTOF fragmentation prediction model . Nat Commun 15 , ( 2024 ). 3. ↵ Phulphagar , K. M. et al. Sensitive, High-Throughput HLA-I and HLA-II Immunopeptidomics Using Parallel Accumulation-Serial Fragmentation Mass Spectrometry . Molecular and Cellular Proteomics 22 , ( 2023 ). 4. ↵ Meier , F. et al. Parallel accumulation-serial fragmentation (PASEF): Multiplying sequencing speed and sensitivity by synchronized scans in a trapped ion mobility device . J Proteome Res 14 , 5378 – 5387 ( 2015 ). OpenUrl CrossRef PubMed 5. Michelmann , K. , Silveira , J. A. , Ridgeway , M. E. & Park , M. A . Fundamentals of trapped ion mobility spectrometry . J Am Soc Mass Spectrom 26 , 14 – 24 ( 2014 ). OpenUrl PubMed 6. ↵ Teschner , D. et al. Ionmob: a Python package for prediction of peptide collisional cross-section values . Bioinformatics 9 , ( 2023 ). 7. ↵ Declercq , A. et al. TIMS2Rescore: A Data Dependent Acquisition-Parallel Accumulation and Serial Fragmentation-Optimized Data-Driven Rescoring Pipeline Based on MS2Rescore . J Proteome Res ( 2025 ). doi: 10.1021/acs.jproteome.4c00609 OpenUrl CrossRef 8. ↵ Schmid , R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3 . Nat Biotechnol 41 , 447 – 449 Preprint at doi: 10.1038/s41587-023-01690-2 ( 2023 ) OpenUrl CrossRef PubMed 9. ↵ Skowronek , P. et al. Rapid and In-Depth Coverage of the (Phospho-) Proteome With Deep Libraries and Optimal Window Design for dia-PASEF . Molecular and Cellular Proteomics 21 , ( 2022 ). 10. ↵ Meier , F. et al. Deep learning the collisional cross sections of the peptide universe from a million experimental values . Nat Commun 12 , ( 2021 ). 11. ↵ Pierson , N. A. , Chen , L. , Valentine , S. J. , Russell , D. H. & Clemmer , D. E . Number of solution states of bradykinin from ion mobility and mass spectrometry measurements . J Am Chem Soc 1 , 13810 – 13813 ( 2011 ). OpenUrl 12. ↵ Khanal , N. , Gaye , M. M. & Clemmer , D. E . Multiple solution structures of the disordered peptide indolicidin from IMS-MS analysis . Int J Mass Spectrom 427 , 52 – 58 ( 2018 ). OpenUrl CrossRef PubMed 13. ↵ Pierson , N. A. , Chen , L. , Russell , D. H. & Clemmer , D. E . Cis - Trans isomerizations of proline residues are key to bradykinin conformations . J Am Chem Soc 1 5 , 3186 – 3192 ( 2013 ). OpenUrl 14. ↵ McCann , A. et al. Cyclic Peptide Protomer Detection in the Gas Phase: Impact on CCS Measurement and Fragmentation Patterns . J Am Soc Mass Spectrom , 851 – 858 ( 2022 ). 15. ↵ Bouwmeester , R. , Gabriels , R. , Hulstaert , N. , Martens , L. & Degroeve , S . DeepLC can predict retention times for peptides that carry as-yet unseen modifications . Nat Methods 18 , 1363 – 1369 ( 2021 ). OpenUrl CrossRef PubMed 16. ↵ Cox , J. & Mann , M . MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification . Nat Biotechnol 26 , 1367 – 1372 ( 2008 ). OpenUrl CrossRef PubMed Web of Science 17. ↵ Hu , Z. et al. Proteogenomic insights into early-onset endometrioid endometrial carcinoma: predictors for fertility-sparing therapy response . Nat Genet 56 , 637 – 651 ( 2024 ). OpenUrl CrossRef PubMed 18. ↵ Willems , S. , Voytik , E. , Skowronek , P. , Strauss , M. T. & Mann , M . AlphaTims: Indexing trapped ion mobility spectrometry-TOF data for fast and easy accession and visualization . Molecular and Cellular Proteomics 20 , ( 2021 ). 19. ↵ Ma , C. et al. Improved Peptide Retention Time Prediction in Liquid Chromatography through Deep Learning . Anal Chem 90 , 10881 – 10888 ( 2018 ). OpenUrl CrossRef 20. ↵ Zeng , W. F. et al. MS/MS Spectrum prediction for modified peptides using pDeep2 Trained by Transfer Learning . Anal Chem 91 , 9724 – 9731 ( 2019 ). OpenUrl CrossRef 21. Zila , N. et al. Proteomic Profiling of Advanced Melanoma Patients to Predict Therapeutic Response to Anti-PD-1 Therapy . Clinical Cancer Research 0 , 159 – 175 ( 2024 ). OpenUrl 22. Will , A. , Oliinyk , D. , Bleiholder , C. & Meier , F . Peptide collision cross sections of 22 post-translational modifications . Anal Bioanal Chem 415 , 6633 – 6645 ( 2023 ). OpenUrl CrossRef PubMed 23. Romero-Gavilán , F. et al. Proteomic evaluation of human osteoblast responses to titanium implants over time . J Biomed Mater Res A 111 , 45 – 59 ( 2023 ). OpenUrl CrossRef PubMed 24. Ries , A. et al. Primary and hTERT-Transduced Mesothelioma-Associated Fibroblasts but Not Primary or hTERT-Transduced Mesothelial Cells Stimulate Growth of Human Mesothelioma Cells . Cells 12 , ( 2023 ). 25. Pahmeier , F. et al. Identification of host dependency factors involved in SARS-CoV-2 replication organelle formation through proteomics and ultrastructural analysis . J Virol 97 , ( 2023 ). 26. Lamsal , A. et al. Opposite and dynamic regulation of the interferon response in metastatic and non-metastatic breast cancer . Cell Communication and Signaling 21 , ( 2023 ). 27. Ries , A. et al. Mesothelioma-associated fibroblasts enhance proliferation and migration of pleural mesothelioma cells via c-Met/PI3K and WNT signaling but do not protect against cisplatin . Journal of Experimental and Clinical Cancer Research 42 , ( 2023 ). 28. Filatov , S. , Dyčka , F. , Sterba , J. & Rego , R. O. M . A simple non-invasive method to collect soft tick saliva reveals differences in Ornithodoros moubata saliva composition between ticks infected and uninfected with Borrelia duttonii spirochetes . Front Cell Infect Microbiol 1 , ( 2023 ). 29. Bradić , I. et al. Metabolic changes and propensity for inflammation, fibrosis, and cancer in livers of mice lacking lysosomal acid lipase . J Lipid Res 64 , ( 2023 ). 30. Puzio , M. , et al. An Electrophysiological and Proteomic Analysis of the Effects of the Superoxide Dismutase Mimetic, MnTMPyP, on Synaptic Signalling Post-Ischemia in Isolated Rat Hippocampal Slices . Antioxidants 12 , ( 2023 ). 31. de Jonckheere , B. et al. Critical shifts in lipid metabolism promote megakaryocyte differentiation and proplatelet formation . Nature Cardiovascular Research 2 , 835 – 852 ( 2023 ). OpenUrl CrossRef PubMed 32. Meulders , B. et al. Preconception Diet Interventions in Obese Outbred Mice and the Impact on Female Offspring Metabolic Health and Oocyte Quality . Int J Mol Sci 25 , ( 2024 ). 33. Wang , Z. et al. Global profiling of the proteome, phosphoproteome, and N-glycoproteome of protoscoleces and adult worms of Echinococcus granulosus . Front Vet Sci 10 , ( 2023 ). 34. Mansour , H. et al. Characterization of GEXP15 as a Potential Regulator of Protein Phosphatase 1 in Plasmodium falciparum . Int J Mol Sci 24 , ( 2023 ). 35. Zhu , M. & Dai , X . Stringent response ensures the timely adaptation of bacterial growth to nutrient downshift . Nat Commun 14 , ( 2023 ). 36. Kovarik , J. J. , et al. A multi-omics based anti-inflammatory immune signature characterizes long COVID-19 syndrome . iScience 26 , ( 2023 ). 37. Scally , S. W. et al. PCRCR complex is essential for invasion of human erythrocytes by Plasmodium falciparum . Nat Microbiol 7 , 2039 – 2053 ( 2022 ). OpenUrl CrossRef PubMed 38. Huang , Q. et al. Proteomic Characterization of Peritoneal Extracellular Vesicles in a Mouse Model of Peritoneal Fibrosis . J Proteome Res 22 , 908 – 918 ( 2023 ). OpenUrl CrossRef PubMed 39. Triglia , T. et al. Plasmepsin X activates the PCRCR complex of Plasmodium falciparum by processing PfRh5 for erythrocyte invasion . Nat Commun 14 , ( 2023 ). 40. Wang , Z. , Ouyang , X. , Tan , Z. , Yang , L. & Dong , B . Quantitative Phosphoproteomics Reveals the Requirement of DYRK1-Mediated Phosphorylation of Ion Transport- and Cell Junction-Related Proteins for Notochord Lumenogenesis in Ascidian . Cells 12 , ( 2023 ). 41. Cao , H. et al. Malonylation of Acetyl-CoA carboxylase 1 promotes hepatic steatosis and is attenuated by ketogenic diet in NAFLD . Cell Rep 42 , ( 2023 ). 42. Feng , J. et al. Widespread Involvement of Acetylation in the Retinal Metabolism of Form-Deprivation Myopia in Guinea Pigs . ACS Omega 8 , 23825 – 23839 ( 2023 ). OpenUrl CrossRef PubMed 43. Chen , X. , Wang , S. , Wu , M. & Zhao , Y . Role of Succinylation in Pseudorabies Virus Infection . J Virol 97 , ( 2023 ). 44. Li , C. et al. Immunization with a heat-killed prm1 deletion strain protects the host from Cryptococcus neoformans infection . Emerg Microbes Infect 12 , ( 2023 ). 45. Xu , L. et al. ANXA3-Rich Exosomes Derived from Tumor-Associated Macrophages Regulate Ferroptosis and Lymphatic Metastasis of Laryngeal Squamous Cell Carcinoma . Cancer Immunol Res 12 , 614 – 630 ( 2024 ). OpenUrl CrossRef PubMed 46. Xu , T. et al. A small molecule inhibitor of the UBE2F-CRL5 axis induces apoptosis and radiosensitization in lung cancer . Signal Transduct Target Ther 7 , ( 2022 ). 47. Klusch , N. et al. Cryo-EM structure of the respiratory I + III2 supercomplex from Arabidopsis thaliana at 2 Å resolution . Nat Plants 9 , 142 – 156 ( 2023 ). OpenUrl PubMed View the discussion thread. Back to top Previous Next Posted February 23, 2025. Download PDF Data/Code Email Thank you for your interest in spreading the word about bioRxiv. NOTE: Your email address is requested solely to identify you as the sender of this article. Your Email * Your Name * Send To * Enter multiple addresses on separate lines or separate them with commas. You are going to email the following Collisional cross-section prediction for multiconformational peptide ions with IM2Deep Message Subject (Your Name) has forwarded a page to you from bioRxiv Message Body (Your Name) thought you would like to see this page from the bioRxiv website. Your Personal Message CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions. Share Collisional cross-section prediction for multiconformational peptide ions with IM2Deep Robbe Devreese , Alireza Nameni , Arthur Declercq , Emmy Terryn , Ralf Gabriels , Francis Impens , Kris Gevaert , Lennart Martens , Robbin Bouwmeester bioRxiv 2025.02.18.638865; doi: https://doi.org/10.1101/2025.02.18.638865 Share This Article: Copy Citation Tools Collisional cross-section prediction for multiconformational peptide ions with IM2Deep Robbe Devreese , Alireza Nameni , Arthur Declercq , Emmy Terryn , Ralf Gabriels , Francis Impens , Kris Gevaert , Lennart Martens , Robbin Bouwmeester bioRxiv 2025.02.18.638865; doi: https://doi.org/10.1101/2025.02.18.638865 Citation Manager Formats BibTeX Bookends EasyBib EndNote (tagged) EndNote 8 (xml) Medlars Mendeley Papers RefWorks Tagged Ref Manager RIS Zotero Tweet Widget Facebook Like Google Plus One Subject Area Bioinformatics Subject Areas All Articles Animal Behavior and Cognition (7622) Biochemistry (17648) Bioengineering (13870) Bioinformatics (41880) Biophysics (21423) Cancer Biology (18553) Cell Biology (25458) Clinical Trials (138) Developmental Biology (13364) Ecology (19866) Epidemiology (2067) Evolutionary Biology (24290) Genetics (15589) Genomics (22475) Immunology (17711) Microbiology (40326) Molecular Biology (17145) Neuroscience (88471) Paleontology (666) Pathology (2826) Pharmacology and Toxicology (4815) Physiology (7635) Plant Biology (15114) Scientific Communication and Education (2044) Synthetic Biology (4286) Systems Biology (9815) Zoology (2268)","source_license":"CC-BY-4.0","license_restricted":false}