How to Predict Effective Drug Combinations - Moving beyond Synergy Scores

doi:10.1101/2024.11.22.624812

How to Predict Effective Drug Combinations - Moving beyond Synergy Scores

2024 · doi:10.1101/2024.11.22.624812

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 91,743 characters · extracted from oa-pdf · 8 sections · click to expand

Abstract

To improve our understanding of multi-drug therapies, cancer cell line panels screened with drug combinations are frequently studied using machine learning (ML). ML mod- els trained on such data typically focus on predicting synergy scores, which support drug development and repurposing efforts but have limitations when deriving personal- ized treatment recommendations. To simulate a more realistic personalized treatment scenario, we pioneer ML models that predict the relative growth inhibition (instead of synergy scores), and that can be applied to previously unseen cell lines. Our approach is highly flexible: it enables the reconstruction of dose-response curves and matrices, as well as various measures of drug sensitivity (and synergy) from model predictions, which can finally even be used to derive cell line-specific prioritizations of both mono- and combination therapies. 1 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint

Introduction

Tailoring drug treatments to the individual patient is a major goal of cancer research. Due to ethical concerns and limited availability of tumor material, relationships between molec- ular properties of cancer cells and their drug responses are generally not studied on humans directly, but instead using model systems, most prominently, cell lines. For monotherapy, large cell line panels such as theGenomics of Drug Sensitivity in Cancer (GDSC) database1,2 have been available for more than a decade, providing both molecular characterizations and drug screening data of cancer cell lines. However, combination therapies are frequently pre- ferred over monotherapies for cancer treatment due to increased efficacy and a decreased risk of treatment resistance. 3 More recently, large data resources have also become available for drug combination screens: In 2019, the DrugComb data portal was introduced, 4,5 which accumulates harmonized results of drug screens from different sources. To date, a total of 37 datasets are available in DrugComb. 5 Databases like the GDSC or DrugComb enable the systematic evaluation of the effect that different drugs have on different types of cancer cells. Thus, two main use cases that can be addressed using this data include (1) making personalized treatment recommendations for a given patient (cell line) and (2) finding promising drugs or drug combinations that should be further explored, e.g., for drug repurposing or the development of novel (combination) therapies. Due to the complexity and high dimensionality of the data, machine learning (ML) is commonly used to address these tasks. ML models trained on monotherapy drug responses are usually suitable for both use cases, (1) and (2), since they directly predict measures of drug effectiveness, such as the IC50 or AUC value. In comparison, methods using drug combination data typically predict so-called drug synergy scores, 6,7 which are usually suited for the second task but less applicable for the first one as we briefly outline in the following: These scores quantify the synergistic or antagonistic potential of two compounds for a given cell line by comparing their combined effect on cell growth to the expected effect obtained from a baseline model that assumes 2 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint no synergism or antagonism. 8 Prominent examples are the Loewe, 9 Bliss, 10 HSA, 11 and ZIP 8 synergy score. For each of these scores, values > 0 indicate synergism, and values < 0 indicate antagonism, making it possible to classify drug combinations based on their synergy score. Undoubtedly, estimating the synergistic potential of compound combinations through synergy scores can be valuable for the identification of promising combination treat- ments to undergo more detailed screening, the development of novel compounds, or drug repurposing. However, even though synergy score prediction is sometimes motivated as a step toward achieving personalized treatment recommendations, 6,12 we believe that synergy scores have shortcomings that debilitate their usefulness for this application. Briefly sum- marized, synergy scores are based on various (in part very strong) model assumptions, some of which differ fundamentally between scores. 8,13 Additionally, disagreement between scores was observed by Vlot et al. 13 and Yadav et al., 8 weakening their informative value. Two factors that are especially relevant for personalized treatment recommendations are that (1) the scores are aggregated over multiple drug concentrations, which do not necessarily cor- respond well to clinically feasible concentration ranges 13 (cf. Supplementary Figure 1) and (2) a high synergy between two compounds does not guarantee a high effectiveness of the combination treatment. 5 Thus, instead of relying on synergy scores, we advocate exploring other strategies to estimate the effectiveness of combination treatments. For the prediction of drug combination sensi- tivity, several models that do not rely on synergy scores have been published: Malyutina et al. 14 and Zagidullin et al. 4 trained cell-line specific models that predict CSS ( Combina- tion Sensitivity Score ) values, a sensitivity measure for two-drug combination therapies. 14 However, the CSS score is an aggregated measure of sensitivity based on drug-specific AUC values. Thus, like the AUC for monotherapies, 15 it depends strongly on the investigated concentration ranges and is not comparable across compounds. Instead of focusing on one specific measure of drug sensitivity, an alternative approach is to directly predict the response (in terms of relative inhibition/viability) of cell lines at various 3 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint treatment concentrations. Thereby, we could, moreover, reconstruct various drug sensitivity or synergy measures from the model predictions. For monotherapy, this approach has al- ready been explored by Rahman and Pal et al. 16,17 For combination therapy, Zheng et al. 5 trained a CatBoost model that predicts the relative inhibition of two drugs at given concen- trations for a given cell line. Similarly, comboFM by Julkunen et al. 18 employs higher-order factorization machines (HOFMs) to predict relative cell growth. A drawback of all combination prediction approaches mentioned above is that they are not applicable to make predictions for previously unseen cell lines: Malyutina et al. 14 and Za- gidullin et al. 4 trained cell line-specific models, while Zheng et al. 5 and Julkunen et al. 18 employ a one-hot encoding of cell lines and drugs in the model input such that both have to be known during training already. Thus, these models are difficult to apply for personal- ized treatment recommendations, where predictions should be made for a previously unseen patient (cell line). According to Codic` e et al., this setting is frequently overlooked or insuf- ficiently evaluated in ML-based drug response prediction. 19 In this manuscript, we present ML models for the prediction of drug combination sensitivity that do not rely on synergy scores and are able to make predictions for previously unseen cell lines, thereby mimicking a personalized treatment scenario. Instead of predicting an aggre- gated measure of treatment response, our models predict the relative inhibition at arbitrary treatment concentrations provided in the model input. Consequently, various measures of drug sensitivity or synergy, including dose-response curves and matrices, as well as IC50 values or synergy scores can be reconstructed from the model predictions. We investigate not only different ML algorithms (neural networks, random forest, elastic net) but also analyze the benefit of including different drug characterizations (MACCS fin- gerprints, physico-chemical properties), as well as information on drug targets. The different model architectures provide different benefits, e.g., the ability to make predictions not just for two-drug combinations but also for monotherapies and combination treatments consisting of more than two drugs. Some of the investigated architectures also enable predictions to be 4 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint made for any previously unseen drug, given that the features of the drug (e.g., MACCS fin- gerprint) are known. Our results show that random forests outperform the other algorithms in all investigated settings. Additionally, we analyze which inhibition intervals are predicted most accurately and investigate the reconstruction of mono- and combination sensitivity measures from our model predictions. Lastly, using our recently published drug response measure called CMax viability,15 we show- case how our models can be applied to perform drug prioritization for mono- and combination therapies based on clinically feasible treatment concentrations. Drug prioritization, i.e., the ranking of drugs by their predicted effectiveness for a given cell line (patient) is a major goal in personalized medicine: it exceeds the mere prediction of sensitivity measures and moves toward deriving actual treatment recommendations.

Materials

and Data Processing Drug response data Drug screening data for our analyses was obtained from the DrugComb database Version 1.5. More specifically, we employed the DrugComb API (https://api.drugcomb.org/) to down- load the list of all cell lines and their corresponding COSMIC IDs, the full list of drugs with their SMILE encodings and their target molecules, and the full dose-response matrices. To assign the respective cell line and drug information to each dose-response matrix, we downloaded the core database from https://drugcomb.org/download, which provides a unique identifier for each dose-response experiment. Consequently, each database entry can be writ- ten as: (cell line, drug row, drug col, conc row, conc col, inhibition) 5 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Here, cell line is the COSMIC ID of the investigated cell line, and drug row and drug col are the names of the tested drugs. The entries conc row and conc col are the micromolar concentrations of the tested compounds. For monotherapies, one of the drug names is set to N U LL and the corresponding concentration is set to 0. Finally, inhibition denotes the relative inhibition measured after administration of the denoted drug concentration(s) (see Supplement for further information). Relative inhibitions > 0 denote reduced cell growth through the drug treatment, while inhibitions < 0 indicate increased growth. We removed the following entries from the dataset: • poor quality entries as defined by the authors of DrugComb 5 with inhibition 200 • entries where the concentration of all tested drugs is 0 (conc row = conc col = 0) • entries, where the corresponding cell line had no COSMIC ID or no gene expression data provided in the GDSC database Additionally, we converted entries where drug row and drug col denote the same drug into monotherapies by summing the respective treatment concentrations and setting drug col to NULL: (cell line, drug row, N U LL, conc row + conc col, 0, inhibition) . Cases where two different drugs are provided but only one has a concentration > 0 were modified to denote a monotherapy by replacing the drug with concentration 0 with N U LL. Afterwards, all replicates involving the same cell line, the same drug(s), and same concentra- tion(s) were averaged. Lastly, we log1p-normalized (log1p(x) = log(x+1)) the concentration values in conc row and conc col. To keep the dataset size manageable, we only considered entries involving those 265 drugs 6 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint (cf. Supplementary Table 1) for which at least 10,000 entries are provided after performing all the steps described above (cf. Discussion). Note that after this reduction still more than 10,000 entries remained for each of the drugs. In total, the final dataset consists of 5,291,424 entries covering 947 cell lines, 265 drugs, and 9,535 drug combinations. Additionally, the CMax concentrations for 77 of the investigated drugs were obtained from Liston and Davis.20 The CMax value denotes the peak plasma concentration after administer- ing the highest clinically recommended dose of a drug. 20 In a recently published manuscript, we employed CMax to derive a novel drug sensitivity measure called the CMax viability, which will be described below. 15 We also use this measure to perform drug prioritization in the Results section. Drug Properties For the representation of drugs in the inputs of our models, we investigated four different settings, which will be discussed below (cf. also Figure 1). Using the SMILES drug repre- sentations provided by DrugComb, we used RDKit version 2023.3.2 21 to calculate two types of drug features: • binary MACCS fingerprints 22 of length 166 • 209 physico-chemical drug properties using the function CalcMolDescriptors from the rdkit.Chem.Descriptors module 23 We removed all properties that showed no variation across the investigated 265 drugs, re- sulting in MACCS fingerprints of length 162 and 182 physico-chemical properties. Additionally, 735 drug target molecules for the investigated drugs were obtained from Drug- Comb. 7 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Gene Expression Data Normalized gene expression data of 17,419 genes (Affymetrix Human Genome U219 Ar- ray) was obtained from the GDSC database Release 8.3 (https://www.cancerrxgene.org/ downloads/bulk_download).

Methods

Model Inputs and Outputs We train multi-drug models that predict the relative inhibition for a given cell line being treated with given concentrations of one or more drug(s). The model inputs comprise cell line features based on gene expression, a representation of the applied drugs, and the corre- sponding drug concentrations. For the representation of drugs, we investigated four different settings, which are depicted in Figure 1 and will be described below. To characterize cell lines in the model input, we performed a principal component analysis (PCA) on the gene expression values of the training cell lines and used the first 300 prin- cipal components (PCs) as cell line features. This dimension reduction method performed well in our recently published benchmarking of drug sensitivity prediction methods. 24 The feature coefficients computed on the training data were used to project the test cell lines into the same 300-dimensional space. To perform the cross-validation discussed below, we re-computed the PCs based on the respective training folds. In addition to the cell line features, we investigated four different settings for the encoding of drugs in the model input: Setting 1 (OneHot): In this setting, no drug properties are included. Instead, a 265-dimensional encoding of drugs is used. Each feature corresponds to one of the 265 drugs in our dataset. If a drug is 8 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint part of the current entry, its feature is set to the corresponding log1p-normalized treatment concentration, otherwise it is set to 0. Setting 2 (OneHotTar): This setting uses the same concentration encoding as Setting 1 but additionally includes 290 drug target features. More precisely, we used the drug target annotations provided by DrugComb and included all molecules as targets that were targeted by at least five of the drugs in our dataset, resulting in a total of 290 target features. Each feature is then set to the number of drugs in the current entry that target the corresponding molecule (0, 1, or 2): Since DrugComb provides only data on monotherapies and two-drug combinations, the maximum value a target feature can have is 2, if it is targeted by both drugs in a two-drug combination entry. Note also that one drug can target more than one molecule. Setting 3 (MACCS): In this setting, each drug is represented by a 162-dimensional binary molecular access system (MACCS) fingerprint.22 Each position of the fingerprint corresponds to a molecular substruc- ture, e.g., a functional group that may be present in a drug molecule. The respective bit is set to 1 if the corresponding substructure is present in the drug molecule at least once, and 0, otherwise. Additionally, one input feature for each drug is needed to denote its treatment concentration. Consequently, this setting uses a total of 2 · 162 + 2 · 1 = 326 drug features. To encode monotherapies, one of the fingerprints and the corresponding concentration are set to 0. Setting 4 (PhysChem): This setting is similar to Setting 3 but replaces each MACCS fingerprint with 182 numer- ical physico-chemical descriptors that denote different properties of the respective drugs, such as the molecular weight, number of valence electrons, or the logP value that measures 9 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint lipophilicity. Consequently, this setting uses a total of 2 · 182 + 2· 1 = 366 drug features. To denote monotherapies, one set of properties and the corresponding concentration are set to 0. Depending on the desired application, the different settings provide different benefits: Set- tings 3 and 4 allow making predictions for arbitrary drug molecules given that their MACCS fingerprint or physico-chemical properties are known. Consequently, the resulting models can be used to make predictions for previously unseen, e.g., newly developed compounds. In contrast, models derived from Setting 1 and 2 are limited to those 265 drugs that were present in our dataset and hence encoded in the input. However, these models can not only make predictions for single drugs and two-drug combinations but even for treatments using three or more drugs simultaneously. While three-drug combination therapies have already been approved for cancer treatment by the United States Food and Drug Administration (FDA),25 DrugComb does not provide such data. Machine Learning Algorithms We investigate the predictive performance of three ML algorithms: neural networks ran- dom forests, and elastic net. We chose these models, since neural networks and tree-based

Methods

are commonly used for synergy prediction. 7 Furthermore, neural networks are also popular for drug sensitivity prediction, 26–28 while random forest and elastic nets are used less frequently for this task.29–34 In our recently published benchmarking, we found, however, that tree-based methods and elastic nets frequently outperform neural networks in predicting drug responses. 24 In line with our findings, several studies found that deep learning does not improve over conventional ML algorithms for making predictions on tabular data, 35–37 or to generate feature representations for model inputs. 24,38 All prediction models were implemented in Python 3.11. Random forests and elastic net models were implemented using scikit-learn Version 1.5.0, 39 while neural networks were im- plemented using tensorflow Version 2.16.1 40 with GPU support. The hyperparameters for 10 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Figure 1: Prediction pipeline. This figure summarizes our pipeline for the prediction of relative inhibitions. The large blue box depicts the different types of input features and representations we investigated. The grey box at the top right lists our data resources. The yellow box shows the different ML algorithms we used. The green box at the bottom depicts the model output, i.e., the relative inhibition for a given cell-drug-drug combination at defined treatment concentrations. Lastly, the purple box shows potential downstream analyses that can be performed based on the model predictions. 11 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint each algorithm are provided in Table 1. Table 1: Hyperparameters of the investigated ML algorithms. This table denotes the tuned hyperparameters for each ML algorithm. For hyperparameters not stated explicitly, the default parameters as provided the respective Python package were employed. Explicitly tuned hyperparameters are marked in bold. For the PhysChem setting (i.e., the setting with the largest data matrix), we were unable to train neural networks with the ELU activation or learning rates of 0.1 due to insufficient memory for resource allocation even when decreasing the batch size. Model Parameter Value(s) Elastic net alpha 0.01, 0.1, 1, 10, 100 l1 ratio 0, 0.25, 0.5, 0.75, 1 Random forest max depth 100, 1000000 max features 25, 50, 100, 250 min samples leaf 2, 20, 100, 1000 n estimators 500 Neural network loss mean squared error activation tanh, ELU (none in last layer) optimizer Adam learning rate 0.0001, 0.001, 0.1 hidden layers 1,2,3,4,5 size of hidden layers equally spaced btw. in-/output size dropout 0.1, 0.3 batch size 256 bias initializer 0.01 kernel initializer glorot uniform for tanh, he normal for ELU activation kernel regularizer l2 epochs 300 validation split 0.2 early stopping yes patience 15 restore best weights True Model Training and Testing After filtering and processing the data as described above, we randomly divided the remain- ing cell lines into a training set (80% of cell lines) and a test set (20%). Since multiple data entries exist for each cell line (screening of different drugs/drug combinations at different 12 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint concentrations), the final training data consists of all entries involving a cell line from the training set (3,741,209 entries). The final test data contains all remaining entries (1,550,215), i.e., all entries involving a cell line from the test set. This splitting ensures that the test per- formance is always evaluated on cell lines that were unseen during model training, thereby mimicking the scenario of making predictions for a previously unseen patient. In contrast, the same drugs and drug-combinations can occur in both the training and test data. On the training data, we performed a 5-fold cross validation (CV) to determine the best- performing hyperparameters of each ML model (see Table 1). The CV folds were generated by randomly dividing the training cell lines into five disjoint folds and assigning all entries involving a certain cell line to the corresponding fold. Since the number of available entries per cell line differs, the size of CV folds varies slightly between 644,308 and 857,361 entries. For the hyperparameter combination with smallest mean absolute error (MAE) averaged across all five folds, one final model is trained on the complete training data and its perfor- mance is evaluated on the test data. For the models using one-hot encodings (Setting 1 and Setting 2), each drug has a designated input node. This is not the case for the models using drug features (Setting 3 and Setting 4), where swapping the features and concentration of the first drug with those of the second drug represents the same treatment but results in changes in the input representation (cf. input visualization in Figure 1). However, the model output should not depend on the order of the drugs in the input, i.e., it should not depend on whether drug features of a drug A in the input vector are located in front of or behind those of a drug B. Therefore, each original sample is included twice in the datasets for Settings 3 and 4. These duplicate samples differ only in the order of the drug features and concentrations: once in the order A-B, once in the order B-A. In the Results section, we investigate the impact on model performance when models are trained using the duplicated versus non-duplicated data. The test performance is always evaluated on the duplicated entries. 13 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Fitting of Dose-Response Curves and Computation of Sensitivity Measures Using the relative inhibitions predicted by our models, it is possible to reconstruct dose- response curves for monotherapies and dose-response matrices for combination therapies (cf. Figure 2). Based on these curves/matrices, various measures of drug response can be derived. To this end, we first converted the (actual and predicted) relative inhibitions into relative viabilities by subtracting the relative inhibitions from 100 and dividing the result by 100. Additionally, we clamped viabilities to [0, 1]. Note that we report relative viabilities in range [0, 1] rather than range [0, 100] to keep the results consistent and comparable to our previous study. 15 To perform the curve-fitting for monotherapies, we employed a three-parametric logistic function from the drc R-package: 41,42 f(x) = c + 1 − c 1 + exp(b · (log(x) − log(e))) (1) Here, f(x) denotes the estimated relative viability of the considered cell line at drug concen- tration x, c denotes the curve asymptote for increasing concentrations, b denotes the curve’s slope, and e denotes the concentration at the inflection point. We only fit curves when at least five dose-response points were available and we discarded all curves where the root mean squared error (RMSE) between the actual viabilities and those derived from the curve was greater than 0.3, a threshold that was previously employed for the data generation in the GDSC database. 43,44 From the fitted curves, we derived two measures of monotherapy drug responses, namely IC50 values and CMax viabilities. The CMax viability is a novel drug sensitivity measure which we recently published. 15 It is defined as the relative viability at the CMax concentration of the respective drug. The CMax concentration denotes the peak plasma concentration of a drug after administering the highest clinically recommended dose. 20 Thus, the CMax viability is designed to estimate the maximal effect a treatment can 14 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint realistically achieve. For the computation of CMax viabilities, we evaluated the function of the fitted curve at the drug’s CMax concentration (cf. Figure 2A). For the computation of IC50 values, we intersected the dose-response curves with a horizontal line with y-intercept 0.5. Figure 2: Exemplary dose-response curve and matrix. Sub-figure A depicts a dose-response curve (blue) for the monotherapy treatment of a cancer cell line (COSMIC ID 683667) with the drug Vorinostat. The fit is based on nine dose-response points (black). The yellow di- amond marks the CMax concentration of Vorinostat (1.2µM) , and the red star marks the corresponding CMax viability (0.41) derived from the curve (cf. Methods). Sub-figure B depicts a dose-response matrix for the combination treatment of cell line 909755 with Dasa- tinib and Lapatinib, where the x- and y-axes denote the respective treatment concentrations. The yellow and blue diamonds approximately mark the CMax concentration of both drugs, which are used to limit the considered concentration combinations for the computation of the combination CMax viability (cf. Methods). For combination therapies, we developed a variation of the CMax viability we call the com- bination CMax viability that can be derived from an actual/predicted dose-response matrix (cf. Figure 2B). Our initial idea was to interpolate the values in the dose-response matrix to derive the relative viability when administering the CMax concentration of both combina- tion drugs simultaneously. However, two synergistic drugs may have certain concentration windows with particularly high synergy/effectiveness.45 Thus, it is possible that the smallest viability is reached at a concentration combination smaller than the CMax concentrations. (Note that this should not happen for the dose-response curves we employed to compute the 15 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint CMax viability for monotherapies since these curves are monotonically decreasing.) Con- sequently, we considered the entire concentration range below the respective CMax values to compute our sensitivity measure. Conceptually, we want to derive the smallest viabil- ity within the area defined by the two concentration windows of the drugs limited at their respective CMax concentration. To compute the combination CMax viability, we linearly divided the concentration interval from 0 to the CMax for each drug into 100 equally spaced concentrations, each, resulting in 10,000 concentration combinations. For each combination, we estimated its relative viability through bilinear interpolation (R package pracma 46) from the full dose-response matrix. Finally, we define the minimum of all 10,000 values as the combination CMax viability. As the CMax denotes the maximal feasible treatment concentration for a drug monotherapy, it may not be feasible to administer the CMax concentration of two drugs in combination. Yet, we believe that the respective CMax concentrations are a reasonable upper bound to consider for the computation of combination CMax viabilities. Note also that administering the CMax concentration for monotherapies might likewise not be feasible in all cases. Fur- thermore, the presented approach can theoretically be applied to any desired concentrations other than CMax.

Results

Challenges of Using Synergy Scores for Personalized Treatment Recommendations The idea behind synergy scores is to measure the synergistic or antagonistic potential of two compounds for a given cell line by comparing their experimentally measured combined effect on cell survival to the expected effect obtained from a baseline model that assumes no synergism or antagonism. 8 The baseline model is derived from monotherapy data of both compounds. It estimates their combined effect at the concentrations that were tested in the 16 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint actual combination screening. The baseline and actually measured treatment responses are then subtracted from each other and the result is averaged over all concentration combi- nations to obtain a final synergy score. 13 Prominent examples of synergy scores that differ solely in their computation of the baseline are the Loewe, 9 Bliss, 10 HSA, 11 and ZIP 8 scores. For each of these scores, values > 0 indicate synergism and values < 0 indicate antagonism. A detailed description of the scores can be found in the Supplement. Undoubtedly, estimating the synergistic potential of compound combinations through syn- ergy scores can be valuable for the identification of promising combination treatments to undergo more detailed screening, the development of novel compounds, or drug repurposing. However, there are known limitations of synergy scores, which have been summarized and extensively discussed in a review by Vlot et al.,13 where they also performed several analyses using a large-scale drug combination dataset. Their findings can be briefly summarized as follows: Firstly, each synergy score is based on certain model assumptions, some of which might frequently be violated by real word data. 47,48 For example, both the Loewe and ZIP score require fitting dose-response curves of a certain shape to the monotherapy data. The Loewe score furthermore requires both drugs to have the same minimum and maximum effect as well as a constant potency ratio. 13 In comparison, the Bliss score relies on the as- sumption that the combined effect of two non-interacting drugs is statistically independent. Even though pharmacological independence is not necessarily required to achieve statistical independence, 13 it is most likely that statistical independence is caused by pharmacologi- cal independence. However, due to crosstalk between biological processes affected by either drug, achieving true pharmacological independence may be unlikely. 48 These examples also highlight that the assumptions between scores differ fundamentally. In their data analysis, Vlot et al. observed only a moderate to low correlation between the four different scores calculated on the same data, which might be explained by the different model assumptions. They also found that value ranges between scores are not comparable: the HSA and ZIP scores generally result in higher values than Loewe and Bliss. Additionally, Vlot 17 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint et al. observed that synergy scores are relatively difficult to reproduce between replicated experiments, even though the measured drug responses used to derive the scores correlated well between replicates. Furthermore, while misclassifications (synergism vs. antagonism) between scores were rare, several scenarios were identified where scores are likely to disagree, which could typically be retraced to a violation of model assumptions. Based on these findings, Vlot et al. advocate against the automated analysis of large-scale data using individual synergy scores. Instead, they recommend a careful investigation of the respective dose-response curves to then select an appropriate score. When training mod- els that only predict synergy scores (instead of concentration-specific inhibitions/viabilities), this is hardly possible since we are unable to assess the underlying dose-response relationship to validate model assumptions. We agree with these conclusions by Vlot et al. but would like to emphasize further points that make synergy scores difficult to use and interpret, especially for personalized treatment recommendations: A methodological criticism of synergy scores is that they are an aggre- gated measure over concentration ranges. The choice of meaningful concentration ranges is especially challenging for experimental drugs but crucial to draw meaningful conclusions for personalized medicine. We have previously shown that the screened concentration ranges in the GDSC database do not correspond well to clinically feasible treatment concentrations 24 and similar observations can also be made for the DrugComb database (cf. Supplementary Figure 1). Another major factor that hampers the use of synergy scores for treatment recommendation is that a high synergy between two compounds solely implies that the com- bination treatment is more effective than the respective monotherapies. However, it does not guarantee an overall high effectiveness (in terms of large relative inhibition) of the com- bination treatment. 5 It follows that synergy scores alone should not be used to compare the suitability of different treatment options for a given patient (cell line). In particular, synergy scores cannot be used to compare the effectiveness of different combination treat- ments. Furthermore, it is not possible to compare the effectiveness of combination therapies 18 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint to monotherapies involving different drugs. Based on these drawbacks of synergy scores in general and for treatment recommendation in particular, our analyses presented in the following focus on sensitivity prediction instead. Compared to the number of synergy prediction methods, 6,7 sensitivity prediction of drug combinations is understudied, especially when the goal is to make predictions for previously unseen cell lines as we have outlined in the Introduction section. In the following, we analyze how accurately drug responses (here: relative inhibitions) can be predicted for combination therapies. We compare different ML algorithms and model inputs and investigate the reconstruction of sensitivity measures from the model predictions. Additionally, we show how both mono- and combination therapies can be ranked by their effectiveness for a given cell line using our recently developed sensitivity measure: the CMax viability. 15 Overall Performance Comparison Figure 3 shows the performance of all investigated models in terms of test MAE (mean abso- lute error). The optimized hyperparameters for each model are provided in Supplementary Table 2. The first row depicts the results for the entire test data, while the second and third row focus on the data subsets representing mono- and combination therapies, respectively. Across all four settings, random forests resulted in the lowest error, followed by neural net- works, while elastic net had the worst performance. An exception is the PhysChem setting, where neural networks were outperformed by elastic net. The overall smallest test error (MAE 12.14) was achieved using a random forest with MACCS fingerprints as input. Additionally, even the worst performing random forest model (One- Hot, MAE of 13.04) still outperforms the best neural network (OneHot, MAE 14.08) and elastic net (OneHotTar, MAE of 16.46) models. Thus, the choice of ML algorithm seems to have a stronger impact on performance than the choice of input features, even though the 19 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint different input representations differ considerably (cf. Methods and Figure 1). Notably, the addition of drug targets slightly improves predictions for random forest and elastic net but has the opposite effect for neural networks. To contextualize the obtained errors, we compare them to two baseline models: A simple baseline model that always predicts the mean of the training data has a test MAE of 24.2. A more advanced baseline that always predicts the mean inhibition per drug for monotherapies and the mean inhibition of the combination for combination therapies has a test MAE of 19.74. Consequently, our best model improves these baselines by 50% and 37%, respectively. While all of the random forests models outperform the baseline, some elastic nets and neural networks are not superior to the baselines. When investigating mono- and combination therapies separately (cf. row 2 and 3 of Figure 3), the same overall trends can be observed, with the random forest model with MACCS features again having the smallest error. Generally, both types of therapies can be predicted similarly well, even though the training data contains slightly more combination (60%) than monotherapy data (40%). Besides the MAE, we also investigated the Pearson correlation (PCC) between the actual and predicted inhibitions. The overall PCC for the best-performing model was 0.8 (0.77 and 0.82 for mono- and combination therapies, respectively). However, computing correlations across the entire data artificially increases the PCC: since some drugs/combinations generally have lower/higher inhibitions than others, even mean predictions for each drug/combination (requiring no ML at all) would result in a correlation above 0.19 Thus, we computed the mean per-drug PCC for monotherapies (0.58) and the mean per-combination PCC for combination therapies (0.56) (see also Supplementary Figure 2). These values have a similar magnitude to what we previously observed for monotherapy sensitivity prediction. 15 Note that Zheng et al. 5 and Julkunen et al. 18 also provide overall correlations and errors for the prediction of relative inhibition/growth (cf. Introduction). However, their results are not comparable to ours since we investigate the performance for unknown cell lines, which 20 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint 13.04 14.08 16.99 12.95 14.29 17.89 13.09 13.98 16.56 12.77 14.44 16.46 12.91 14.49 17.09 12.7 14.41 16.17 12.14 15.7 20.56 12.76 14.69 18.83 11.85 16.18 21.38 12.2 21.91 20.5 12.96 22.37 19.02 11.84 21.69 21.19 OneHot OneHotTar MACCS PhysChem Complete dataMonotherapyCombin. therapy 0 20 40 60 0 20 40 60 0 20 40 60 | absolute − predicted | Model Random Forest Neural Network Elastic Net Figure 3: Test set performance. This figure shows the prediction errors (in terms of absolute difference between actual and predicted values) for each setting (columns) and each investi- gated ML algorithm (coloring). The first row shows the results for the entire test dataset, while the second and third row show the results for the data subsets corresponding to mono- and combination therapies, respectively. On top of each boxplot, the mean absolute error (MAE) is shown. 21 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint cannot be evaluated using the other two methods. It is known that the cell-line blind scenario increases errors considerably compared to making predictions for known cell lines. 49,50 To, nevertheless, assess how our random forest MACCS model would perform for known cell lines, we retrained the model using a random split of of the available data into a training (80%) and test set (20%). This split does not guarantee that cell lines in the test set were unseen during model training. Note that we still assured that duplicated entries denoting the same treatment are either exclusively contained in the training or the test set (cf. Methods). With a PCC of 0.96 and RMSE of 8.41, our performance for known cell lines is comparable to that reported by Zheng et al. (PCC = 0.98, RMSE = 7.12) 5 and Julkunen et al. (PCC = 0.97, RMSE = 9.86 in cross-validation; PCC = 0.92 on validation data). 18 However, the dataset used in our analyses is much larger and more heterogeneous comprising 947 cell lines, 265 drugs, and 9,535 drug combinations from different sources. In contrast, Zheng et al. employed solely the O’Neil dataset (39 cell lines, 38 drugs, 583 drug combinations), 51 which is known to be of high quality, 4,5 whereas Julkunen et al. employed solely the AstraZeneca DREAM dataset (85 cell lines, 118 drugs, 910 drug combinations). 6 Range Performance Comparison Next, we investigated whether certain inhibition ranges can be predicted more accurately than others. Figure 4 shows the distribution of test MAEs for different inhibition intervals in range (−25, 100]. This range covers 99% of the training and test data. Predictions are (on average) most accurate in the interval (0, 25] followed by the interval (−25,0]. As the actual inhibition increases, the error increases as well. This could be explained by the amount of available training data for each interval: Most data is located in the intervals (0 , 25] (41%) and (−25, 0] (25%), while each of the other intervals is only covered by around 10% of the data. In Supplementary Section 3 and Supplementary Figure 3, we provide further analysis on how the amount of training data for individual drugs/combinations affects prediction performance. 22 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Data points with high inhibition represent cases where the drug treatment greatly reduced the amount of viable cells, i.e., cases of effective treatment. Such data are commonly under- represented in drug screening datasets. 34,52,53 They are, however, of particular interest for personalized therapy, where the most effective treatment options for a given patient should be determined. Thus, for monotherapies, we developed SAURON-RF, a random forest-based model that is designed to improve predictions of drug-sensitive samples for both classification and regres- sion. 15,29 To this end, SAURON-RF relies (among other things) on sample-specific weights. Consequently, we also tried to incorporate sample weights into our models presented here to increase the importance of the underrepresented intervals. Unfortunately, the sample weights had only little impact on predictions, especially for the cases with highest inhibition (see Supplementary Figure 4). Correlation of Duplicated Entries As discussed in the Methods section, for the MACCS and PhysChem settings, the same treatment can be described by two different input representations through switching the order of the considered drugs (cf. Figure 1). Hence, we decided to include both input representations into the training and test data of our models. Ideally, predictions for both input representations should correlate well. Figure 5A shows the correlation of predictions for the best-performing random forest model trained using MACCS fingerprints. As desired, both predictions are highly correlated (PCC ≈ 1) and the mean absolute difference between them is very small (0.8). Figure 5B shows the same analysis for a model where we removed the duplicated entries from the training data. Even though the correlation is still high (PCC = 0.82), it decreased strongly, while prediction differences increased notably to 9.12 on average. The mean PCCs per drug (for monotherapies) and per drug combination are 0.98 and 0.97 for the duplicated training data and decrease to 0.78 and 0.86 for the non-duplicated training data, respectively. This is also represented in the test error where the model with 23 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint 10.69 12.83 18.86 9.02 8.59 8.78 16.74 19.13 16.31 21.25 24.35 31.74 27.05 32.98 47.84 10.29 12.67 17.29 8.75 10.26 9.35 16.4 19.05 14.96 21.02 22.92 28.89 26.58 29.62 43.66 10.8 16.22 22.8 8.37 13.78 15.13 14.36 19.6 12.86 18.69 18.64 27.35 24.67 19.15 48.36 10.73 25.96 22.42 8.3 12.93 15.2 14.09 14.23 12.93 19.05 38.87 27.18 25.81 65.17 48.63 OneHot OneHotTar MACCS PhysChem (−25,0](0,25](25,50](50,75](75,100] 0 25 50 75 100 125 0 25 50 75 100 125 0 25 50 75 100 125 0 25 50 75 100 125 0 25 50 75 100 125 | absolute − predicted | Model Random Forest Neural Network Elastic Net Figure 4: Test set performance for different inhibition ranges. This figure shows the pre- diction errors (in terms of absolute difference between actual and predicted values) for each setting (columns) and each investigated ML algorithm (coloring). Each row shows the per- formance for a different interval of actual relative inhibitions. On top of each boxplot, the mean absolute error (MAE) is shown. 24 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint duplicated training entries achieved an MAE of 12.39 compared to 14.6 for non-duplicated entries. Similar trends can also be observed for the PhysChem setting (see Supplementary Figure 5). R = 1, p < 2.2e−16 −25 0 25 50 75 100 −25 0 25 50 75 100 Prediction 1 Prediction 2 20k 40k 60k # Data points DuplicatedA R = 0.82, p < 2.2e−16 −25 0 25 50 75 100 −25 0 25 50 75 100 Prediction 1 Prediction 2 2k 4k 6k 8k # Data points Not duplicatedB Figure 5: Correlation of duplicated entries from the test data. This figure shows the cor- relation between the model predictions for duplicated entries. Duplicated entries refer to the same drug-drug-cell combination and the same treatment concentrations but can be represented by two different model inputs through swapping the features of the respective drugs (cf. Methods and Figure 1) Sub-figure A shows the test predictions when including duplicated entries into the training data, while Sub-figure B shows the predictions when training only on non-duplicated entries. In both figures, the black diagonal line represents the identity and R denotes the Pearson correlation between the predictions. Reconstruction of Drug Sensitivity Measures A benefit of predicting concentration-specific inhibition values is that based on the model’s predictions, dose-response curves and matrices can be reconstructed. These can in turn be used to compute various measures of drug sensitivity or synergy. Since the focus of this paper is on sensitivity prediction and Vlot et al. discourage the computation of arbitrary syn- ergy scores on large-scale data, 13 we reconstructed two measures of drug sensitivity, namely 25 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint our recently published measure called CMax viability for monotherapies, and a modification of this measure for drug combinations, which we call the combination CMax viability (cf. Methods). Unlike conventional sensitivity measures like the IC50 or AUC, the (combina- tion) CMax viability is comparable across drugs 15 and drug combinations. Consequently, it can be used to prioritize drugs/combinations for a given cell line (i.e., rank them by their effectiveness), which will be investigated in the next section. For the computation of monotherapy CMax viabilities, we first used the actual/predicted monotherapy entries of the test data to generate actual/predicted dose-response-curves (cf. Methods). An example is shown in Figure 2, where we also highlight how the CMax viability is derived from the curves. In total, we were able to compute both the actual and predicted CMax viabilities for 7,352 out of 32,564 cell line-drug combinations. The decreased num- ber of combinations stems from the fact that CMax concentrations were only available for 77 of the investigated drugs. Figure 6 depicts the prediction errors for the reconstructed monotherapy CMax viability values. The mean MAE averaged over all drugs is 0.12 and the mean MSE is 0.04, which is comparable to the error we previously achieved when pre- dicting CMax viabilities directly using either the SAURON-RF algorithm by Lenhof et al. 29 (MSE = 0.03) or a slightly adjusted version of DeepDR by Chiu et al. 54 (MSE = 0.09). 15 A baseline error can be obtained from a model that for every treatment concentration predicts the mean inhibition for each drug obtained from the training data. For such a model, the CMax viability (i.e., the viability at the CMax concentration) would also be predicted as this mean. This would result in a baseline MAE of 0.2, which our model improves by 40%. The overall PCC is 0.58 for the CMax viabilities and 0.41 for the baseline. However, the drug-specific PCC is only 0.1 (cf. Figure 6B). While a drug-specific baseline PCC cannot be computed for constant predictions, adding random noise with mean 0 to these constant predictions results in a baseline PCC of 0. Thus, our predictions improve this baseline but only slightly. When using our models to reconstruct IC50 values, we observe a similar phe- nomenon (overall PCC = 0.71, mean PCC per drug = 0.01, cf. Supplementary Figure 6). To 26 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint investigate the reasons for these low drug-specific correlations, we developed and evaluated different hypotheses, which can be found in the Supplement. Based on our evaluation of these hypotheses, we conclude that even though prediction errors are relatively small and comparable to our previous work, the derived measures cannot be used to compare the effect of a drug monotherapy on different cell lines. For the combination CMax viability (26,946 drug-drug-cell line combinations), we obtained similar results, which are depicted in Figure 6C and D. 0.0 0.1 0.2 0.3 0.4MAE per drug A −1.0 −0.5 0.0 0.5 1.0 PCC per drug B 0.0 0.2 0.4 0.6MAE per drug combination C −1.0 −0.5 0.0 0.5 1.0 PCC per drug combination D Figure 6: Reconstruction of (combination) CMax viabilities from predicted dose-response curves/matrices. Sub-figures A and B (red) show the distribution of MAE and PCC per drug for the reconstruction of CMax viabilities using dose-response curves fit on the test set monotherapy data. Sub-figures C and D (blue) show the distribution of MAE and PCC per dug combination for the reconstruction of combination CMax viabilities using dose-response matrices derived from the test set drug combination data. Nevertheless, we would like to highlight that such an evaluation of drug-specific correlations as conducted here is frequently not performed for drug sensitivity and synergy prediction (cf. Supplementary Table 3, where we compare the investigated settings and analyses for 39 state-of-the-art methods). Thus, similar problems may often go undetected. Due to the novelty of our prediction approach, there is no method we could directly compare our findings to. Nevertheless, our analyses presented earlier show that our models are com- petitive in performance to the approaches by Zheng et al. 5 and Julkunen et al. 18 for making predictions using known cell lines and drug combination data. Note that both approaches 27 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint do not provide drug-/combination-specific correlations. For cell-blind evaluations on monotherapy data, we found three related approaches that provide drug-specific correlations: Our recently published method SAURON-RF achieves a mean PCC of 0.56 when directly predicting CMax viabilities using drug-specific models.15 In the same manuscript we also show that an adjusted version of the multi-drug model DeepDR by Chiu et al. 54 achieves a PCC of 0 for the same task. In comparison, Chawla et al. employ multi-drug models for the prediction of IC50 values and achieve mean PCCs between ca. 0.18 and 0.5 for different ML algorithms. Lastly, Rahman and Pal achieve mean PCCs between 0.29 and 0.44 when reconstructing AUC values from predicted dose-response curves. While not directly comparable to our approach, these works underline that at least weak to mod- erate drug-specific correlations can be achieved (1) for predicting CMax viabilities (2) when using multi-drug models (3) when deriving sensitivity measures from predicted curves. Yet, it remains to be investigated further if and how comparable results can be achieved when combining all three factors and also considering combination therapies, thereby enabling predictions for arbitrary drugs/combinations and measures, which we aim to achieve here. Treatment Prioritization In our final analysis, we investigate how accurately drugs and drug combinations can be prioritized for a given cell line based on the model predictions: For each cell line in the test set, we used the computed CMax viabilities for the monotherapy and combination data to achieve a ranking of drugs and drug combinations from most to least effective. Drug prioriti- zation is supposed to mimic a personalized treatment scenario with the goal to achieve a list of most effective treatment suggestions for a given patient. The results are shown in Figure 7, where the first row shows the results for monotherapies only, while the second row shows the results when combining mono- and combination therapies into one list. The results for combination therapies only are shown in Supplementary Figure 8. 28 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint For monotherapies, the Spearman correlation coefficient (SCC) between the actual and pre- dicted rankings was 0.74 (baseline (as defined in the previous section): 0.54). Our predictions clearly outperform the baseline. Still, the baseline correlation is relatively high, indicating that the differences in effectiveness between drugs are easier to predict than the differences between cell lines receiving the same treatment. While an accurate ranking for the entire list is desirable, one would typically place more emphasis on the correct identification of the most effective treatments. Thus, we computed the mean overlap between the first k elements of the actual and predicted rankings. For monotherapies, the average length of the predicted drug lists is 31.15. The average over- lap between the top k = 5 and k = 10 actual and predicted most effective drugs is 3.16 (baseline: 2.14) and 7.68 (baseline: 6.55), respectively (results for further k are shown in Supplementary Figure 9). Furthermore, the median rank of the actually most effective drug in the predicted ranking is 2.5 (baseline: 8), and the median rank of the drug predicted to be most effective in the actual list is 3 (baseline 6). The median difference between the true CMax viabilities of the actual most effective and predicted most effective drugs is only 0.02 (baseline 0.31). The second row of Figure 7 shows the analogous prioritization results when combining mono- and combination treatments into one list. The SCC of 0.76 (baseline: 0.62) is comparable to the results for monotherapies. Since the average list length is much greater when including drug combinations (838.62), the overlaps at k = 5 (1.26, baseline: 0.68) and k = 10 (3.38, baseline: 2.09) are lower (cf. also Supplementary Figures 9 and 10). Furthermore, the me- dian rank of the actually best treatment in the predicted list (27, baseline: 170.5) and of the predicted best treatment in the actual list (9.5, baseline: 12) decrease. Still, results clearly improve over the baseline. Furthermore, the median difference in viability between the actu- ally most effective treatment and the treatment predicted to be most effective remains small (0.02, baseline 0.03). 29 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint 0.0 0.2 0.4 0.6 0.8 1.0SCC A 0 1 2 3 4 5Overlap act./pred. lists, k = 5 B 0 2 4 6 8 10Overlap act./pred. lists, k = 10 C 0 10 20 30Pred. rank of act. best drug D 0 10 20 30Act. rank of pred. best drug E 0.0 0.2 0.4 0.6 0.8 1.0 Difference in CMax viabiliy btw. act. and pred. best drug F 0.0 0.2 0.4 0.6 0.8 1.0SCC G 0 1 2 3 4 5Overlap act./pred. lists, k = 5 H 0 2 4 6 8 10Overlap act./pred. lists, k = 10 I 0 250 500 750 1000Pred. rank of act. best treatment J 0 10 20 30Act. rank of pred. best treatment K 0.0 0.2 0.4 0.6 0.8 1.0 Difference in CMax viabiliy btw. act. and pred. best treatment L Figure 7: Treatment prioritization. This figure depicts the test set prioritization results for mono- and combination therapies. Sub-figures A to F (red) focus on the prioritization of monotherapies including: (A) the SCC between the actual and predicted rankings for each cell line, (B)/(C) the intersection size between the 5/10 actual and predicted most effective treatments, (D) the predicted rank of the actual most effective treatment, (E) the actual rank of the treatment predicted to be most effective, and (F) the difference between the actual CMax viabilities for the actual and predicted most effective treatment. Sub-figures G to L (blue) show the analogous prioritization results when combining mono- and combination treatments into one list. 30 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint

Discussion

Administering not only single but multiple drugs in combination is common in cancer treat- ment. However, while drug response datasets for monotherapy data have been available for more than a decade, large-scale data sets for combination therapy have only become publicly available more recently, e.g., the DrugComb database. 4,5 While the DrugComb data have extensively been studied for the prediction of drug synergy, they are still underused for the prediction of drug sensitivity, especially with the focus on making personalized treatment recommendations. For this application case, we found the scores that are widely used for synergy prediction less suited due to various reasons discussed in this manuscript. To exploit the available drug combination data for predicting drug responses without rely- ing on synergy scores, we developed and evaluated several ML algorithms and architectures that directly predict concentration-specific drug responses in the form of relative inhibitions. We are convinced that this approach has various benefits for personalized treatment rec- ommendation: First, our approach allows the reconstruction of dose-response curves and matrices from the model predictions. From these curves/matrices, various sensitivity or syn- ergy measures can be reconstructed. The inspection of individual curves/matrices can aid in validating the underlying assumptions for certain measures. Next, our approach can pre- dict both mono- and combination therapies. Additionally, our approach allows for making predictions for unseen cell lines, thereby mimicking the scenario assessing drug responses for a new patient. Together with our novel sensitivity measure, the (combination) CMax viability, this framework finally enables the prioritization of both mono- and combination therapy options for unseen cell lines (patients). Our evaluations on the DrugComb database show that our models substantially improve baseline models and show very little variation when predicting the same treatment using different input representations. Notably, we evaluated our models on unseen cell lines, which is often neglected in drug sensitivity prediction. 19 Moreover, our models are also competitive with state-of-the-art approaches when making predictions for known cell lines. Furthermore, 31 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint we achieved strong correlations for treatment prioritization. However, our analyses also re- veal weaknesses of directly predicting relative inhibitions: While prediction errors for the reconstruction of drug response measures are competitive with other approaches, the drug- specific correlations between these measures only slightly improve over a baseline model. Additionally, we observed increased prediction errors for data samples with high inhibitions, corresponding to cases of treatment sensitivity. This issue is relatively well-known for clas- sification but has rarely been discussed or addressed for regression. 29,34 Three main factors can be adjusted to potentially address such challenges, namely the choice of ML algorithm, the choice and representation of input features, and the used data: ML algorithm: We investigated neural networks (highly popular for sensitivity and syn- ergy prediction), random forests, and elastic nets. In our recently published benchmarking, we found both elastic nets and random forests to outperform neural networks when predict- ing drug sensitivity. 24 For the prediction of inhibitions, as investigated here, random forests are superior to the other algorithms. In general, a plethora of further (potentially more sophisticated) approaches can be used to model the prediction of inhibitions. However, as discussed in our benchmarking 24 and also by Li et al., 55 more complex approaches are not necessarily superior to simpler ML algorithms, and careful evaluation is required to ensure a fair performance comparison. Input features and representation: For the characterization of cell lines in the model input, several sources found gene expression to be the most informative omics-type for pre- dicting drug responses. 54,56,57 However, the inclusion of further omics or a priori knowledge, e.g., known sensitivity biomarkers or protein interactions, might improve predictions. Similarly, further drug properties, e.g., Morgan fingerprints58 could be investigated, or graph neural networks could be employed to represent drugs as molecular graphs. However, the superiority of molecular graphs over conventional drug fingerprints for sensitivity/synergy prediction and drug discovery has been questioned. 27,59 32 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Dataset: With 947 cell lines, 265 drugs, and 9,535 drug combinations, the dataset in- vested here is notably larger compared to other approaches working on drug combination data. 5,12,14,18,49,60,61 Unfortunately, given the size of the investigated dataset, hardware re- strictions become a limiting factor for ML. Despite training models on a compute cluster with machines of 500 gigabytes working memory, we had to reduce our data regarding the number of considered drugs, features, and methods (cf. Methods). Generally, a large amount of training data benefits model training and robustness. Yet, if the dataset is heterogeneous, e.g., due to different data sources, as is the case for DrugComb, this may decrease performance compared to models built and evaluated on a more homo- geneous dataset. Even though Zagidullin et al. found the reproducibility between replicates from different datasets satisfactory in the first release of DrugComb, 4 disagreement between drug response data from different sources is a well-known problem. 57,62,63 Especially for clin- ical applications, combining data from different sources (e.g., different hospitals) is essential, and models should be able to cope with this degree of heterogeneity. To this end, meta- or transfer-learning methods could be leveraged. 64 Investigating different ML algorithms, input representations, and datasets can potentially improve the predictive performance. However, especially in a sensitive field such as per- sonalized medicine, performance alone should not be regarded the sole building block of model trustworthiness. 65 E.g., to assess the reliability of individual predictions, uncertainty estimation frameworks like conformal prediction could be applied. 15,66–68 Additionally, in- corporating interpretability mechanisms 12,65,69 into the model design and evaluation can aid in identifying drug or cell line properties that impact the predicted response. This could not only make predictions more comprehensible but also be useful to infer novel mechanisms of drug sensitivity or synergy. 33 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint Data and Software Availability The drug response data used for our analyses can be downloaded from the DrugComb web- site (https://drugcomb.org/download) and the DrugComb API (https://api.drugcomb. org/, cf. Methods). The gene expression data of canncer cell lines can be downloaded from the GDSC website ( https://www.cancerrxgene.org/downloads/bulk_download). CMax concentrations for 77 of the investigated drugs can be derived from Liston and Davis. 20 Our code is available at GitHub (https://github.com/unisb-bioinf/Drug_Combination_ Sensitivity_Prediction), where we also provide the SMILES, MACCS fingerprints and physico-chemical properties derived from RDKit, 21 as well as the one-hot encoded target molecules of the investigated compounds.

References

(1) Yang, W.; Soares, J.; Greninger, P.; Edelman, E. J.; Lightfoot, H.; Forbes, S.; Bindal, N.; Beare, D.; Smith, J. A.; Thompson, I. R.; others Genomics of Drug Sen- sitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research 2012, 41, D955–D961. (2) Iorio, F.; Knijnenburg, T. A.; Vis, D. J.; Bignell, G. R.; Menden, M. P.; Schubert, M.; Aben, N.; Gon¸ calves, E.; Barthorpe, S.; Lightfoot, H.; others A landscape of pharma- cogenomic interactions in cancer. Cell 2016, 166, 740–754. (3) Mokhtari, R. B.; Homayouni, T. S.; Baluch, N.; Morgatskaya, E.; Kumar, S.; Das, B.; Yeger, H. Combination therapy in combating cancer. Oncotarget 2017, 8, 38022. (4) Zagidullin, B.; Aldahdooh, J.; Zheng, S.; Wang, W.; Wang, Y.; Saad, J.; Malyutina, A.; Jafari, M.; Tanoli, Z.; Pessia, A.; others DrugComb: an integrative cancer drug combi- nation data portal. Nucleic acids research 2019, 47, W43–W51. 34 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint (5) Zheng, S.; Aldahdooh, J.; Shadbahr, T.; Wang, Y.; Aldahdooh, D.; Bao, J.; Wang, W.; Tang, J. DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal. Nucleic acids research 2021, 49, W174–W184. (6) Menden, M. P.; Wang, D.; Mason, M. J.; Szalai, B.; Bulusu, K. C.; Guan, Y.; Yu, T.; Kang, J.; Jeon, M.; Wolfinger, R.; others Community assessment to advance compu- tational prediction of cancer drug combinations in a pharmacogenomic screen. Nature communications 2019, 10, 2674. (7) Torkamannia, A.; Omidi, Y.; Ferdousi, R. A review of machine learning approaches for drug synergy prediction in cancer. Briefings in Bioinformatics 2022, 23, bbac075. (8) Yadav, B.; Wennerberg, K.; Aittokallio, T.; Tang, J. Searching for drug synergy in complex dose–response landscapes using an interaction potency model. Computational and structural biotechnology journal 2015, 13, 504–513. (9) Loewe, S. The problem of synergism and antagonism of combined drugs. Arzneimittel- forschung 1953, 3, 285–290. (10) Bliss, C. I. The toxicity of poisons applied jointly 1. Annals of applied biology 1939, 26, 585–615. (11) Berenbaum, M. C. What is synergy? Pharmacological reviews 1989, 41, 93–141. (12) Janizek, J. D.; Celik, S.; Lee, S.-I. Explainable machine learning prediction of synergistic drug combinations for precision cancer medicine. BioRxiv 2018, 331769. (13) Vlot, A. H.; Aniceto, N.; Menden, M. P.; Ulrich-Merzenich, G.; Bender, A. Applying synergy metrics to combination screening data: agreements, disagreements and pitfalls. Drug discovery today 2019, 24, 2286–2298. (14) Malyutina, A.; Majumder, M. M.; Wang, W.; Pessia, A.; Heckman, C. A.; Tang, J. Drug 35 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint combination sensitivity scoring facilitates the discovery of synergistic and efficacious drug combinations in cancer. PLoS computational biology 2019, 15, e1006752. (15) Lenhof, K.; Eckhart, L.; Rolli, L.-M.; Volkamer, A.; Lenhof, H.-P. Reliable anti-cancer drug sensitivity prediction and prioritization. Scientific Reports 2024, 14, 12303. (16) Rahman, R.; Pal, R. Analyzing drug sensitivity prediction based on dose response curve characteristics. IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2016; pp 140–143. (17) Rahman, R.; Dhruba, S. R.; Ghosh, S.; Pal, R. Functional random forest with applica- tions in dose-response predictions. Scientific reports 2019, 9, 1628. (18) Julkunen, H.; Cichonska, A.; Gautam, P.; Szedmak, S.; Douat, J.; Pahikkala, T.; Ait- tokallio, T.; Rousu, J. Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects. Nature communications 2020, 11, 6136. (19) Codic` e, F.; Pancotti, C.; Rollo, C.; Moreau, Y.; Fariselli, P.; Raimondi, D. The Spec- ification Game: Rethinking the Evaluation of Drug Response Prediction for Precision Oncology. bioRxiv 2024, 2024–10. (20) Liston, D. R.; Davis, M. Clinically Relevant Concentrations of Anticancer Drugs: A Guide for Nonclinical StudiesGuide to Clinical Exposures of Anticancer Drugs. Clinical cancer research 2017, 23, 3489–3498. (21) Landrum, G.; others RDKit: Open-source cheminformatics. version 2023.3.2. (22) Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL keys for use in drug discovery.Journal of chemical information and computer sciences 2002, 42, 1273–1280. (23) Landrum, G.; others RDKit Documentation - rdkit.Chem.Descriptors module. https: 36 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint //www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html, Accessed: 2024-11- 11. (24) Eckhart, L.; Lenhof, K.; Rolli, L.-M.; Lenhof, H.-P. A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction. Briefings in bioinformatics 2024, 25 . (25) Fudio, S.; Sellers, A.; P´ erez Ramos, L.; Gil-Alberdi, B.; Zeaiter, A.; Urroz, M.; Car- cas, A.; Lubomirov, R. Anti-cancer drug combinations approved by US FDA from 2011 to 2021: main design features of clinical trials and role of pharmacokinetics. Cancer Chemotherapy and Pharmacology 2022, 90, 285–299. (26) Baptista, D.; Ferreira, P. G.; Rocha, M. Deep learning for drug response prediction in cancer. Briefings in Bioinformatics 2020, 22, 360–379. (27) An, X.; Chen, X.; Yi, D.; Li, H.; Guan, Y. Representation of molecules for drug response prediction. Briefings in Bioinformatics 2021, 23, bbab393. (28) Chen, Y.; Zhang, L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Briefings in bioinformatics 2022, 23, bbab378. (29) Lenhof, K.; Eckhart, L.; Gerstner, N.; Kehl, T.; Lenhof, H.-P. Simultaneous regres- sion and classification for drug sensitivity prediction using an advanced random forest method. Scientific Reports 2022, 12, 13458. (30) Rahman, R.; Matlock, K.; Ghosh, S.; Pal, R. Heterogeneity aware random forest for drug sensitivity prediction. Scientific reports 2017, 7, 1–11. (31) Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: a deep forest model to predict anti-cancer drug response. Methods 2019, 166, 91–102. 37 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint (32) Oskooei, A.; Manica, M.; Mathis, R.; Mart´ ınez, M. R. Network-based biased tree en- sembles (NetBiTE) for drug sensitivity prediction and drug sensitivity biomarker iden- tification in cancer. Scientific reports 2019, 9, 15918. (33) Fang, Y.; Xu, P.; Yang, J.; Qin, Y. A quantile regression forest based method to predict drug response and assess prediction reliability. PLoS One 2018, 13, e0205155. (34) Basu, A.; Mitra, R.; Liu, H.; Schreiber, S. L.; Clemons, P. A. RWEN: response-weighted elastic net for prediction of chemosensitivity of cancer cell lines. Bioinformatics 2018, 34, 3332–3339. (35) Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outper- form deep learning on typical tabular data? Advances in neural information processing systems 2022, 35, 507–520. (36) Shwartz-Ziv, R.; Armon, A. Tabular data: Deep learning is not all you need. Informa- tion Fusion 2022, 81, 84–90. (37) Borisov, V.; Leemann, T.; Seßler, K.; Haug, J.; Pawelczyk, M.; Kasneci, G. Deep Neural Networks and Tabular Data: A Survey. IEEE Transactions on Neural Networks and Learning Systems 2024, 35, 7499–7519. (38) Smith, A. M.; Walsh, J. R.; Long, J.; Davis, C. B.; Henstock, P.; Hodge, M. R.; Ma- ciejewski, M.; Mu, X. J.; Ra, S.; Zhao, S.; others Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC bioinformatics 2020, 21 . (39) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blon- del, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; others Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 2011, 12, 2825–2830. 38 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint (40) Abadi, M.; others TensorFlow: Large-Scale Machine Learning on Heterogeneous Sys- tems. 2015; https://www.tensorflow.org/. (41) Ritz, C.; Baty, F.; Streibig, J. C.; Gerhard, D. Dose-Response Analysis Using R. PLOS ONE 2015, 10 . (42) drc: Analysis of Dose-Response Curves. https://cran.r-project.org/web/ packages/drc/drc.pdf, Accessed: 2024-11-11. (43) Vis, D. J.; Bombardelli, L.; Lightfoot, H.; Iorio, F.; Garnett, M. J.; Wessels, L. F. Multilevel models improve precision and speed of IC50 estimates. Pharmacogenomics 2016, 17, 691–700. (44) Wellcome Sanger Institute, GDSC database Resources Download - IC50 Data defini- tions. https://cog.sanger.ac.uk/cancerrxgene/GDSC_release8.5/GDSC_Fitted_ Data_Description.pdf, 2024; Accessed: 2024-11-11. (45) Ianevski, A.; Giri, A. K.; Gautam, P.; Kononov, A.; Potdar, S.; Saarela, J.; Wenner- berg, K.; Aittokallio, T. Prediction of drug combination effects with a minimal set of experiments. Nature machine intelligence 2019, 1, 568–577. (46) Borchers, H. W. pracma: Practical Numerical Math Functions. 2022; R package version 2.4.2. (47) Lederer, S.; Dijkstra, T. M.; Heskes, T. Additive dose response models: explicit formu- lation and the loewe additivity consistency condition. Frontiers in pharmacology 2018, 9, 31. (48) Greco, W. R.; Bravo, G.; Parsons, J. C. The search for synergy: a critical review from a response surface perspective. Pharmacological Reviews 1995, 47, 331–385. (49) Preuer, K.; Lewis, R. P.; Hochreiter, S.; Bender, A.; Bulusu, K. C.; Klambauer, G. 39 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 2018, 34, 1538–1546. (50) Liu, P.; Li, H.; Li, S.; Leung, K.-S. Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinformatics 2019, 20 . (51) O’Neil, J.; Benita, Y.; Feldman, I.; Chenard, M.; Roberts, B.; Liu, Y.; Li, J.; Kral, A.; Lejnine, S.; Loboda, A.; others An unbiased oncology compound screen to identify novel combination strategies. Molecular cancer therapeutics 2016, 15, 1155–1162. (52) Knijnenburg, T. A.; Klau, G. W.; Iorio, F.; Garnett, M. J.; McDermott, U.; Shmule- vich, I.; Wessels, L. F. Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Scientific reports 2016, 6, 1–14. (53) Lenhof, K.; Gerstner, N.; Kehl, T.; Eckhart, L.; Schneider, L.; Lenhof, H.-P. MERIDA: a novel Boolean logic-based integer linear program for personalized cancer therapy. Bioinformatics 2021, 37, 3881–3888. (54) Chiu, Y.-C.; Chen, H.-I. H.; Zhang, T.; Zhang, S.; Gorthi, A.; Wang, L.-J.; Huang, Y.; Chen, Y. Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC medical genomics 2019, 12, 143–155. (55) Li, Y.; Hostallero, D. E.; Emad, A. Interpretable deep learning architectures for im- proving drug response prediction performance: myth or reality? Bioinformatics 2023, 39, btad390. (56) Costello, J. C.; Heiser, L. M.; Georgii, E.; G¨ onen, M.; Menden, M. P.; Wang, N. J.; Bansal, M.; Ammad-Ud-Din, M.; Hintsanen, P.; Khan, S. A.; others A community effort to assess and improve drug sensitivity prediction algorithms.Nature biotechnology 2014, 32, 1202–1212. 40 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint (57) Jang, I. S.; Neto, E. C.; Guinney, J.; Friend, S. H.; Margolin, A. A. Biocomputing 2014; World Scientific, 2014; pp 63–74. (58) Rogers, D.; Hahn, M. Extended-connectivity fingerprints. Journal of chemical informa- tion and modeling 2010, 50, 742–754. (59) Jiang, D.; Wu, Z.; Hsieh, C.-Y.; Chen, G.; Liao, B.; Wang, Z.; Shen, C.; Cao, D.; Wu, J.; Hou, T. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. Journal of cheminformatics 2021, 13, 1–23. (60) Li, X.; Xu, Y.; Cui, H.; Huang, T.; Wang, D.; Lian, B.; Li, W.; Qin, G.; Chen, L.; Xie, L. Prediction of synergistic anti-cancer drug combinations based on drug target network and drug induced gene expression profiles. Artificial intelligence in medicine 2017, 83, 35–43. (61) Sidorov, P.; Naulaerts, S.; Ariey-Bonnet, J.; Pasquier, E.; Ballester, P. J. Predicting synergism of cancer drug combinations using NCI-ALMANAC data. Frontiers in chem- istry 2019, 7, 509. (62) Haibe-Kains, B.; El-Hachem, N.; Birkbak, N. J.; Jin, A. C.; Beck, A. H.; Aerts, H. J.; Quackenbush, J. Inconsistency in large pharmacogenomic studies. Nature 2013, 504, 389–393. (63) Hatzis, C.; Bedard, P. L.; Birkbak, N. J.; Beck, A. H.; Aerts, H. J.; Stern, D. F.; Shi, L.; Clarke, R.; Quackenbush, J.; Haibe-Kains, B. Enhancing reproducibility in cancer drug screening: how do we move forward? Cancer research 2014, 74, 4016–4023. (64) Sharifi-Noghabi, H.; Peng, S.; Zolotareva, O.; Collins, C. C.; Ester, M. AITL: adver- sarial inductive transfer learning with input and output space adaptation for pharma- cogenomics. Bioinformatics 2020, 36, i380–i388. 41 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint (65) Lenhof, K.; Eckhart, L.; Rolli, L.-M.; Lenhof, H.-P. Trust me if you can: a survey on reliability and interpretability of machine learning approaches for drug sensitivity prediction in cancer. Briefings in Bioinformatics 2024, 25, bbae379. (66) Angelopoulos, A. N.; Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511 2021, (67) Norinder, U.; Carlsson, L.; Boyer, S.; Eklund, M. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. Journal of chemical information and modeling 2014, 54, 1596–1603. (68) Alvarsson, J.; McShane, S. A.; Norinder, U.; Spjuth, O. Predicting with confidence: using conformal prediction in drug discovery.Journal of Pharmaceutical Sciences 2021, 110, 42–49. (69) Tang, Y.-C.; Gottlieb, A. Explainable drug sensitivity prediction through cancer path- way enrichment. Scientific reports 2021, 11, 1–10. 42 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0