Abstract
To improve our understanding of multi-drug therapies, cancer cell line panels screened
with drug combinations are frequently studied using machine learning (ML). ML mod-
els trained on such data typically focus on predicting synergy scores, which support
drug development and repurposing efforts but have limitations when deriving personal-
ized treatment recommendations. To simulate a more realistic personalized treatment
scenario, we pioneer ML models that predict the relative growth inhibition (instead of
synergy scores), and that can be applied to previously unseen cell lines. Our approach
is highly flexible: it enables the reconstruction of dose-response curves and matrices,
as well as various measures of drug sensitivity (and synergy) from model predictions,
which can finally even be used to derive cell line-specific prioritizations of both mono-
and combination therapies.
1
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Introduction
Tailoring drug treatments to the individual patient is a major goal of cancer research. Due
to ethical concerns and limited availability of tumor material, relationships between molec-
ular properties of cancer cells and their drug responses are generally not studied on humans
directly, but instead using model systems, most prominently, cell lines. For monotherapy,
large cell line panels such as theGenomics of Drug Sensitivity in Cancer (GDSC) database1,2
have been available for more than a decade, providing both molecular characterizations and
drug screening data of cancer cell lines. However, combination therapies are frequently pre-
ferred over monotherapies for cancer treatment due to increased efficacy and a decreased
risk of treatment resistance. 3 More recently, large data resources have also become available
for drug combination screens: In 2019, the DrugComb data portal was introduced, 4,5 which
accumulates harmonized results of drug screens from different sources. To date, a total of
37 datasets are available in DrugComb. 5
Databases like the GDSC or DrugComb enable the systematic evaluation of the effect that
different drugs have on different types of cancer cells. Thus, two main use cases that can be
addressed using this data include (1) making personalized treatment recommendations for
a given patient (cell line) and (2) finding promising drugs or drug combinations that should
be further explored, e.g., for drug repurposing or the development of novel (combination)
therapies. Due to the complexity and high dimensionality of the data, machine learning
(ML) is commonly used to address these tasks.
ML models trained on monotherapy drug responses are usually suitable for both use cases,
(1) and (2), since they directly predict measures of drug effectiveness, such as the IC50 or
AUC value. In comparison, methods using drug combination data typically predict so-called
drug synergy scores, 6,7 which are usually suited for the second task but less applicable for
the first one as we briefly outline in the following: These scores quantify the synergistic or
antagonistic potential of two compounds for a given cell line by comparing their combined
effect on cell growth to the expected effect obtained from a baseline model that assumes
2
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
no synergism or antagonism. 8 Prominent examples are the Loewe, 9 Bliss, 10 HSA, 11 and
ZIP 8 synergy score. For each of these scores, values > 0 indicate synergism, and values
< 0 indicate antagonism, making it possible to classify drug combinations based on their
synergy score. Undoubtedly, estimating the synergistic potential of compound combinations
through synergy scores can be valuable for the identification of promising combination treat-
ments to undergo more detailed screening, the development of novel compounds, or drug
repurposing. However, even though synergy score prediction is sometimes motivated as a
step toward achieving personalized treatment recommendations, 6,12 we believe that synergy
scores have shortcomings that debilitate their usefulness for this application. Briefly sum-
marized, synergy scores are based on various (in part very strong) model assumptions, some
of which differ fundamentally between scores. 8,13 Additionally, disagreement between scores
was observed by Vlot et al. 13 and Yadav et al., 8 weakening their informative value. Two
factors that are especially relevant for personalized treatment recommendations are that (1)
the scores are aggregated over multiple drug concentrations, which do not necessarily cor-
respond well to clinically feasible concentration ranges 13 (cf. Supplementary Figure 1) and
(2) a high synergy between two compounds does not guarantee a high effectiveness of the
combination treatment. 5
Thus, instead of relying on synergy scores, we advocate exploring other strategies to estimate
the effectiveness of combination treatments. For the prediction of drug combination sensi-
tivity, several models that do not rely on synergy scores have been published: Malyutina
et al. 14 and Zagidullin et al. 4 trained cell-line specific models that predict CSS ( Combina-
tion Sensitivity Score ) values, a sensitivity measure for two-drug combination therapies. 14
However, the CSS score is an aggregated measure of sensitivity based on drug-specific AUC
values. Thus, like the AUC for monotherapies, 15 it depends strongly on the investigated
concentration ranges and is not comparable across compounds.
Instead of focusing on one specific measure of drug sensitivity, an alternative approach is to
directly predict the response (in terms of relative inhibition/viability) of cell lines at various
3
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
treatment concentrations. Thereby, we could, moreover, reconstruct various drug sensitivity
or synergy measures from the model predictions. For monotherapy, this approach has al-
ready been explored by Rahman and Pal et al. 16,17 For combination therapy, Zheng et al. 5
trained a CatBoost model that predicts the relative inhibition of two drugs at given concen-
trations for a given cell line. Similarly, comboFM by Julkunen et al. 18 employs higher-order
factorization machines (HOFMs) to predict relative cell growth.
A drawback of all combination prediction approaches mentioned above is that they are not
applicable to make predictions for previously unseen cell lines: Malyutina et al. 14 and Za-
gidullin et al. 4 trained cell line-specific models, while Zheng et al. 5 and Julkunen et al. 18
employ a one-hot encoding of cell lines and drugs in the model input such that both have
to be known during training already. Thus, these models are difficult to apply for personal-
ized treatment recommendations, where predictions should be made for a previously unseen
patient (cell line). According to Codic` e et al., this setting is frequently overlooked or insuf-
ficiently evaluated in ML-based drug response prediction. 19
In this manuscript, we present ML models for the prediction of drug combination sensitivity
that do not rely on synergy scores and are able to make predictions for previously unseen cell
lines, thereby mimicking a personalized treatment scenario. Instead of predicting an aggre-
gated measure of treatment response, our models predict the relative inhibition at arbitrary
treatment concentrations provided in the model input. Consequently, various measures of
drug sensitivity or synergy, including dose-response curves and matrices, as well as IC50
values or synergy scores can be reconstructed from the model predictions.
We investigate not only different ML algorithms (neural networks, random forest, elastic
net) but also analyze the benefit of including different drug characterizations (MACCS fin-
gerprints, physico-chemical properties), as well as information on drug targets. The different
model architectures provide different benefits, e.g., the ability to make predictions not just
for two-drug combinations but also for monotherapies and combination treatments consisting
of more than two drugs. Some of the investigated architectures also enable predictions to be
4
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
made for any previously unseen drug, given that the features of the drug (e.g., MACCS fin-
gerprint) are known. Our results show that random forests outperform the other algorithms
in all investigated settings. Additionally, we analyze which inhibition intervals are predicted
most accurately and investigate the reconstruction of mono- and combination sensitivity
measures from our model predictions.
Lastly, using our recently published drug response measure called CMax viability,15 we show-
case how our models can be applied to perform drug prioritization for mono- and combination
therapies based on clinically feasible treatment concentrations. Drug prioritization, i.e., the
ranking of drugs by their predicted effectiveness for a given cell line (patient) is a major goal
in personalized medicine: it exceeds the mere prediction of sensitivity measures and moves
toward deriving actual treatment recommendations.
Materials
and Data Processing
Drug response data
Drug screening data for our analyses was obtained from the DrugComb database Version 1.5.
More specifically, we employed the DrugComb API (https://api.drugcomb.org/) to down-
load the list of all cell lines and their corresponding COSMIC IDs, the full list of drugs with
their SMILE encodings and their target molecules, and the full dose-response matrices.
To assign the respective cell line and drug information to each dose-response matrix, we
downloaded the core database from https://drugcomb.org/download, which provides a unique
identifier for each dose-response experiment. Consequently, each database entry can be writ-
ten as:
(cell
line, drug row, drug col, conc row, conc col, inhibition)
5
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Here, cell line is the COSMIC ID of the investigated cell line, and drug row and drug col
are the names of the tested drugs. The entries conc row and conc col are the micromolar
concentrations of the tested compounds. For monotherapies, one of the drug names is set
to N U LL and the corresponding concentration is set to 0. Finally, inhibition denotes the
relative inhibition measured after administration of the denoted drug concentration(s) (see
Supplement for further information). Relative inhibitions > 0 denote reduced cell growth
through the drug treatment, while inhibitions < 0 indicate increased growth.
We removed the following entries from the dataset:
• poor quality entries as defined by the authors of DrugComb 5 with inhibition 200
• entries where the concentration of all tested drugs is 0 (conc row = conc col = 0)
• entries, where the corresponding cell line had no COSMIC ID or no gene expression
data provided in the GDSC database
Additionally, we converted entries where drug
row and drug col denote the same drug into
monotherapies by summing the respective treatment concentrations and setting drug col to
NULL:
(cell line, drug row, N U LL, conc row + conc col, 0, inhibition) .
Cases where two different drugs are provided but only one has a concentration > 0 were
modified to denote a monotherapy by replacing the drug with concentration 0 with N U LL.
Afterwards, all replicates involving the same cell line, the same drug(s), and same concentra-
tion(s) were averaged. Lastly, we log1p-normalized (log1p(x) = log(x+1)) the concentration
values in conc
row and conc col.
To keep the dataset size manageable, we only considered entries involving those 265 drugs
6
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
(cf. Supplementary Table 1) for which at least 10,000 entries are provided after performing
all the steps described above (cf. Discussion). Note that after this reduction still more than
10,000 entries remained for each of the drugs. In total, the final dataset consists of 5,291,424
entries covering 947 cell lines, 265 drugs, and 9,535 drug combinations.
Additionally, the CMax concentrations for 77 of the investigated drugs were obtained from
Liston and Davis.20 The CMax value denotes the peak plasma concentration after administer-
ing the highest clinically recommended dose of a drug. 20 In a recently published manuscript,
we employed CMax to derive a novel drug sensitivity measure called the CMax viability,
which will be described below. 15 We also use this measure to perform drug prioritization in
the Results section.
Drug Properties
For the representation of drugs in the inputs of our models, we investigated four different
settings, which will be discussed below (cf. also Figure 1). Using the SMILES drug repre-
sentations provided by DrugComb, we used RDKit version 2023.3.2 21 to calculate two types
of drug features:
• binary MACCS fingerprints 22 of length 166
• 209 physico-chemical drug properties using the function CalcMolDescriptors from the
rdkit.Chem.Descriptors module 23
We removed all properties that showed no variation across the investigated 265 drugs, re-
sulting in MACCS fingerprints of length 162 and 182 physico-chemical properties.
Additionally, 735 drug target molecules for the investigated drugs were obtained from Drug-
Comb.
7
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Gene Expression Data
Normalized gene expression data of 17,419 genes (Affymetrix Human Genome U219 Ar-
ray) was obtained from the GDSC database Release 8.3 (https://www.cancerrxgene.org/
downloads/bulk_download).
Methods
Model Inputs and Outputs
We train multi-drug models that predict the relative inhibition for a given cell line being
treated with given concentrations of one or more drug(s). The model inputs comprise cell
line features based on gene expression, a representation of the applied drugs, and the corre-
sponding drug concentrations. For the representation of drugs, we investigated four different
settings, which are depicted in Figure 1 and will be described below.
To characterize cell lines in the model input, we performed a principal component analysis
(PCA) on the gene expression values of the training cell lines and used the first 300 prin-
cipal components (PCs) as cell line features. This dimension reduction method performed
well in our recently published benchmarking of drug sensitivity prediction methods. 24 The
feature coefficients computed on the training data were used to project the test cell lines
into the same 300-dimensional space. To perform the cross-validation discussed below, we
re-computed the PCs based on the respective training folds.
In addition to the cell line features, we investigated four different settings for the encoding
of drugs in the model input:
Setting 1 (OneHot):
In this setting, no drug properties are included. Instead, a 265-dimensional encoding of
drugs is used. Each feature corresponds to one of the 265 drugs in our dataset. If a drug is
8
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
part of the current entry, its feature is set to the corresponding log1p-normalized treatment
concentration, otherwise it is set to 0.
Setting 2 (OneHotTar):
This setting uses the same concentration encoding as Setting 1 but additionally includes
290 drug target features. More precisely, we used the drug target annotations provided by
DrugComb and included all molecules as targets that were targeted by at least five of the
drugs in our dataset, resulting in a total of 290 target features. Each feature is then set to
the number of drugs in the current entry that target the corresponding molecule (0, 1, or
2): Since DrugComb provides only data on monotherapies and two-drug combinations, the
maximum value a target feature can have is 2, if it is targeted by both drugs in a two-drug
combination entry. Note also that one drug can target more than one molecule.
Setting 3 (MACCS):
In this setting, each drug is represented by a 162-dimensional binary molecular access system
(MACCS) fingerprint.22 Each position of the fingerprint corresponds to a molecular substruc-
ture, e.g., a functional group that may be present in a drug molecule. The respective bit is
set to 1 if the corresponding substructure is present in the drug molecule at least once, and
0, otherwise. Additionally, one input feature for each drug is needed to denote its treatment
concentration. Consequently, this setting uses a total of 2 · 162 + 2 · 1 = 326 drug features.
To encode monotherapies, one of the fingerprints and the corresponding concentration are
set to 0.
Setting 4 (PhysChem):
This setting is similar to Setting 3 but replaces each MACCS fingerprint with 182 numer-
ical physico-chemical descriptors that denote different properties of the respective drugs,
such as the molecular weight, number of valence electrons, or the logP value that measures
9
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
lipophilicity. Consequently, this setting uses a total of 2 · 182 + 2· 1 = 366 drug features. To
denote monotherapies, one set of properties and the corresponding concentration are set to 0.
Depending on the desired application, the different settings provide different benefits: Set-
tings 3 and 4 allow making predictions for arbitrary drug molecules given that their MACCS
fingerprint or physico-chemical properties are known. Consequently, the resulting models
can be used to make predictions for previously unseen, e.g., newly developed compounds.
In contrast, models derived from Setting 1 and 2 are limited to those 265 drugs that were
present in our dataset and hence encoded in the input. However, these models can not only
make predictions for single drugs and two-drug combinations but even for treatments using
three or more drugs simultaneously. While three-drug combination therapies have already
been approved for cancer treatment by the United States Food and Drug Administration
(FDA),25 DrugComb does not provide such data.
Machine Learning Algorithms
We investigate the predictive performance of three ML algorithms: neural networks ran-
dom forests, and elastic net. We chose these models, since neural networks and tree-based
Methods
are commonly used for synergy prediction. 7 Furthermore, neural networks are also
popular for drug sensitivity prediction, 26–28 while random forest and elastic nets are used
less frequently for this task.29–34 In our recently published benchmarking, we found, however,
that tree-based methods and elastic nets frequently outperform neural networks in predicting
drug responses. 24 In line with our findings, several studies found that deep learning does not
improve over conventional ML algorithms for making predictions on tabular data, 35–37 or to
generate feature representations for model inputs. 24,38
All prediction models were implemented in Python 3.11. Random forests and elastic net
models were implemented using scikit-learn Version 1.5.0, 39 while neural networks were im-
plemented using tensorflow Version 2.16.1 40 with GPU support. The hyperparameters for
10
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Figure 1: Prediction pipeline. This figure summarizes our pipeline for the prediction of
relative inhibitions. The large blue box depicts the different types of input features and
representations we investigated. The grey box at the top right lists our data resources.
The yellow box shows the different ML algorithms we used. The green box at the bottom
depicts the model output, i.e., the relative inhibition for a given cell-drug-drug combination
at defined treatment concentrations. Lastly, the purple box shows potential downstream
analyses that can be performed based on the model predictions.
11
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
each algorithm are provided in Table 1.
Table 1: Hyperparameters of the investigated ML algorithms. This table denotes the tuned
hyperparameters for each ML algorithm. For hyperparameters not stated explicitly, the
default parameters as provided the respective Python package were employed. Explicitly
tuned hyperparameters are marked in bold. For the PhysChem setting (i.e., the setting with
the largest data matrix), we were unable to train neural networks with the ELU activation or
learning rates of 0.1 due to insufficient memory for resource allocation even when decreasing
the batch size.
Model Parameter Value(s)
Elastic net alpha 0.01, 0.1, 1, 10, 100
l1 ratio 0, 0.25, 0.5, 0.75, 1
Random forest max depth 100, 1000000
max features 25, 50, 100, 250
min samples leaf 2, 20, 100, 1000
n estimators 500
Neural network loss mean squared error
activation tanh, ELU (none in last layer)
optimizer Adam
learning
rate 0.0001, 0.001, 0.1
hidden layers 1,2,3,4,5
size of hidden layers equally spaced btw. in-/output size
dropout 0.1, 0.3
batch
size 256
bias initializer 0.01
kernel initializer glorot uniform for tanh,
he normal for ELU activation
kernel regularizer l2
epochs 300
validation
split 0.2
early stopping yes
patience 15
restore
best weights True
Model Training and Testing
After filtering and processing the data as described above, we randomly divided the remain-
ing cell lines into a training set (80% of cell lines) and a test set (20%). Since multiple data
entries exist for each cell line (screening of different drugs/drug combinations at different
12
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
concentrations), the final training data consists of all entries involving a cell line from the
training set (3,741,209 entries). The final test data contains all remaining entries (1,550,215),
i.e., all entries involving a cell line from the test set. This splitting ensures that the test per-
formance is always evaluated on cell lines that were unseen during model training, thereby
mimicking the scenario of making predictions for a previously unseen patient. In contrast,
the same drugs and drug-combinations can occur in both the training and test data.
On the training data, we performed a 5-fold cross validation (CV) to determine the best-
performing hyperparameters of each ML model (see Table 1). The CV folds were generated
by randomly dividing the training cell lines into five disjoint folds and assigning all entries
involving a certain cell line to the corresponding fold. Since the number of available entries
per cell line differs, the size of CV folds varies slightly between 644,308 and 857,361 entries.
For the hyperparameter combination with smallest mean absolute error (MAE) averaged
across all five folds, one final model is trained on the complete training data and its perfor-
mance is evaluated on the test data.
For the models using one-hot encodings (Setting 1 and Setting 2), each drug has a designated
input node. This is not the case for the models using drug features (Setting 3 and Setting
4), where swapping the features and concentration of the first drug with those of the second
drug represents the same treatment but results in changes in the input representation (cf.
input visualization in Figure 1). However, the model output should not depend on the order
of the drugs in the input, i.e., it should not depend on whether drug features of a drug A in
the input vector are located in front of or behind those of a drug B. Therefore, each original
sample is included twice in the datasets for Settings 3 and 4. These duplicate samples differ
only in the order of the drug features and concentrations: once in the order A-B, once in the
order B-A. In the Results section, we investigate the impact on model performance when
models are trained using the duplicated versus non-duplicated data. The test performance
is always evaluated on the duplicated entries.
13
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Fitting of Dose-Response Curves and Computation of Sensitivity
Measures
Using the relative inhibitions predicted by our models, it is possible to reconstruct dose-
response curves for monotherapies and dose-response matrices for combination therapies (cf.
Figure 2). Based on these curves/matrices, various measures of drug response can be derived.
To this end, we first converted the (actual and predicted) relative inhibitions into relative
viabilities by subtracting the relative inhibitions from 100 and dividing the result by 100.
Additionally, we clamped viabilities to [0, 1]. Note that we report relative viabilities in range
[0, 1] rather than range [0, 100] to keep the results consistent and comparable to our previous
study. 15
To perform the curve-fitting for monotherapies, we employed a three-parametric logistic
function from the drc R-package: 41,42
f(x) = c + 1 − c
1 + exp(b · (log(x) − log(e))) (1)
Here, f(x) denotes the estimated relative viability of the considered cell line at drug concen-
tration x, c denotes the curve asymptote for increasing concentrations, b denotes the curve’s
slope, and e denotes the concentration at the inflection point. We only fit curves when at
least five dose-response points were available and we discarded all curves where the root
mean squared error (RMSE) between the actual viabilities and those derived from the curve
was greater than 0.3, a threshold that was previously employed for the data generation in
the GDSC database. 43,44 From the fitted curves, we derived two measures of monotherapy
drug responses, namely IC50 values and CMax viabilities. The CMax viability is a novel
drug sensitivity measure which we recently published. 15 It is defined as the relative viability
at the CMax concentration of the respective drug. The CMax concentration denotes the
peak plasma concentration of a drug after administering the highest clinically recommended
dose. 20 Thus, the CMax viability is designed to estimate the maximal effect a treatment can
14
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
realistically achieve. For the computation of CMax viabilities, we evaluated the function of
the fitted curve at the drug’s CMax concentration (cf. Figure 2A). For the computation of
IC50 values, we intersected the dose-response curves with a horizontal line with y-intercept
0.5.
Figure 2: Exemplary dose-response curve and matrix. Sub-figure A depicts a dose-response
curve (blue) for the monotherapy treatment of a cancer cell line (COSMIC ID 683667) with
the drug Vorinostat. The fit is based on nine dose-response points (black). The yellow di-
amond marks the CMax concentration of Vorinostat (1.2µM) , and the red star marks the
corresponding CMax viability (0.41) derived from the curve (cf. Methods). Sub-figure B
depicts a dose-response matrix for the combination treatment of cell line 909755 with Dasa-
tinib and Lapatinib, where the x- and y-axes denote the respective treatment concentrations.
The yellow and blue diamonds approximately mark the CMax concentration of both drugs,
which are used to limit the considered concentration combinations for the computation of
the combination CMax viability (cf. Methods).
For combination therapies, we developed a variation of the CMax viability we call the com-
bination CMax viability that can be derived from an actual/predicted dose-response matrix
(cf. Figure 2B). Our initial idea was to interpolate the values in the dose-response matrix to
derive the relative viability when administering the CMax concentration of both combina-
tion drugs simultaneously. However, two synergistic drugs may have certain concentration
windows with particularly high synergy/effectiveness.45 Thus, it is possible that the smallest
viability is reached at a concentration combination smaller than the CMax concentrations.
(Note that this should not happen for the dose-response curves we employed to compute the
15
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
CMax viability for monotherapies since these curves are monotonically decreasing.) Con-
sequently, we considered the entire concentration range below the respective CMax values
to compute our sensitivity measure. Conceptually, we want to derive the smallest viabil-
ity within the area defined by the two concentration windows of the drugs limited at their
respective CMax concentration. To compute the combination CMax viability, we linearly
divided the concentration interval from 0 to the CMax for each drug into 100 equally spaced
concentrations, each, resulting in 10,000 concentration combinations. For each combination,
we estimated its relative viability through bilinear interpolation (R package pracma 46) from
the full dose-response matrix. Finally, we define the minimum of all 10,000 values as the
combination CMax viability.
As the CMax denotes the maximal feasible treatment concentration for a drug monotherapy,
it may not be feasible to administer the CMax concentration of two drugs in combination.
Yet, we believe that the respective CMax concentrations are a reasonable upper bound to
consider for the computation of combination CMax viabilities. Note also that administering
the CMax concentration for monotherapies might likewise not be feasible in all cases. Fur-
thermore, the presented approach can theoretically be applied to any desired concentrations
other than CMax.
Results
Challenges of Using Synergy Scores for Personalized Treatment
Recommendations
The idea behind synergy scores is to measure the synergistic or antagonistic potential of
two compounds for a given cell line by comparing their experimentally measured combined
effect on cell survival to the expected effect obtained from a baseline model that assumes no
synergism or antagonism. 8 The baseline model is derived from monotherapy data of both
compounds. It estimates their combined effect at the concentrations that were tested in the
16
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
actual combination screening. The baseline and actually measured treatment responses are
then subtracted from each other and the result is averaged over all concentration combi-
nations to obtain a final synergy score. 13 Prominent examples of synergy scores that differ
solely in their computation of the baseline are the Loewe, 9 Bliss, 10 HSA, 11 and ZIP 8 scores.
For each of these scores, values > 0 indicate synergism and values < 0 indicate antagonism.
A detailed description of the scores can be found in the Supplement.
Undoubtedly, estimating the synergistic potential of compound combinations through syn-
ergy scores can be valuable for the identification of promising combination treatments to
undergo more detailed screening, the development of novel compounds, or drug repurposing.
However, there are known limitations of synergy scores, which have been summarized and
extensively discussed in a review by Vlot et al.,13 where they also performed several analyses
using a large-scale drug combination dataset. Their findings can be briefly summarized as
follows: Firstly, each synergy score is based on certain model assumptions, some of which
might frequently be violated by real word data. 47,48 For example, both the Loewe and ZIP
score require fitting dose-response curves of a certain shape to the monotherapy data. The
Loewe score furthermore requires both drugs to have the same minimum and maximum
effect as well as a constant potency ratio. 13 In comparison, the Bliss score relies on the as-
sumption that the combined effect of two non-interacting drugs is statistically independent.
Even though pharmacological independence is not necessarily required to achieve statistical
independence, 13 it is most likely that statistical independence is caused by pharmacologi-
cal independence. However, due to crosstalk between biological processes affected by either
drug, achieving true pharmacological independence may be unlikely. 48
These examples also highlight that the assumptions between scores differ fundamentally. In
their data analysis, Vlot et al. observed only a moderate to low correlation between the four
different scores calculated on the same data, which might be explained by the different model
assumptions. They also found that value ranges between scores are not comparable: the HSA
and ZIP scores generally result in higher values than Loewe and Bliss. Additionally, Vlot
17
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
et al. observed that synergy scores are relatively difficult to reproduce between replicated
experiments, even though the measured drug responses used to derive the scores correlated
well between replicates. Furthermore, while misclassifications (synergism vs. antagonism)
between scores were rare, several scenarios were identified where scores are likely to disagree,
which could typically be retraced to a violation of model assumptions.
Based on these findings, Vlot et al. advocate against the automated analysis of large-scale
data using individual synergy scores. Instead, they recommend a careful investigation of the
respective dose-response curves to then select an appropriate score. When training mod-
els that only predict synergy scores (instead of concentration-specific inhibitions/viabilities),
this is hardly possible since we are unable to assess the underlying dose-response relationship
to validate model assumptions.
We agree with these conclusions by Vlot et al. but would like to emphasize further points
that make synergy scores difficult to use and interpret, especially for personalized treatment
recommendations: A methodological criticism of synergy scores is that they are an aggre-
gated measure over concentration ranges. The choice of meaningful concentration ranges is
especially challenging for experimental drugs but crucial to draw meaningful conclusions for
personalized medicine. We have previously shown that the screened concentration ranges in
the GDSC database do not correspond well to clinically feasible treatment concentrations 24
and similar observations can also be made for the DrugComb database (cf. Supplementary
Figure 1). Another major factor that hampers the use of synergy scores for treatment
recommendation is that a high synergy between two compounds solely implies that the com-
bination treatment is more effective than the respective monotherapies. However, it does
not guarantee an overall high effectiveness (in terms of large relative inhibition) of the com-
bination treatment. 5 It follows that synergy scores alone should not be used to compare
the suitability of different treatment options for a given patient (cell line). In particular,
synergy scores cannot be used to compare the effectiveness of different combination treat-
ments. Furthermore, it is not possible to compare the effectiveness of combination therapies
18
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
to monotherapies involving different drugs.
Based on these drawbacks of synergy scores in general and for treatment recommendation
in particular, our analyses presented in the following focus on sensitivity prediction instead.
Compared to the number of synergy prediction methods, 6,7 sensitivity prediction of drug
combinations is understudied, especially when the goal is to make predictions for previously
unseen cell lines as we have outlined in the Introduction section.
In the following, we analyze how accurately drug responses (here: relative inhibitions) can
be predicted for combination therapies. We compare different ML algorithms and model
inputs and investigate the reconstruction of sensitivity measures from the model predictions.
Additionally, we show how both mono- and combination therapies can be ranked by their
effectiveness for a given cell line using our recently developed sensitivity measure: the CMax
viability. 15
Overall Performance Comparison
Figure 3 shows the performance of all investigated models in terms of test MAE (mean abso-
lute error). The optimized hyperparameters for each model are provided in Supplementary
Table 2. The first row depicts the results for the entire test data, while the second and third
row focus on the data subsets representing mono- and combination therapies, respectively.
Across all four settings, random forests resulted in the lowest error, followed by neural net-
works, while elastic net had the worst performance. An exception is the PhysChem setting,
where neural networks were outperformed by elastic net.
The overall smallest test error (MAE 12.14) was achieved using a random forest with MACCS
fingerprints as input. Additionally, even the worst performing random forest model (One-
Hot, MAE of 13.04) still outperforms the best neural network (OneHot, MAE 14.08) and
elastic net (OneHotTar, MAE of 16.46) models. Thus, the choice of ML algorithm seems to
have a stronger impact on performance than the choice of input features, even though the
19
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
different input representations differ considerably (cf. Methods and Figure 1). Notably, the
addition of drug targets slightly improves predictions for random forest and elastic net but
has the opposite effect for neural networks.
To contextualize the obtained errors, we compare them to two baseline models: A simple
baseline model that always predicts the mean of the training data has a test MAE of 24.2. A
more advanced baseline that always predicts the mean inhibition per drug for monotherapies
and the mean inhibition of the combination for combination therapies has a test MAE of
19.74. Consequently, our best model improves these baselines by 50% and 37%, respectively.
While all of the random forests models outperform the baseline, some elastic nets and neural
networks are not superior to the baselines.
When investigating mono- and combination therapies separately (cf. row 2 and 3 of Figure
3), the same overall trends can be observed, with the random forest model with MACCS
features again having the smallest error. Generally, both types of therapies can be predicted
similarly well, even though the training data contains slightly more combination (60%) than
monotherapy data (40%).
Besides the MAE, we also investigated the Pearson correlation (PCC) between the actual
and predicted inhibitions. The overall PCC for the best-performing model was 0.8 (0.77 and
0.82 for mono- and combination therapies, respectively). However, computing correlations
across the entire data artificially increases the PCC: since some drugs/combinations generally
have lower/higher inhibitions than others, even mean predictions for each drug/combination
(requiring no ML at all) would result in a correlation above 0.19 Thus, we computed the mean
per-drug PCC for monotherapies (0.58) and the mean per-combination PCC for combination
therapies (0.56) (see also Supplementary Figure 2). These values have a similar magnitude
to what we previously observed for monotherapy sensitivity prediction. 15
Note that Zheng et al. 5 and Julkunen et al. 18 also provide overall correlations and errors
for the prediction of relative inhibition/growth (cf. Introduction). However, their results are
not comparable to ours since we investigate the performance for unknown cell lines, which
20
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
13.04 14.08 16.99
12.95 14.29 17.89
13.09 13.98 16.56
12.77 14.44 16.46
12.91 14.49 17.09
12.7 14.41 16.17
12.14 15.7 20.56
12.76 14.69 18.83
11.85 16.18 21.38
12.2 21.91 20.5
12.96 22.37 19.02
11.84 21.69 21.19
OneHot OneHotTar MACCS PhysChem
Complete dataMonotherapyCombin. therapy
0
20
40
60
0
20
40
60
0
20
40
60 | absolute − predicted |
Model Random Forest Neural Network Elastic Net
Figure 3: Test set performance. This figure shows the prediction errors (in terms of absolute
difference between actual and predicted values) for each setting (columns) and each investi-
gated ML algorithm (coloring). The first row shows the results for the entire test dataset,
while the second and third row show the results for the data subsets corresponding to mono-
and combination therapies, respectively. On top of each boxplot, the mean absolute error
(MAE) is shown.
21
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
cannot be evaluated using the other two methods. It is known that the cell-line blind scenario
increases errors considerably compared to making predictions for known cell lines. 49,50 To,
nevertheless, assess how our random forest MACCS model would perform for known cell
lines, we retrained the model using a random split of of the available data into a training
(80%) and test set (20%). This split does not guarantee that cell lines in the test set were
unseen during model training. Note that we still assured that duplicated entries denoting the
same treatment are either exclusively contained in the training or the test set (cf. Methods).
With a PCC of 0.96 and RMSE of 8.41, our performance for known cell lines is comparable
to that reported by Zheng et al. (PCC = 0.98, RMSE = 7.12) 5 and Julkunen et al. (PCC
= 0.97, RMSE = 9.86 in cross-validation; PCC = 0.92 on validation data). 18 However,
the dataset used in our analyses is much larger and more heterogeneous comprising 947 cell
lines, 265 drugs, and 9,535 drug combinations from different sources. In contrast, Zheng et al.
employed solely the O’Neil dataset (39 cell lines, 38 drugs, 583 drug combinations), 51 which
is known to be of high quality, 4,5 whereas Julkunen et al. employed solely the AstraZeneca
DREAM dataset (85 cell lines, 118 drugs, 910 drug combinations). 6
Range Performance Comparison
Next, we investigated whether certain inhibition ranges can be predicted more accurately
than others. Figure 4 shows the distribution of test MAEs for different inhibition intervals
in range (−25, 100]. This range covers 99% of the training and test data. Predictions are (on
average) most accurate in the interval (0, 25] followed by the interval (−25,0]. As the actual
inhibition increases, the error increases as well. This could be explained by the amount of
available training data for each interval: Most data is located in the intervals (0 , 25] (41%)
and (−25, 0] (25%), while each of the other intervals is only covered by around 10% of the
data. In Supplementary Section 3 and Supplementary Figure 3, we provide further analysis
on how the amount of training data for individual drugs/combinations affects prediction
performance.
22
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Data points with high inhibition represent cases where the drug treatment greatly reduced
the amount of viable cells, i.e., cases of effective treatment. Such data are commonly under-
represented in drug screening datasets. 34,52,53 They are, however, of particular interest for
personalized therapy, where the most effective treatment options for a given patient should
be determined.
Thus, for monotherapies, we developed SAURON-RF, a random forest-based model that is
designed to improve predictions of drug-sensitive samples for both classification and regres-
sion. 15,29 To this end, SAURON-RF relies (among other things) on sample-specific weights.
Consequently, we also tried to incorporate sample weights into our models presented here
to increase the importance of the underrepresented intervals. Unfortunately, the sample
weights had only little impact on predictions, especially for the cases with highest inhibition
(see Supplementary Figure 4).
Correlation of Duplicated Entries
As discussed in the Methods section, for the MACCS and PhysChem settings, the same
treatment can be described by two different input representations through switching the
order of the considered drugs (cf. Figure 1). Hence, we decided to include both input
representations into the training and test data of our models. Ideally, predictions for both
input representations should correlate well. Figure 5A shows the correlation of predictions
for the best-performing random forest model trained using MACCS fingerprints. As desired,
both predictions are highly correlated (PCC ≈ 1) and the mean absolute difference between
them is very small (0.8). Figure 5B shows the same analysis for a model where we removed
the duplicated entries from the training data. Even though the correlation is still high (PCC
= 0.82), it decreased strongly, while prediction differences increased notably to 9.12 on
average. The mean PCCs per drug (for monotherapies) and per drug combination are 0.98
and 0.97 for the duplicated training data and decrease to 0.78 and 0.86 for the non-duplicated
training data, respectively. This is also represented in the test error where the model with
23
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
10.69 12.83 18.86
9.02 8.59 8.78
16.74 19.13 16.31
21.25 24.35 31.74
27.05 32.98 47.84
10.29 12.67 17.29
8.75 10.26 9.35
16.4 19.05 14.96
21.02 22.92 28.89
26.58 29.62 43.66
10.8 16.22 22.8
8.37 13.78 15.13
14.36 19.6 12.86
18.69 18.64 27.35
24.67 19.15 48.36
10.73 25.96 22.42
8.3 12.93 15.2
14.09 14.23 12.93
19.05 38.87 27.18
25.81 65.17 48.63
OneHot OneHotTar MACCS PhysChem
(−25,0](0,25](25,50](50,75](75,100]
0
25
50
75
100
125
0
25
50
75
100
125
0
25
50
75
100
125
0
25
50
75
100
125
0
25
50
75
100
125 | absolute − predicted |
Model Random Forest Neural Network Elastic Net
Figure 4: Test set performance for different inhibition ranges. This figure shows the pre-
diction errors (in terms of absolute difference between actual and predicted values) for each
setting (columns) and each investigated ML algorithm (coloring). Each row shows the per-
formance for a different interval of actual relative inhibitions. On top of each boxplot, the
mean absolute error (MAE) is shown.
24
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
duplicated training entries achieved an MAE of 12.39 compared to 14.6 for non-duplicated
entries. Similar trends can also be observed for the PhysChem setting (see Supplementary
Figure 5).
R = 1, p < 2.2e−16
−25
0
25
50
75
100
−25 0 25 50 75 100
Prediction 1
Prediction 2
20k 40k 60k
# Data points
DuplicatedA
R = 0.82, p < 2.2e−16
−25
0
25
50
75
100
−25 0 25 50 75 100
Prediction 1
Prediction 2
2k 4k 6k 8k
# Data points
Not duplicatedB
Figure 5: Correlation of duplicated entries from the test data. This figure shows the cor-
relation between the model predictions for duplicated entries. Duplicated entries refer to
the same drug-drug-cell combination and the same treatment concentrations but can be
represented by two different model inputs through swapping the features of the respective
drugs (cf. Methods and Figure 1) Sub-figure A shows the test predictions when including
duplicated entries into the training data, while Sub-figure B shows the predictions when
training only on non-duplicated entries. In both figures, the black diagonal line represents
the identity and R denotes the Pearson correlation between the predictions.
Reconstruction of Drug Sensitivity Measures
A benefit of predicting concentration-specific inhibition values is that based on the model’s
predictions, dose-response curves and matrices can be reconstructed. These can in turn
be used to compute various measures of drug sensitivity or synergy. Since the focus of this
paper is on sensitivity prediction and Vlot et al. discourage the computation of arbitrary syn-
ergy scores on large-scale data, 13 we reconstructed two measures of drug sensitivity, namely
25
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
our recently published measure called CMax viability for monotherapies, and a modification
of this measure for drug combinations, which we call the combination CMax viability (cf.
Methods). Unlike conventional sensitivity measures like the IC50 or AUC, the (combina-
tion) CMax viability is comparable across drugs 15 and drug combinations. Consequently, it
can be used to prioritize drugs/combinations for a given cell line (i.e., rank them by their
effectiveness), which will be investigated in the next section.
For the computation of monotherapy CMax viabilities, we first used the actual/predicted
monotherapy entries of the test data to generate actual/predicted dose-response-curves (cf.
Methods). An example is shown in Figure 2, where we also highlight how the CMax viability
is derived from the curves. In total, we were able to compute both the actual and predicted
CMax viabilities for 7,352 out of 32,564 cell line-drug combinations. The decreased num-
ber of combinations stems from the fact that CMax concentrations were only available for
77 of the investigated drugs. Figure 6 depicts the prediction errors for the reconstructed
monotherapy CMax viability values. The mean MAE averaged over all drugs is 0.12 and
the mean MSE is 0.04, which is comparable to the error we previously achieved when pre-
dicting CMax viabilities directly using either the SAURON-RF algorithm by Lenhof et al. 29
(MSE = 0.03) or a slightly adjusted version of DeepDR by Chiu et al. 54 (MSE = 0.09). 15 A
baseline error can be obtained from a model that for every treatment concentration predicts
the mean inhibition for each drug obtained from the training data. For such a model, the
CMax viability (i.e., the viability at the CMax concentration) would also be predicted as
this mean. This would result in a baseline MAE of 0.2, which our model improves by 40%.
The overall PCC is 0.58 for the CMax viabilities and 0.41 for the baseline. However, the
drug-specific PCC is only 0.1 (cf. Figure 6B). While a drug-specific baseline PCC cannot
be computed for constant predictions, adding random noise with mean 0 to these constant
predictions results in a baseline PCC of 0. Thus, our predictions improve this baseline but
only slightly. When using our models to reconstruct IC50 values, we observe a similar phe-
nomenon (overall PCC = 0.71, mean PCC per drug = 0.01, cf. Supplementary Figure 6). To
26
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
investigate the reasons for these low drug-specific correlations, we developed and evaluated
different hypotheses, which can be found in the Supplement. Based on our evaluation of
these hypotheses, we conclude that even though prediction errors are relatively small and
comparable to our previous work, the derived measures cannot be used to compare the effect
of a drug monotherapy on different cell lines. For the combination CMax viability (26,946
drug-drug-cell line combinations), we obtained similar results, which are depicted in Figure
6C and D.
0.0
0.1
0.2
0.3
0.4MAE per drug
A
−1.0
−0.5
0.0
0.5
1.0
PCC per drug
B
0.0
0.2
0.4
0.6MAE per drug combination
C
−1.0
−0.5
0.0
0.5
1.0
PCC per drug combination
D
Figure 6: Reconstruction of (combination) CMax viabilities from predicted dose-response
curves/matrices. Sub-figures A and B (red) show the distribution of MAE and PCC per
drug for the reconstruction of CMax viabilities using dose-response curves fit on the test set
monotherapy data. Sub-figures C and D (blue) show the distribution of MAE and PCC per
dug combination for the reconstruction of combination CMax viabilities using dose-response
matrices derived from the test set drug combination data.
Nevertheless, we would like to highlight that such an evaluation of drug-specific correlations
as conducted here is frequently not performed for drug sensitivity and synergy prediction
(cf. Supplementary Table 3, where we compare the investigated settings and analyses for 39
state-of-the-art methods). Thus, similar problems may often go undetected.
Due to the novelty of our prediction approach, there is no method we could directly compare
our findings to. Nevertheless, our analyses presented earlier show that our models are com-
petitive in performance to the approaches by Zheng et al. 5 and Julkunen et al. 18 for making
predictions using known cell lines and drug combination data. Note that both approaches
27
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
do not provide drug-/combination-specific correlations.
For cell-blind evaluations on monotherapy data, we found three related approaches that
provide drug-specific correlations: Our recently published method SAURON-RF achieves a
mean PCC of 0.56 when directly predicting CMax viabilities using drug-specific models.15 In
the same manuscript we also show that an adjusted version of the multi-drug model DeepDR
by Chiu et al. 54 achieves a PCC of 0 for the same task. In comparison, Chawla et al. employ
multi-drug models for the prediction of IC50 values and achieve mean PCCs between ca. 0.18
and 0.5 for different ML algorithms. Lastly, Rahman and Pal achieve mean PCCs between
0.29 and 0.44 when reconstructing AUC values from predicted dose-response curves. While
not directly comparable to our approach, these works underline that at least weak to mod-
erate drug-specific correlations can be achieved (1) for predicting CMax viabilities (2) when
using multi-drug models (3) when deriving sensitivity measures from predicted curves. Yet,
it remains to be investigated further if and how comparable results can be achieved when
combining all three factors and also considering combination therapies, thereby enabling
predictions for arbitrary drugs/combinations and measures, which we aim to achieve here.
Treatment Prioritization
In our final analysis, we investigate how accurately drugs and drug combinations can be
prioritized for a given cell line based on the model predictions: For each cell line in the test
set, we used the computed CMax viabilities for the monotherapy and combination data to
achieve a ranking of drugs and drug combinations from most to least effective. Drug prioriti-
zation is supposed to mimic a personalized treatment scenario with the goal to achieve a list
of most effective treatment suggestions for a given patient. The results are shown in Figure
7, where the first row shows the results for monotherapies only, while the second row shows
the results when combining mono- and combination therapies into one list. The results for
combination therapies only are shown in Supplementary Figure 8.
28
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
For monotherapies, the Spearman correlation coefficient (SCC) between the actual and pre-
dicted rankings was 0.74 (baseline (as defined in the previous section): 0.54). Our predictions
clearly outperform the baseline. Still, the baseline correlation is relatively high, indicating
that the differences in effectiveness between drugs are easier to predict than the differences
between cell lines receiving the same treatment.
While an accurate ranking for the entire list is desirable, one would typically place more
emphasis on the correct identification of the most effective treatments. Thus, we computed
the mean overlap between the first k elements of the actual and predicted rankings. For
monotherapies, the average length of the predicted drug lists is 31.15. The average over-
lap between the top k = 5 and k = 10 actual and predicted most effective drugs is 3.16
(baseline: 2.14) and 7.68 (baseline: 6.55), respectively (results for further k are shown in
Supplementary Figure 9). Furthermore, the median rank of the actually most effective drug
in the predicted ranking is 2.5 (baseline: 8), and the median rank of the drug predicted to
be most effective in the actual list is 3 (baseline 6). The median difference between the true
CMax viabilities of the actual most effective and predicted most effective drugs is only 0.02
(baseline 0.31).
The second row of Figure 7 shows the analogous prioritization results when combining mono-
and combination treatments into one list. The SCC of 0.76 (baseline: 0.62) is comparable to
the results for monotherapies. Since the average list length is much greater when including
drug combinations (838.62), the overlaps at k = 5 (1.26, baseline: 0.68) and k = 10 (3.38,
baseline: 2.09) are lower (cf. also Supplementary Figures 9 and 10). Furthermore, the me-
dian rank of the actually best treatment in the predicted list (27, baseline: 170.5) and of the
predicted best treatment in the actual list (9.5, baseline: 12) decrease. Still, results clearly
improve over the baseline. Furthermore, the median difference in viability between the actu-
ally most effective treatment and the treatment predicted to be most effective remains small
(0.02, baseline 0.03).
29
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
0.0
0.2
0.4
0.6
0.8
1.0SCC
A
0
1
2
3
4
5Overlap act./pred. lists, k = 5
B
0
2
4
6
8
10Overlap act./pred. lists, k = 10
C
0
10
20
30Pred. rank of act. best drug
D
0
10
20
30Act. rank of pred. best drug
E
0.0
0.2
0.4
0.6
0.8
1.0
Difference in CMax viabiliy btw.
act. and pred. best drug
F
0.0
0.2
0.4
0.6
0.8
1.0SCC
G
0
1
2
3
4
5Overlap act./pred. lists, k = 5
H
0
2
4
6
8
10Overlap act./pred. lists, k = 10
I
0
250
500
750
1000Pred. rank of act. best treatment
J
0
10
20
30Act. rank of pred. best treatment
K
0.0
0.2
0.4
0.6
0.8
1.0
Difference in CMax viabiliy btw.
act. and pred. best treatment
L
Figure 7: Treatment prioritization. This figure depicts the test set prioritization results for
mono- and combination therapies. Sub-figures A to F (red) focus on the prioritization of
monotherapies including: (A) the SCC between the actual and predicted rankings for each
cell line, (B)/(C) the intersection size between the 5/10 actual and predicted most effective
treatments, (D) the predicted rank of the actual most effective treatment, (E) the actual
rank of the treatment predicted to be most effective, and (F) the difference between the
actual CMax viabilities for the actual and predicted most effective treatment. Sub-figures G
to L (blue) show the analogous prioritization results when combining mono- and combination
treatments into one list.
30
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Discussion
Administering not only single but multiple drugs in combination is common in cancer treat-
ment. However, while drug response datasets for monotherapy data have been available for
more than a decade, large-scale data sets for combination therapy have only become publicly
available more recently, e.g., the DrugComb database. 4,5 While the DrugComb data have
extensively been studied for the prediction of drug synergy, they are still underused for the
prediction of drug sensitivity, especially with the focus on making personalized treatment
recommendations. For this application case, we found the scores that are widely used for
synergy prediction less suited due to various reasons discussed in this manuscript.
To exploit the available drug combination data for predicting drug responses without rely-
ing on synergy scores, we developed and evaluated several ML algorithms and architectures
that directly predict concentration-specific drug responses in the form of relative inhibitions.
We are convinced that this approach has various benefits for personalized treatment rec-
ommendation: First, our approach allows the reconstruction of dose-response curves and
matrices from the model predictions. From these curves/matrices, various sensitivity or syn-
ergy measures can be reconstructed. The inspection of individual curves/matrices can aid
in validating the underlying assumptions for certain measures. Next, our approach can pre-
dict both mono- and combination therapies. Additionally, our approach allows for making
predictions for unseen cell lines, thereby mimicking the scenario assessing drug responses
for a new patient. Together with our novel sensitivity measure, the (combination) CMax
viability, this framework finally enables the prioritization of both mono- and combination
therapy options for unseen cell lines (patients).
Our evaluations on the DrugComb database show that our models substantially improve
baseline models and show very little variation when predicting the same treatment using
different input representations. Notably, we evaluated our models on unseen cell lines, which
is often neglected in drug sensitivity prediction. 19 Moreover, our models are also competitive
with state-of-the-art approaches when making predictions for known cell lines. Furthermore,
31
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
we achieved strong correlations for treatment prioritization. However, our analyses also re-
veal weaknesses of directly predicting relative inhibitions: While prediction errors for the
reconstruction of drug response measures are competitive with other approaches, the drug-
specific correlations between these measures only slightly improve over a baseline model.
Additionally, we observed increased prediction errors for data samples with high inhibitions,
corresponding to cases of treatment sensitivity. This issue is relatively well-known for clas-
sification but has rarely been discussed or addressed for regression. 29,34
Three main factors can be adjusted to potentially address such challenges, namely the choice
of ML algorithm, the choice and representation of input features, and the used data:
ML algorithm: We investigated neural networks (highly popular for sensitivity and syn-
ergy prediction), random forests, and elastic nets. In our recently published benchmarking,
we found both elastic nets and random forests to outperform neural networks when predict-
ing drug sensitivity. 24 For the prediction of inhibitions, as investigated here, random forests
are superior to the other algorithms. In general, a plethora of further (potentially more
sophisticated) approaches can be used to model the prediction of inhibitions. However, as
discussed in our benchmarking 24 and also by Li et al., 55 more complex approaches are not
necessarily superior to simpler ML algorithms, and careful evaluation is required to ensure
a fair performance comparison.
Input features and representation: For the characterization of cell lines in the model
input, several sources found gene expression to be the most informative omics-type for pre-
dicting drug responses. 54,56,57 However, the inclusion of further omics or a priori knowledge,
e.g., known sensitivity biomarkers or protein interactions, might improve predictions.
Similarly, further drug properties, e.g., Morgan fingerprints58 could be investigated, or graph
neural networks could be employed to represent drugs as molecular graphs. However, the
superiority of molecular graphs over conventional drug fingerprints for sensitivity/synergy
prediction and drug discovery has been questioned. 27,59
32
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Dataset: With 947 cell lines, 265 drugs, and 9,535 drug combinations, the dataset in-
vested here is notably larger compared to other approaches working on drug combination
data. 5,12,14,18,49,60,61 Unfortunately, given the size of the investigated dataset, hardware re-
strictions become a limiting factor for ML. Despite training models on a compute cluster
with machines of 500 gigabytes working memory, we had to reduce our data regarding the
number of considered drugs, features, and methods (cf. Methods).
Generally, a large amount of training data benefits model training and robustness. Yet, if
the dataset is heterogeneous, e.g., due to different data sources, as is the case for DrugComb,
this may decrease performance compared to models built and evaluated on a more homo-
geneous dataset. Even though Zagidullin et al. found the reproducibility between replicates
from different datasets satisfactory in the first release of DrugComb, 4 disagreement between
drug response data from different sources is a well-known problem. 57,62,63 Especially for clin-
ical applications, combining data from different sources (e.g., different hospitals) is essential,
and models should be able to cope with this degree of heterogeneity. To this end, meta- or
transfer-learning methods could be leveraged. 64
Investigating different ML algorithms, input representations, and datasets can potentially
improve the predictive performance. However, especially in a sensitive field such as per-
sonalized medicine, performance alone should not be regarded the sole building block of
model trustworthiness. 65 E.g., to assess the reliability of individual predictions, uncertainty
estimation frameworks like conformal prediction could be applied. 15,66–68 Additionally, in-
corporating interpretability mechanisms 12,65,69 into the model design and evaluation can aid
in identifying drug or cell line properties that impact the predicted response. This could not
only make predictions more comprehensible but also be useful to infer novel mechanisms of
drug sensitivity or synergy.
33
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Data and Software Availability
The drug response data used for our analyses can be downloaded from the DrugComb web-
site (https://drugcomb.org/download) and the DrugComb API (https://api.drugcomb.
org/, cf. Methods). The gene expression data of canncer cell lines can be downloaded from
the GDSC website ( https://www.cancerrxgene.org/downloads/bulk_download). CMax
concentrations for 77 of the investigated drugs can be derived from Liston and Davis. 20
Our code is available at GitHub (https://github.com/unisb-bioinf/Drug_Combination_
Sensitivity_Prediction), where we also provide the SMILES, MACCS fingerprints and
physico-chemical properties derived from RDKit, 21 as well as the one-hot encoded target
molecules of the investigated compounds.
References
(1) Yang, W.; Soares, J.; Greninger, P.; Edelman, E. J.; Lightfoot, H.; Forbes, S.;
Bindal, N.; Beare, D.; Smith, J. A.; Thompson, I. R.; others Genomics of Drug Sen-
sitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer
cells. Nucleic acids research 2012, 41, D955–D961.
(2) Iorio, F.; Knijnenburg, T. A.; Vis, D. J.; Bignell, G. R.; Menden, M. P.; Schubert, M.;
Aben, N.; Gon¸ calves, E.; Barthorpe, S.; Lightfoot, H.; others A landscape of pharma-
cogenomic interactions in cancer. Cell 2016, 166, 740–754.
(3) Mokhtari, R. B.; Homayouni, T. S.; Baluch, N.; Morgatskaya, E.; Kumar, S.; Das, B.;
Yeger, H. Combination therapy in combating cancer. Oncotarget 2017, 8, 38022.
(4) Zagidullin, B.; Aldahdooh, J.; Zheng, S.; Wang, W.; Wang, Y.; Saad, J.; Malyutina, A.;
Jafari, M.; Tanoli, Z.; Pessia, A.; others DrugComb: an integrative cancer drug combi-
nation data portal. Nucleic acids research 2019, 47, W43–W51.
34
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
(5) Zheng, S.; Aldahdooh, J.; Shadbahr, T.; Wang, Y.; Aldahdooh, D.; Bao, J.; Wang, W.;
Tang, J. DrugComb update: a more comprehensive drug sensitivity data repository
and analysis portal. Nucleic acids research 2021, 49, W174–W184.
(6) Menden, M. P.; Wang, D.; Mason, M. J.; Szalai, B.; Bulusu, K. C.; Guan, Y.; Yu, T.;
Kang, J.; Jeon, M.; Wolfinger, R.; others Community assessment to advance compu-
tational prediction of cancer drug combinations in a pharmacogenomic screen. Nature
communications 2019, 10, 2674.
(7) Torkamannia, A.; Omidi, Y.; Ferdousi, R. A review of machine learning approaches for
drug synergy prediction in cancer. Briefings in Bioinformatics 2022, 23, bbac075.
(8) Yadav, B.; Wennerberg, K.; Aittokallio, T.; Tang, J. Searching for drug synergy in
complex dose–response landscapes using an interaction potency model. Computational
and structural biotechnology journal 2015, 13, 504–513.
(9) Loewe, S. The problem of synergism and antagonism of combined drugs. Arzneimittel-
forschung 1953, 3, 285–290.
(10) Bliss, C. I. The toxicity of poisons applied jointly 1. Annals of applied biology 1939,
26, 585–615.
(11) Berenbaum, M. C. What is synergy? Pharmacological reviews 1989, 41, 93–141.
(12) Janizek, J. D.; Celik, S.; Lee, S.-I. Explainable machine learning prediction of synergistic
drug combinations for precision cancer medicine. BioRxiv 2018, 331769.
(13) Vlot, A. H.; Aniceto, N.; Menden, M. P.; Ulrich-Merzenich, G.; Bender, A. Applying
synergy metrics to combination screening data: agreements, disagreements and pitfalls.
Drug discovery today 2019, 24, 2286–2298.
(14) Malyutina, A.; Majumder, M. M.; Wang, W.; Pessia, A.; Heckman, C. A.; Tang, J. Drug
35
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
combination sensitivity scoring facilitates the discovery of synergistic and efficacious
drug combinations in cancer. PLoS computational biology 2019, 15, e1006752.
(15) Lenhof, K.; Eckhart, L.; Rolli, L.-M.; Volkamer, A.; Lenhof, H.-P. Reliable anti-cancer
drug sensitivity prediction and prioritization. Scientific Reports 2024, 14, 12303.
(16) Rahman, R.; Pal, R. Analyzing drug sensitivity prediction based on dose response
curve characteristics. IEEE-EMBS International Conference on Biomedical and Health
Informatics (BHI). 2016; pp 140–143.
(17) Rahman, R.; Dhruba, S. R.; Ghosh, S.; Pal, R. Functional random forest with applica-
tions in dose-response predictions. Scientific reports 2019, 9, 1628.
(18) Julkunen, H.; Cichonska, A.; Gautam, P.; Szedmak, S.; Douat, J.; Pahikkala, T.; Ait-
tokallio, T.; Rousu, J. Leveraging multi-way interactions for systematic prediction of
pre-clinical drug combination effects. Nature communications 2020, 11, 6136.
(19) Codic` e, F.; Pancotti, C.; Rollo, C.; Moreau, Y.; Fariselli, P.; Raimondi, D. The Spec-
ification Game: Rethinking the Evaluation of Drug Response Prediction for Precision
Oncology. bioRxiv 2024, 2024–10.
(20) Liston, D. R.; Davis, M. Clinically Relevant Concentrations of Anticancer Drugs: A
Guide for Nonclinical StudiesGuide to Clinical Exposures of Anticancer Drugs. Clinical
cancer research 2017, 23, 3489–3498.
(21) Landrum, G.; others RDKit: Open-source cheminformatics. version 2023.3.2.
(22) Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL keys
for use in drug discovery.Journal of chemical information and computer sciences 2002,
42, 1273–1280.
(23) Landrum, G.; others RDKit Documentation - rdkit.Chem.Descriptors module. https:
36
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
//www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html, Accessed: 2024-11-
11.
(24) Eckhart, L.; Lenhof, K.; Rolli, L.-M.; Lenhof, H.-P. A comprehensive benchmarking of
machine learning algorithms and dimensionality reduction methods for drug sensitivity
prediction. Briefings in bioinformatics 2024, 25 .
(25) Fudio, S.; Sellers, A.; P´ erez Ramos, L.; Gil-Alberdi, B.; Zeaiter, A.; Urroz, M.; Car-
cas, A.; Lubomirov, R. Anti-cancer drug combinations approved by US FDA from 2011
to 2021: main design features of clinical trials and role of pharmacokinetics. Cancer
Chemotherapy and Pharmacology 2022, 90, 285–299.
(26) Baptista, D.; Ferreira, P. G.; Rocha, M. Deep learning for drug response prediction in
cancer. Briefings in Bioinformatics 2020, 22, 360–379.
(27) An, X.; Chen, X.; Yi, D.; Li, H.; Guan, Y. Representation of molecules for drug response
prediction. Briefings in Bioinformatics 2021, 23, bbab393.
(28) Chen, Y.; Zhang, L. How much can deep learning improve prediction of the responses
to drugs in cancer cell lines? Briefings in bioinformatics 2022, 23, bbab378.
(29) Lenhof, K.; Eckhart, L.; Gerstner, N.; Kehl, T.; Lenhof, H.-P. Simultaneous regres-
sion and classification for drug sensitivity prediction using an advanced random forest
method. Scientific Reports 2022, 12, 13458.
(30) Rahman, R.; Matlock, K.; Ghosh, S.; Pal, R. Heterogeneity aware random forest for
drug sensitivity prediction. Scientific reports 2017, 7, 1–11.
(31) Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: a deep forest model to predict
anti-cancer drug response. Methods 2019, 166, 91–102.
37
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
(32) Oskooei, A.; Manica, M.; Mathis, R.; Mart´ ınez, M. R. Network-based biased tree en-
sembles (NetBiTE) for drug sensitivity prediction and drug sensitivity biomarker iden-
tification in cancer. Scientific reports 2019, 9, 15918.
(33) Fang, Y.; Xu, P.; Yang, J.; Qin, Y. A quantile regression forest based method to predict
drug response and assess prediction reliability. PLoS One 2018, 13, e0205155.
(34) Basu, A.; Mitra, R.; Liu, H.; Schreiber, S. L.; Clemons, P. A. RWEN: response-weighted
elastic net for prediction of chemosensitivity of cancer cell lines. Bioinformatics 2018,
34, 3332–3339.
(35) Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outper-
form deep learning on typical tabular data? Advances in neural information processing
systems 2022, 35, 507–520.
(36) Shwartz-Ziv, R.; Armon, A. Tabular data: Deep learning is not all you need. Informa-
tion Fusion 2022, 81, 84–90.
(37) Borisov, V.; Leemann, T.; Seßler, K.; Haug, J.; Pawelczyk, M.; Kasneci, G. Deep Neural
Networks and Tabular Data: A Survey. IEEE Transactions on Neural Networks and
Learning Systems 2024, 35, 7499–7519.
(38) Smith, A. M.; Walsh, J. R.; Long, J.; Davis, C. B.; Henstock, P.; Hodge, M. R.; Ma-
ciejewski, M.; Mu, X. J.; Ra, S.; Zhao, S.; others Standard machine learning approaches
outperform deep representation learning on phenotype prediction from transcriptomics
data. BMC bioinformatics 2020, 21 .
(39) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blon-
del, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; others Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research 2011, 12, 2825–2830.
38
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
(40) Abadi, M.; others TensorFlow: Large-Scale Machine Learning on Heterogeneous Sys-
tems. 2015; https://www.tensorflow.org/.
(41) Ritz, C.; Baty, F.; Streibig, J. C.; Gerhard, D. Dose-Response Analysis Using R. PLOS
ONE 2015, 10 .
(42) drc: Analysis of Dose-Response Curves. https://cran.r-project.org/web/
packages/drc/drc.pdf, Accessed: 2024-11-11.
(43) Vis, D. J.; Bombardelli, L.; Lightfoot, H.; Iorio, F.; Garnett, M. J.; Wessels, L. F.
Multilevel models improve precision and speed of IC50 estimates. Pharmacogenomics
2016, 17, 691–700.
(44) Wellcome Sanger Institute, GDSC database Resources Download - IC50 Data defini-
tions. https://cog.sanger.ac.uk/cancerrxgene/GDSC_release8.5/GDSC_Fitted_
Data_Description.pdf, 2024; Accessed: 2024-11-11.
(45) Ianevski, A.; Giri, A. K.; Gautam, P.; Kononov, A.; Potdar, S.; Saarela, J.; Wenner-
berg, K.; Aittokallio, T. Prediction of drug combination effects with a minimal set of
experiments. Nature machine intelligence 2019, 1, 568–577.
(46) Borchers, H. W. pracma: Practical Numerical Math Functions. 2022; R package version
2.4.2.
(47) Lederer, S.; Dijkstra, T. M.; Heskes, T. Additive dose response models: explicit formu-
lation and the loewe additivity consistency condition. Frontiers in pharmacology 2018,
9, 31.
(48) Greco, W. R.; Bravo, G.; Parsons, J. C. The search for synergy: a critical review from
a response surface perspective. Pharmacological Reviews 1995, 47, 331–385.
(49) Preuer, K.; Lewis, R. P.; Hochreiter, S.; Bender, A.; Bulusu, K. C.; Klambauer, G.
39
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics
2018, 34, 1538–1546.
(50) Liu, P.; Li, H.; Li, S.; Leung, K.-S. Improving prediction of phenotypic drug response
on cancer cell lines using deep convolutional network. BMC Bioinformatics 2019, 20 .
(51) O’Neil, J.; Benita, Y.; Feldman, I.; Chenard, M.; Roberts, B.; Liu, Y.; Li, J.; Kral, A.;
Lejnine, S.; Loboda, A.; others An unbiased oncology compound screen to identify
novel combination strategies. Molecular cancer therapeutics 2016, 15, 1155–1162.
(52) Knijnenburg, T. A.; Klau, G. W.; Iorio, F.; Garnett, M. J.; McDermott, U.; Shmule-
vich, I.; Wessels, L. F. Logic models to predict continuous outputs based on binary
inputs with an application to personalized cancer therapy. Scientific reports 2016, 6,
1–14.
(53) Lenhof, K.; Gerstner, N.; Kehl, T.; Eckhart, L.; Schneider, L.; Lenhof, H.-P. MERIDA:
a novel Boolean logic-based integer linear program for personalized cancer therapy.
Bioinformatics 2021, 37, 3881–3888.
(54) Chiu, Y.-C.; Chen, H.-I. H.; Zhang, T.; Zhang, S.; Gorthi, A.; Wang, L.-J.; Huang, Y.;
Chen, Y. Predicting drug response of tumors from integrated genomic profiles by deep
neural networks. BMC medical genomics 2019, 12, 143–155.
(55) Li, Y.; Hostallero, D. E.; Emad, A. Interpretable deep learning architectures for im-
proving drug response prediction performance: myth or reality? Bioinformatics 2023,
39, btad390.
(56) Costello, J. C.; Heiser, L. M.; Georgii, E.; G¨ onen, M.; Menden, M. P.; Wang, N. J.;
Bansal, M.; Ammad-Ud-Din, M.; Hintsanen, P.; Khan, S. A.; others A community
effort to assess and improve drug sensitivity prediction algorithms.Nature biotechnology
2014, 32, 1202–1212.
40
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
(57) Jang, I. S.; Neto, E. C.; Guinney, J.; Friend, S. H.; Margolin, A. A. Biocomputing 2014;
World Scientific, 2014; pp 63–74.
(58) Rogers, D.; Hahn, M. Extended-connectivity fingerprints. Journal of chemical informa-
tion and modeling 2010, 50, 742–754.
(59) Jiang, D.; Wu, Z.; Hsieh, C.-Y.; Chen, G.; Liao, B.; Wang, Z.; Shen, C.; Cao, D.;
Wu, J.; Hou, T. Could graph neural networks learn better molecular representation
for drug discovery? A comparison study of descriptor-based and graph-based models.
Journal of cheminformatics 2021, 13, 1–23.
(60) Li, X.; Xu, Y.; Cui, H.; Huang, T.; Wang, D.; Lian, B.; Li, W.; Qin, G.; Chen, L.;
Xie, L. Prediction of synergistic anti-cancer drug combinations based on drug target
network and drug induced gene expression profiles. Artificial intelligence in medicine
2017, 83, 35–43.
(61) Sidorov, P.; Naulaerts, S.; Ariey-Bonnet, J.; Pasquier, E.; Ballester, P. J. Predicting
synergism of cancer drug combinations using NCI-ALMANAC data. Frontiers in chem-
istry 2019, 7, 509.
(62) Haibe-Kains, B.; El-Hachem, N.; Birkbak, N. J.; Jin, A. C.; Beck, A. H.; Aerts, H. J.;
Quackenbush, J. Inconsistency in large pharmacogenomic studies. Nature 2013, 504,
389–393.
(63) Hatzis, C.; Bedard, P. L.; Birkbak, N. J.; Beck, A. H.; Aerts, H. J.; Stern, D. F.; Shi, L.;
Clarke, R.; Quackenbush, J.; Haibe-Kains, B. Enhancing reproducibility in cancer drug
screening: how do we move forward? Cancer research 2014, 74, 4016–4023.
(64) Sharifi-Noghabi, H.; Peng, S.; Zolotareva, O.; Collins, C. C.; Ester, M. AITL: adver-
sarial inductive transfer learning with input and output space adaptation for pharma-
cogenomics. Bioinformatics 2020, 36, i380–i388.
41
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
(65) Lenhof, K.; Eckhart, L.; Rolli, L.-M.; Lenhof, H.-P. Trust me if you can: a survey
on reliability and interpretability of machine learning approaches for drug sensitivity
prediction in cancer. Briefings in Bioinformatics 2024, 25, bbae379.
(66) Angelopoulos, A. N.; Bates, S. A gentle introduction to conformal prediction and
distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511 2021,
(67) Norinder, U.; Carlsson, L.; Boyer, S.; Eklund, M. Introducing conformal prediction
in predictive modeling. A transparent and flexible alternative to applicability domain
determination. Journal of chemical information and modeling 2014, 54, 1596–1603.
(68) Alvarsson, J.; McShane, S. A.; Norinder, U.; Spjuth, O. Predicting with confidence:
using conformal prediction in drug discovery.Journal of Pharmaceutical Sciences 2021,
110, 42–49.
(69) Tang, Y.-C.; Gottlieb, A. Explainable drug sensitivity prediction through cancer path-
way enrichment. Scientific reports 2021, 11, 1–10.
42
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.