{"paper_id":"21168491-c9ae-46ca-8a4a-66534bf6c358","body_text":"How to Predict Effective Drug Combinations -\nMoving beyond Synergy Scores\nLea Eckhart, ∗,† Kerstin Lenhof, †,‡ Lutz Herrmann, † Lisa-Marie Rolli, † and\nHans-Peter Lenhof †\n†Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123\nSaarbr¨ ucken, Germany\n‡Computational Biology Group, Department of Biosystems Science and Engineering, ETH\nZ¨ urich, 4056 Basel, Switzerland\nE-mail: lea.eckhart@uni-saarland.de\nAbstract\nTo improve our understanding of multi-drug therapies, cancer cell line panels screened\nwith drug combinations are frequently studied using machine learning (ML). ML mod-\nels trained on such data typically focus on predicting synergy scores, which support\ndrug development and repurposing efforts but have limitations when deriving personal-\nized treatment recommendations. To simulate a more realistic personalized treatment\nscenario, we pioneer ML models that predict the relative growth inhibition (instead of\nsynergy scores), and that can be applied to previously unseen cell lines. Our approach\nis highly flexible: it enables the reconstruction of dose-response curves and matrices,\nas well as various measures of drug sensitivity (and synergy) from model predictions,\nwhich can finally even be used to derive cell line-specific prioritizations of both mono-\nand combination therapies.\n1\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nIntroduction\nTailoring drug treatments to the individual patient is a major goal of cancer research. Due\nto ethical concerns and limited availability of tumor material, relationships between molec-\nular properties of cancer cells and their drug responses are generally not studied on humans\ndirectly, but instead using model systems, most prominently, cell lines. For monotherapy,\nlarge cell line panels such as theGenomics of Drug Sensitivity in Cancer (GDSC) database1,2\nhave been available for more than a decade, providing both molecular characterizations and\ndrug screening data of cancer cell lines. However, combination therapies are frequently pre-\nferred over monotherapies for cancer treatment due to increased efficacy and a decreased\nrisk of treatment resistance. 3 More recently, large data resources have also become available\nfor drug combination screens: In 2019, the DrugComb data portal was introduced, 4,5 which\naccumulates harmonized results of drug screens from different sources. To date, a total of\n37 datasets are available in DrugComb. 5\nDatabases like the GDSC or DrugComb enable the systematic evaluation of the effect that\ndifferent drugs have on different types of cancer cells. Thus, two main use cases that can be\naddressed using this data include (1) making personalized treatment recommendations for\na given patient (cell line) and (2) finding promising drugs or drug combinations that should\nbe further explored, e.g., for drug repurposing or the development of novel (combination)\ntherapies. Due to the complexity and high dimensionality of the data, machine learning\n(ML) is commonly used to address these tasks.\nML models trained on monotherapy drug responses are usually suitable for both use cases,\n(1) and (2), since they directly predict measures of drug effectiveness, such as the IC50 or\nAUC value. In comparison, methods using drug combination data typically predict so-called\ndrug synergy scores, 6,7 which are usually suited for the second task but less applicable for\nthe first one as we briefly outline in the following: These scores quantify the synergistic or\nantagonistic potential of two compounds for a given cell line by comparing their combined\neffect on cell growth to the expected effect obtained from a baseline model that assumes\n2\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nno synergism or antagonism. 8 Prominent examples are the Loewe, 9 Bliss, 10 HSA, 11 and\nZIP 8 synergy score. For each of these scores, values > 0 indicate synergism, and values\n< 0 indicate antagonism, making it possible to classify drug combinations based on their\nsynergy score. Undoubtedly, estimating the synergistic potential of compound combinations\nthrough synergy scores can be valuable for the identification of promising combination treat-\nments to undergo more detailed screening, the development of novel compounds, or drug\nrepurposing. However, even though synergy score prediction is sometimes motivated as a\nstep toward achieving personalized treatment recommendations, 6,12 we believe that synergy\nscores have shortcomings that debilitate their usefulness for this application. Briefly sum-\nmarized, synergy scores are based on various (in part very strong) model assumptions, some\nof which differ fundamentally between scores. 8,13 Additionally, disagreement between scores\nwas observed by Vlot et al. 13 and Yadav et al., 8 weakening their informative value. Two\nfactors that are especially relevant for personalized treatment recommendations are that (1)\nthe scores are aggregated over multiple drug concentrations, which do not necessarily cor-\nrespond well to clinically feasible concentration ranges 13 (cf. Supplementary Figure 1) and\n(2) a high synergy between two compounds does not guarantee a high effectiveness of the\ncombination treatment. 5\nThus, instead of relying on synergy scores, we advocate exploring other strategies to estimate\nthe effectiveness of combination treatments. For the prediction of drug combination sensi-\ntivity, several models that do not rely on synergy scores have been published: Malyutina\net al. 14 and Zagidullin et al. 4 trained cell-line specific models that predict CSS ( Combina-\ntion Sensitivity Score ) values, a sensitivity measure for two-drug combination therapies. 14\nHowever, the CSS score is an aggregated measure of sensitivity based on drug-specific AUC\nvalues. Thus, like the AUC for monotherapies, 15 it depends strongly on the investigated\nconcentration ranges and is not comparable across compounds.\nInstead of focusing on one specific measure of drug sensitivity, an alternative approach is to\ndirectly predict the response (in terms of relative inhibition/viability) of cell lines at various\n3\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\ntreatment concentrations. Thereby, we could, moreover, reconstruct various drug sensitivity\nor synergy measures from the model predictions. For monotherapy, this approach has al-\nready been explored by Rahman and Pal et al. 16,17 For combination therapy, Zheng et al. 5\ntrained a CatBoost model that predicts the relative inhibition of two drugs at given concen-\ntrations for a given cell line. Similarly, comboFM by Julkunen et al. 18 employs higher-order\nfactorization machines (HOFMs) to predict relative cell growth.\nA drawback of all combination prediction approaches mentioned above is that they are not\napplicable to make predictions for previously unseen cell lines: Malyutina et al. 14 and Za-\ngidullin et al. 4 trained cell line-specific models, while Zheng et al. 5 and Julkunen et al. 18\nemploy a one-hot encoding of cell lines and drugs in the model input such that both have\nto be known during training already. Thus, these models are difficult to apply for personal-\nized treatment recommendations, where predictions should be made for a previously unseen\npatient (cell line). According to Codic` e et al., this setting is frequently overlooked or insuf-\nficiently evaluated in ML-based drug response prediction. 19\nIn this manuscript, we present ML models for the prediction of drug combination sensitivity\nthat do not rely on synergy scores and are able to make predictions for previously unseen cell\nlines, thereby mimicking a personalized treatment scenario. Instead of predicting an aggre-\ngated measure of treatment response, our models predict the relative inhibition at arbitrary\ntreatment concentrations provided in the model input. Consequently, various measures of\ndrug sensitivity or synergy, including dose-response curves and matrices, as well as IC50\nvalues or synergy scores can be reconstructed from the model predictions.\nWe investigate not only different ML algorithms (neural networks, random forest, elastic\nnet) but also analyze the benefit of including different drug characterizations (MACCS fin-\ngerprints, physico-chemical properties), as well as information on drug targets. The different\nmodel architectures provide different benefits, e.g., the ability to make predictions not just\nfor two-drug combinations but also for monotherapies and combination treatments consisting\nof more than two drugs. Some of the investigated architectures also enable predictions to be\n4\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nmade for any previously unseen drug, given that the features of the drug (e.g., MACCS fin-\ngerprint) are known. Our results show that random forests outperform the other algorithms\nin all investigated settings. Additionally, we analyze which inhibition intervals are predicted\nmost accurately and investigate the reconstruction of mono- and combination sensitivity\nmeasures from our model predictions.\nLastly, using our recently published drug response measure called CMax viability,15 we show-\ncase how our models can be applied to perform drug prioritization for mono- and combination\ntherapies based on clinically feasible treatment concentrations. Drug prioritization, i.e., the\nranking of drugs by their predicted effectiveness for a given cell line (patient) is a major goal\nin personalized medicine: it exceeds the mere prediction of sensitivity measures and moves\ntoward deriving actual treatment recommendations.\nMaterials and Data Processing\nDrug response data\nDrug screening data for our analyses was obtained from the DrugComb database Version 1.5.\nMore specifically, we employed the DrugComb API (https://api.drugcomb.org/) to down-\nload the list of all cell lines and their corresponding COSMIC IDs, the full list of drugs with\ntheir SMILE encodings and their target molecules, and the full dose-response matrices.\nTo assign the respective cell line and drug information to each dose-response matrix, we\ndownloaded the core database from https://drugcomb.org/download, which provides a unique\nidentifier for each dose-response experiment. Consequently, each database entry can be writ-\nten as:\n(cell\nline, drug row, drug col, conc row, conc col, inhibition)\n5\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nHere, cell line is the COSMIC ID of the investigated cell line, and drug row and drug col\nare the names of the tested drugs. The entries conc row and conc col are the micromolar\nconcentrations of the tested compounds. For monotherapies, one of the drug names is set\nto N U LL and the corresponding concentration is set to 0. Finally, inhibition denotes the\nrelative inhibition measured after administration of the denoted drug concentration(s) (see\nSupplement for further information). Relative inhibitions > 0 denote reduced cell growth\nthrough the drug treatment, while inhibitions < 0 indicate increased growth.\nWe removed the following entries from the dataset:\n• poor quality entries as defined by the authors of DrugComb 5 with inhibition < −200\nor inhibition > 200\n• entries where the concentration of all tested drugs is 0 (conc row = conc col = 0)\n• entries, where the corresponding cell line had no COSMIC ID or no gene expression\ndata provided in the GDSC database\nAdditionally, we converted entries where drug\nrow and drug col denote the same drug into\nmonotherapies by summing the respective treatment concentrations and setting drug col to\nNULL:\n(cell line, drug row, N U LL, conc row + conc col, 0, inhibition) .\nCases where two different drugs are provided but only one has a concentration > 0 were\nmodified to denote a monotherapy by replacing the drug with concentration 0 with N U LL.\nAfterwards, all replicates involving the same cell line, the same drug(s), and same concentra-\ntion(s) were averaged. Lastly, we log1p-normalized (log1p(x) = log(x+1)) the concentration\nvalues in conc\nrow and conc col.\nTo keep the dataset size manageable, we only considered entries involving those 265 drugs\n6\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n(cf. Supplementary Table 1) for which at least 10,000 entries are provided after performing\nall the steps described above (cf. Discussion). Note that after this reduction still more than\n10,000 entries remained for each of the drugs. In total, the final dataset consists of 5,291,424\nentries covering 947 cell lines, 265 drugs, and 9,535 drug combinations.\nAdditionally, the CMax concentrations for 77 of the investigated drugs were obtained from\nListon and Davis.20 The CMax value denotes the peak plasma concentration after administer-\ning the highest clinically recommended dose of a drug. 20 In a recently published manuscript,\nwe employed CMax to derive a novel drug sensitivity measure called the CMax viability,\nwhich will be described below. 15 We also use this measure to perform drug prioritization in\nthe Results section.\nDrug Properties\nFor the representation of drugs in the inputs of our models, we investigated four different\nsettings, which will be discussed below (cf. also Figure 1). Using the SMILES drug repre-\nsentations provided by DrugComb, we used RDKit version 2023.3.2 21 to calculate two types\nof drug features:\n• binary MACCS fingerprints 22 of length 166\n• 209 physico-chemical drug properties using the function CalcMolDescriptors from the\nrdkit.Chem.Descriptors module 23\nWe removed all properties that showed no variation across the investigated 265 drugs, re-\nsulting in MACCS fingerprints of length 162 and 182 physico-chemical properties.\nAdditionally, 735 drug target molecules for the investigated drugs were obtained from Drug-\nComb.\n7\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nGene Expression Data\nNormalized gene expression data of 17,419 genes (Affymetrix Human Genome U219 Ar-\nray) was obtained from the GDSC database Release 8.3 (https://www.cancerrxgene.org/\ndownloads/bulk_download).\nMethods\nModel Inputs and Outputs\nWe train multi-drug models that predict the relative inhibition for a given cell line being\ntreated with given concentrations of one or more drug(s). The model inputs comprise cell\nline features based on gene expression, a representation of the applied drugs, and the corre-\nsponding drug concentrations. For the representation of drugs, we investigated four different\nsettings, which are depicted in Figure 1 and will be described below.\nTo characterize cell lines in the model input, we performed a principal component analysis\n(PCA) on the gene expression values of the training cell lines and used the first 300 prin-\ncipal components (PCs) as cell line features. This dimension reduction method performed\nwell in our recently published benchmarking of drug sensitivity prediction methods. 24 The\nfeature coefficients computed on the training data were used to project the test cell lines\ninto the same 300-dimensional space. To perform the cross-validation discussed below, we\nre-computed the PCs based on the respective training folds.\nIn addition to the cell line features, we investigated four different settings for the encoding\nof drugs in the model input:\nSetting 1 (OneHot):\nIn this setting, no drug properties are included. Instead, a 265-dimensional encoding of\ndrugs is used. Each feature corresponds to one of the 265 drugs in our dataset. If a drug is\n8\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\npart of the current entry, its feature is set to the corresponding log1p-normalized treatment\nconcentration, otherwise it is set to 0.\nSetting 2 (OneHotTar):\nThis setting uses the same concentration encoding as Setting 1 but additionally includes\n290 drug target features. More precisely, we used the drug target annotations provided by\nDrugComb and included all molecules as targets that were targeted by at least five of the\ndrugs in our dataset, resulting in a total of 290 target features. Each feature is then set to\nthe number of drugs in the current entry that target the corresponding molecule (0, 1, or\n2): Since DrugComb provides only data on monotherapies and two-drug combinations, the\nmaximum value a target feature can have is 2, if it is targeted by both drugs in a two-drug\ncombination entry. Note also that one drug can target more than one molecule.\nSetting 3 (MACCS):\nIn this setting, each drug is represented by a 162-dimensional binary molecular access system\n(MACCS) fingerprint.22 Each position of the fingerprint corresponds to a molecular substruc-\nture, e.g., a functional group that may be present in a drug molecule. The respective bit is\nset to 1 if the corresponding substructure is present in the drug molecule at least once, and\n0, otherwise. Additionally, one input feature for each drug is needed to denote its treatment\nconcentration. Consequently, this setting uses a total of 2 · 162 + 2 · 1 = 326 drug features.\nTo encode monotherapies, one of the fingerprints and the corresponding concentration are\nset to 0.\nSetting 4 (PhysChem):\nThis setting is similar to Setting 3 but replaces each MACCS fingerprint with 182 numer-\nical physico-chemical descriptors that denote different properties of the respective drugs,\nsuch as the molecular weight, number of valence electrons, or the logP value that measures\n9\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nlipophilicity. Consequently, this setting uses a total of 2 · 182 + 2· 1 = 366 drug features. To\ndenote monotherapies, one set of properties and the corresponding concentration are set to 0.\nDepending on the desired application, the different settings provide different benefits: Set-\ntings 3 and 4 allow making predictions for arbitrary drug molecules given that their MACCS\nfingerprint or physico-chemical properties are known. Consequently, the resulting models\ncan be used to make predictions for previously unseen, e.g., newly developed compounds.\nIn contrast, models derived from Setting 1 and 2 are limited to those 265 drugs that were\npresent in our dataset and hence encoded in the input. However, these models can not only\nmake predictions for single drugs and two-drug combinations but even for treatments using\nthree or more drugs simultaneously. While three-drug combination therapies have already\nbeen approved for cancer treatment by the United States Food and Drug Administration\n(FDA),25 DrugComb does not provide such data.\nMachine Learning Algorithms\nWe investigate the predictive performance of three ML algorithms: neural networks ran-\ndom forests, and elastic net. We chose these models, since neural networks and tree-based\nmethods are commonly used for synergy prediction. 7 Furthermore, neural networks are also\npopular for drug sensitivity prediction, 26–28 while random forest and elastic nets are used\nless frequently for this task.29–34 In our recently published benchmarking, we found, however,\nthat tree-based methods and elastic nets frequently outperform neural networks in predicting\ndrug responses. 24 In line with our findings, several studies found that deep learning does not\nimprove over conventional ML algorithms for making predictions on tabular data, 35–37 or to\ngenerate feature representations for model inputs. 24,38\nAll prediction models were implemented in Python 3.11. Random forests and elastic net\nmodels were implemented using scikit-learn Version 1.5.0, 39 while neural networks were im-\nplemented using tensorflow Version 2.16.1 40 with GPU support. The hyperparameters for\n10\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nFigure 1: Prediction pipeline. This figure summarizes our pipeline for the prediction of\nrelative inhibitions. The large blue box depicts the different types of input features and\nrepresentations we investigated. The grey box at the top right lists our data resources.\nThe yellow box shows the different ML algorithms we used. The green box at the bottom\ndepicts the model output, i.e., the relative inhibition for a given cell-drug-drug combination\nat defined treatment concentrations. Lastly, the purple box shows potential downstream\nanalyses that can be performed based on the model predictions.\n11\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\neach algorithm are provided in Table 1.\nTable 1: Hyperparameters of the investigated ML algorithms. This table denotes the tuned\nhyperparameters for each ML algorithm. For hyperparameters not stated explicitly, the\ndefault parameters as provided the respective Python package were employed. Explicitly\ntuned hyperparameters are marked in bold. For the PhysChem setting (i.e., the setting with\nthe largest data matrix), we were unable to train neural networks with the ELU activation or\nlearning rates of 0.1 due to insufficient memory for resource allocation even when decreasing\nthe batch size.\nModel Parameter Value(s)\nElastic net alpha 0.01, 0.1, 1, 10, 100\nl1 ratio 0, 0.25, 0.5, 0.75, 1\nRandom forest max depth 100, 1000000\nmax features 25, 50, 100, 250\nmin samples leaf 2, 20, 100, 1000\nn estimators 500\nNeural network loss mean squared error\nactivation tanh, ELU (none in last layer)\noptimizer Adam\nlearning\nrate 0.0001, 0.001, 0.1\nhidden layers 1,2,3,4,5\nsize of hidden layers equally spaced btw. in-/output size\ndropout 0.1, 0.3\nbatch\nsize 256\nbias initializer 0.01\nkernel initializer glorot uniform for tanh,\nhe normal for ELU activation\nkernel regularizer l2\nepochs 300\nvalidation\nsplit 0.2\nearly stopping yes\npatience 15\nrestore\nbest weights True\nModel Training and Testing\nAfter filtering and processing the data as described above, we randomly divided the remain-\ning cell lines into a training set (80% of cell lines) and a test set (20%). Since multiple data\nentries exist for each cell line (screening of different drugs/drug combinations at different\n12\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nconcentrations), the final training data consists of all entries involving a cell line from the\ntraining set (3,741,209 entries). The final test data contains all remaining entries (1,550,215),\ni.e., all entries involving a cell line from the test set. This splitting ensures that the test per-\nformance is always evaluated on cell lines that were unseen during model training, thereby\nmimicking the scenario of making predictions for a previously unseen patient. In contrast,\nthe same drugs and drug-combinations can occur in both the training and test data.\nOn the training data, we performed a 5-fold cross validation (CV) to determine the best-\nperforming hyperparameters of each ML model (see Table 1). The CV folds were generated\nby randomly dividing the training cell lines into five disjoint folds and assigning all entries\ninvolving a certain cell line to the corresponding fold. Since the number of available entries\nper cell line differs, the size of CV folds varies slightly between 644,308 and 857,361 entries.\nFor the hyperparameter combination with smallest mean absolute error (MAE) averaged\nacross all five folds, one final model is trained on the complete training data and its perfor-\nmance is evaluated on the test data.\nFor the models using one-hot encodings (Setting 1 and Setting 2), each drug has a designated\ninput node. This is not the case for the models using drug features (Setting 3 and Setting\n4), where swapping the features and concentration of the first drug with those of the second\ndrug represents the same treatment but results in changes in the input representation (cf.\ninput visualization in Figure 1). However, the model output should not depend on the order\nof the drugs in the input, i.e., it should not depend on whether drug features of a drug A in\nthe input vector are located in front of or behind those of a drug B. Therefore, each original\nsample is included twice in the datasets for Settings 3 and 4. These duplicate samples differ\nonly in the order of the drug features and concentrations: once in the order A-B, once in the\norder B-A. In the Results section, we investigate the impact on model performance when\nmodels are trained using the duplicated versus non-duplicated data. The test performance\nis always evaluated on the duplicated entries.\n13\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nFitting of Dose-Response Curves and Computation of Sensitivity\nMeasures\nUsing the relative inhibitions predicted by our models, it is possible to reconstruct dose-\nresponse curves for monotherapies and dose-response matrices for combination therapies (cf.\nFigure 2). Based on these curves/matrices, various measures of drug response can be derived.\nTo this end, we first converted the (actual and predicted) relative inhibitions into relative\nviabilities by subtracting the relative inhibitions from 100 and dividing the result by 100.\nAdditionally, we clamped viabilities to [0, 1]. Note that we report relative viabilities in range\n[0, 1] rather than range [0, 100] to keep the results consistent and comparable to our previous\nstudy. 15\nTo perform the curve-fitting for monotherapies, we employed a three-parametric logistic\nfunction from the drc R-package: 41,42\nf(x) = c + 1 − c\n1 + exp(b · (log(x) − log(e))) (1)\nHere, f(x) denotes the estimated relative viability of the considered cell line at drug concen-\ntration x, c denotes the curve asymptote for increasing concentrations, b denotes the curve’s\nslope, and e denotes the concentration at the inflection point. We only fit curves when at\nleast five dose-response points were available and we discarded all curves where the root\nmean squared error (RMSE) between the actual viabilities and those derived from the curve\nwas greater than 0.3, a threshold that was previously employed for the data generation in\nthe GDSC database. 43,44 From the fitted curves, we derived two measures of monotherapy\ndrug responses, namely IC50 values and CMax viabilities. The CMax viability is a novel\ndrug sensitivity measure which we recently published. 15 It is defined as the relative viability\nat the CMax concentration of the respective drug. The CMax concentration denotes the\npeak plasma concentration of a drug after administering the highest clinically recommended\ndose. 20 Thus, the CMax viability is designed to estimate the maximal effect a treatment can\n14\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nrealistically achieve. For the computation of CMax viabilities, we evaluated the function of\nthe fitted curve at the drug’s CMax concentration (cf. Figure 2A). For the computation of\nIC50 values, we intersected the dose-response curves with a horizontal line with y-intercept\n0.5.\nFigure 2: Exemplary dose-response curve and matrix. Sub-figure A depicts a dose-response\ncurve (blue) for the monotherapy treatment of a cancer cell line (COSMIC ID 683667) with\nthe drug Vorinostat. The fit is based on nine dose-response points (black). The yellow di-\namond marks the CMax concentration of Vorinostat (1.2µM) , and the red star marks the\ncorresponding CMax viability (0.41) derived from the curve (cf. Methods). Sub-figure B\ndepicts a dose-response matrix for the combination treatment of cell line 909755 with Dasa-\ntinib and Lapatinib, where the x- and y-axes denote the respective treatment concentrations.\nThe yellow and blue diamonds approximately mark the CMax concentration of both drugs,\nwhich are used to limit the considered concentration combinations for the computation of\nthe combination CMax viability (cf. Methods).\nFor combination therapies, we developed a variation of the CMax viability we call the com-\nbination CMax viability that can be derived from an actual/predicted dose-response matrix\n(cf. Figure 2B). Our initial idea was to interpolate the values in the dose-response matrix to\nderive the relative viability when administering the CMax concentration of both combina-\ntion drugs simultaneously. However, two synergistic drugs may have certain concentration\nwindows with particularly high synergy/effectiveness.45 Thus, it is possible that the smallest\nviability is reached at a concentration combination smaller than the CMax concentrations.\n(Note that this should not happen for the dose-response curves we employed to compute the\n15\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nCMax viability for monotherapies since these curves are monotonically decreasing.) Con-\nsequently, we considered the entire concentration range below the respective CMax values\nto compute our sensitivity measure. Conceptually, we want to derive the smallest viabil-\nity within the area defined by the two concentration windows of the drugs limited at their\nrespective CMax concentration. To compute the combination CMax viability, we linearly\ndivided the concentration interval from 0 to the CMax for each drug into 100 equally spaced\nconcentrations, each, resulting in 10,000 concentration combinations. For each combination,\nwe estimated its relative viability through bilinear interpolation (R package pracma 46) from\nthe full dose-response matrix. Finally, we define the minimum of all 10,000 values as the\ncombination CMax viability.\nAs the CMax denotes the maximal feasible treatment concentration for a drug monotherapy,\nit may not be feasible to administer the CMax concentration of two drugs in combination.\nYet, we believe that the respective CMax concentrations are a reasonable upper bound to\nconsider for the computation of combination CMax viabilities. Note also that administering\nthe CMax concentration for monotherapies might likewise not be feasible in all cases. Fur-\nthermore, the presented approach can theoretically be applied to any desired concentrations\nother than CMax.\nResults\nChallenges of Using Synergy Scores for Personalized Treatment\nRecommendations\nThe idea behind synergy scores is to measure the synergistic or antagonistic potential of\ntwo compounds for a given cell line by comparing their experimentally measured combined\neffect on cell survival to the expected effect obtained from a baseline model that assumes no\nsynergism or antagonism. 8 The baseline model is derived from monotherapy data of both\ncompounds. It estimates their combined effect at the concentrations that were tested in the\n16\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nactual combination screening. The baseline and actually measured treatment responses are\nthen subtracted from each other and the result is averaged over all concentration combi-\nnations to obtain a final synergy score. 13 Prominent examples of synergy scores that differ\nsolely in their computation of the baseline are the Loewe, 9 Bliss, 10 HSA, 11 and ZIP 8 scores.\nFor each of these scores, values > 0 indicate synergism and values < 0 indicate antagonism.\nA detailed description of the scores can be found in the Supplement.\nUndoubtedly, estimating the synergistic potential of compound combinations through syn-\nergy scores can be valuable for the identification of promising combination treatments to\nundergo more detailed screening, the development of novel compounds, or drug repurposing.\nHowever, there are known limitations of synergy scores, which have been summarized and\nextensively discussed in a review by Vlot et al.,13 where they also performed several analyses\nusing a large-scale drug combination dataset. Their findings can be briefly summarized as\nfollows: Firstly, each synergy score is based on certain model assumptions, some of which\nmight frequently be violated by real word data. 47,48 For example, both the Loewe and ZIP\nscore require fitting dose-response curves of a certain shape to the monotherapy data. The\nLoewe score furthermore requires both drugs to have the same minimum and maximum\neffect as well as a constant potency ratio. 13 In comparison, the Bliss score relies on the as-\nsumption that the combined effect of two non-interacting drugs is statistically independent.\nEven though pharmacological independence is not necessarily required to achieve statistical\nindependence, 13 it is most likely that statistical independence is caused by pharmacologi-\ncal independence. However, due to crosstalk between biological processes affected by either\ndrug, achieving true pharmacological independence may be unlikely. 48\nThese examples also highlight that the assumptions between scores differ fundamentally. In\ntheir data analysis, Vlot et al. observed only a moderate to low correlation between the four\ndifferent scores calculated on the same data, which might be explained by the different model\nassumptions. They also found that value ranges between scores are not comparable: the HSA\nand ZIP scores generally result in higher values than Loewe and Bliss. Additionally, Vlot\n17\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\net al. observed that synergy scores are relatively difficult to reproduce between replicated\nexperiments, even though the measured drug responses used to derive the scores correlated\nwell between replicates. Furthermore, while misclassifications (synergism vs. antagonism)\nbetween scores were rare, several scenarios were identified where scores are likely to disagree,\nwhich could typically be retraced to a violation of model assumptions.\nBased on these findings, Vlot et al. advocate against the automated analysis of large-scale\ndata using individual synergy scores. Instead, they recommend a careful investigation of the\nrespective dose-response curves to then select an appropriate score. When training mod-\nels that only predict synergy scores (instead of concentration-specific inhibitions/viabilities),\nthis is hardly possible since we are unable to assess the underlying dose-response relationship\nto validate model assumptions.\nWe agree with these conclusions by Vlot et al. but would like to emphasize further points\nthat make synergy scores difficult to use and interpret, especially for personalized treatment\nrecommendations: A methodological criticism of synergy scores is that they are an aggre-\ngated measure over concentration ranges. The choice of meaningful concentration ranges is\nespecially challenging for experimental drugs but crucial to draw meaningful conclusions for\npersonalized medicine. We have previously shown that the screened concentration ranges in\nthe GDSC database do not correspond well to clinically feasible treatment concentrations 24\nand similar observations can also be made for the DrugComb database (cf. Supplementary\nFigure 1). Another major factor that hampers the use of synergy scores for treatment\nrecommendation is that a high synergy between two compounds solely implies that the com-\nbination treatment is more effective than the respective monotherapies. However, it does\nnot guarantee an overall high effectiveness (in terms of large relative inhibition) of the com-\nbination treatment. 5 It follows that synergy scores alone should not be used to compare\nthe suitability of different treatment options for a given patient (cell line). In particular,\nsynergy scores cannot be used to compare the effectiveness of different combination treat-\nments. Furthermore, it is not possible to compare the effectiveness of combination therapies\n18\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nto monotherapies involving different drugs.\nBased on these drawbacks of synergy scores in general and for treatment recommendation\nin particular, our analyses presented in the following focus on sensitivity prediction instead.\nCompared to the number of synergy prediction methods, 6,7 sensitivity prediction of drug\ncombinations is understudied, especially when the goal is to make predictions for previously\nunseen cell lines as we have outlined in the Introduction section.\nIn the following, we analyze how accurately drug responses (here: relative inhibitions) can\nbe predicted for combination therapies. We compare different ML algorithms and model\ninputs and investigate the reconstruction of sensitivity measures from the model predictions.\nAdditionally, we show how both mono- and combination therapies can be ranked by their\neffectiveness for a given cell line using our recently developed sensitivity measure: the CMax\nviability. 15\nOverall Performance Comparison\nFigure 3 shows the performance of all investigated models in terms of test MAE (mean abso-\nlute error). The optimized hyperparameters for each model are provided in Supplementary\nTable 2. The first row depicts the results for the entire test data, while the second and third\nrow focus on the data subsets representing mono- and combination therapies, respectively.\nAcross all four settings, random forests resulted in the lowest error, followed by neural net-\nworks, while elastic net had the worst performance. An exception is the PhysChem setting,\nwhere neural networks were outperformed by elastic net.\nThe overall smallest test error (MAE 12.14) was achieved using a random forest with MACCS\nfingerprints as input. Additionally, even the worst performing random forest model (One-\nHot, MAE of 13.04) still outperforms the best neural network (OneHot, MAE 14.08) and\nelastic net (OneHotTar, MAE of 16.46) models. Thus, the choice of ML algorithm seems to\nhave a stronger impact on performance than the choice of input features, even though the\n19\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\ndifferent input representations differ considerably (cf. Methods and Figure 1). Notably, the\naddition of drug targets slightly improves predictions for random forest and elastic net but\nhas the opposite effect for neural networks.\nTo contextualize the obtained errors, we compare them to two baseline models: A simple\nbaseline model that always predicts the mean of the training data has a test MAE of 24.2. A\nmore advanced baseline that always predicts the mean inhibition per drug for monotherapies\nand the mean inhibition of the combination for combination therapies has a test MAE of\n19.74. Consequently, our best model improves these baselines by 50% and 37%, respectively.\nWhile all of the random forests models outperform the baseline, some elastic nets and neural\nnetworks are not superior to the baselines.\nWhen investigating mono- and combination therapies separately (cf. row 2 and 3 of Figure\n3), the same overall trends can be observed, with the random forest model with MACCS\nfeatures again having the smallest error. Generally, both types of therapies can be predicted\nsimilarly well, even though the training data contains slightly more combination (60%) than\nmonotherapy data (40%).\nBesides the MAE, we also investigated the Pearson correlation (PCC) between the actual\nand predicted inhibitions. The overall PCC for the best-performing model was 0.8 (0.77 and\n0.82 for mono- and combination therapies, respectively). However, computing correlations\nacross the entire data artificially increases the PCC: since some drugs/combinations generally\nhave lower/higher inhibitions than others, even mean predictions for each drug/combination\n(requiring no ML at all) would result in a correlation above 0.19 Thus, we computed the mean\nper-drug PCC for monotherapies (0.58) and the mean per-combination PCC for combination\ntherapies (0.56) (see also Supplementary Figure 2). These values have a similar magnitude\nto what we previously observed for monotherapy sensitivity prediction. 15\nNote that Zheng et al. 5 and Julkunen et al. 18 also provide overall correlations and errors\nfor the prediction of relative inhibition/growth (cf. Introduction). However, their results are\nnot comparable to ours since we investigate the performance for unknown cell lines, which\n20\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n13.04 14.08 16.99\n12.95 14.29 17.89\n13.09 13.98 16.56\n12.77 14.44 16.46\n12.91 14.49 17.09\n12.7 14.41 16.17\n12.14 15.7 20.56\n12.76 14.69 18.83\n11.85 16.18 21.38\n12.2 21.91 20.5\n12.96 22.37 19.02\n11.84 21.69 21.19\nOneHot OneHotTar MACCS PhysChem\nComplete dataMonotherapyCombin. therapy\n0\n20\n40\n60\n0\n20\n40\n60\n0\n20\n40\n60 | absolute − predicted |\nModel Random Forest Neural Network Elastic Net\nFigure 3: Test set performance. This figure shows the prediction errors (in terms of absolute\ndifference between actual and predicted values) for each setting (columns) and each investi-\ngated ML algorithm (coloring). The first row shows the results for the entire test dataset,\nwhile the second and third row show the results for the data subsets corresponding to mono-\nand combination therapies, respectively. On top of each boxplot, the mean absolute error\n(MAE) is shown.\n21\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\ncannot be evaluated using the other two methods. It is known that the cell-line blind scenario\nincreases errors considerably compared to making predictions for known cell lines. 49,50 To,\nnevertheless, assess how our random forest MACCS model would perform for known cell\nlines, we retrained the model using a random split of of the available data into a training\n(80%) and test set (20%). This split does not guarantee that cell lines in the test set were\nunseen during model training. Note that we still assured that duplicated entries denoting the\nsame treatment are either exclusively contained in the training or the test set (cf. Methods).\nWith a PCC of 0.96 and RMSE of 8.41, our performance for known cell lines is comparable\nto that reported by Zheng et al. (PCC = 0.98, RMSE = 7.12) 5 and Julkunen et al. (PCC\n= 0.97, RMSE = 9.86 in cross-validation; PCC = 0.92 on validation data). 18 However,\nthe dataset used in our analyses is much larger and more heterogeneous comprising 947 cell\nlines, 265 drugs, and 9,535 drug combinations from different sources. In contrast, Zheng et al.\nemployed solely the O’Neil dataset (39 cell lines, 38 drugs, 583 drug combinations), 51 which\nis known to be of high quality, 4,5 whereas Julkunen et al. employed solely the AstraZeneca\nDREAM dataset (85 cell lines, 118 drugs, 910 drug combinations). 6\nRange Performance Comparison\nNext, we investigated whether certain inhibition ranges can be predicted more accurately\nthan others. Figure 4 shows the distribution of test MAEs for different inhibition intervals\nin range (−25, 100]. This range covers 99% of the training and test data. Predictions are (on\naverage) most accurate in the interval (0, 25] followed by the interval (−25,0]. As the actual\ninhibition increases, the error increases as well. This could be explained by the amount of\navailable training data for each interval: Most data is located in the intervals (0 , 25] (41%)\nand (−25, 0] (25%), while each of the other intervals is only covered by around 10% of the\ndata. In Supplementary Section 3 and Supplementary Figure 3, we provide further analysis\non how the amount of training data for individual drugs/combinations affects prediction\nperformance.\n22\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nData points with high inhibition represent cases where the drug treatment greatly reduced\nthe amount of viable cells, i.e., cases of effective treatment. Such data are commonly under-\nrepresented in drug screening datasets. 34,52,53 They are, however, of particular interest for\npersonalized therapy, where the most effective treatment options for a given patient should\nbe determined.\nThus, for monotherapies, we developed SAURON-RF, a random forest-based model that is\ndesigned to improve predictions of drug-sensitive samples for both classification and regres-\nsion. 15,29 To this end, SAURON-RF relies (among other things) on sample-specific weights.\nConsequently, we also tried to incorporate sample weights into our models presented here\nto increase the importance of the underrepresented intervals. Unfortunately, the sample\nweights had only little impact on predictions, especially for the cases with highest inhibition\n(see Supplementary Figure 4).\nCorrelation of Duplicated Entries\nAs discussed in the Methods section, for the MACCS and PhysChem settings, the same\ntreatment can be described by two different input representations through switching the\norder of the considered drugs (cf. Figure 1). Hence, we decided to include both input\nrepresentations into the training and test data of our models. Ideally, predictions for both\ninput representations should correlate well. Figure 5A shows the correlation of predictions\nfor the best-performing random forest model trained using MACCS fingerprints. As desired,\nboth predictions are highly correlated (PCC ≈ 1) and the mean absolute difference between\nthem is very small (0.8). Figure 5B shows the same analysis for a model where we removed\nthe duplicated entries from the training data. Even though the correlation is still high (PCC\n= 0.82), it decreased strongly, while prediction differences increased notably to 9.12 on\naverage. The mean PCCs per drug (for monotherapies) and per drug combination are 0.98\nand 0.97 for the duplicated training data and decrease to 0.78 and 0.86 for the non-duplicated\ntraining data, respectively. This is also represented in the test error where the model with\n23\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n10.69 12.83 18.86\n9.02 8.59 8.78\n16.74 19.13 16.31\n21.25 24.35 31.74\n27.05 32.98 47.84\n10.29 12.67 17.29\n8.75 10.26 9.35\n16.4 19.05 14.96\n21.02 22.92 28.89\n26.58 29.62 43.66\n10.8 16.22 22.8\n8.37 13.78 15.13\n14.36 19.6 12.86\n18.69 18.64 27.35\n24.67 19.15 48.36\n10.73 25.96 22.42\n8.3 12.93 15.2\n14.09 14.23 12.93\n19.05 38.87 27.18\n25.81 65.17 48.63\nOneHot OneHotTar MACCS PhysChem\n(−25,0](0,25](25,50](50,75](75,100]\n0\n25\n50\n75\n100\n125\n0\n25\n50\n75\n100\n125\n0\n25\n50\n75\n100\n125\n0\n25\n50\n75\n100\n125\n0\n25\n50\n75\n100\n125 | absolute − predicted |\nModel Random Forest Neural Network Elastic Net\nFigure 4: Test set performance for different inhibition ranges. This figure shows the pre-\ndiction errors (in terms of absolute difference between actual and predicted values) for each\nsetting (columns) and each investigated ML algorithm (coloring). Each row shows the per-\nformance for a different interval of actual relative inhibitions. On top of each boxplot, the\nmean absolute error (MAE) is shown.\n24\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nduplicated training entries achieved an MAE of 12.39 compared to 14.6 for non-duplicated\nentries. Similar trends can also be observed for the PhysChem setting (see Supplementary\nFigure 5).\nR = 1, p < 2.2e−16\n−25\n0\n25\n50\n75\n100\n−25 0 25 50 75 100\nPrediction 1\nPrediction 2\n20k 40k 60k\n# Data points\nDuplicatedA\nR = 0.82, p < 2.2e−16\n−25\n0\n25\n50\n75\n100\n−25 0 25 50 75 100\nPrediction 1\nPrediction 2\n2k 4k 6k 8k\n# Data points\nNot duplicatedB\nFigure 5: Correlation of duplicated entries from the test data. This figure shows the cor-\nrelation between the model predictions for duplicated entries. Duplicated entries refer to\nthe same drug-drug-cell combination and the same treatment concentrations but can be\nrepresented by two different model inputs through swapping the features of the respective\ndrugs (cf. Methods and Figure 1) Sub-figure A shows the test predictions when including\nduplicated entries into the training data, while Sub-figure B shows the predictions when\ntraining only on non-duplicated entries. In both figures, the black diagonal line represents\nthe identity and R denotes the Pearson correlation between the predictions.\nReconstruction of Drug Sensitivity Measures\nA benefit of predicting concentration-specific inhibition values is that based on the model’s\npredictions, dose-response curves and matrices can be reconstructed. These can in turn\nbe used to compute various measures of drug sensitivity or synergy. Since the focus of this\npaper is on sensitivity prediction and Vlot et al. discourage the computation of arbitrary syn-\nergy scores on large-scale data, 13 we reconstructed two measures of drug sensitivity, namely\n25\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nour recently published measure called CMax viability for monotherapies, and a modification\nof this measure for drug combinations, which we call the combination CMax viability (cf.\nMethods). Unlike conventional sensitivity measures like the IC50 or AUC, the (combina-\ntion) CMax viability is comparable across drugs 15 and drug combinations. Consequently, it\ncan be used to prioritize drugs/combinations for a given cell line (i.e., rank them by their\neffectiveness), which will be investigated in the next section.\nFor the computation of monotherapy CMax viabilities, we first used the actual/predicted\nmonotherapy entries of the test data to generate actual/predicted dose-response-curves (cf.\nMethods). An example is shown in Figure 2, where we also highlight how the CMax viability\nis derived from the curves. In total, we were able to compute both the actual and predicted\nCMax viabilities for 7,352 out of 32,564 cell line-drug combinations. The decreased num-\nber of combinations stems from the fact that CMax concentrations were only available for\n77 of the investigated drugs. Figure 6 depicts the prediction errors for the reconstructed\nmonotherapy CMax viability values. The mean MAE averaged over all drugs is 0.12 and\nthe mean MSE is 0.04, which is comparable to the error we previously achieved when pre-\ndicting CMax viabilities directly using either the SAURON-RF algorithm by Lenhof et al. 29\n(MSE = 0.03) or a slightly adjusted version of DeepDR by Chiu et al. 54 (MSE = 0.09). 15 A\nbaseline error can be obtained from a model that for every treatment concentration predicts\nthe mean inhibition for each drug obtained from the training data. For such a model, the\nCMax viability (i.e., the viability at the CMax concentration) would also be predicted as\nthis mean. This would result in a baseline MAE of 0.2, which our model improves by 40%.\nThe overall PCC is 0.58 for the CMax viabilities and 0.41 for the baseline. However, the\ndrug-specific PCC is only 0.1 (cf. Figure 6B). While a drug-specific baseline PCC cannot\nbe computed for constant predictions, adding random noise with mean 0 to these constant\npredictions results in a baseline PCC of 0. Thus, our predictions improve this baseline but\nonly slightly. When using our models to reconstruct IC50 values, we observe a similar phe-\nnomenon (overall PCC = 0.71, mean PCC per drug = 0.01, cf. Supplementary Figure 6). To\n26\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\ninvestigate the reasons for these low drug-specific correlations, we developed and evaluated\ndifferent hypotheses, which can be found in the Supplement. Based on our evaluation of\nthese hypotheses, we conclude that even though prediction errors are relatively small and\ncomparable to our previous work, the derived measures cannot be used to compare the effect\nof a drug monotherapy on different cell lines. For the combination CMax viability (26,946\ndrug-drug-cell line combinations), we obtained similar results, which are depicted in Figure\n6C and D.\n0.0\n0.1\n0.2\n0.3\n0.4MAE per drug\nA\n−1.0\n−0.5\n0.0\n0.5\n1.0\nPCC per drug\nB\n0.0\n0.2\n0.4\n0.6MAE per drug combination\nC\n−1.0\n−0.5\n0.0\n0.5\n1.0\nPCC per drug combination\nD\nFigure 6: Reconstruction of (combination) CMax viabilities from predicted dose-response\ncurves/matrices. Sub-figures A and B (red) show the distribution of MAE and PCC per\ndrug for the reconstruction of CMax viabilities using dose-response curves fit on the test set\nmonotherapy data. Sub-figures C and D (blue) show the distribution of MAE and PCC per\ndug combination for the reconstruction of combination CMax viabilities using dose-response\nmatrices derived from the test set drug combination data.\nNevertheless, we would like to highlight that such an evaluation of drug-specific correlations\nas conducted here is frequently not performed for drug sensitivity and synergy prediction\n(cf. Supplementary Table 3, where we compare the investigated settings and analyses for 39\nstate-of-the-art methods). Thus, similar problems may often go undetected.\nDue to the novelty of our prediction approach, there is no method we could directly compare\nour findings to. Nevertheless, our analyses presented earlier show that our models are com-\npetitive in performance to the approaches by Zheng et al. 5 and Julkunen et al. 18 for making\npredictions using known cell lines and drug combination data. Note that both approaches\n27\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\ndo not provide drug-/combination-specific correlations.\nFor cell-blind evaluations on monotherapy data, we found three related approaches that\nprovide drug-specific correlations: Our recently published method SAURON-RF achieves a\nmean PCC of 0.56 when directly predicting CMax viabilities using drug-specific models.15 In\nthe same manuscript we also show that an adjusted version of the multi-drug model DeepDR\nby Chiu et al. 54 achieves a PCC of 0 for the same task. In comparison, Chawla et al. employ\nmulti-drug models for the prediction of IC50 values and achieve mean PCCs between ca. 0.18\nand 0.5 for different ML algorithms. Lastly, Rahman and Pal achieve mean PCCs between\n0.29 and 0.44 when reconstructing AUC values from predicted dose-response curves. While\nnot directly comparable to our approach, these works underline that at least weak to mod-\nerate drug-specific correlations can be achieved (1) for predicting CMax viabilities (2) when\nusing multi-drug models (3) when deriving sensitivity measures from predicted curves. Yet,\nit remains to be investigated further if and how comparable results can be achieved when\ncombining all three factors and also considering combination therapies, thereby enabling\npredictions for arbitrary drugs/combinations and measures, which we aim to achieve here.\nTreatment Prioritization\nIn our final analysis, we investigate how accurately drugs and drug combinations can be\nprioritized for a given cell line based on the model predictions: For each cell line in the test\nset, we used the computed CMax viabilities for the monotherapy and combination data to\nachieve a ranking of drugs and drug combinations from most to least effective. Drug prioriti-\nzation is supposed to mimic a personalized treatment scenario with the goal to achieve a list\nof most effective treatment suggestions for a given patient. The results are shown in Figure\n7, where the first row shows the results for monotherapies only, while the second row shows\nthe results when combining mono- and combination therapies into one list. The results for\ncombination therapies only are shown in Supplementary Figure 8.\n28\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nFor monotherapies, the Spearman correlation coefficient (SCC) between the actual and pre-\ndicted rankings was 0.74 (baseline (as defined in the previous section): 0.54). Our predictions\nclearly outperform the baseline. Still, the baseline correlation is relatively high, indicating\nthat the differences in effectiveness between drugs are easier to predict than the differences\nbetween cell lines receiving the same treatment.\nWhile an accurate ranking for the entire list is desirable, one would typically place more\nemphasis on the correct identification of the most effective treatments. Thus, we computed\nthe mean overlap between the first k elements of the actual and predicted rankings. For\nmonotherapies, the average length of the predicted drug lists is 31.15. The average over-\nlap between the top k = 5 and k = 10 actual and predicted most effective drugs is 3.16\n(baseline: 2.14) and 7.68 (baseline: 6.55), respectively (results for further k are shown in\nSupplementary Figure 9). Furthermore, the median rank of the actually most effective drug\nin the predicted ranking is 2.5 (baseline: 8), and the median rank of the drug predicted to\nbe most effective in the actual list is 3 (baseline 6). The median difference between the true\nCMax viabilities of the actual most effective and predicted most effective drugs is only 0.02\n(baseline 0.31).\nThe second row of Figure 7 shows the analogous prioritization results when combining mono-\nand combination treatments into one list. The SCC of 0.76 (baseline: 0.62) is comparable to\nthe results for monotherapies. Since the average list length is much greater when including\ndrug combinations (838.62), the overlaps at k = 5 (1.26, baseline: 0.68) and k = 10 (3.38,\nbaseline: 2.09) are lower (cf. also Supplementary Figures 9 and 10). Furthermore, the me-\ndian rank of the actually best treatment in the predicted list (27, baseline: 170.5) and of the\npredicted best treatment in the actual list (9.5, baseline: 12) decrease. Still, results clearly\nimprove over the baseline. Furthermore, the median difference in viability between the actu-\nally most effective treatment and the treatment predicted to be most effective remains small\n(0.02, baseline 0.03).\n29\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n0.0\n0.2\n0.4\n0.6\n0.8\n1.0SCC\nA\n0\n1\n2\n3\n4\n5Overlap act./pred. lists, k = 5\nB\n0\n2\n4\n6\n8\n10Overlap act./pred. lists, k = 10\nC\n0\n10\n20\n30Pred. rank of act. best drug\nD\n0\n10\n20\n30Act. rank of pred. best drug\nE\n0.0\n0.2\n0.4\n0.6\n0.8\n1.0\nDifference in CMax viabiliy btw. \n act. and pred. best drug\nF\n0.0\n0.2\n0.4\n0.6\n0.8\n1.0SCC\nG\n0\n1\n2\n3\n4\n5Overlap act./pred. lists, k = 5\nH\n0\n2\n4\n6\n8\n10Overlap act./pred. lists, k = 10\nI\n0\n250\n500\n750\n1000Pred. rank of act. best treatment\nJ\n0\n10\n20\n30Act. rank of pred. best treatment\nK\n0.0\n0.2\n0.4\n0.6\n0.8\n1.0\nDifference in CMax viabiliy btw. \n act. and pred. best treatment\nL\nFigure 7: Treatment prioritization. This figure depicts the test set prioritization results for\nmono- and combination therapies. Sub-figures A to F (red) focus on the prioritization of\nmonotherapies including: (A) the SCC between the actual and predicted rankings for each\ncell line, (B)/(C) the intersection size between the 5/10 actual and predicted most effective\ntreatments, (D) the predicted rank of the actual most effective treatment, (E) the actual\nrank of the treatment predicted to be most effective, and (F) the difference between the\nactual CMax viabilities for the actual and predicted most effective treatment. Sub-figures G\nto L (blue) show the analogous prioritization results when combining mono- and combination\ntreatments into one list.\n30\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nDiscussion\nAdministering not only single but multiple drugs in combination is common in cancer treat-\nment. However, while drug response datasets for monotherapy data have been available for\nmore than a decade, large-scale data sets for combination therapy have only become publicly\navailable more recently, e.g., the DrugComb database. 4,5 While the DrugComb data have\nextensively been studied for the prediction of drug synergy, they are still underused for the\nprediction of drug sensitivity, especially with the focus on making personalized treatment\nrecommendations. For this application case, we found the scores that are widely used for\nsynergy prediction less suited due to various reasons discussed in this manuscript.\nTo exploit the available drug combination data for predicting drug responses without rely-\ning on synergy scores, we developed and evaluated several ML algorithms and architectures\nthat directly predict concentration-specific drug responses in the form of relative inhibitions.\nWe are convinced that this approach has various benefits for personalized treatment rec-\nommendation: First, our approach allows the reconstruction of dose-response curves and\nmatrices from the model predictions. From these curves/matrices, various sensitivity or syn-\nergy measures can be reconstructed. The inspection of individual curves/matrices can aid\nin validating the underlying assumptions for certain measures. Next, our approach can pre-\ndict both mono- and combination therapies. Additionally, our approach allows for making\npredictions for unseen cell lines, thereby mimicking the scenario assessing drug responses\nfor a new patient. Together with our novel sensitivity measure, the (combination) CMax\nviability, this framework finally enables the prioritization of both mono- and combination\ntherapy options for unseen cell lines (patients).\nOur evaluations on the DrugComb database show that our models substantially improve\nbaseline models and show very little variation when predicting the same treatment using\ndifferent input representations. Notably, we evaluated our models on unseen cell lines, which\nis often neglected in drug sensitivity prediction. 19 Moreover, our models are also competitive\nwith state-of-the-art approaches when making predictions for known cell lines. Furthermore,\n31\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nwe achieved strong correlations for treatment prioritization. However, our analyses also re-\nveal weaknesses of directly predicting relative inhibitions: While prediction errors for the\nreconstruction of drug response measures are competitive with other approaches, the drug-\nspecific correlations between these measures only slightly improve over a baseline model.\nAdditionally, we observed increased prediction errors for data samples with high inhibitions,\ncorresponding to cases of treatment sensitivity. This issue is relatively well-known for clas-\nsification but has rarely been discussed or addressed for regression. 29,34\nThree main factors can be adjusted to potentially address such challenges, namely the choice\nof ML algorithm, the choice and representation of input features, and the used data:\nML algorithm: We investigated neural networks (highly popular for sensitivity and syn-\nergy prediction), random forests, and elastic nets. In our recently published benchmarking,\nwe found both elastic nets and random forests to outperform neural networks when predict-\ning drug sensitivity. 24 For the prediction of inhibitions, as investigated here, random forests\nare superior to the other algorithms. In general, a plethora of further (potentially more\nsophisticated) approaches can be used to model the prediction of inhibitions. However, as\ndiscussed in our benchmarking 24 and also by Li et al., 55 more complex approaches are not\nnecessarily superior to simpler ML algorithms, and careful evaluation is required to ensure\na fair performance comparison.\nInput features and representation: For the characterization of cell lines in the model\ninput, several sources found gene expression to be the most informative omics-type for pre-\ndicting drug responses. 54,56,57 However, the inclusion of further omics or a priori knowledge,\ne.g., known sensitivity biomarkers or protein interactions, might improve predictions.\nSimilarly, further drug properties, e.g., Morgan fingerprints58 could be investigated, or graph\nneural networks could be employed to represent drugs as molecular graphs. However, the\nsuperiority of molecular graphs over conventional drug fingerprints for sensitivity/synergy\nprediction and drug discovery has been questioned. 27,59\n32\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nDataset: With 947 cell lines, 265 drugs, and 9,535 drug combinations, the dataset in-\nvested here is notably larger compared to other approaches working on drug combination\ndata. 5,12,14,18,49,60,61 Unfortunately, given the size of the investigated dataset, hardware re-\nstrictions become a limiting factor for ML. Despite training models on a compute cluster\nwith machines of 500 gigabytes working memory, we had to reduce our data regarding the\nnumber of considered drugs, features, and methods (cf. Methods).\nGenerally, a large amount of training data benefits model training and robustness. Yet, if\nthe dataset is heterogeneous, e.g., due to different data sources, as is the case for DrugComb,\nthis may decrease performance compared to models built and evaluated on a more homo-\ngeneous dataset. Even though Zagidullin et al. found the reproducibility between replicates\nfrom different datasets satisfactory in the first release of DrugComb, 4 disagreement between\ndrug response data from different sources is a well-known problem. 57,62,63 Especially for clin-\nical applications, combining data from different sources (e.g., different hospitals) is essential,\nand models should be able to cope with this degree of heterogeneity. To this end, meta- or\ntransfer-learning methods could be leveraged. 64\nInvestigating different ML algorithms, input representations, and datasets can potentially\nimprove the predictive performance. However, especially in a sensitive field such as per-\nsonalized medicine, performance alone should not be regarded the sole building block of\nmodel trustworthiness. 65 E.g., to assess the reliability of individual predictions, uncertainty\nestimation frameworks like conformal prediction could be applied. 15,66–68 Additionally, in-\ncorporating interpretability mechanisms 12,65,69 into the model design and evaluation can aid\nin identifying drug or cell line properties that impact the predicted response. This could not\nonly make predictions more comprehensible but also be useful to infer novel mechanisms of\ndrug sensitivity or synergy.\n33\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nData and Software Availability\nThe drug response data used for our analyses can be downloaded from the DrugComb web-\nsite (https://drugcomb.org/download) and the DrugComb API (https://api.drugcomb.\norg/, cf. Methods). The gene expression data of canncer cell lines can be downloaded from\nthe GDSC website ( https://www.cancerrxgene.org/downloads/bulk_download). CMax\nconcentrations for 77 of the investigated drugs can be derived from Liston and Davis. 20\nOur code is available at GitHub (https://github.com/unisb-bioinf/Drug_Combination_\nSensitivity_Prediction), where we also provide the SMILES, MACCS fingerprints and\nphysico-chemical properties derived from RDKit, 21 as well as the one-hot encoded target\nmolecules of the investigated compounds.\nReferences\n(1) Yang, W.; Soares, J.; Greninger, P.; Edelman, E. J.; Lightfoot, H.; Forbes, S.;\nBindal, N.; Beare, D.; Smith, J. A.; Thompson, I. R.; others Genomics of Drug Sen-\nsitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer\ncells. Nucleic acids research 2012, 41, D955–D961.\n(2) Iorio, F.; Knijnenburg, T. A.; Vis, D. J.; Bignell, G. R.; Menden, M. P.; Schubert, M.;\nAben, N.; Gon¸ calves, E.; Barthorpe, S.; Lightfoot, H.; others A landscape of pharma-\ncogenomic interactions in cancer. Cell 2016, 166, 740–754.\n(3) Mokhtari, R. B.; Homayouni, T. S.; Baluch, N.; Morgatskaya, E.; Kumar, S.; Das, B.;\nYeger, H. Combination therapy in combating cancer. Oncotarget 2017, 8, 38022.\n(4) Zagidullin, B.; Aldahdooh, J.; Zheng, S.; Wang, W.; Wang, Y.; Saad, J.; Malyutina, A.;\nJafari, M.; Tanoli, Z.; Pessia, A.; others DrugComb: an integrative cancer drug combi-\nnation data portal. Nucleic acids research 2019, 47, W43–W51.\n34\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n(5) Zheng, S.; Aldahdooh, J.; Shadbahr, T.; Wang, Y.; Aldahdooh, D.; Bao, J.; Wang, W.;\nTang, J. DrugComb update: a more comprehensive drug sensitivity data repository\nand analysis portal. Nucleic acids research 2021, 49, W174–W184.\n(6) Menden, M. P.; Wang, D.; Mason, M. J.; Szalai, B.; Bulusu, K. C.; Guan, Y.; Yu, T.;\nKang, J.; Jeon, M.; Wolfinger, R.; others Community assessment to advance compu-\ntational prediction of cancer drug combinations in a pharmacogenomic screen. Nature\ncommunications 2019, 10, 2674.\n(7) Torkamannia, A.; Omidi, Y.; Ferdousi, R. A review of machine learning approaches for\ndrug synergy prediction in cancer. Briefings in Bioinformatics 2022, 23, bbac075.\n(8) Yadav, B.; Wennerberg, K.; Aittokallio, T.; Tang, J. Searching for drug synergy in\ncomplex dose–response landscapes using an interaction potency model. Computational\nand structural biotechnology journal 2015, 13, 504–513.\n(9) Loewe, S. The problem of synergism and antagonism of combined drugs. Arzneimittel-\nforschung 1953, 3, 285–290.\n(10) Bliss, C. I. The toxicity of poisons applied jointly 1. Annals of applied biology 1939,\n26, 585–615.\n(11) Berenbaum, M. C. What is synergy? Pharmacological reviews 1989, 41, 93–141.\n(12) Janizek, J. D.; Celik, S.; Lee, S.-I. Explainable machine learning prediction of synergistic\ndrug combinations for precision cancer medicine. BioRxiv 2018, 331769.\n(13) Vlot, A. H.; Aniceto, N.; Menden, M. P.; Ulrich-Merzenich, G.; Bender, A. Applying\nsynergy metrics to combination screening data: agreements, disagreements and pitfalls.\nDrug discovery today 2019, 24, 2286–2298.\n(14) Malyutina, A.; Majumder, M. M.; Wang, W.; Pessia, A.; Heckman, C. A.; Tang, J. Drug\n35\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\ncombination sensitivity scoring facilitates the discovery of synergistic and efficacious\ndrug combinations in cancer. PLoS computational biology 2019, 15, e1006752.\n(15) Lenhof, K.; Eckhart, L.; Rolli, L.-M.; Volkamer, A.; Lenhof, H.-P. Reliable anti-cancer\ndrug sensitivity prediction and prioritization. Scientific Reports 2024, 14, 12303.\n(16) Rahman, R.; Pal, R. Analyzing drug sensitivity prediction based on dose response\ncurve characteristics. IEEE-EMBS International Conference on Biomedical and Health\nInformatics (BHI). 2016; pp 140–143.\n(17) Rahman, R.; Dhruba, S. R.; Ghosh, S.; Pal, R. Functional random forest with applica-\ntions in dose-response predictions. Scientific reports 2019, 9, 1628.\n(18) Julkunen, H.; Cichonska, A.; Gautam, P.; Szedmak, S.; Douat, J.; Pahikkala, T.; Ait-\ntokallio, T.; Rousu, J. Leveraging multi-way interactions for systematic prediction of\npre-clinical drug combination effects. Nature communications 2020, 11, 6136.\n(19) Codic` e, F.; Pancotti, C.; Rollo, C.; Moreau, Y.; Fariselli, P.; Raimondi, D. The Spec-\nification Game: Rethinking the Evaluation of Drug Response Prediction for Precision\nOncology. bioRxiv 2024, 2024–10.\n(20) Liston, D. R.; Davis, M. Clinically Relevant Concentrations of Anticancer Drugs: A\nGuide for Nonclinical StudiesGuide to Clinical Exposures of Anticancer Drugs. Clinical\ncancer research 2017, 23, 3489–3498.\n(21) Landrum, G.; others RDKit: Open-source cheminformatics. version 2023.3.2.\n(22) Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL keys\nfor use in drug discovery.Journal of chemical information and computer sciences 2002,\n42, 1273–1280.\n(23) Landrum, G.; others RDKit Documentation - rdkit.Chem.Descriptors module. https:\n36\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n//www.rdkit.org/docs/source/rdkit.Chem.Descriptors.html, Accessed: 2024-11-\n11.\n(24) Eckhart, L.; Lenhof, K.; Rolli, L.-M.; Lenhof, H.-P. A comprehensive benchmarking of\nmachine learning algorithms and dimensionality reduction methods for drug sensitivity\nprediction. Briefings in bioinformatics 2024, 25 .\n(25) Fudio, S.; Sellers, A.; P´ erez Ramos, L.; Gil-Alberdi, B.; Zeaiter, A.; Urroz, M.; Car-\ncas, A.; Lubomirov, R. Anti-cancer drug combinations approved by US FDA from 2011\nto 2021: main design features of clinical trials and role of pharmacokinetics. Cancer\nChemotherapy and Pharmacology 2022, 90, 285–299.\n(26) Baptista, D.; Ferreira, P. G.; Rocha, M. Deep learning for drug response prediction in\ncancer. Briefings in Bioinformatics 2020, 22, 360–379.\n(27) An, X.; Chen, X.; Yi, D.; Li, H.; Guan, Y. Representation of molecules for drug response\nprediction. Briefings in Bioinformatics 2021, 23, bbab393.\n(28) Chen, Y.; Zhang, L. How much can deep learning improve prediction of the responses\nto drugs in cancer cell lines? Briefings in bioinformatics 2022, 23, bbab378.\n(29) Lenhof, K.; Eckhart, L.; Gerstner, N.; Kehl, T.; Lenhof, H.-P. Simultaneous regres-\nsion and classification for drug sensitivity prediction using an advanced random forest\nmethod. Scientific Reports 2022, 12, 13458.\n(30) Rahman, R.; Matlock, K.; Ghosh, S.; Pal, R. Heterogeneity aware random forest for\ndrug sensitivity prediction. Scientific reports 2017, 7, 1–11.\n(31) Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: a deep forest model to predict\nanti-cancer drug response. Methods 2019, 166, 91–102.\n37\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n(32) Oskooei, A.; Manica, M.; Mathis, R.; Mart´ ınez, M. R. Network-based biased tree en-\nsembles (NetBiTE) for drug sensitivity prediction and drug sensitivity biomarker iden-\ntification in cancer. Scientific reports 2019, 9, 15918.\n(33) Fang, Y.; Xu, P.; Yang, J.; Qin, Y. A quantile regression forest based method to predict\ndrug response and assess prediction reliability. PLoS One 2018, 13, e0205155.\n(34) Basu, A.; Mitra, R.; Liu, H.; Schreiber, S. L.; Clemons, P. A. RWEN: response-weighted\nelastic net for prediction of chemosensitivity of cancer cell lines. Bioinformatics 2018,\n34, 3332–3339.\n(35) Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outper-\nform deep learning on typical tabular data? Advances in neural information processing\nsystems 2022, 35, 507–520.\n(36) Shwartz-Ziv, R.; Armon, A. Tabular data: Deep learning is not all you need. Informa-\ntion Fusion 2022, 81, 84–90.\n(37) Borisov, V.; Leemann, T.; Seßler, K.; Haug, J.; Pawelczyk, M.; Kasneci, G. Deep Neural\nNetworks and Tabular Data: A Survey. IEEE Transactions on Neural Networks and\nLearning Systems 2024, 35, 7499–7519.\n(38) Smith, A. M.; Walsh, J. R.; Long, J.; Davis, C. B.; Henstock, P.; Hodge, M. R.; Ma-\nciejewski, M.; Mu, X. J.; Ra, S.; Zhao, S.; others Standard machine learning approaches\noutperform deep representation learning on phenotype prediction from transcriptomics\ndata. BMC bioinformatics 2020, 21 .\n(39) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blon-\ndel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; others Scikit-learn: Machine learning\nin Python. Journal of Machine Learning Research 2011, 12, 2825–2830.\n38\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n(40) Abadi, M.; others TensorFlow: Large-Scale Machine Learning on Heterogeneous Sys-\ntems. 2015; https://www.tensorflow.org/.\n(41) Ritz, C.; Baty, F.; Streibig, J. C.; Gerhard, D. Dose-Response Analysis Using R. PLOS\nONE 2015, 10 .\n(42) drc: Analysis of Dose-Response Curves. https://cran.r-project.org/web/\npackages/drc/drc.pdf, Accessed: 2024-11-11.\n(43) Vis, D. J.; Bombardelli, L.; Lightfoot, H.; Iorio, F.; Garnett, M. J.; Wessels, L. F.\nMultilevel models improve precision and speed of IC50 estimates. Pharmacogenomics\n2016, 17, 691–700.\n(44) Wellcome Sanger Institute, GDSC database Resources Download - IC50 Data defini-\ntions. https://cog.sanger.ac.uk/cancerrxgene/GDSC_release8.5/GDSC_Fitted_\nData_Description.pdf, 2024; Accessed: 2024-11-11.\n(45) Ianevski, A.; Giri, A. K.; Gautam, P.; Kononov, A.; Potdar, S.; Saarela, J.; Wenner-\nberg, K.; Aittokallio, T. Prediction of drug combination effects with a minimal set of\nexperiments. Nature machine intelligence 2019, 1, 568–577.\n(46) Borchers, H. W. pracma: Practical Numerical Math Functions. 2022; R package version\n2.4.2.\n(47) Lederer, S.; Dijkstra, T. M.; Heskes, T. Additive dose response models: explicit formu-\nlation and the loewe additivity consistency condition. Frontiers in pharmacology 2018,\n9, 31.\n(48) Greco, W. R.; Bravo, G.; Parsons, J. C. The search for synergy: a critical review from\na response surface perspective. Pharmacological Reviews 1995, 47, 331–385.\n(49) Preuer, K.; Lewis, R. P.; Hochreiter, S.; Bender, A.; Bulusu, K. C.; Klambauer, G.\n39\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\nDeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics\n2018, 34, 1538–1546.\n(50) Liu, P.; Li, H.; Li, S.; Leung, K.-S. Improving prediction of phenotypic drug response\non cancer cell lines using deep convolutional network. BMC Bioinformatics 2019, 20 .\n(51) O’Neil, J.; Benita, Y.; Feldman, I.; Chenard, M.; Roberts, B.; Liu, Y.; Li, J.; Kral, A.;\nLejnine, S.; Loboda, A.; others An unbiased oncology compound screen to identify\nnovel combination strategies. Molecular cancer therapeutics 2016, 15, 1155–1162.\n(52) Knijnenburg, T. A.; Klau, G. W.; Iorio, F.; Garnett, M. J.; McDermott, U.; Shmule-\nvich, I.; Wessels, L. F. Logic models to predict continuous outputs based on binary\ninputs with an application to personalized cancer therapy. Scientific reports 2016, 6,\n1–14.\n(53) Lenhof, K.; Gerstner, N.; Kehl, T.; Eckhart, L.; Schneider, L.; Lenhof, H.-P. MERIDA:\na novel Boolean logic-based integer linear program for personalized cancer therapy.\nBioinformatics 2021, 37, 3881–3888.\n(54) Chiu, Y.-C.; Chen, H.-I. H.; Zhang, T.; Zhang, S.; Gorthi, A.; Wang, L.-J.; Huang, Y.;\nChen, Y. Predicting drug response of tumors from integrated genomic profiles by deep\nneural networks. BMC medical genomics 2019, 12, 143–155.\n(55) Li, Y.; Hostallero, D. E.; Emad, A. Interpretable deep learning architectures for im-\nproving drug response prediction performance: myth or reality? Bioinformatics 2023,\n39, btad390.\n(56) Costello, J. C.; Heiser, L. M.; Georgii, E.; G¨ onen, M.; Menden, M. P.; Wang, N. J.;\nBansal, M.; Ammad-Ud-Din, M.; Hintsanen, P.; Khan, S. A.; others A community\neffort to assess and improve drug sensitivity prediction algorithms.Nature biotechnology\n2014, 32, 1202–1212.\n40\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n(57) Jang, I. S.; Neto, E. C.; Guinney, J.; Friend, S. H.; Margolin, A. A. Biocomputing 2014;\nWorld Scientific, 2014; pp 63–74.\n(58) Rogers, D.; Hahn, M. Extended-connectivity fingerprints. Journal of chemical informa-\ntion and modeling 2010, 50, 742–754.\n(59) Jiang, D.; Wu, Z.; Hsieh, C.-Y.; Chen, G.; Liao, B.; Wang, Z.; Shen, C.; Cao, D.;\nWu, J.; Hou, T. Could graph neural networks learn better molecular representation\nfor drug discovery? A comparison study of descriptor-based and graph-based models.\nJournal of cheminformatics 2021, 13, 1–23.\n(60) Li, X.; Xu, Y.; Cui, H.; Huang, T.; Wang, D.; Lian, B.; Li, W.; Qin, G.; Chen, L.;\nXie, L. Prediction of synergistic anti-cancer drug combinations based on drug target\nnetwork and drug induced gene expression profiles. Artificial intelligence in medicine\n2017, 83, 35–43.\n(61) Sidorov, P.; Naulaerts, S.; Ariey-Bonnet, J.; Pasquier, E.; Ballester, P. J. Predicting\nsynergism of cancer drug combinations using NCI-ALMANAC data. Frontiers in chem-\nistry 2019, 7, 509.\n(62) Haibe-Kains, B.; El-Hachem, N.; Birkbak, N. J.; Jin, A. C.; Beck, A. H.; Aerts, H. J.;\nQuackenbush, J. Inconsistency in large pharmacogenomic studies. Nature 2013, 504,\n389–393.\n(63) Hatzis, C.; Bedard, P. L.; Birkbak, N. J.; Beck, A. H.; Aerts, H. J.; Stern, D. F.; Shi, L.;\nClarke, R.; Quackenbush, J.; Haibe-Kains, B. Enhancing reproducibility in cancer drug\nscreening: how do we move forward? Cancer research 2014, 74, 4016–4023.\n(64) Sharifi-Noghabi, H.; Peng, S.; Zolotareva, O.; Collins, C. C.; Ester, M. AITL: adver-\nsarial inductive transfer learning with input and output space adaptation for pharma-\ncogenomics. Bioinformatics 2020, 36, i380–i388.\n41\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint \n\n(65) Lenhof, K.; Eckhart, L.; Rolli, L.-M.; Lenhof, H.-P. Trust me if you can: a survey\non reliability and interpretability of machine learning approaches for drug sensitivity\nprediction in cancer. Briefings in Bioinformatics 2024, 25, bbae379.\n(66) Angelopoulos, A. N.; Bates, S. A gentle introduction to conformal prediction and\ndistribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511 2021,\n(67) Norinder, U.; Carlsson, L.; Boyer, S.; Eklund, M. Introducing conformal prediction\nin predictive modeling. A transparent and flexible alternative to applicability domain\ndetermination. Journal of chemical information and modeling 2014, 54, 1596–1603.\n(68) Alvarsson, J.; McShane, S. A.; Norinder, U.; Spjuth, O. Predicting with confidence:\nusing conformal prediction in drug discovery.Journal of Pharmaceutical Sciences 2021,\n110, 42–49.\n(69) Tang, Y.-C.; Gottlieb, A. Explainable drug sensitivity prediction through cancer path-\nway enrichment. Scientific reports 2021, 11, 1–10.\n42\n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted November 23, 2024. ; https://doi.org/10.1101/2024.11.22.624812doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}