A Machine Learning Platform for Interconnecting Antibody-Drug Conjugate Cytotoxic Design with Tumor Cell Biology

doi:10.21203/rs.3.rs-6256038/v1

A Machine Learning Platform for Interconnecting Antibody-Drug Conjugate Cytotoxic Design with Tumor Cell Biology

2025 · doi:10.21203/rs.3.rs-6256038/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 227,079 characters · extracted from preprint-html · click to expand

A Machine Learning Platform for Interconnecting Antibody-Drug Conjugate Cytotoxic Design with Tumor Cell Biology | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A Machine Learning Platform for Interconnecting Antibody-Drug Conjugate Cytotoxic Design with Tumor Cell Biology Jeffrey Leyton, Hazem Mslati, Gael Coulombe, Mehdi Ezzine, Tiana Yuen, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6256038/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Antibody-drug conjugates (ADCs) represent a significant advancement in therapeutic oncology, as they precisely deliver cytotoxic drugs to target tumor cells. However, ADC development is complex due to the entangled interplay between chemical design and tumor cell biology. Therefore, a platform was developed consisting of an ADC-tumor cell interconnected multimodal framework for machine learning applications. It contains ADC records from the past two decades that details linkers, payloads, drug-antibody ratios, and cytotoxicity IC50 values. Biological interconnection was achieved through integrating omics data from ~1,400 human tumor cell lines. Moreover, a protein intensity prediction tool was developed that further enriched the multifaceted framework by concentrating on cell surface antigens. A deep learning model was trained on the framework and accurately predicted ADC in vitro activity across tumor cell lines at relevant nanomolar thresholds. This work exposes the complexities at the ADC-tumor cell interface and can significantly influence current empirical ADC design decisions. Biological sciences/Drug discovery/Drug screening/Virtual screening Biological sciences/Computational biology and bioinformatics/Data integration Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Introduction Antibody-drug conjugates (ADCs) have transformed cancer therapy by enabling targeted drug delivery and improving efficacy over traditional chemotherapy. However, their development remains highly inefficient due to the intricate complexity between their structural components as ADCs combine monoclonal antibodies (mAbs), specialized linkers, and cytotoxic payloads, at optimized drug-antibody ratios (DARs). Beyond structural complexity, ADCs rely on a highly coordinated mechanism of action. They must selectively bind to overexpressed antigens on the surface of tumor cells, undergo rapid internalization, and traffic through the endosomal-lysosomal system for degradation and payload release 1 . Unfortunately, the ADC-tumor cell interplay is also highly intricate as these processes not only vary from normal cells but across tumor types and, therefore, not fully understood 2 – 4 . ADC research faces high attrition rates in early development stages 5 , with many of the candidates also failing in clinical trial 6 . A significant challenge in developing effective ADCs stems from an incomplete understanding of the intricate interconnectedness between chemical design and cellular biology. ADCs must navigate a highly variable biological landscape, including cellular differences in antigen expression and endosomal-lysosomal processing pathways. This variability often leads to suboptimal therapeutic efficacy or increased off-target toxicity. Even with ADCs in clinical trials and U.S. FDA approvals on the rise, the recent withdrawal of two commercial products due to post-marketing inefficacy 7 , 8 underscores the need to address these gaps. A pressing fundamental challenge lies in refining antigen selection criteria and improving knowledge on the ADC relationship with internalization kinetics and lysosomal targeting efficiencies. Successful ADC design hinges on the expression profile of the cell surface target antigen and the subsequent processing governed by the endosomal-lysosomal machinery. Typically, ADC development prioritizes antigens with high tumor expression, as greater antigen density enhances ADC binding and payload accumulation 9 , 10 . However, high antigen expression alone does not guarantee success. First, the minimum antigen expression threshold required for good tumor selectivity remains unclear, as many tumor-associated antigens are also present on normal tissues, raising concerns about off-target toxicity. Second, overexpression of target antigens for specific cancer types has been shown to inhibit apoptosis 11 , suggesting that ADC efficacy may be enhanced by targeting tumors with relatively lower antigen expression levels. Beyond absolute and relative antigen expression levels, antigen internalization kinetics and endocytic pathways are important to understand 12 , not only between tumor and normal cells but also across different tumor types. However, antigen internalization is not well understood 12 . Even for well-established ADC targets like HER2, variations in internalization rates and intracellular routing can alter therapeutic efficacy between tumor types 13 . ADCs must then navigate the complex endosomal system to reach lysosomes, where degradation releases the cytotoxic payload. However, this pathway is highly variable among tissue types, which is compounded in cancer. The endosomal network consists of multiple compartments and their organization and dynamics are regulated by select members of the Rab proteins, the largest family of monomeric GTPases, and their effectors 14 , 15 . Many components of the endosomal machinery are mutated or have altered expression in several cancers impairing ADC lysosomal targeting 3 , 4 . Additionally, lysosomal dysfunction, including alkalization due to aberrant vacuolar H + ATPase activity, has been a significant factor linked to ADC resistance 16 – 18 . Despite advances in linker chemistry designed to exploit endosomal-lysosomal conditions for payload release 19 , these significant biological variabilities complicate ADC optimization. Adding to the complexity of ADC development is the strikingly high redundancy in design, where the vast majority of ADCs rely on a narrow range of structural components interfacing with an even narrower range of biological elements. Among ~ 6,500 reported ADCs, the diversity of the structural components is approximately 18%, 8%, and 7% for the mAb, linker, and payload, respectively 5 . Furthermore, the diversity on the biological side such as target antigen and intracellular target is 5% and < 1%, respectively 5 . Diversity is further reduced for these elements for ADCs in the clinic 20 . Moreover, the empirical nature of ADC development is insufficient for systematically evaluating efficacy across all available tumor cell lines/xenograft models representing the diverse and numerous types of cancer. This often leads to the premature elimination of potentially viable candidates or the advancement of ADCs with hidden suboptimal efficacy. As a result, the ADC field is currently at a crossroads – simply switching the mAb to universally broaden ADC applicability across multiple cancer types has proven to be both overly simplified and an economically unsustainable strategy. Simultaneously, the empirical development of novel ADCs remains constrained by a lack of comprehensive and systematic evaluation methods. Consequently, the field continues to struggle with high attrition rates and unpredictable clinical outcomes, increasing research and development costs. Addressing these challenges necessitates the creation of innovative, data-driven approaches that integrate structural and biological insights to optimize ADC design and enhance the possibility of clinical success. This work introduces a machine learning (ML)-powered ADC design platform based on a multimodal framework interconnecting ADC structures and activity with tumor cell biology that accurately predicts ADC in vitro activities across tumor cell lines (Fig. 1). The platform is built on three synergistic components. The first is the ADCpedia database that contains over two decades of scientific literature on ADC chemical structure-activity relationships. ADCpedia also focuses on tumor cell biology by including comprehensive transcriptomic and proteomic data for ~ 1,400 human tumor cell lines 21 . The second component is the gene cell expression prediction (GENCEP) model, which is a late fusion neural network and by integrating multiple tumor cell features provides a universal and standardized system to quantify protein intensities. The third component is the ADC multimodal (AMM) model, which is based on a multimodal convolutional neural network (CNN) with a processing system tailored for the ADC-tumor cell interface. As a proof of concept, the training focused on target antigen protein expression across all tumor cell lines and at clinically relevant nanomolar (nM) ADC cytotoxicity IC 50 values. The ADC design platform demonstrated accurate predictions of ADC in vitro activities across diverse tumor cell lines expressing various target antigen intensities. Strikingly, accurate prediction was also achieved on blinded real-world ADCs tested against tumor cell lines where both were absent in the training set as they were not yet reported in the literature. The rich multifaceted framework also exposed the complexities between ADC design based on different payloads and linker types and their cytotoxic activities across target antigens and tumor types. This work potentially realizes the development of a robust platform that links ADC activity and complex tumor cell biology using a data-centric yet holistic multimodal approach. This platform also represents a potential transformative solution for addressing the challenges of ADC design as it is a first-of-a-kind system to integrate chemical and cellular insights architecturally organized for deep learning (DL) to offer researchers accurate predictions and enabling insights into tumor-specific ADC efficacy. Hence, this platform has the potential to become a new standard for precision ADC development by streamlining and increasing early decision confidence for novel candidate ADC design and repurposing efforts for past ADCs, which ultimately will decrease resource-intensive research efforts. Results Database building Although previous curations of ADC structure-activity databases were created by extracting as much information as possible, these inherently leave large gaps. Databases built on searching multiple patent organizations and extracting information from the R&D pipelines listed on the websites of pharmaceutical companies 5 inherently leads to significant inferring of structure and/or activity information for a given ADC or leaves gaps in the database, a concern that is detrimental to training 22 . Additionally, such extensive searches most likely identify ADCs developed very early in the field and are not realistic into today’s much more refined field. Therefore, the curation process for ADCpedia focused on preclinically and clinically validated ADCs listed in the ADC DrugMap 23 . Surprisingly, the ADC DrugMap listed structures that do not traditionally constitute an ADC such as biparatopic antibodies and ɑ-particle emitting payloads 23 . Therefore, the database was refined based on standardized inclusion criteria for ADC structure. Briefly, inclusion criteria consisted of only full monospecific mAbs, hetero-bifunctional chemical linkers, and traditional small molecule/anti-mitotic peptide payloads, akin to chemotherapeutic activities. A targeted search was then performed on these ADCs in the literature between January 2000 to April 15, 2024. A key second inclusive layer was that ADC structural information had to include accompanying cytotoxicity IC 50 values to ensure completeness in the database. Also to mitigate inconsistencies in methodologies to evaluate and determine quantitative IC 50 values, only studies reporting cytotoxicity assays conducted in multi-well plate formats, single time-point incubation periods ( e.g ., 72 h, 120 h, etc.), ADC concentration-response relationships, and readouts based on fluorescence- or luminescence-based reagents were included in the database. As a result of these standardization measures, a total of 151 articles and patents were collected and procured a total of 1,167 datapoints on ADC linker-payload compositions, mAb conjugation sites, DAR, the accompanying IC 50 values, target antigen and tumor types, and tumor cell line names. The evolution of ADC design Post-curation analysis of ADCpedia revealed significant trends in the evolution of linker-payload combinations over the 24-year search span. A representation of the most present payload-linker combinations is in Supplemental Figure 1. Between 2001-2009, ADC research focused heavily on DNA damaging payloads like calicheamicin paired with acetyl-butyrate (AcBut) linkers. These linkers exploit the elevated intracellular glutathione levels to reduce the incorporated disulfide bonds and enable payload release 24 . ADCs using this combination, such as gemtuzumab-ozogamicin (GO), were associated with IC 50 values below 10 nM and were predominantly tested on leukemic cell lines ( e.g ., HL-60). In contrast, anti-microtubule payloads were paired with more diverse linkers, exhibited higher IC 50 values, and were tested on solid tumor cell lines ( e.g. , A549 and U251). This likely reflects early challenges in optimizing anti-microtubule ADCs during this period compared to the more developed AcBut-calicheamicin linker-payloads at the time. This stage of ADC development aligns with GO being the first clinically approved ADC in 2000, albeit GO was withdrawn in 2010 due to fatal toxicities and then re-approved in 2017. Early exploration of PEG incorporation into linker structures also emerged during this interval. From 2010-2018, research shifted toward protease-cleavable linkers, particularly dipeptide-based systems. This period saw extensive use of PEG moieties likely addressing challenges in conjugating hydrophobic small molecules to mAb side chains with a potential aim at higher DARs. Not coincidentally, there was also significant exploration of linker conjugation strategies involving lysine residues, which likely followed the development and success of ado-trastuzumab emtansine (T-DM1) 25 . Anti-microtubule payloads paired with cysteine conjugation strategies also gained prominence during this period and coincides with the success of brentuximab vedotin (BV) 26 . However, the ADCs during this interval showed diverse IC 50 values and were tested across both solid and liquid tumors. This likely reflects a phase of experimentation to identify effective payloads and optimize their pairing with various linker technologies for different tumor types. The period from 2019 and onward is marked by a notable effort to increase payload diversity. Included are newer generation DNA damaging agents and microtubule inhibitors. The valine-citrulline dipeptide coupled to the self-immolative para-aminobenzyl spacer (vc-PAB) became the linker most present, most likely due to its ability to enable payload release without residual linker and/or antibody components and its improved controlled release relative to hydrazone-based linkers 27 . This period also unearthed a research focus on refining payload-linker pairings to maximize ADC efficacy against traditionally difficult to treat solid tumors such as glioblastoma ( e.g ., SNB75), metastatic lung cancer ( e.g ., NCI-H1930), and colorectal cancer (e.g., HT-29). However, the ADC IC 50 values were highly variable and reflects the ongoing challenges in optimizing not only ADC structural combinations, also the emerging importance of the tumor type interfaces. Regarding cytotoxicity, the field initially attempted to pair mAbs with traditional chemotherapeutics with micromolar IC 50 value range like doxorubicin however these ADCs had limited clinical anti-tumor activity 28 . The five records in ADCpedia for ADCs with doxorubicin have IC 50 values ranging from 800-1250 nM. This prompted a shift to develop ADCs with ultra-potent IC 50 values in picomolar range in the early 2000s. The notable shift, based on publication dates, occurred around 2010. ADC in vitro activities appeared to settle between 0.1 nM and 10 nM. However, upon closer inspection, ADC IC 50 values varied >300-fold (Suppl. Fig. 2). This wide distribution highlights areas of confusion within the field on what constitutes a ‘potent’ ADC design for a given target antigen and tumor type and is further complicated due to inconsistent conditions in cytotoxicity assays, and as optimization remains empirical, there is a need for cost-effective systematic strategies for the field to move forward 29, 30 . Descriptive embeddings connecting ADC structures and tumor cell biology To model ADC structural complexities in an ML framework, classical properties such as molecular weight, log P values, hydrogen bond donors and acceptors, rotatable bonds, molecular fingerprints, topological surface area for the linkers and payloads were integrated 31 . More ADC-specific elements, such as DAR and antibody conjugation site ( e.g ., lysines, cysteines, sugars) were embedded as distinct chemical features 32 . The payload-biology connection was represented based on their intracellular targets. Specifically, payloads were stratified into the three mechanisms driving cytotoxicity, namely, DNA damagers, microtubule disruptors, and topoisomerase I inhibitors. Each cytotoxic mechanism was described as either present or absent in binary 1 or 0, respectively. The tumor cell-specific features related to the complex multi-step cellular delivery process of ADCs were captured in multiple layers using two key approaches. First, the canonical amino acid sequences for target antigens were retrieved from UniProt and transformed into principal component analysis (PCA)-reduced Evolutionary Scale Modeling (ESM) embeddings 32 . The amino acid sequences for mAbs were not scrutinized and treated as a constant feature because most mAbs utilized as ADCs are of the human IgG 1 isotype and have molecular weights of approximately 150 kDa, and to the best of our search, limited publicly available information was found on mAb sequences and epitope binding specifics, most likely for proprietary reasons. Second, to address the aberrant activity of key ADC intracellular trafficking processes that have been shown to impair ADC efficacy 10, 12, 33 , the transcriptomic data from the Cell Model Passport repository 21 was utilized as a reference database to derive transcriptomic signatures for cell lines (Fig. 1A). At the time of this study, the repository contained comprehensive RNA-seq profiles ‘read counts’ for approximately 37,600 protein encoding genes from 1,479 human tumor cell lines that represented 42 different tumor types (Fig. 1B), including an additional 12 non-cancer cell lines representing diverse tissue types. The read counts for each cell line with ADC activity information were integrated as PCA-reduced features. Antigen expression prediction The density of target antigens on the surface of cancer cells is a critical determinant of ADC efficacy. Increased antigen density allows for increased ADC binding to target cells and enhances cytotoxic payload accumulation 9 . The antigen expression level as a marker for ‘go/no go’ decisions is evident in the ADC field, and although, empirical research is important, it is limited. For example, ADC designs are typically screened by either evaluating 1) different linkers and keeping the payload constant 34 or different payloads and keeping the linker constant 35 on a few cell lines containing high and negative target antigen expression based on previous common knowledge of the tumor system or 2) developing an in-house flow cytometric system to comparatively connect the protein cell surface density of a target antigen across numerous tumor cell lines but for a few tumor types and associate IC 50 values to a single ADC design 36 . These approaches, while important, fall short in systematically addressing the complexities of ADC design, particularly in early-stage development. Although ADCpedia contained comprehensive proteomic information, there were significant challenges. Protein intensity and mRNA transcription levels are often poorly correlated 21, 37, 38 as demonstrated by their weak relationship (Pearson = 0.26) in the Cell Model Passport 21 (Fig. 2A). Another critical challenge was that protein intensity across cell lines contained significant gaps, both in the antigens listed in ADCpedia and in the greater proteomic data in the Cell Model passport (Suppl. Fig. 3A). Approximately 91% of the proteomic information was missing compared to the mRNA transcription counterparts. These disparities are common and attributable to degradation, post-translational modifications and/or inadequate capture techniques 39 . To address these challenges, the GENCEP model (Fig. 1A) was created to predict the protein intensities from all cell lines in the Cell Model Passport. GENCEP used a fusion neural network model that was trained on ~4.8 million data points, integrating ESM embeddings of target genes, their transcriptomic profiles, and the reported protein intensities for each cell line. Hyperparameter optimization and training over 24 epochs with early stopping was performed (Suppl. Fig. 3B). As a result, GENCEP demonstrated strong predictive performance, achieving a correlation of R 2 = 0.86 on a set of ~400,000 random proteins from all cell lines in the Cell Model Passport (Suppl. Fig. 3C). As a result, GENCEP was able to predict the intensities for the target antigens in ADCpedia from cell lines challenged with ADCs with a correlation of R 2 = 0.75 with ground truth labels on the test set (Fig. 2B). This significantly outperformed a zero-rule regression model (R 2 =-8E -6 ). This showed that GENCEP was not only effective at capturing meaningful relationships in the tumor cell biological data to predict protein expression intensities but also provided universal standardization for protein levels. Since the amount of physical protein, especially in cancer, drives cellular functions, GENCEP is powerful at also capturing protein properties in the cell lines the Cell Model Passport. Taking advantage of this achievement, the antigen intensity predictions generated by GENCEP were embedded within ADCpedia replacing the ground truth intensity values and organized as a priority feature and first applied to evaluate antigen densities across the tumor cell lines and tumor types. HER2 served as a benchmark antigen due to the amplification of the HER2/neu oncogene and the resulting high cell surface antigen densities, which have been directly correlated to outcomes for patients treated with trastuzumab 40, 41 . Moreover, the relationship between ADC cytotoxicity in various levels of HER2 expressing tumor cells has previously been studied 42 . For example, the HCC1937 and MCF-7 breast cancer cell lines are often accepted as ‘low/negative’ HER2-expressing cells used in studying T-DM1 34, 43 and trastuzumab deruxtecan (T-Dxd) 42 . The HER2 real/predicted intensities were 2.96/2.77 for MCF-7 and 2.24/3.16 for HCC1937 cell lines. The ovarian cancer cell line SK-OV-3 has been used as a HER2 ‘high’-expressing cell line 42 . The HER2 real/predicted intensities for SK-OV-3 were 8.34/7.74. Therefore, standardized metrics for predicted intensities were created into high (≥7.5), mid (≥2.78-7.4), and low/negative (<2.78) expression levels across all tumor cell lines. Interestingly, HER2 intensity stratification revealed that fewer cell lines than anticipated expressed high levels with reported ADC activities. Tumor cell lines NCI-N87 (gastric carcinoma) and UACC-812 (breast ductal carcinoma) expressed very high HER2 levels of >10 (Fig. 3A). At the mid expression level most HER2-positive cell lines belonged to breast cancer types. When evaluating the mean HER2 intensities for different cancer systems, novel insights became apparent. The average HER2 intensity was the highest at 3.94 for breast cancer, which was lower than the 4.24 mean intensity for all ADC-challenged tumor cell lines in ADCpedia (Fig. 3A and B). This indicated that other tumor types existed with attractive HER2 expression profiles. For example, blood and gastric cancers were potential attractive targets as their HER2 predicted intensities were 3.56 and 3.3, respectively. Additionally, lung, renal, melanoma, central nervous system (CNS), and female reproductive system-based cancers had mean HER2 intensities of ~3.0. Additional tumor types except for esophageal, pancreas, colorectal, and prostate cancers were all potential tumor types for the development of effective ADCs as their mean HER2 intensities were ≥2.78 (Fig. 3B). Interestingly, these means were all lower than the global mean of the tumor cell lines that had been empirically tested with investigational ADCs. This indicates that ADC development strategies could be biased against high-expressing HER2-positive tumor cell lines and not reflective of the overall expression within specific tumor types. Therefore, considering that there are scant records of ADC activities in many of these tumor types, there is considerable space for future anti-HER2 ADC development. Beyond HER2, GENCEP determined that the target antigens for approved ADCs all have intensities in the mid-to-low range and represents a significant under explored space for ADC design. For example, Trop2 is a 46-kDa monomeric glycoprotein whose expression has been reported in ~30 different tumor types, albeit using different and semi-quantitative methods 44 , and is the target of the approved ADC sacituzumab govitecan (SG). SG was approved in 2021 for adult patients with Trop2-positive advanced and/or metastatic urothelial and triple-negative breast cancers based on the results from the TROPHY-U-01 45 and ASCENT 46 studies, respectively. Unfortunately, SG was withdrawn for urothelial cancer as it did not meet the primary endpoints in the post-approval Phase 3, TROPiCS-04 study. Interestingly, there was no requirement nor measure of Trop2 for the TROPHY-U-01 study 45 . However, archival tumor samples from patients enrolled in the TROPHY-U-01 study revealed that patient responses did not depend on Trop-2 expression levels, based on immunohistochemistry 47 . Specifically, there was no statistical significance between groups based on high, mid, and low Trop2 expression and objective response rates, progression-free survival, and overall survival. Strikingly, patients with tumors with high Trop2 expression had poorer survival outcomes. A recent report suggests that the instability of the CL2A linker in SG contributes to premature release of SN38, leading to excessive systemic exposure and severe adverse events 48 . Additionally, when expressed at high levels in ovarian cancer, Trop2 inhibits apoptosis by increasing the expression of Bcl-2 and decreasing the expression of Bax 11 . Therefore, Trop2 warrants alternative ADC designs that may be more effective and evaluated at distinct Trop2 expression levels for specific tumor types. In ADCpedia, many tumor cell lines overexpressing Trop2 and that had been challenged by an ADC reached the high expression level threshold (Fig. 3C). Several more tumor cell lines fell within the mid-level expression range. Notably, the mean predicted Trop2 intensity for ADC-challenged tumor cell lines was 5.50, higher than HER2. Extending to cell lines with no recorded ADC challenges, all organ systems and tumor types except for esophageal and colorectal cancers had mean Trop2 intensities well above the 2.77 cut-off (Fig. 3D). Also important are antigens overexpressed at the relative low-level range, such as Nectin-4. The mean expression intensity for ADC-challenged tumor cell lines was 2.58 (Fig. 3E). Enfortumab vedotin is approved for the treatment of patients with locally advanced or metastatic Nectin-4-positive urothelial cancer based on the results of a Phase III study 49 . Interestingly, this data indicates that Nectin-4-positive tumor cells are sensitive at these lower relative overexpression levels. Outside of ADCpedia, the mean Nectin-4 intensity for the 28 bladder tumor cell lines was 2.07 (Fig. 3F). Based on these findings, head and neck, breast, and biliary cancers appear as attractive tumor types for anti-Nectin-4 ADC development. There were several other interesting patterns with additional antigens including CD30, CD19, and folate receptor-ɑ (FR-ɑ). There were only a few tumor cell lines expressing high levels of CD30 that were reported challenged by anti-CD30 ADCs (Suppl. Fig. 4A). The mean Nectin-4 intensity for these tested cell lines was 6.73. However, the mean CD30 intensities decreased substantially across all tumor types in the Cell Model Passport that had not been tested on by ADCs (Suppl. Fig. 4B). Hematologic malignancies had the highest mean intensity of 1.51. Based on this threshold, central nervous system-based tumors, thyroid, mesothelium, liver, and kidney cancers are all potential attractive targets for ADC development. Lymphoma cell lines expressed mid-levels of CD19 and were most of the cases challenged by ADCs (Suppl. Fig. 4C). Hematologic malignancies also had the highest relative mean intensity of 2.47 (Suppl. Fig. 4D). No other tumor type surpassed an intensity mean of 2.0 in the Cell Model Passport and could indicate that CD19 targetability is restricted to blood cancer types. There were only a few records of FRɑ-expressing cell lines challenged by ADCs and the mean intensity was notably low (1.42) (Suppl. Fig. 4E). Evaluating the entire Cell Model Passport revealed even lower mean FR-ɑ intensities (Suppl. Fig. 4F). FR-ɑ is the target for mirvetuximab soravtansine and is indicated for ovarian, fallopian tube, or peritoneal cancers. In the ovarian/endometrial/cervical cancer group, the mean FR-ɑ intensity was 1.25. Based on this expression threshold, CNS-based cancers, melanoma, sarcomas, head and neck, thyroid, kidney, breast, mesothelium, and gastric cancers all appear as potential targets for ADC development. These findings demonstrate the value of GENCEPs ability to standardize and quantify antigen density. This standardized metric revealed that HER2 and Trop2 have substantially increased expression in multiple tumor types compared to the target antigens for the other approved ADCs. Importantly, it also revealed that extremely high antigen expression, represented by HER2, is not an absolute requirement to develop effective ADCs. The antigens examined reveal a large grey zone where multiple tumor types are potentially attractive for ADC development. Some antigens such as FR-ɑ even indicate that even very low expressed targets can be developed as targets for future ADCs. Taken together, GENCEP provides a system that could significantly improve decisions on what tumor types should and should not be pursued for ADC development. A multimodal ADC framework for ML training By integrating predicted antigen intensities into ADCpedia, this specific insertion enabled the contextualization of several intricate associations between antigen expression, ADC structural elements, and ADC activities across tumor types. In doing so, influential factors affecting ADC complexities were illuminated and emphasizes the deep multifaceted content linking ADCs and tumor cell biology. The framework also highlighted the need for using ML-based assistance for ADC design. i) The contextual relationship between predicted target antigen intensity and ADC cytotoxicity While high antigen expression is often considered a key determinant of ADC efficacy, the deeper and systematic analysis afforded by this platform exposed critical nuances. Across all target antigens in ADCpedia, ADCs with high cytotoxic potency (IC 50 100 nM) potencies (Fig. 4A). However, this trend was not statistically significant. Additionally, a point-biserial analysis revealed a poor association ( r = 0.145) between antigen predicted intensities and ADC IC 50 values split into active (1 <10 nM) and non-active (0 ≥10 nM) potencies (Fig. 4B). As anticipated, mRNA read count values were broadly spread across the range of predicted antigen intensities for all three ADC potency stratifications (Suppl. Fig. 5), reinforcing that raw mRNA information alone is an insufficient indicator, even when combined with protein intensity for reliable antigen selection. For HER2-targeted cases, while ADCs tested against low HER2-expressing cell lines consistently exhibited IC 50 values >100 nM, the insightful revelation was that no clear distinction was observed between mid and high expression levels and their associated potencies (Fig. 5A). This demonstrates that even for a well-studied antigen like HER2, relationships between expression at a targetable intensity and efficacy remain ambiguous. There was also a notable overlap with IC 50 values for ADCs tested on cells with low- and mid-level Trop2 expression (Fig. 5B). This further demonstrates the ambiguity between expression and efficacy yet also presents potential broader therapeutic opportunities for Trop-2-targeting ADCs. Based on the clinical experience with SG targeting Trop2 and that most ADCs targeting Trop2 in ADCpedia are SG (Suppl. Fig. 1), together, strongly indicate that alternative ADC designs, such as linker type, are warranted against this antigen. CD79b is a component of the B-cell receptor complex and critical to the proper endocytosis of bound foreign antigens as part of the immune presentation pathway 12 . CD79b is well documented to be exclusively expressed in immature and mature B cells but overexpressed in ≥80% of B cell-based neoplasms 50, 51 . Polatuzumab vedotin (Pola-V) is a clinically approved ADC specifically indicated for the treatment of patients with diffuse large B-cell lymphoma (DLBCL) based on results from the POLARIX trial 52 . Preclinical testing of various Pola-V prototypes incorporating different combinations of linkers, payloads, and conjugation site on multiple CD79b-positive lymphoma cell lines revealed enhanced activity favored cleavable linker-incorporated ADCs 53 . However, the CD79b expression levels on these cell lines were initially unknown. It was notable that a main reason for selecting the linker-payload design for Pola-V was that ADCs incorporating a non-cleavable linker exhibited poor internalization, and hence, ineffective payload release and anti-tumor activity 54 . It was subsequently reported that below a geometric mean fluorescence intensity threshold, CD79b-positive lymphoma cell lines were insensitive to anti-CD79b ADCs at a concentration of ~70 nM 36 . Evaluating primary samples from patients with chronic lymphocytic leukemia, marginal zone lymphoma, hairy cell leukemia, follicular lymphoma, mantle cell lymphoma, and DLBCL showed that CD79b expression varied, was marginally higher than on normal B-cells, and there was a trend with relative higher expression correlated with more potent ADC IC 50 values 36 . However, these expression levels were relative with absolute numbers missing. In this work, GENCEP predictions revealed that the mean CD79b intensity for hematologic malignancies was 3.28. Interestingly, all the tumor types apart from esophageal and colorectal cancers had considerably higher CD79b predicted intensities. However, there was no IC 50 value differences when stratifying cell lines into high and mid CD79b intensities (Fig. 5C). Taken together, CD79b is an attractive target for further ADC development for multiple types of cancer and, yet the relationship between expression and cytotoxic potency remains ambiguous. ii) The contextual relationship between intracellular targets and ADC cytotoxicity Although DNA and microtubules are common intracellular targets for both ADCs and traditional chemotherapeutics, the notable distinction is the ultra-cytotoxicity of the payload’s ADCs transport. The rationale is that these highly toxic agents can overcome natural insensitivities certain tumor types may have for traditional chemotherapeutic mechanisms of actions, which the latter evolved and optimized over several decades for the treatment of specific tumor types 55 . Analyzing the tumor cell lines in ADCpedia revealed significant variability in IC 50 values across tumor types and highlights the challenge of matching specific payload structures to tumor biology. For ADCs delivering microtubule inhibitors, there were several tumor types that were sensitive and IC 50 values were <10 nM. DM1 is a derivative of maytansine, whose in vitro activity is 100- and up to 270-times more potent than traditional chemotherapeutic microtubule inhibitors vinca alkaloids and paclitaxel, respectively 56-58 . Monomethyl auristatin E (MMAE) and monomethyl auristatin F are peptide analogs of dolastatin 10, which showed ultra-cytotoxic activities against human cancer cell lines 59, 60 . As anticipated, breast and blood cancer cell lines were sensitive to microtubule inhibitor-incorporated ADCs with the majority of IC 50 values ≤10 nM (Fig. 6A). This aligns with the indications for T-DM1, and Pola-V, as previously described. There was considerable variation for many tumor types, possibly reflecting investigational ADCs that did not achieve the potency of more successful or approved counterparts. Notably, prostate cancer was the most sensitive tumor type, while sarcomas, mesolthelial, kidney, and colorectal were notably less sensitive with ADC mean IC 50 values approaching 100 nM. Interestingly, lung cancers were distinctly divided in sensitivity between small cell and non-small cell types and explains the large variation in ADC cytotoxic potency. The mean IC 50 values for ADCs targeting non-small cell lung cancer (NSCLC) were highly potent bordering just below 1 nM. In contrast, ADCs targeting small cell lung cancer cell lines displayed IC 50 values approaching 100 nM. Other cancers such as ovarian and oral cavity cancer cell lines had lower mean sensitivities compared to breast cancer and non-Hodgkin’s lymphoma cell lines, albeit with significant variability, and suggest these tumor types can be targets for future ADC development. In contrast, colorectal tumor cell lines exhibited the least mean sensitivity, which aligns with colorectal cancer known as inherently resistant against anti-microtubule chemotherapy 61 . However, the variability is wide and potentially indicates a microtubule inhibitor-incorporated ADC with a different design format ( e.g., linker, DAR) may prove otherwise. For ADCs delivering DNA-damaging payloads, there were 13 tumor types that had been tested (Fig. 6B). It is currently thought the reason for why ADCs transporting calicheamicin payloads have had limited clinical effectiveness against solid tumors is due to poor tumor penetration and/or accessing of the DNA at concentrations below dose-limiting toxicity 62, 63 . GO is indicated for the treatment of adult patients with newly diagnosed or relapsed/refractory CD33-positive acute myeloid leukemia (AML), and in pediatric patients 64 . In general, DNA-damaging-incorporated ADCs had a mean IC 50 value approaching 1 nM for hematologic malignancy cell lines and support the effectiveness in constructing ADCs against AML. However, the blood cancer subtype, acute lymphoblastic leukemia, displayed much less sensitivity against these ADCs. Melanoma was highly resistant against DNA-damaging-incorporated ADCs, suggesting that DNA-damaging-based ADCs may not be ideal for this tumor type and that the cytotoxic potency itself and not the systemic dosing issue, should be considered. Nevertheless, such profound contrasts reinforce that ADCs delivering DNA damaging agents are under explored. Topoisomerase I inhibitors represent a unique class of agents in the ADC landscape, with potencies that test the definition of ‘ultra-toxic’ 65 . Topoisomerase I was originally validated as a cancer target when tumor cells died when treated with camptothecin, which was limited by its poor solubility and unacceptable toxicity 66 . Both payloads SN-38 and Dxd for the currently approved SG and T-Dxd, respectively, are camptothecin derivatives. SN-38 is the active component of irinotecan and, importantly, is listed in the mid-to-high nM range and distant from the sub-nM payloads targeting DNA and tubulin 65 . Comparatively, Dxd, which is a derivative from exatecan, has been reported to be 10-fold more potent that SN-38 67, 68 . Unsurprisingly, topoisomerase I inhibitor derivatives have been greatly pursued where at least 72 novel ADCs transporting more than 15 different camptothecin-based payloads, in combination with more than 21 different linkers across 24 different targets have been developed 35 . Recently, several novel topoisomerase I inhibitors were reported all based on the camptothecin backbone and interestingly, exhibited sub-nM to low nM IC 50 values 35 and the available structures were integrated in ADCpedia. An analysis of the ADCs delivering topoisomerase I inhibitors in ADCpedia revealed more uniform potencies with mean values all hovering around 10 nM (Fig. 6C). The number of tumor types of eight was notably less than the tumor systems evaluated with ADCs incorporated with microtubule inhibitors and DNA damagers. ADCs delivering topoisomerase I inhibitors were the most potent against thyroid cancer followed by hematologic malignancies and ovarian/endometrial/cervical-grouped cancers. There was wide IC 50 variability for lung cancers since, NSCLC tumor cell lines were sensitive while small-cell lung cancer cell lines were less sensitive. Interestingly, breast cancer was the most resistant tumor type with IC 50 values approaching 100 nM. The structural diversity of the topoisomerase I inhibitor-incorporated ADCs was significantly less compared to the ADCs incorporated with the two other payload types, as 129/138 ADCs were SG or T-Dxd or very similar (Suppl. Fig. 1). Additionally, these ADCs were almost all exclusively constructed with DARs of 8, eliminating DAR as a confounding variable and narrowing the focus to tumor biology matching. This further underscores the importance of identifying the optimal tumor type-payload combination when designing an ADC in early development. iii) The contextual relationship between linker types and ADC cytotoxicity Linkers play a pivotal role in ADC design, substantially influencing both the stability and release of the cytotoxic payload. Over the past two decades, significant innovation in linker chemistry has been achieved – yielding a rich array of constructs and captured within ADCpedia (Suppl. Fig. 1). There were 18 tumor systems/types tested with cleavable linker-incorporated ADCs, while only seven tumor systems/types tested with non-cleavable-incorporated ADCs. The ADCs incorporating cleavable linkers included those susceptible to cysteine-specific cathepsins ( i.e ., vc linker), cysteine- and serine-specific cathepsins ( e.g ., GGFG linker), glutathione- ( i.e ., AcBut) and pH-based ( e.g ., CL2A) payload release mechanisms. As a group, these types of ADCs varied greatly in cytotoxic potency across all tumor types (Fig. 7A). The mean IC 50 values indicated that prostate cancer, B-cell lymphomas, breast cancer, NSCLC, AML, and small cell lung cancer cell lines highly sensitive, albeit with wide variation. In contrast, ovarian and colorectal cancers exhibited notably less sensitivity, also with wide variability. Although mesothelial cancer displayed less sensitivity, there were only two ADC records. The large IC 50 value differences may be immediately due to the diversification of the cleavable linkers contained in ADCpedia. For instance, novel amino acid combinations and sequence lengths linkers such as the GGFG linker used in T-Dxd 68 , which is sensitive to both cysteine and serine proteases, it is not known how efficient cleavage and payload release is compared to other cleavable linkers. On the other hand, non-cleavable linkers require efficient lysosome ADC delivery and digestion to liberate the payload. For non-cleavable linker-incorporated ADCs, all seven tumor types displayed sensitivities of ≤10 nM (Fig. 7B). In ADCpedia, the overall comparison of non-cleavable and cleavable linkers reveals a striking variability in cytotoxic potency that is driven by target expression heterogeneity and evolving linker chemistries. Non-cleavable linker-incorporated ADCs, while there appeared a narrower mean of IC 50 potencies compared to cleavable linker-incorporated ADCs, also exhibited wide variabilities, indicating the inconsistencies are tied to target antigen expression heterogeneity in tumor tissue since these payloads are trapped inside cells when released. In contrast, cleavable linker-incorporated ADCs showed much more deviations across and within tumor types, this may be more reflective of the different release mechanisms for these linkers. iv) The contextual relationship between DAR and ADC cytotoxicity DAR is greatly influential to ADC design as it effects overall potency, stability, and pharmacokinetics. The DARs in ADCpedia ranged from 0.8-9.0, with a mean DAR of 3.73. The most frequent DARs were 2.0, 3.4, 4.0, and 8.0 (Fig. 8). Importantly, both ultra-toxic and moderate-toxic payloads have been developed across this DAR spectrum over the past two decades. Interestingly, when DAR values were analyzed across tumor types and target antigens, no consistent patterns emerged. For cytotoxicity patterns, ADCs with DARs of 2 and 4 were more frequent with both high potency (IC 50 100 nM). This suggest that the DAR at the 2-4 range did not uniformly correlate with enhanced cytotoxic potency. Notably, higher DARs were not associated with increased cytotoxic potency. This lack of association can be partially attributed to the inherent differences in free payload potency. For example, highly potent payloads such as PBD dimers achieve strong cytotoxic effects even at very low DARs of 1 69, 70 . In contrast, less potent payloads like SN-38 are almost exclusively assembled with DARs of 8 to compensate for reduced potency 65 . Because ADCs with DARs of 8, particularly with ADCs delivering topoisomerase I inhibitors, are more recent advancements and are underreported in the current ADC literature, they are not as represented to potentially reveal emerging trends in this specific type of ADC design. A multimodal model for ADC activity prediction across tumor cell lines and tumor types AMM is multimodal convolutional neural network model designed to predict the probability of ADC in vitro activity by analyzing ADC structures and their IC50 values and connecting this with the tumor cell biological information integrated in ADCpedia (Fig. 9A). Importantly, AMM also learned from the predicted antigen intensity embeddings from GENCEP and the proteomic and transcriptomic cell line embeddings (Fig. 9B). To this end, the maximum number of neurons were dedicated to the antigen intensity prediction embeddings as part of AMMs training and overall goal to generate predictions on the likelihood of cytotoxic potencies at defined nM values for given tumor cell lines expressing their individual levels of the target antigen. As previously mentioned, there are several other tumor cell biological parameters, beyond antigen expression, important for ADC efficacy. Therefore, also embedded were GENCEP-predicted intensities for proteins involved in key areas in the ADC mechanism of action and tumor cell resistance such as the endosomal-lysosomal system (RAB7, CTSB, LAMP1), drug efflux (ABCG1, ABCG2), DNA repair, (BRCA1, BRCA2, TP53, RAD51), and survival signaling (PIK3CA, AKT1, mTOR). As the main goal of this study was to evaluate the performance of the ADC design platform based on antigen expression, these other tumor cell protein embeddings were assigned only a single neuron. The AMM model was then trained as an ADC activity classifier using 10 nM, 5, and down to 1 nM thresholds to define active versus inactive ADCs because the median IC 50 value in ADCpedia was ~2.5 nM with 25% and 75% of values having median IC 50 values of 0.2 nM and 36.1 nM, respectively. By training at these three distinct IC 50 values, it was possible to evaluate predictions within a range reflecting current ADC potencies. The database was split so that ADC structural features were diverse in the training, validation, and test sets. A subsequent extensive hyperparameter optimization was performed, using a loss function to reduce penalties for incorrect predictions ± 1 nM at the specific thresholds. The AMM model was evaluated on internal and external test sets, which consisted of structurally diverse ADCs. For the internal test set, the real-world cytotoxicity data was known. To assess generalizability, the IC 50 values were blinded from the authors in this study and unavailable until after the predictions were made. These evaluations tested if the AMM model could effectively predict ADC potency across multiple tumor cell lines from different tumor types, based on the model’s training emphasizing antigen expression. The AMM model successfully distinguished active from inactive ADCs with high confidence across the 1-10 nM range of relevant ADC cytotoxic potencies with the 10 nM threshold performing the best. The internal validation set results for the area under the receiver operating characteristic curve (AUC) scores were 0.85 (10 nM), 0.71 (5 nM), and 0.78 (1 nM), while the AUC scores for the internal test sets were 0.84 (10 nM), 0.87 (5 nM), and 0.87 (1 nM) (Fig. 10A). Evaluations were also performed at a 50% probability classification threshold (active≥50% and inactive<50%) to provide further insights into the AMM model’s performance. On the internal test set, the AMM model at the 10 nM threshold had a good and balanced performance with high accuracy (0.81), precision (0.70), recall (0.67), and F1-score (0.68) (Suppl. Fig. 6). At 5 nM, the AMM model also performed well on the internal test set having comparable accuracy (0.72) and F1-score (0.70) to the model’s performance at 10 nM. Notably, its precision (0.95) was higher. However, it had poor recall (0.56), indicating it missed many true active ADCs. At 1 nM, the AMM model also performed well. It had good accuracy (0.79), precision (0.85), recall (0.65), and F1-score (0.74) comparable to the model’s performance at 10 nM. These results indicated the model successfully distinguished active and inactive ADCs with high confidence, especially at 1 and 10 nM, across a broad range of relevant ADC cytotoxic potencies in the internal test sets. The blinded, external, and unpublished ADC dataset provided an opportunity to determine whether the AMM model could accurately classify ADC potency in a real-world scenario, which the model performed notably well. This data set was comprised of four distinct ADCs (T-DM1, T-Dxd, disitamab-vc-MMAE [D-MMAE], and trastuzumab-pAcF-Amberstatin 269 [T-A269]) targeting HER2 across nine tumor cell lines representing breast, esophageal, and gastric cancers (Fig. 10B). Additionally, the tumor cell lines had HER2 levels with varying predicted expression intensities and there was a total of four different linker and payload types, and three different DARs. The AUC scores were 0.95 (10 nM), 0.87 (5 nM), and 0.82 (1 nM) (Fig. 10A). The confusion matrix evaluations further supported the AUC scores, with accuracy values of 0.90 (10 nM), 0.85 (5 nM), and 0.70 (1 nM) (Suppl. Fig. 6). Notably, at 10 nM the AMM model outperformed internal test set ADC predictions with higher precision (0.88 vs 0.70), recall (0.78 vs. 0.67), and F1-score (0.82 vs. 0.68). At 5 nM, AMM also outperformed predictions compared to the internal test set for accuracy, recall, and F1 score. This highlights its potential robustness in real-world testing. This was particularly evident in cases such as D-MMAE and T-A269 ADCs tested against the gastric SNU-216, breast Hs-578T, and esophageal OE19 cancer tumor cell lines. These ADC-cell line pairings were absent from the original training set as they have not been reported in the literature. Furthermore, there were observable instances where AMM predictions aligned with HER2 expression intensities. It was observed that the AMM model’s predictive probabilities for ADC activities shifted in response to IC 50 values near specific thresholds. For example, at the 10 nM threshold, AMM predicted high probabilities (0.89 ± 0.06) of T-DM1 activity in high (≥7.5) HER2-expressing cell lines (representing breast and esophageal cancers), where actual IC 50 values confirmed strong potencies (0.50 nM ± 0.58 nM) (Suppl. Table 1). For the mid-level (5.31) HER2-expressing breast cancer JIMT-1 cells, AMM shifted its probability (0.32) as the T-DM1 IC 50 values were 7.62 and 6.89 nM. For the slightly lower (4.92) HER2-expressing gastric cancers SNU-216 cells, where T-DM1 failed to reach an IC 50 value, AMM predicted an even lower probability of 0.03. In the lowest (3.09) HER2-expressing Hs 578T cell line, where T-DM1 was also ineffective, AMM predicted a minimal activity probability of 0.06. Interestingly, at the 5 nM threshold, AMM’s probability for JIMT-1 increased to 0.47, indicating that this threshold captured better T-DM1’s activity near the 5 nM IC 50 value (Suppl. Table 2). Similarly, at the 1 nM threshold, AMM assigned a probability of 0.17 to JIMT-1 and 0.01 to Hs 578T (Suppl. Table 3). Although there is room to improve sensitivities, these findings suggest that the AMM model is able to refine predictions at changing HER2 expression levels by adjusting the IC 50 nM thresholds and, therefore, can improve ADC activity classifications. This reflects AMM’s capacity to integrate the intricate ADC-tumor cell interconnections into accurate ADC activity predictions. These AMM predictions also trended with the ADCs T-Dxd. Notably, at the 10 nM threshold, T-Dxd was ineffective against JIMT-1 cells and AMM assigned a probability of 0.28 (Suppl. Table 1). However, for OE19 cells, AMM assigned a probability of 0.80 when T-Dxd was also cytotoxically ineffective. Shifting to the 1 nM threshold, AMM was able to adjust and assigned probabilities of 0.09 and 0.65 for the JIMT-1 and OE19 cell lines, respectively (Suppl. Table 3), again indicating the model improves prediction when shifting to a nM threshold. Further analyses on specific cell lines, which were selected as they had real IC 50 value data above and below the given nM thresholds, revealed significant additional insights. For example, in the internal test set, AMM achieved perfect accuracies (1.0) for the triple-negative breast cancer MDA-MB-468 and non-Hodgkin’s lymphoma NU-DUL-1 cell lines (Suppl. Fig. 7). However, the AMM model was inaccurate for the colorectal carcinoma HT-29 cell line at 10 nM. Interestingly, AMM improved accuracy to 1.00 and 0.50 at the 5 and 1 nM thresholds, respectively. For cell lines from the blind external test set, AMM had an accuracy value of 0.75 for the 10 nM threshold and values of 0.67 for the 5 and 1 nM thresholds for the high HER2-expressing cell lines. Interestingly, the poor accuracy of 0.50 at the 10 nM threshold for JIMT-1 cells notably improved to 0.75 when the AMM model predicted at the 5 threshold. This again indicates the model’s ability to shift and adjust when the nM threshold is closer to the real IC 50 values. Based on the above, the AMM model’s predictive performance showed strong potential for generalizability. These cases illustrate how the AMM model can shift according to data points lying close to the classification cutoff to improve ADC in vitro activity probabilities. These results indicate an impressive predictive performance for ADC activity across different structural designs and tumor cell lines. Moreover, the AMM model showed a capacity to generalize for unseen data. The platform as an online tool for predicting in vitro ADC efficacies A public online platform (www.adcpedia.com) was developed that allows users to access the AMM model for predicting ADC efficacies on selected tumor cell lines SK-BR-3, SKOV3, MCF7, OE19, and COLO 205. The platform is divided into two primary interfaces. The Predict interface offers users to select a target antigen, linker, payload, DAR, and one or all the cancer tumor cell lines previously indicated (Suppl. Fig. 8A). Users begin by creating an ADC ‘nickname’ to open a new file. For linker and payload chemical structures, the two must be conjugated together. Predict interface offers the option to manually upload the linker-payload chemical structure details encoded in SMILES, SDF, SMI, or CSV formats. Users can also draw the linker-payload using the JSME molecular editor (Suppl. Fig. 8B) if ML-readable formats are not available. Once all parameters are set, users can click on ‘ Predict in-vitro activity ’ to generate ADC predictions. The Retrieve interface delivers ADC activity predictions currently set at 10 nM. Discussion ADCs are a powerful yet complex class of therapeutics and face formidable challenges to deliver their payloads accurately and efficiently in target tumor cells. In the current ADC landscape, it is well known that factors such as the nature of the ADC linker, payload, DAR, target antigen expression, internalization, lysosomal targeting, and intrinsic tumor cell sensitivity directly affect antitumor efficacy and off-target toxicity. Moreover, the field understands that unique and optimized combinations of these factors significantly influence ADC efficacy for certain tumor types but not others. Yet, in the case of strategically and systematically assembling ADC structural factors and connecting them with tumor biology factors, little has been documented with AI-powered approaches. Although rigorous experimental studies evaluating ADC design with specific biological characteristics such as antigen expression have been performed, they are limited and vary in approach and do not universally address this longstanding challenge. As previously described, typical ADC design studies base ‘go/no-go’ advancement decisions on empirical studies varying a specific element of an ADC design, while maintaining the other structural elements constant and evaluating cytotoxicity on only limited cell lines 34 , 35 . Or, utilizing an in-house antigen expression evaluation system on multiple cell lines in parallel to testing a single ADC design 36 . These approaches, while important, fall short in systematically addressing the complexities of ADC design, particularly in early-stage development. This work introduced an ADC design platform based on a multimodal learning approach (Figs. 1 and 9) that provides a route for enabling confident ADC designs for target antigens and tumor types at the early yet critical initial stage in ADC development. The platform relied on three foundational components. The first was the ADCpedia database that was constructed and organized to interconnect ADC chemical structure-activity relationships with tumor cell biology. The second was the creation of the GENCEP model for predicting protein intensities and provided clarity on long known, but never clearly resolved, complex relationships between ADC structures, activity across target antigens and tumor types. Third was the key integration of the GENCEP model into the AMM model that enabled accurate probability predictions of ADC in vitro activities at relevant IC 50 values. AMM accurately predicted the activity of a set external blinded real-world ADCs targeting HER2 at clinically relevant IC 50 values and showed instances of adjusting to various varying HER2 expression levels. As a result, this platform demonstrated the ability to potentially provide significant advantages for the generalizable selection of effective ADC designs for multiple specific target antigens and tumor types. Beyond these advancements, the GENCEP model addressed a critical gap in proteomics by imputing missing protein intensity data, a common challenge in mass spectrometry-based proteomic mining. When integrated into the AMM model, GENCEP enhanced the model’s ability to decipher the complex interplay between ADC design and antigen expression. Specifically, this work uncovered key insights, such as the variability in antigen expression across the entire Cell Model Passport, thereby identifying potential new targets/tumor systems for ADC development. Furthermore, this work revealed the importance of optimizing ADC designs as even mid- and low-range overexpressed antigens are targetable. Importantly, by integrating a standardized antigen intensity metric system, this study demonstrated that antigens in these newly discovered quantified overexpression ‘grey zones’ must be carefully paired with ADCs optimized for linker, payload, and DAR selection. This work also highlighted the variability in ADC cytotoxicity for different payload and linker types, and DARs. Although, IC 50 values varied widely for different ADCs targeting the same tumor types, tracking the mean cytotoxicity values revealed important trends. For example, ADCs transporting anti-microtubule or DNA damaging payloads could be very attractive to develop against NSCLC as well as other tumor types where there is very limited ADC activity information. In comparison, most ADCs transporting topoisomerase I inhibitors were very potent across all evaluated tumor types except for breast cancer. However, nearly all these ADCs were T-Dxd or SG. It was recently cited that the CL2A linker of SG is highly unstable and causes premature release of SN38 leading to patients receiving significantly high doses and suffering from serious adverse events 48 . For T-Dxd, a meta-analysis revealed it had a considerably higher systemic free payload concentration compared to other clinical-stage ADCs 71 . Taken together, there is considerable support to warrant the testing of different linker designs that conjugate topoisomerase I inhibitors to mAbs, which can then benefit from this platform to screen different designs. ML-based approaches are rapidly transforming drug discovery, extending beyond small molecules to encompass complex therapeutic modalities such as proteolysis-targeting chimeras, RNA therapies, peptides, and unconjugated mAbs 72 . Among these, ADCs present a particularly formidable challenge due to limited publicly available datasets for model training and their intricate multicomponent structures. Moreover, the therapeutic efficacy of ADCs hinges on a sophisticated multifaceted nature of their mechanism of action involving a series of coordinated steps commencing with antigen recognition and binding and followed by subsequent internalization, lysosomal degradation payload release, and intracellular target engagement by the payload. Inspired by recent advances in multimodal DL strategies in drug discovery 73 – 75 , we hypothesized that the development of a multimodal modeling framework that specifically captures the complex biological and chemical interactions governing ADC function can significantly streamline yet comprehensively improve ADC development. To this end, we developed an integrated platform capable of jointly learning from diverse chemical and biological data streamed within a unified model architecture. This platform demonstrated high predictive accuracy for generating probabilities for in vitro activity of real-world, blinded ADCs. Notably, the AMM model exhibited great potential for robust generalizability, accurately predicting bioactivities for ADCs and cell lines absent from the training set. This suggests the model's strong potential for guiding the rational design of novel ADC therapeutics, and to the best of our knowledge represent the first significant application of artificial intelligence in ADC design. After binding the target antigen on the cell surface, ADCs must be rapidly internalized through either clathrin-mediated, caveolin-mediated, or other pathways. Only after internalization, ADCs must traverse the endosomal system and delivered to lysosomes where they must be efficiently degraded so the payload is released and is able to bind to its intracellular target and induce tumor cell death. Unfortunately, the internalization mechanisms and kinetics for target antigens are not well understood 12 . The aberrant expressions and activities of Rab proteins and their associated interacting partners is now recently becoming apparent as they also greatly influence lysosomal targeting and ADC degradation efficiency and linked to resistance 33 . Apart from HER2 and Trop2, the variability observed for most targeted antigens suggests that mid and even low expression levels can support potent ADC activity under the right conditions. Although the ADC community already knows that antigen alone is not a marker of ADC success and that understanding the tumor-specific biological conditions is important, the interconnectedness is not understood yet critical for ADC design. By embedding GENCEP-predicted antigen intensities into ADCpedia, a framework was provided for systematically identifying optimal antigens and tumor types for early ADC development. The authors understand the importance of also emphasizing these additional intracellular elements and, thus, the ADC design platform is already expanding with the end goal to cover the entire antigen-to-lysosome delivery process. Additionally, post release elements such as protein efflux and anti-apoptosis protein expressions will be further scrutinized. In this current work, a few relevant proteins were incorporated and integrated into the overall training of the AMM model but not assigned multiple neurons. Future work will systematically explore these proteins and the relationship with different linker and payload types for ADCs across tumor types. In conclusion, this multimodal ADC design platform interconnecting ADC structural design with tumor cell biology offers a novel, systematic, and universal approach to advance ADC development. Methods Literature curation and ADCpedia construction The curation process was performed by entering the names of ADCs taken from the ADC DrugMap searching for them on Pubmed, Google Scholar, and in patent office websites for a period between January 2000 to April 15, 2024. In addition, ADC structure-activities were included from relevant and very publications in 2024 near the drafting and submission of this work. To standardize IC 50 values, values in mass units (e.g., µg/mL) were converted to nM using a 150 kDa for the molecular weight of the mAb and adding the molecular weights of the given linker and payload and multiplied by the DAR. Tumor cell line transcriptomic and proteomic data integration The entire raw read counts of mRNA expression profiles for the human tumor cell lines and totaling 37,603 protein-encoding genes were integrated into ADCpedia from the Cell Model Passport 21 . The cell lines with published ADC activities were then subjected to PCA formatting to reduce dimensionality while retaining 95% of the variance, resulting in 910-dimensional cell line embeddings geared for ML applications (Fig. 9). A parallel procedure was performed for the all the proteomic data available. Cancer model names from the Cell Passport Model were used to enrich the database with detailed annotations on tissue, cancer type, and sample site for each tested cell line (Fig. 1B). Antigen sequence embedding generation The mature and canonical amino acid sequences (no alternatively spliced isoforms) from all listed antigens in ADCpedia were retrieved from UniProt and transformed into 1,280-dimensional embeddings using ESM (650M model) 32 . PCA was then applied, preserving 95% of the variance and resulting in 356 dimensions. This enhanced computational efficiency without significant loss of information. GENCEP model The GENCEP model took as inputs (i) raw mRNA expression data, (ii) ESM embeddings of all protein sequences, which were both PCA-reduced, and (iii) gene-specific read counts for cell lines (Fig. 9B). ESM embeddings are processed through a fully connected neural network with two layers, each containing 512 neurons, with ReLU activations, batch normalization, and a dropout rate of 15%. The mRNA data was processed through two dense layers of 512 neurons each, with ReLU activations, batch normalization, and dropout rate of 15%. The read counts were passed through a neural network layer with 64 neurons, with ReLU activation, batch normalization, and dropout rate of 15%. The three outputs were then concatenated (2,112 dimension) and fed into a network consisting of three fully connected layers with decreasing neuron counts (2,112, 1,024, 512, 64). To each layer ReLU activation function, batch normalization, and a dropout of 0.15 is applied. The outputs from the mRNA expression and read count components were also utilized as embedded inputs for the AMM model (described in the next section). The model was trained using Root Mean Squared Error (RMSE) loss to measure the difference between predicted and experimental protein intensities: $$\:\begin{array}{c}{Loss}_{MSE}=\sqrt{\frac{1}{n}\:\sum\:_{i=1}^{n}{\left({y}_{pred,i}-{y}_{true,i}\right)}^{2}}\#\left(Eq.\:1\right)\end{array}$$ Optimization was performed using the Adam optimizer 76 with an initial learning rate of 1E − 3 and a weight decay of 1E − 5. Dataset split was performed with the goal of evaluating the applicability of the model to ADC-targeted antigens. Thus, all the antigens in ADCpedia were kept in the hold-out test set, which was further populated via random sampling to a size equal to 10% of the dataset. Training and validation sets were also built via random sampling (80% and 10%, respectively). Early stopping was implemented based on the validation RMSE value with a patience of 10 epochs. AMM model i) Chemical descriptors Physicochemical features for the payload and linker structures were computed using the RDKit cheminformatics 77 and Descriptastorous libraries 31 . A set of 200 physicochemical properties was calculated, including log P, molecular weight, hydrogen bond donors and acceptors, rotatable bonds, and topological polar surface areas. Additionally, 2,048-bit molecular fingerprints were generated for the payload and linker chemical structures 78 . ii) Data preprocessing All features were converted to numeric form, with missing values set to zero. Categorical variables, such as the intracellular target classes, were one-hot encoded. A StandardScaler 79 was applied for continuous features. For labeling, IC 50 values were binarized at 10 (IC 50 < 10 nM assigned label 1; otherwise 0). The same procedure was performed for the 5 and 1 nM thresholds. To prevent data leakage, we performed Butina clustering 80 (80% threshold) on the Tanimoto dissimilarity matrix of the linker-payload fingerprints. The resulting clusters were then partitioned with stratification into training, validation, and testing sets in proportions of 80%, 10%, and 10%, respectively. Additionally, the training was balanced by oversampling the minority class (0). iii) Architecture The architecture was a multimodal CNN that processes the following inputs: (i) DAR and intracellular target information, (ii) PCA-reduced ESM embeddings of the target antigen, (iii) predicted antigen expression intensity (from the GENCEP model), (iv) Cell line-specific mRNA embeddings (from the GENCEP model), (v) Gene-specific mRNA read count embeddings (also from the GENCEP model), (vi) physicochemical (200 total) and MACCS fingerprint descriptors (167 bits) of the payload-linker system (Fig. 8A). Each feature stream in the multimodal framework first underwent an initial linear projection to reduce dimensionality. DAR and ESM embeddings were merged into Feature Stream 1, antigen intensities and mRNA embeddings constitute Feature Stream 2, and molecular descriptors plus fingerprints form Feature Stream 3. The linearly projected features within each stream were concatenated and passed to a dedicated 1D convolutional block, whose layers, hidden channels, kernel size, and dropout were tuned via Optuna. All convolutional outputs then went through a global average pooling step and were subsequently concatenated into a combined embedding, which was refined by an attention mechanism. Finally, a fully connected (FC) classifier (with ReLU activations, batch normalization, and dropout) output a sigmoid probability for binary classification. A custom binary cross-entropy (BCE) loss function was designed to emphasize classifications far from the selected nM threshold boundary while down-weighting ambiguous samples near each nM threshold. Specifically: $$\:\begin{array}{c}Penalty\:Scale=1-{e}^{\left(-penalt{y}_{slope}\:\bullet\:\:{|IC}_{50}-threshold\:\right|)}\#\left(Eq.\:2\right)\end{array}$$ $$\:\begin{array}{c}Loss=BCE\left({y}_{pred},\:{y}_{true}\right)\:\bullet\:Penalty\:Scale\:\#\left(Eq.3\right)\end{array}$$ where $\:BCE\left({y}_{pred},\:{y}_{true}\right)$ was the standard binary cross-entropy loss, $\:penalt{y}_{slope}$ controlled the rate at which the penalty decreased near the boundary (set to 10), $\:{\text{I}\text{C}}_{50}$ is the IC₅₀ value for each data point, and $\:threshold$ was set to either 1, 5, or 10 nM. Early stopping was implemented based on validation loss, halting training if no improvement was observed over eight consecutive epochs. Learning rate scheduling via ReduceLROnPlateau 81 was employed to reduce the learning rate when validation loss plateaued. iv) Hyperparameter optimization Hyperparameter searches were conducted using the default Tree-structured Parzan Estimator within Optuna 82 . The exploration covered 1 to 3 CNN layers, hidden dimensions ranging 32–512, kernel sizes up to 7, dropout up to 50%, learning rates in [E-5, E-2], and batch sizes of 32, 64, or 128, each with an early-stopping patience of 10. 1, 5, and 10 nM decision thresholds were used for labeling. The best identified hyperparameter combinations are detailed in Suppl. Table 4, 5, and 6. Computational infrastructure All computational tasks were performed on high-performance computing clusters provided by the Digital Research Alliance of Canada, including a Béluga virtual machine allocation with 8 virtual CPUs and 30 GB of RAM (p8-30gb configuration). The operating system used was Ubuntu-22.04.4-Jammy-x64-2024-06. For inference and website integration, job scheduling and resource allocation were managed by the SLURM workload manager. The computational environment was also configured to support multi-threaded operations and scale with demand. External and blinded ADC test set The external ADCs were kindly provided by Dr. Mark Barok (University of Helsinki). ADC structures (antibody name, linker type, payload, and DAR) and the cell lines tested against were provided. The information was processed by the AMM model and the probabilities for activity determined and sent back to Dr. Barok for verification. A portion of the ADCs has since been published 83 . Chemical structures and activity data for this set are reported in Supplemental Table 7. ADCpedia website implementation Allocation and deployment with Digital Research Alliance of Canada involved using a virtual machine allocation for public access, facilitated by a "Floating IP." A subdomain (server.adcpedia.com) was linked to this IP, and the application was developed with WordPress for the frontend and Django for the backend, integrating prediction models and data pipelines. Nginx and Gunicorn were used for deployment, while SSH managed the server. Project files were transferred, and settings were configured to handle cookies, CSRF, and CORS. Security rules for HTTP and HTTPS enabled requests on ports 80 and 443. Nginx was set up to redirect traffic to Gunicorn, serving the Django application. Static files were collected, database migrations executed, and SSL certificates obtained via Certbot for HTTPS security. File permissions ensured proper access, and multiple Gunicorn instances were able to maintain performance under heavy loads. Monitoring of Nginx logs helped detect and resolve issues. CSRF tokens secured POST requests, while CORS was configured to manage requests from different origins, essential for the Digital Research Alliance of Canada deployment. Statistical analysis Multivariable analytical methods were used to generate associations between ADC components and antigen expressions and IC 50 values. Continuous variables were compared using Pearson correlation coefficients and linear regression models, with model performance quantified by the coefficient of determination (R 2 ). For instance, correlations between scaled mRNA read counts and true protein intensities were assessed by scatter plots overlaid with regression lines, while differences in antigen expression across predefined IC 50 bins were examined using boxplots and strip plots. Outliers were identified and excluded based on Z-score and interquartile range criteria with the box encompassing the 75th interquartile range (IQR) and the mean indicated by horizontal lines in the boxes. Box whiskers span the 25th IQR. The IC₅₀ values were log-transformed (pIC₅₀ = −log₁₀[IC₅₀]) when appropriate. In addition, classification models for drug sensitivity (using an IC₅₀ threshold of 10 nM) were evaluated by receiver ROC analyses, with AUC values computed for validation, test, and blind test datasets. Confusion matrices were further constructed for cell line–specific analyses. All tests were two-sided, with statistical significance defined as p < 0.05, and analyses were performed using Python (pandas, SciPy, scikit-learn, and seaborn). Declarations Acknowledgements This research was funded by the Canadian Institutes of Health Research (J. Leyton; 378389) and by the Natural Sciences and Engineering Research Council of Canada (F. Gentile; RGPIN-2023-04129). The authors thank the Digital Research Alliance of Canada for computational resources and for a Resource Allocation Competition grant awarded to F. Gentile. References Dumontet, C., Reichert, J.M., Senter, P.D., Lambert, J.M. & Beck, A. Antibody-drug conjugates come of age in oncology. Nat Rev Drug Discov 22 , 641-661 (2023). Jin, H. et al. Rab GTPases: Central Coordinators of Membrane Trafficking in Cancer. Front Cell Dev Biol 9 , 648384 (2021). Mellman, I. & Yarden, Y. Endocytosis and cancer. Cold Spring Harb Perspect Biol 5 , a016949 (2013). Mosesson, Y., Mills, G.B. & Yarden, Y. Derailed endocytosis: an emerging feature of cancer. Nat Rev Cancer 8 , 835-850 (2008). Shen, L. et al. ADCdb: the database of antibody-drug conjugates. Nucleic Acids Res 52 , D1097-D1109 (2024). Nessler, I., Menezes, B. & Thurber, G.M. Key metrics to expanding the pipeline of successful antibody-drug conjugates. Trends Pharmacol Sci 42 , 803-812 (2021). Blenrep withdrawn for multiple myeloma https://www.gsk.com/en-gb/media/press-releases/gsk-provides-update-on-blenrep-us-marketing-authorisation/. (2022). Trodelvy withdrawn for metastatic urothelial cancer. https://www.gilead.com/company/company-statements/2024/gilead-provides-update-on-us-indication-for-trodelvy-in-metastatic-urothelial-cancer#:~:text=Foster%20City%2C%20Calif.%2C%20October,and%20Drug%20Administration%20(FDA). (2024). Fu, Z., Li, S., Han, S., Shi, C. & Zhang, Y. Antibody drug conjugate: the "biological missile" for targeted cancer therapy. Signal Transduct Target Ther 7 , 93 (2022). Khongorzul, P., Ling, C.J., Khan, F.U., Ihsan, A.U. & Zhang, J. Antibody-Drug Conjugates: A Comprehensive Review. Mol Cancer Res 18 , 3-19 (2020). Liu, T. et al. Overexpression of TROP2 predicts poor prognosis of patients with cervical cancer and promotes the proliferation and invasion of cervical cancer cells by regulating ERK signaling pathway. PLoS One 8 , e75864 (2013). Hammood, M., Craig, A.W. & Leyton, J.V. Impact of Endocytosis Mechanisms for the Receptors Targeted by the Currently Approved Antibody-Drug Conjugates (ADCs)-A Necessity for Future ADC Research and Development. Pharmaceuticals (Basel) 14 (2021). Leyton, J.V. Improving Receptor-Mediated Intracellular Access and Accumulation of Antibody Therapeutics-The Tale of HER2. Antibodies (Basel) 9 (2020). Cullen, P.J. & Steinberg, F. To degrade or not to degrade: mechanisms and significance of endocytic recycling. Nat Rev Mol Cell Biol 19 , 679-696 (2018). Zerial, M. & McBride, H. Rab proteins as membrane organizers. Nat Rev Mol Cell Biol 2 , 107-117 (2001). Nadal-Serrano, M. et al. The Second Generation Antibody-Drug Conjugate SYD985 Overcomes Resistances to T-DM1. Cancers (Basel) 12 (2020). Rios-Luci, C. et al. Resistance to the Antibody-Drug Conjugate T-DM1 Is Based in a Reduction in Lysosomal Proteolytic Activity. Cancer Res 77 , 4639-4651 (2017). Weng, W. et al. Antibody-Exatecan Conjugates with a Novel Self-immolative Moiety Overcome Resistance in Colon and Lung Cancer. Cancer Discov 13 , 950-973 (2023). Kostova, V., Desos, P., Starck, J.B. & Kotschy, A. The Chemistry Behind ADCs. Pharmaceuticals (Basel) 14 (2021). Maecker, H., Jonnalagadda, V., Bhakta, S., Jammalamadaka, V. & Junutula, J.R. Exploration of the antibody-drug conjugate clinical landscape. MAbs 15 , 2229101 (2023). Goncalves, E. et al. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell 40 , 835-849 e838 (2022). Foerderer, J. Should we trust web-scaped data? ArXiv:2308.02231 (2023). ADC Review. www.adcreview.com (accessed 04/12/2024). Hamann, P.R. et al. Gemtuzumab ozogamicin, a potent and selective anti-CD33 antibody-calicheamicin conjugate for treatment of acute myeloid leukemia. Bioconjug Chem 13 , 47-58 (2002). Lewis Phillips, G.D. et al. Targeting HER2-positive breast cancer with trastuzumab-DM1, an antibody-cytotoxic drug conjugate. Cancer Res 68 , 9280-9290 (2008). Doronina, S.O. et al. Novel peptide linkers for highly potent antibody-auristatin conjugate. Bioconjug Chem 19 , 1960-1963 (2008). Doronina, S.O. et al. Development of potent monoclonal antibody auristatin conjugates for cancer therapy. Nat Biotechnol 21 , 778-784 (2003). Tolcher, A.W. et al. Randomized phase II study of BR96-doxorubicin conjugate in patients with metastatic breast cancer. J Clin Oncol 17 , 478-484 (1999). Damelin, M., Zhong, W., Myers, J. & Sapra, P. Evolving Strategies for Target Selection for Antibody-Drug Conjugates. Pharm Res 32 , 3494-3507 (2015). Samantasinghar, A. et al. A comprehensive review of key factors affecting the efficacy of antibody drug conjugate. Biomed Pharmacother 161 , 114408 (2023). https://github.com/bp-kelley/descriptastorus. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379 , 1123-1130 (2023). Leyton, J.V. The endosomal-lysosomal system in ADC design and cancer therapy. Expert Opin Biol Ther 23 , 1067-1076 (2023). Erickson, H.K. et al. The effect of different linkers on target cell catabolism and pharmacokinetics/pharmacodynamics of trastuzumab maytansinoid conjugates. Mol Cancer Ther 11 , 1133-1142 (2012). Petersen, M.E. et al. Design and Evaluation of ZD06519, a Novel Camptothecin Payload for Antibody Drug Conjugates. Mol Cancer Ther 23 , 606-618 (2024). Dornan, D. et al. Therapeutic potential of an anti-CD79b antibody-drug conjugate, anti-CD79b-vc-MMAE, for the treatment of non-Hodgkin lymphoma. Blood 114 , 2721-2729 (2009). Kim, S.B. et al. Relationship between tumor biomarkers and efficacy in TH3RESA, a phase III study of trastuzumab emtansine (T-DM1) vs. treatment of physician's choice in previously treated HER2-positive advanced breast cancer. Int J Cancer 139 , 2336-2342 (2016). Savage, S.R. et al. Pan-cancer proteogenomics expands the landscape of therapeutic targets. Cell 187 , 4389-4407 e4315 (2024). Dupree, E.J. et al. A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of this Field. Proteomes 8 (2020). Slamon, D.J. et al. Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science 235 , 177-182 (1987). Vogel, C.L. et al. Efficacy and safety of trastuzumab as a single agent in first-line treatment of HER2-overexpressing metastatic breast cancer. J Clin Oncol 20 , 719-726 (2002). Yu, S.Y., Park, J., Kwon, W.S., Jeong, I., Kang, S.K., Bae, H.J., Kim, T.S., Chung, H.C., Rha, S.Y. Abstract 945: Trastuzumab deruxtecan (T-DXd) sensitivity invarious levels of HER2 expressing gastric cancer cells. Cancer Res 81 (12_Supplement) (2021). Lacasse, V., Beaudoin, S., Jean, S. & Leyton, J.V. A Novel Proteomic Method Reveals NLS Tagging of T-DM1 Contravenes Classical Nuclear Transport in a Model of HER2-Positive Breast Cancer. Mol Ther Methods Clin Dev 19 , 99-119 (2020). Lenart, S. et al. Trop2: Jack of All Trades, Master of None. Cancers (Basel) 12 (2020). Tagawa, S.T. et al. TROPHY-U-01: A Phase II Open-Label Study of Sacituzumab Govitecan in Patients With Metastatic Urothelial Carcinoma Progressing After Platinum-Based Chemotherapy and Checkpoint Inhibitors. J Clin Oncol 39 , 2474-2485 (2021). Bardia, A., Hurvitz, S.A. & Rugo, H.S. Sacituzumab Govitecan in Metastatic Breast Cancer. Reply. N Engl J Med 385 , e12 (2021). Loriot, Y. et al. Sacituzumab Govitecan Demonstrates Efficacy across Tumor Trop-2 Expression Levels in Patients with Advanced Urothelial Cancer. Clin Cancer Res 30 , 3179-3188 (2024). Santi, D.V., Ashley, G.W., Cabel, L., Bidard, F.C. Could a Long-Acting Prodrug of SN-38 be Efficacious in Sacituzumab Govitecan-Resistant Tumors? BioDrugs 38 , 171-176 (2024). Powles, T. et al. Enfortumab Vedotin in Previously Treated Advanced Urothelial Carcinoma. N Engl J Med 384 , 1125-1135 (2021). Chu, P.G. & Arber, D.A. CD79: a review. Appl Immunohistochem Mol Morphol 9 , 97-106 (2001). Pfeifer, M. et al. Anti-CD22 and anti-CD79B antibody drug conjugates are active in different molecular diffuse large B-cell lymphoma subtypes. Leukemia 29 , 1578-1586 (2015). Tilly, H. et al. Polatuzumab Vedotin in Previously Untreated Diffuse Large B-Cell Lymphoma. N Engl J Med 386 , 351-363 (2022). Polson, A.G. et al. Antibody-drug conjugates targeted to CD79 for the treatment of non-Hodgkin lymphoma. Blood 110 , 616-623 (2007). Polson, A.G. et al. Antibody-drug conjugates for the treatment of non-Hodgkin's lymphoma: target and linker-drug selection. Cancer Res 69 , 2358-2364 (2009). Tolcher, A.W. Antibody drug conjugates: lessons from 20 years of clinical experience. Ann Oncol 27 , 2168-2172 (2016). Cassady, J.M., Chan, K.K., Floss, H.G. & Leistner, E. Recent developments in the maytansinoid antitumor agents. Chem Pharm Bull (Tokyo) 52 , 1-26 (2004). Junttila, T.T., Li, G., Parsons, K., Phillips, G.L. & Sliwkowski, M.X. Trastuzumab-DM1 (T-DM1) retains all the mechanisms of action of trastuzumab and efficiently inhibits growth of lapatinib insensitive breast cancer. Breast Cancer Res Treat 128 , 347-356 (2011). Remillard, S., Rebhun, L.I., Howie, G.A. & Kupchan, S.M. Antimitotic activity of the potent tumor inhibitor maytansine. Science 189 , 1002-1005 (1975). Steube, K.G. et al. Dolastatin 10 and dolastatin 15: effects of two natural peptides on growth and differentiation of leukemia cells. Leukemia 6 , 1048-1053 (1992). Quentmeier, H., Brauer, S., Pettit, G.R., Drexler, H.G. Cytotostatic effects of dolastatin 10 and dolastatin 15 on human leukemia cell lines. Leuk. Lymphoma 6 , 245-250 (1992). Krause, W. Resistance to anti-tubulin agents: From vinca alkaloids to epothilones. Cancer Drug Resist 2 , 82-106 (2019). Morgensztern, D., Ready, N.E., Johnson, M.L., Dowlati, A., Choudhury, N.J., Carbone, D.P., Schaefer, E.S., Arnold, S.M., Puri, S., Piotrowska, Z. First-in-human study of ABBV-011, a seizure-related homolog protein 6 (SEZ6)–targeting antibody-drug conjugate, in patients with small cell lung cancer. J. Clin. Oncol. 41 (2023). Wiedemeyer, W.R. et al. ABBV-011, A Novel, Calicheamicin-Based Antibody-Drug Conjugate, Targets SEZ6 to Eradicate Small Cell Lung Cancer Tumors. Mol Cancer Ther 21 , 986-998 (2022). Norsworthy, K.J. et al. FDA Approval Summary: Mylotarg for Treatment of Patients with Relapsed or Refractory CD33-Positive Acute Myeloid Leukemia. Oncologist 23 , 1103-1108 (2018). Goldenberg, D.M. & Sharkey, R.M. Antibody-drug conjugates targeting TROP-2 and incorporating SN-38: A case study of anti-TROP-2 sacituzumab govitecan. MAbs 11 , 987-995 (2019). Avendano, C., Menendez, J.C. Medicinal Chemistry of Anticancer Drugs. (Elsevier Science, 2008). Nakada, T., Sugihara, K., Jikoh, T., Abe, Y. & Agatsuma, T. The Latest Research and Development into the Antibody-Drug Conjugate, [fam-] Trastuzumab Deruxtecan (DS-8201a), for HER2 Cancer Therapy. Chem Pharm Bull (Tokyo) 67 , 173-185 (2019). Ogitani, Y. et al. DS-8201a, A Novel HER2-Targeting ADC with a Novel DNA Topoisomerase I Inhibitor, Demonstrates a Promising Antitumor Efficacy with Differentiation from T-DM1. Clin Cancer Res 22 , 5097-5108 (2016). de Bever, L. et al. Generation of DAR1 Antibody-Drug Conjugates for Ultrapotent Payloads Using Tailored GlycoConnect Technology. Bioconjug Chem 34 , 538-548 (2023). Wang, S., Zhang, R., Zhong, K., Guo, W. & Tong, A. An Anti-CD7 Antibody-Drug Conjugate Target Showing Potent Antitumor Activity for T-Lymphoblastic Leukemia (T-ALL). Biomolecules 14 (2024). Tang, S.C. et al. Influence of antibody-drug conjugate cleavability, drug-to-antibody ratio, and free payload concentration on systemic toxicities: A systematic review and meta-analysis. Cancer Metastasis Rev 44 , 18 (2024). Nagra, N.S. et al. The company landscape for artificial intelligence in large-molecule drug discovery. Nat Rev Drug Discov 22 , 949-950 (2023). Lu, X. et al. Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph. Comput Struct Biotechnol J 23 , 1666-1679 (2024). Luo, Y. et al. Toward Unified AI Drug Discovery with Multimodal Knowledge. Health Data Sci 4 , 0113 (2024). Steurer, B., Vanhaelen, Q. & Zhavoronkov, A. Multimodal Transformers and Their Applications in Drug Target Discovery for Aging and Age-Related Diseases. J Gerontol A Biol Sci Med Sci 79 (2024). Kingma, D.P., Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014). RDKit https://www.rdkit.org/ (accessed 2024-04-26). Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J Chem Inf Model 50 , 742-754 (2010). Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. Scikit-learn: Machine Learning in Python. The journal of Machine Learning Research 12 , 2825-2830 (2011). Butina, D. Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. Journal of Chemical Information and Computer Sciences 39 (1999). Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. ArXiv:1406.2572 (2014). Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M. Optuna: A next-generation hyperparameter optimization framework. Archiv (2019). Pourjamal, N., Le Joncour, V., Vereb, G., Honkamaki, C., Isola, J., Leyton, J.V., Laakkonen, P., Joensuu, H., Barok, M. Disitamab vedotin in preclinical models of HER2-positive breast and gastric cancers resistant to trastuzumab emtansine and trastuzumab deruxtecan. Transl. Oncol. 53 (2025). Additional Declarations There is NO Competing Interest. Supplementary Files ADCpediaSupplMaterialsv5.docx Supplementary Tables 1-7 FigsS1.png FigsS2.png FigsS3.png FigsS4.png FigsS5.png FigsS6.png FigsS7.png FigsS8.png Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6256038","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":431155319,"identity":"094c2ba0-5dd9-4672-bd72-f8620a847471","order_by":0,"name":"Jeffrey Leyton","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA2UlEQVRIiWNgGAWjYDACdgglx8DeQKwWZiA+wMBgzMBzmJFITVAtiQ0SyURqMTjMfEz6Q8Wd9A038o8/+Nlmx8DffoCQFrY0iQNnnuVuuJHM2NjblswgcSYBvxbJZh4ziYNth8FaGni3MTMYMBDUwv8NpCXdAGTL3231DAb8D/Br4WfmYQNpSQBpaebddpjBQIKALfzMbMYWZ84cNpwJxLNl/x3nkbhBwBY29uaHNyoqDsvzHW988PHNmWo5/n4CtgABiwQyj4egeiBg/kCMqlEwCkbBKBjBAADvKEcKP/yH4QAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-6870-3819","institution":"University of Ottawa","correspondingAuthor":true,"prefix":"","firstName":"Jeffrey","middleName":"","lastName":"Leyton","suffix":""},{"id":431155320,"identity":"302800c0-63b1-49f8-831b-2e396bf4d612","order_by":1,"name":"Hazem Mslati","email":"","orcid":"","institution":"University of Ottawa","correspondingAuthor":false,"prefix":"","firstName":"Hazem","middleName":"","lastName":"Mslati","suffix":""},{"id":431155321,"identity":"110560b4-c3d5-4a35-b84b-6821752339c1","order_by":2,"name":"Gael Coulombe","email":"","orcid":"","institution":"University of Ottawa","correspondingAuthor":false,"prefix":"","firstName":"Gael","middleName":"","lastName":"Coulombe","suffix":""},{"id":431155322,"identity":"69f43dd6-f569-43a6-9be5-fe2c264b97f4","order_by":3,"name":"Mehdi Ezzine","email":"","orcid":"","institution":"University of Ottawa","correspondingAuthor":false,"prefix":"","firstName":"Mehdi","middleName":"","lastName":"Ezzine","suffix":""},{"id":431155323,"identity":"443206a3-13de-4a96-aec1-dc2743963c31","order_by":4,"name":"Tiana Yuen","email":"","orcid":"","institution":"University of Ottawa","correspondingAuthor":false,"prefix":"","firstName":"Tiana","middleName":"","lastName":"Yuen","suffix":""},{"id":431155324,"identity":"4a18a22b-1713-40eb-a09e-2d8f5f1f7d97","order_by":5,"name":"Francesco Gentile","email":"","orcid":"https://orcid.org/0000-0001-8299-1976","institution":"University of Ottawa","correspondingAuthor":false,"prefix":"","firstName":"Francesco","middleName":"","lastName":"Gentile","suffix":""}],"badges":[],"createdAt":"2025-03-18 20:15:48","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-6256038/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6256038/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":81177936,"identity":"b85d7b1d-0128-494f-b2fc-d0e53cc0a02a","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":347258,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe multimodal framework of the ADC design platform. A) \u003c/strong\u003eThe platform enables the prediction of ADC activities across tumor cell lines. The platform is comprised of three components: 1) ADCpedia, which is a curated database that includes records on ADC structures and cytotoxic IC50 values. It also contains tumor cell biology such as the identities of tumor cell lines, cell surface and intracellular targets, and gene expression omics data for 1,491 human tumor cell lines (purple shaded area). 2) The gene cell expression prediction (GENCEP) model, which provides universal and standardized protein intensity quantification (teal shaded area). 3) The ADC MultiModal (AMM) model, which trains on this information at 1, 5, and 10 nM IC50 value thresholds and generates a probability for activity based on an ADC design for a given tumor cell line. \u003cstrong\u003eB) \u003c/strong\u003eOverview of the tumor types and the number of cell lines in ADCpedia. Dashed arrow indicates link between the multi-omics integrated into ADCpedia.\u003c/p\u003e","description":"","filename":"Slide1.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/b24d57d8ed81dfab98bc5969.png"},{"id":81178831,"identity":"d2c98cab-0eae-40bd-9132-42392f00f624","added_by":"auto","created_at":"2025-04-23 06:57:30","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":69677,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eBridging transcriptomic and proteomic data with the GENCEP model. A. \u003c/strong\u003eRelationship between protein expression intensities (y-axis) and mRNA counts (x-axis) in the Cell Model Passport database, showing a low correlation. Each point represents a protein-cell line pair. Both axes are standardized. \u003cstrong\u003eB. \u003c/strong\u003eScatter plot illustrating the correlation between the expression intensities of ADC antigens predicted by GENCEP and the ground-truth values reported in the Cell Model Passport database in the hold-out test set. Each point represents a protein-cell line pair.\u003c/p\u003e","description":"","filename":"Slide2.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/96c10a969bc313ddefe4995d.png"},{"id":81180039,"identity":"1f70b224-e910-48d0-adb4-770be527cc3f","added_by":"auto","created_at":"2025-04-23 07:13:30","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":113356,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStandardization of antigen intensities across tumor cells lines. \u003c/strong\u003eDiversity of the predicted intensities for A) HER2, C) Trop2, and E) Nectin-4 in tumor cell lines challenged by ADCs in ADCpedia. The colored segments are for cell lines grouped according to tumor types with most ADC records and the grey areas indicate tumor types with very few ADC records. The mean intensities for B) HER2, D) Trop2, and F) Nectin-4 across each organ system or tumor type across the entire Cell Model Passport. The number of cell lines per tumor type and the color bars are the same as in Fig. 1B. Stars indicate the tumor indications for the approved ADCs targeting the respective antigens.\u003c/p\u003e","description":"","filename":"Slide3.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/6e9c51efb217d149cfcbfffb.png"},{"id":81177937,"identity":"644f2e2d-ed1a-47ec-98eb-e4005402b32d","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":68354,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationship between predicted ADC antigen intensities and efficacies. A.\u003c/strong\u003e Boxplots of predicted antigen intensities in different IC₅₀ bins. \u003cstrong\u003eB. \u003c/strong\u003ePoint-biserial correlation between IC₅₀ values represented with binary values (\u0026lt;10 nM: 1; ≥10 nM: 0), and predicted antigen intensities of ADCpedia entries, yielding a low correlation coefficient of 0.12 across 1,072 data points.\u003c/p\u003e","description":"","filename":"Slide4.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/51b036160e2638ea0f9089eb.png"},{"id":81177941,"identity":"5578e7ef-c9e8-41c8-94dd-44a6376b785b","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":27240,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRelationship between predicted antigen intensity and IC50 values in ADCpedia.\u003c/strong\u003e Boxplots evaluating the association between IC50 values (-log(IC50) across the three different protein intensity ranges based on A) HER2 , B) Trop2, and C) CD79b.\u003c/p\u003e","description":"","filename":"Slide5.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/694dbba2b940df8c9a36381e.png"},{"id":81177946,"identity":"46acbdb1-2bb3-4434-81a6-a359529f9495","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":84922,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe relationships between ADC potencies and payloads across tumor types. \u003c/strong\u003eThe IC50 values for ADCs in ADCpedia and the relationship with payload intracellular targets \u003cstrong\u003eA) \u003c/strong\u003eMicrotubule inhibitors, \u003cstrong\u003eB) \u003c/strong\u003eDNA damagers, and \u003cstrong\u003eC) \u003c/strong\u003eTopoisomerase I inhibitors. The zone between the dashed lines represents the 10-100 nM IC50 value range. Above and below this zone are IC50 values \u0026lt;10 nM and \u0026gt;100 nM, respectively. Legend lists the tumor types according to color based on Fig. 1B\u003c/p\u003e","description":"","filename":"Slide6.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/99c88bc11b98fc242676a945.png"},{"id":81178840,"identity":"0bb47fe7-04e1-47a4-b366-7d02aa2af506","added_by":"auto","created_at":"2025-04-23 06:57:30","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":41062,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe relationship between ADC potencies and linkers across tumor types. \u003c/strong\u003eThe IC50 values for ADCs in ADCpedia and the relationship with \u003cstrong\u003eA) \u003c/strong\u003ecleavable, and \u003cstrong\u003eB) \u003c/strong\u003enon-cleavable linkers. Y-axes are the negative log of IC50 values in nM. Boxes are color-coded by tumor type as in Fig. 1B\u003c/p\u003e","description":"","filename":"Slide7.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/8c9d993ea87c9071200605ac.png"},{"id":81178837,"identity":"525e7f1d-afd2-410b-9912-134d865deb82","added_by":"auto","created_at":"2025-04-23 06:57:30","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":30306,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eADC potency variability as a function of DAR. \u003c/strong\u003eIC50 values are binned across different DARs. On the top x-axis, the distribution of DAR values across the entire ADCpedia database is reported. The right Y-axis represents the distribution of ADCs in ADCpedia.\u003c/p\u003e","description":"","filename":"Slide8.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/9acbe88a16d620c3862388ec.png"},{"id":81179193,"identity":"af9a9d8a-f99e-445a-86eb-7ea59591c0d8","added_by":"auto","created_at":"2025-04-23 07:05:30","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":234643,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSchematic of the architectures of GENCEP and AMM models. A. \u003c/strong\u003eEmbeddings and outputs from GENCEP’s antigen expression predictions are fed into the AMM model, which processes these and other biological and chemical features via convolutional layers, to predict ADC efficacy (IC50) as a classification task (\u0026lt;10 nM: 1; ≥10 nM: 0). \u003cstrong\u003eB. \u003c/strong\u003eThe GENCEP model predicts protein expression intensities, leveraging mRNA counts, ESM embeddings, and target gene mRNA counts as inputs that are first processed by separated dense layers and then concatenated as an input for a regression head.\u003c/p\u003e","description":"","filename":"Slide9.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/4a99973eac3b43ddbfe1a313.png"},{"id":81177957,"identity":"6911f8dd-3073-4d4c-99ff-4e99f6306247","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":113988,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformances of AMM predictive model. A. \u003c/strong\u003eArea under the receiver operating characteristic curve (AUC) values reporting classification performances of AMM model at 10, 5, and 1 nM thresholds on the validation (golden dotted line), internal hold-out test (blue solid line) and external blind test (green solid line) sets. \u003cstrong\u003eB. \u003c/strong\u003eA schematic of the diversity represented in the external blind test set.\u003c/p\u003e","description":"","filename":"Slide10.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/02b438559a8c011c6e88af5a.png"},{"id":86020081,"identity":"ffb9023f-7fb6-4c89-8ee9-2b02862ab7d7","added_by":"auto","created_at":"2025-07-04 11:41:17","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2501519,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/28ea698c-e4ca-4ee7-9bdc-eda29da73420.pdf"},{"id":81177934,"identity":"7268934c-d574-4296-95ad-26e544aad64d","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":54664,"visible":true,"origin":"","legend":"Supplementary Tables 1-7","description":"","filename":"ADCpediaSupplMaterialsv5.docx","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/114d45f3ecae05aaf488ff50.docx"},{"id":81178835,"identity":"e593d9ee-9ce9-498d-87c9-820332b74744","added_by":"auto","created_at":"2025-04-23 06:57:30","extension":"png","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":89235,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS1.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/ada80608df9b33a71a3bf499.png"},{"id":81179190,"identity":"c782f50f-17e5-4ac6-88eb-9733ec970097","added_by":"auto","created_at":"2025-04-23 07:05:30","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":41814,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS2.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/333b7c4726fc1226a894f759.png"},{"id":81177945,"identity":"d3fd5daf-945e-47df-a0d1-a94a48da102f","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":119185,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS3.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/cf0577b60db2d97c3d46653a.png"},{"id":81179192,"identity":"cb48b9b9-3133-4783-9a5f-6a4c0ed31901","added_by":"auto","created_at":"2025-04-23 07:05:30","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":144358,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS4.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/c0b7cf6a22189f22a58b4697.png"},{"id":81177947,"identity":"7df8ca4a-006c-4660-b8ef-f9ecd2be23e2","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":76052,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS5.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/5825c8a288781733cf66483f.png"},{"id":81177952,"identity":"3bf40005-3c27-4e99-b1b3-f49581153467","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":77188,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS6.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/ad55c62d101df7bd3a7c34e3.png"},{"id":81177954,"identity":"d5465542-fa8b-4bf3-82b2-a2fa7c85d49a","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":127441,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS7.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/a15655f338a6517000bad7e1.png"},{"id":81177959,"identity":"be69db24-9fef-4433-9359-28bdf127fa70","added_by":"auto","created_at":"2025-04-23 06:49:30","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":189753,"visible":true,"origin":"","legend":"","description":"","filename":"FigsS8.png","url":"https://assets-eu.researchsquare.com/files/rs-6256038/v1/976b914ae76d3971894fdfee.png"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"A Machine Learning Platform for Interconnecting Antibody-Drug Conjugate Cytotoxic Design with Tumor Cell Biology","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAntibody-drug conjugates (ADCs) have transformed cancer therapy by enabling targeted drug delivery and improving efficacy over traditional chemotherapy. However, their development remains highly inefficient due to the intricate complexity between their structural components as ADCs combine monoclonal antibodies (mAbs), specialized linkers, and cytotoxic payloads, at optimized drug-antibody ratios (DARs). Beyond structural complexity, ADCs rely on a highly coordinated mechanism of action. They must selectively bind to overexpressed antigens on the surface of tumor cells, undergo rapid internalization, and traffic through the endosomal-lysosomal system for degradation and payload release\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. Unfortunately, the ADC-tumor cell interplay is also highly intricate as these processes not only vary from normal cells but across tumor types and, therefore, not fully understood\u003csup\u003e\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eADC research faces high attrition rates in early development stages\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e, with many of the candidates also failing in clinical trial\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. A significant challenge in developing effective ADCs stems from an incomplete understanding of the intricate interconnectedness between chemical design and cellular biology. ADCs must navigate a highly variable biological landscape, including cellular differences in antigen expression and endosomal-lysosomal processing pathways. This variability often leads to suboptimal therapeutic efficacy or increased off-target toxicity. Even with ADCs in clinical trials and U.S. FDA approvals on the rise, the recent withdrawal of two commercial products due to post-marketing inefficacy\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e underscores the need to address these gaps. A pressing fundamental challenge lies in refining antigen selection criteria and improving knowledge on the ADC relationship with internalization kinetics and lysosomal targeting efficiencies.\u003c/p\u003e \u003cp\u003eSuccessful ADC design hinges on the expression profile of the cell surface target antigen and the subsequent processing governed by the endosomal-lysosomal machinery. Typically, ADC development prioritizes antigens with high tumor expression, as greater antigen density enhances ADC binding and payload accumulation\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. However, high antigen expression alone does not guarantee success. First, the minimum antigen expression threshold required for good tumor selectivity remains unclear, as many tumor-associated antigens are also present on normal tissues, raising concerns about off-target toxicity. Second, overexpression of target antigens for specific cancer types has been shown to inhibit apoptosis\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, suggesting that ADC efficacy may be enhanced by targeting tumors with relatively lower antigen expression levels.\u003c/p\u003e \u003cp\u003eBeyond absolute and relative antigen expression levels, antigen internalization kinetics and endocytic pathways are important to understand\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, not only between tumor and normal cells but also across different tumor types. However, antigen internalization is not well understood\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Even for well-established ADC targets like HER2, variations in internalization rates and intracellular routing can alter therapeutic efficacy between tumor types\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. ADCs must then navigate the complex endosomal system to reach lysosomes, where degradation releases the cytotoxic payload. However, this pathway is highly variable among tissue types, which is compounded in cancer. The endosomal network consists of multiple compartments and their organization and dynamics are regulated by select members of the Rab proteins, the largest family of monomeric GTPases, and their effectors\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. Many components of the endosomal machinery are mutated or have altered expression in several cancers impairing ADC lysosomal targeting\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Additionally, lysosomal dysfunction, including alkalization due to aberrant vacuolar H\u0026thinsp;+\u0026thinsp;ATPase activity, has been a significant factor linked to ADC resistance\u003csup\u003e\u003cspan additionalcitationids=\"CR17\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Despite advances in linker chemistry designed to exploit endosomal-lysosomal conditions for payload release\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, these significant biological variabilities complicate ADC optimization.\u003c/p\u003e \u003cp\u003eAdding to the complexity of ADC development is the strikingly high redundancy in design, where the vast majority of ADCs rely on a narrow range of structural components interfacing with an even narrower range of biological elements. Among ~\u0026thinsp;6,500 reported ADCs, the diversity of the structural components is approximately 18%, 8%, and 7% for the mAb, linker, and payload, respectively\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Furthermore, the diversity on the biological side such as target antigen and intracellular target is 5% and \u0026lt;\u0026thinsp;1%, respectively\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Diversity is further reduced for these elements for ADCs in the clinic\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eMoreover, the empirical nature of ADC development is insufficient for systematically evaluating efficacy across all available tumor cell lines/xenograft models representing the diverse and numerous types of cancer. This often leads to the premature elimination of potentially viable candidates or the advancement of ADCs with hidden suboptimal efficacy. As a result, the ADC field is currently at a crossroads \u0026ndash; simply switching the mAb to universally broaden ADC applicability across multiple cancer types has proven to be both overly simplified and an economically unsustainable strategy. Simultaneously, the empirical development of novel ADCs remains constrained by a lack of comprehensive and systematic evaluation methods. Consequently, the field continues to struggle with high attrition rates and unpredictable clinical outcomes, increasing research and development costs. Addressing these challenges necessitates the creation of innovative, data-driven approaches that integrate structural and biological insights to optimize ADC design and enhance the possibility of clinical success.\u003c/p\u003e \u003cp\u003eThis work introduces a machine learning (ML)-powered ADC design platform based on a multimodal framework interconnecting ADC structures and activity with tumor cell biology that accurately predicts ADC \u003cem\u003ein vitro\u003c/em\u003e activities across tumor cell lines (Fig.\u0026nbsp;1). The platform is built on three synergistic components. The first is the ADCpedia database that contains over two decades of scientific literature on ADC chemical structure-activity relationships. ADCpedia also focuses on tumor cell biology by including comprehensive transcriptomic and proteomic data for ~\u0026thinsp;1,400 human tumor cell lines\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. The second component is the gene cell expression prediction (GENCEP) model, which is a late fusion neural network and by integrating multiple tumor cell features provides a universal and standardized system to quantify protein intensities. The third component is the ADC multimodal (AMM) model, which is based on a multimodal convolutional neural network (CNN) with a processing system tailored for the ADC-tumor cell interface. As a proof of concept, the training focused on target antigen protein expression across all tumor cell lines and at clinically relevant nanomolar (nM) ADC cytotoxicity IC\u003csub\u003e50\u003c/sub\u003e values.\u003c/p\u003e \u003cp\u003eThe ADC design platform demonstrated accurate predictions of ADC \u003cem\u003ein vitro\u003c/em\u003e activities across diverse tumor cell lines expressing various target antigen intensities. Strikingly, accurate prediction was also achieved on blinded real-world ADCs tested against tumor cell lines where both were absent in the training set as they were not yet reported in the literature. The rich multifaceted framework also exposed the complexities between ADC design based on different payloads and linker types and their cytotoxic activities across target antigens and tumor types. This work potentially realizes the development of a robust platform that links ADC activity and complex tumor cell biology using a data-centric yet holistic multimodal approach. This platform also represents a potential transformative solution for addressing the challenges of ADC design as it is a first-of-a-kind system to integrate chemical and cellular insights architecturally organized for deep learning (DL) to offer researchers accurate predictions and enabling insights into tumor-specific ADC efficacy. Hence, this platform has the potential to become a new standard for precision ADC development by streamlining and increasing early decision confidence for novel candidate ADC design and repurposing efforts for past ADCs, which ultimately will decrease resource-intensive research efforts.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eDatabase building\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAlthough previous curations of ADC structure-activity databases were created by extracting as much information as possible, these inherently leave large gaps. Databases built on searching multiple patent organizations and extracting information from the R\u0026amp;D pipelines listed on the websites of pharmaceutical companies\u003csup\u003e5\u003c/sup\u003e inherently leads to significant inferring of structure and/or activity information for a given ADC or leaves gaps in the database, a concern that is detrimental to training\u003csup\u003e22\u003c/sup\u003e. Additionally, such extensive searches most likely identify ADCs developed very early in the field and are not realistic into today\u0026rsquo;s much more refined field.\u003c/p\u003e\n\u003cp\u003eTherefore, the curation process for ADCpedia focused on preclinically and clinically validated ADCs listed in the ADC DrugMap\u003csup\u003e23\u003c/sup\u003e. Surprisingly, the ADC DrugMap listed structures that do not traditionally constitute an ADC such as biparatopic antibodies and ɑ-particle emitting payloads\u003csup\u003e23\u003c/sup\u003e. Therefore, the database was refined based on standardized inclusion criteria for ADC structure. Briefly, inclusion criteria consisted of only full monospecific mAbs, hetero-bifunctional chemical linkers, and traditional small molecule/anti-mitotic peptide payloads, akin to chemotherapeutic activities. A targeted search was then performed on these ADCs in the literature between January 2000 to April 15, 2024. A key second inclusive layer was that ADC structural information had to include accompanying cytotoxicity IC\u003csub\u003e50\u003c/sub\u003e values to ensure completeness in the database. Also to mitigate inconsistencies in methodologies to evaluate and determine quantitative IC\u003csub\u003e50\u003c/sub\u003e values, only studies reporting cytotoxicity assays conducted in multi-well plate formats, single time-point incubation periods (\u003cem\u003ee.g\u003c/em\u003e., 72 h, 120 h, etc.), ADC concentration-response relationships, and readouts based on fluorescence- or luminescence-based reagents were included in the database. As a result of these standardization measures, a total of 151 articles and patents were collected and procured a total of 1,167 datapoints on ADC linker-payload compositions, mAb conjugation sites, DAR, the accompanying IC\u003csub\u003e50\u003c/sub\u003e values, target antigen and tumor types, and tumor cell line names.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe evolution of ADC design\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePost-curation analysis of ADCpedia revealed significant trends in the evolution of linker-payload combinations over the 24-year search span. A representation of the most present payload-linker combinations is in Supplemental Figure 1. Between 2001-2009, ADC research focused heavily on DNA damaging payloads like calicheamicin paired with acetyl-butyrate (AcBut) linkers. These linkers exploit the elevated intracellular glutathione levels to reduce the incorporated disulfide bonds and enable payload release\u003csup\u003e24\u003c/sup\u003e. ADCs using this combination, such as gemtuzumab-ozogamicin (GO), were associated with IC\u003csub\u003e50\u003c/sub\u003e values below 10 nM and were predominantly tested on leukemic cell lines (\u003cem\u003ee.g\u003c/em\u003e., HL-60). In contrast, anti-microtubule payloads were paired with more diverse linkers, exhibited higher IC\u003csub\u003e50\u003c/sub\u003e values, and were tested on solid tumor cell lines (\u003cem\u003ee.g.\u003c/em\u003e, A549 and U251). This likely reflects early challenges in optimizing anti-microtubule ADCs during this period compared to the more developed AcBut-calicheamicin linker-payloads at the time. This stage of ADC development aligns with GO being the first clinically approved ADC in 2000, albeit GO was withdrawn in 2010 due to fatal toxicities and then re-approved in 2017. Early exploration of PEG incorporation into linker structures also emerged during this interval.\u003c/p\u003e\n\u003cp\u003eFrom 2010-2018, research shifted toward protease-cleavable linkers, particularly dipeptide-based systems. This period saw extensive use of PEG moieties likely addressing challenges in conjugating hydrophobic small molecules to mAb side chains with a potential aim at higher DARs. Not coincidentally, there was also significant exploration of linker conjugation strategies involving lysine residues, which likely followed the development and success of ado-trastuzumab emtansine (T-DM1)\u003csup\u003e25\u003c/sup\u003e. Anti-microtubule payloads paired with cysteine conjugation strategies also gained prominence during this period and coincides with the success of brentuximab vedotin (BV)\u003csup\u003e26\u003c/sup\u003e. However, the ADCs during this interval showed diverse IC\u003csub\u003e50\u003c/sub\u003e values and were tested across both solid and liquid tumors. This likely reflects a phase of experimentation to identify effective payloads and optimize their pairing with various linker technologies for different tumor types.\u003c/p\u003e\n\u003cp\u003eThe period from 2019 and onward is marked by a notable effort to increase payload diversity. Included are newer generation DNA damaging agents and microtubule inhibitors. The valine-citrulline dipeptide coupled to the self-immolative para-aminobenzyl spacer (vc-PAB) became the linker most present, most likely due to its ability to enable payload release without residual linker and/or antibody components and its improved controlled release relative to hydrazone-based linkers\u003csup\u003e27\u003c/sup\u003e. This period also unearthed a research focus on refining payload-linker pairings to maximize ADC efficacy against traditionally difficult to treat solid tumors such as glioblastoma (\u003cem\u003ee.g\u003c/em\u003e., SNB75), metastatic lung cancer (\u003cem\u003ee.g\u003c/em\u003e., NCI-H1930), and colorectal cancer (e.g., HT-29). However, the ADC IC\u003csub\u003e50\u003c/sub\u003e values were highly variable and reflects the ongoing challenges in optimizing not only ADC structural combinations, also the emerging importance of the tumor type interfaces.\u003c/p\u003e\n\u003cp\u003eRegarding cytotoxicity, the field initially attempted to pair mAbs with traditional chemotherapeutics with micromolar IC\u003csub\u003e50\u003c/sub\u003e value range like doxorubicin however these ADCs had limited clinical anti-tumor activity\u003csup\u003e28\u003c/sup\u003e. The five records in ADCpedia for ADCs with doxorubicin have IC\u003csub\u003e50\u003c/sub\u003e values ranging from 800-1250 nM. This prompted a shift to develop ADCs with ultra-potent IC\u003csub\u003e50\u003c/sub\u003e values in picomolar range in the early 2000s. The notable shift, based on publication dates, occurred around 2010. ADC \u003cem\u003ein vitro\u003c/em\u003e activities appeared to settle between 0.1 nM and 10 nM. However, upon closer inspection, ADC IC\u003csub\u003e50\u003c/sub\u003e values varied \u0026gt;300-fold (Suppl. Fig. 2). This wide distribution highlights areas of confusion within the field on what constitutes a \u0026lsquo;potent\u0026rsquo; ADC design for a given target antigen and tumor type and is further complicated due to inconsistent conditions in cytotoxicity assays, and as optimization remains empirical, there is a need for cost-effective systematic strategies for the field to move forward\u003csup\u003e29, 30\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDescriptive embeddings connecting ADC structures and tumor cell biology\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo model ADC structural complexities in an ML framework, classical properties such as molecular weight, log P values, hydrogen bond donors and acceptors, rotatable bonds, molecular fingerprints, topological surface area for the linkers and payloads were integrated\u003csup\u003e31\u003c/sup\u003e. More ADC-specific elements, such as DAR and antibody conjugation site (\u003cem\u003ee.g\u003c/em\u003e., lysines, cysteines, sugars) were embedded as distinct chemical features\u003csup\u003e32\u003c/sup\u003e. The payload-biology connection was represented based on their intracellular targets. Specifically, payloads were stratified into the three mechanisms driving cytotoxicity, namely, DNA damagers, microtubule disruptors, and topoisomerase I inhibitors. Each cytotoxic mechanism was described as either present or absent in binary 1 or 0, respectively.\u003c/p\u003e\n\u003cp\u003eThe tumor cell-specific features related to the complex multi-step cellular delivery process of ADCs were captured in multiple layers using two key approaches. First, the canonical amino acid sequences for target antigens were retrieved from UniProt and transformed into principal component analysis (PCA)-reduced Evolutionary Scale Modeling (ESM) embeddings\u003csup\u003e32\u003c/sup\u003e. The amino acid sequences for mAbs were not scrutinized and treated as a constant feature because most mAbs utilized as ADCs are of the human IgG\u003csub\u003e1\u003c/sub\u003e isotype and have molecular weights of approximately 150 kDa, and to the best of our search, limited publicly available information was found on mAb sequences and epitope binding specifics, most likely for proprietary reasons. Second, to address the aberrant activity of key ADC intracellular trafficking processes that have been shown to impair ADC efficacy\u003csup\u003e10, 12, 33\u003c/sup\u003e, the transcriptomic data from the Cell Model Passport repository\u003csup\u003e21\u003c/sup\u003e was utilized as a reference database to derive transcriptomic signatures for cell lines (Fig. 1A). At the time of this study, the repository contained comprehensive RNA-seq profiles \u0026lsquo;read counts\u0026rsquo; for approximately 37,600 protein encoding genes from 1,479 human tumor cell lines that represented 42 different tumor types (Fig. 1B), including an additional 12 non-cancer cell lines representing diverse tissue types. The read counts for each cell line with ADC activity information were integrated as PCA-reduced features.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAntigen expression prediction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe density of target antigens on the surface of cancer cells is a critical determinant of ADC efficacy. Increased antigen density allows for increased ADC binding to target cells and enhances cytotoxic payload accumulation\u003csup\u003e9\u003c/sup\u003e. The antigen expression level as a marker for \u0026lsquo;go/no go\u0026rsquo; decisions is evident in the ADC field, and although, empirical research is important, it is limited. For example, ADC designs are typically screened by either evaluating 1) different linkers and keeping the payload constant\u003csup\u003e34\u003c/sup\u003e or different payloads and keeping the linker constant\u003csup\u003e35\u003c/sup\u003e on a few cell lines containing high and negative target antigen expression based on previous common knowledge of the tumor system or 2) developing an in-house flow cytometric system to comparatively connect the protein cell surface density of a target antigen across numerous tumor cell lines but for a few tumor types and associate IC\u003csub\u003e50\u003c/sub\u003e values to a single ADC design\u003csup\u003e36\u003c/sup\u003e. These approaches, while important, fall short in systematically addressing the complexities of ADC design, particularly in early-stage development.\u003c/p\u003e\n\u003cp\u003eAlthough ADCpedia contained comprehensive proteomic information, there were significant challenges. Protein intensity and mRNA transcription levels are often poorly correlated\u003csup\u003e21, 37, 38\u003c/sup\u003e as demonstrated by their weak relationship (Pearson = 0.26) in the Cell Model Passport\u003csup\u003e21\u003c/sup\u003e (Fig. 2A). Another critical challenge was that protein intensity across cell lines contained significant gaps, both in the antigens listed in ADCpedia and in the greater proteomic data in the Cell Model passport (Suppl. Fig. 3A). Approximately 91% of the proteomic information was missing compared to the mRNA transcription counterparts. These disparities are common and attributable to degradation, post-translational modifications and/or inadequate capture techniques\u003csup\u003e39\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eTo address these challenges, the GENCEP model (Fig. 1A) was created to predict the protein intensities from all cell lines in the Cell Model Passport. GENCEP used a fusion neural network model that was trained on ~4.8 million data points, integrating ESM embeddings of target genes, their transcriptomic profiles, and the reported protein intensities for each cell line. Hyperparameter optimization and training over 24 epochs with early stopping was performed (Suppl. Fig. 3B). As a result, GENCEP demonstrated strong predictive performance, achieving a correlation of R\u003csup\u003e2\u003c/sup\u003e = 0.86 on a set of ~400,000 random proteins from all cell lines in the Cell Model Passport (Suppl. Fig. 3C). As a result, GENCEP was able to predict the intensities for the target antigens in ADCpedia from cell lines challenged with ADCs with a correlation of R\u003csup\u003e2\u003c/sup\u003e = 0.75 with ground truth labels on the test set (Fig. 2B). This significantly outperformed a zero-rule regression model (R\u003csup\u003e2\u003c/sup\u003e=-8E\u003csup\u003e-6\u003c/sup\u003e). This showed that GENCEP was not only effective at capturing meaningful relationships in the tumor cell biological data to predict protein expression intensities but also provided universal standardization for protein levels. Since the amount of physical protein, especially in cancer, drives cellular functions, GENCEP is powerful at also capturing protein properties in the cell lines the Cell Model Passport. Taking advantage of this achievement, the antigen intensity predictions generated by GENCEP were embedded within ADCpedia replacing the ground truth intensity values and organized as a priority feature and first applied to evaluate antigen densities across the tumor cell lines and tumor types.\u003c/p\u003e\n\u003cp\u003eHER2 served as a benchmark antigen due to the amplification of the HER2/neu oncogene and the resulting high cell surface antigen densities, which have been directly correlated to outcomes for patients treated with trastuzumab\u003csup\u003e40, 41\u003c/sup\u003e. Moreover, the relationship between ADC cytotoxicity in various levels of HER2 expressing tumor cells has previously been studied\u003csup\u003e42\u003c/sup\u003e. For example, the HCC1937 and MCF-7 breast cancer cell lines are often accepted as \u0026lsquo;low/negative\u0026rsquo; HER2-expressing cells used in studying T-DM1\u003csup\u003e34, 43\u003c/sup\u003e and trastuzumab deruxtecan (T-Dxd)\u003csup\u003e42\u003c/sup\u003e. The HER2 real/predicted intensities were 2.96/2.77 for MCF-7 and 2.24/3.16 for HCC1937 cell lines. The ovarian cancer cell line SK-OV-3 has been used as a HER2 \u0026lsquo;high\u0026rsquo;-expressing cell line\u003csup\u003e42\u003c/sup\u003e. The HER2 real/predicted intensities for SK-OV-3 were 8.34/7.74.\u003c/p\u003e\n\u003cp\u003eTherefore, standardized metrics for predicted intensities were created into high (\u0026ge;7.5), mid (\u0026ge;2.78-7.4), and low/negative (\u0026lt;2.78) expression levels across all tumor cell lines. Interestingly, HER2 intensity stratification revealed that fewer cell lines than anticipated expressed high levels with reported ADC activities. Tumor cell lines NCI-N87 (gastric carcinoma) and UACC-812 (breast ductal carcinoma) expressed very high HER2 levels of \u0026gt;10 (Fig. 3A). At the mid expression level most HER2-positive cell lines belonged to breast cancer types. When evaluating the mean HER2 intensities for different cancer systems, novel insights became apparent. The average HER2 intensity was the highest at 3.94 for breast cancer, which was lower than the 4.24 mean intensity for all ADC-challenged tumor cell lines in ADCpedia (Fig. 3A and B). This indicated that other tumor types existed with attractive HER2 expression profiles. For example, blood and gastric cancers were potential attractive targets as their HER2 predicted intensities were 3.56 and 3.3, respectively. Additionally, lung, renal, melanoma, central nervous system (CNS), and female reproductive system-based cancers had mean HER2 intensities of ~3.0. Additional tumor types except for esophageal, pancreas, colorectal, and prostate cancers were all potential tumor types for the development of effective ADCs as their mean HER2 intensities were \u0026ge;2.78 (Fig. 3B). Interestingly, these means were all lower than the global mean of the tumor cell lines that had been empirically tested with investigational ADCs. This indicates that ADC development strategies could be biased against high-expressing HER2-positive tumor cell lines and not reflective of the overall expression within specific tumor types. Therefore, considering that there are scant records of ADC activities in many of these tumor types, there is considerable space for future anti-HER2 ADC development.\u003c/p\u003e\n\u003cp\u003eBeyond HER2, GENCEP determined that the target antigens for approved ADCs all have intensities in the mid-to-low range and represents a significant under explored space for ADC design. For example, Trop2 is a 46-kDa monomeric glycoprotein whose expression has been reported in ~30 different tumor types, albeit using different and semi-quantitative methods\u003csup\u003e44\u003c/sup\u003e, and is the target of the approved ADC sacituzumab govitecan (SG). SG was approved in 2021 for adult patients with Trop2-positive advanced and/or metastatic urothelial and triple-negative breast cancers based on the results from the TROPHY-U-01\u003csup\u003e45\u003c/sup\u003e and ASCENT\u003csup\u003e46\u003c/sup\u003e studies, respectively. Unfortunately, SG was withdrawn for urothelial cancer as it did not meet the primary endpoints in the post-approval Phase 3, TROPiCS-04 study. Interestingly, there was no requirement nor measure of Trop2 for the TROPHY-U-01 study\u003csup\u003e45\u003c/sup\u003e. However, archival tumor samples from patients enrolled in the TROPHY-U-01 study revealed that patient responses did not depend on Trop-2 expression levels, based on immunohistochemistry\u003csup\u003e47\u003c/sup\u003e. Specifically, there was no statistical significance between groups based on high, mid, and low Trop2 expression and objective response rates, progression-free survival, and overall survival. Strikingly, patients with tumors with high Trop2 expression had poorer survival outcomes. A recent report suggests that the instability of the CL2A linker in SG contributes to premature release of SN38, leading to excessive systemic exposure and severe adverse events\u003csup\u003e48\u003c/sup\u003e. Additionally, when expressed at high levels in ovarian cancer, Trop2 inhibits apoptosis by increasing the expression of Bcl-2 and decreasing the expression of Bax\u003csup\u003e11\u003c/sup\u003e. Therefore, Trop2 warrants alternative ADC designs that may be more effective and evaluated at distinct Trop2 expression levels for specific tumor types.\u003c/p\u003e\n\u003cp\u003eIn ADCpedia, many tumor cell lines overexpressing Trop2 and that had been challenged by an ADC reached the high expression level threshold (Fig. 3C). Several more tumor cell lines fell within the mid-level expression range. Notably, the mean predicted Trop2 intensity for ADC-challenged tumor cell lines was 5.50, higher than HER2. Extending to cell lines with no recorded ADC challenges, all organ systems and tumor types except for esophageal and colorectal cancers had mean Trop2 intensities well above the 2.77 cut-off (Fig. 3D).\u003c/p\u003e\n\u003cp\u003eAlso important are antigens overexpressed at the relative low-level range, such as Nectin-4. The mean expression intensity for ADC-challenged tumor cell lines was 2.58 (Fig. 3E). Enfortumab vedotin is approved for the treatment of patients with locally advanced or metastatic Nectin-4-positive urothelial cancer based on the results of a Phase III study\u003csup\u003e49\u003c/sup\u003e. Interestingly, this data indicates that Nectin-4-positive tumor cells are sensitive at these lower relative overexpression levels. Outside of ADCpedia, the mean Nectin-4 intensity for the 28 bladder tumor cell lines was 2.07 (Fig. 3F). Based on these findings, head and neck, breast, and biliary cancers appear as attractive tumor types for anti-Nectin-4 ADC development.\u003c/p\u003e\n\u003cp\u003eThere were several other interesting patterns with additional antigens including CD30, CD19, and folate receptor-ɑ (FR-ɑ). There were only a few tumor cell lines expressing high levels of CD30 that were reported challenged by anti-CD30 ADCs (Suppl. Fig. 4A). The mean Nectin-4 intensity for these tested cell lines was 6.73. However, the mean CD30 intensities decreased substantially across all tumor types in the Cell Model Passport that had not been tested on by ADCs (Suppl. Fig. 4B). Hematologic malignancies had the highest mean intensity of 1.51. Based on this threshold, central nervous system-based tumors, thyroid, mesothelium, liver, and kidney cancers are all potential attractive targets for ADC development. Lymphoma cell lines expressed mid-levels of CD19 and were most of the cases challenged by ADCs (Suppl. Fig. 4C). Hematologic malignancies also had the highest relative mean intensity of 2.47 (Suppl. Fig. 4D). No other tumor type surpassed an intensity mean of 2.0 in the Cell Model Passport and could indicate that CD19 targetability is restricted to blood cancer types. There were only a few records of FRɑ-expressing cell lines challenged by ADCs and the mean intensity was notably low (1.42) (Suppl. Fig. 4E). Evaluating the entire Cell Model Passport revealed even lower mean FR-ɑ intensities (Suppl. Fig. 4F). FR-ɑ is the target for mirvetuximab soravtansine and is indicated for ovarian, fallopian tube, or peritoneal cancers. In the ovarian/endometrial/cervical cancer group, the mean FR-ɑ intensity was 1.25. Based on this expression threshold, CNS-based cancers, melanoma, sarcomas, head and neck, thyroid, kidney, breast, mesothelium, and gastric cancers all appear as potential targets for ADC development.\u003c/p\u003e\n\u003cp\u003eThese findings demonstrate the value of GENCEPs ability to standardize and quantify antigen density. This standardized metric revealed that HER2 and Trop2 have substantially increased expression in multiple tumor types compared to the target antigens for the other approved ADCs. Importantly, it also revealed that extremely high antigen expression, represented by HER2, is not an absolute requirement to develop effective ADCs. The antigens examined reveal a large grey zone where multiple tumor types are potentially attractive for ADC development. Some antigens such as FR-ɑ even indicate that even very low expressed targets can be developed as targets for future ADCs. Taken together, GENCEP provides a system that could significantly improve decisions on what tumor types should and should not be pursued for ADC development.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA multimodal ADC framework for ML training\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBy integrating predicted antigen intensities into ADCpedia, this specific insertion enabled the contextualization of several intricate associations between antigen expression, ADC structural elements, and ADC activities across tumor types. In doing so, influential factors affecting ADC complexities were illuminated and emphasizes the deep multifaceted content linking ADCs and tumor cell biology. The framework also highlighted the need for using ML-based assistance for ADC design.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ei) The contextual relationship between predicted target antigen intensity and ADC cytotoxicity\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWhile high antigen expression is often considered a key determinant of ADC efficacy, the deeper and systematic analysis afforded by this platform exposed critical nuances. Across all target antigens in ADCpedia, ADCs with high cytotoxic potency (IC\u003csub\u003e50\u003c/sub\u003e \u0026lt;10 nM) were more frequently associated with antigens predicted to have high expression levels, compared to mid (IC\u003csub\u003e50\u003c/sub\u003e = 10-100 nM) and low (IC\u003csub\u003e50\u003c/sub\u003e \u0026gt;100 nM) potencies (Fig. 4A). However, this trend was not statistically significant. Additionally, a point-biserial analysis revealed a poor association (\u003cem\u003er\u003c/em\u003e = 0.145) between antigen predicted intensities and ADC IC\u003csub\u003e50\u003c/sub\u003e values split into active (1 \u0026lt;10 nM) and non-active (0 \u0026ge;10 nM) potencies (Fig. 4B). As anticipated, mRNA read count values were broadly spread across the range of predicted antigen intensities for all three ADC potency stratifications (Suppl. Fig. 5), reinforcing that raw mRNA information alone is an insufficient indicator, even when combined with protein intensity for reliable antigen selection.\u003c/p\u003e\n\u003cp\u003eFor HER2-targeted cases, while ADCs tested against low HER2-expressing cell lines consistently exhibited IC\u003csub\u003e50\u003c/sub\u003e values \u0026gt;100 nM, the insightful revelation was that no clear distinction was observed between mid and high expression levels and their associated potencies (Fig. 5A). This demonstrates that even for a well-studied antigen like HER2, relationships between expression at a targetable intensity and efficacy remain ambiguous.\u003c/p\u003e\n\u003cp\u003eThere was also a notable overlap with IC\u003csub\u003e50\u003c/sub\u003e values for ADCs tested on cells with low- and mid-level Trop2 expression (Fig. 5B). This further demonstrates the ambiguity between expression and efficacy yet also presents potential broader therapeutic opportunities for Trop-2-targeting ADCs. Based on the clinical experience with SG targeting Trop2 and that most ADCs targeting Trop2 in ADCpedia are SG (Suppl. Fig. 1), together, strongly indicate that alternative ADC designs, such as linker type, are warranted against this antigen.\u003c/p\u003e\n\u003cp\u003eCD79b is a component of the B-cell receptor complex and critical to the proper endocytosis of bound foreign antigens as part of the immune presentation pathway\u003csup\u003e12\u003c/sup\u003e. CD79b is well documented to be exclusively expressed in immature and mature B cells but overexpressed in \u0026ge;80% of B cell-based neoplasms\u003csup\u003e50, 51\u003c/sup\u003e. Polatuzumab vedotin (Pola-V) is a clinically approved ADC specifically indicated for the treatment of patients with diffuse large B-cell lymphoma (DLBCL) based on results from the POLARIX trial\u003csup\u003e52\u003c/sup\u003e. Preclinical testing of various Pola-V prototypes incorporating different combinations of linkers, payloads, and conjugation site on multiple CD79b-positive lymphoma cell lines revealed enhanced activity favored cleavable linker-incorporated ADCs\u003csup\u003e53\u003c/sup\u003e. However, the CD79b expression levels on these cell lines were initially unknown. It was notable that a main reason for selecting the linker-payload design for Pola-V was that ADCs incorporating a non-cleavable linker exhibited poor internalization, and hence, ineffective payload release and anti-tumor activity\u003csup\u003e54\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eIt was subsequently reported that below a geometric mean fluorescence intensity threshold, CD79b-positive lymphoma cell lines were insensitive to anti-CD79b ADCs at a concentration of ~70 nM\u003csup\u003e36\u003c/sup\u003e. Evaluating primary samples from patients with chronic lymphocytic leukemia, marginal zone lymphoma, hairy cell leukemia, follicular lymphoma, mantle cell lymphoma, and DLBCL showed that CD79b expression varied, was marginally higher than on normal B-cells, and there was a trend with relative higher expression correlated with more potent ADC IC\u003csub\u003e50\u003c/sub\u003e values\u003csup\u003e36\u003c/sup\u003e. However, these expression levels were relative with absolute numbers missing. In this work, GENCEP predictions revealed that the mean CD79b intensity for hematologic malignancies was 3.28. Interestingly, all the tumor types apart from esophageal and colorectal cancers had considerably higher CD79b predicted intensities. However, there was no IC\u003csub\u003e50\u003c/sub\u003e value differences when stratifying cell lines into high and mid CD79b intensities (Fig. 5C). Taken together, CD79b is an attractive target for further ADC development for multiple types of cancer and, yet the relationship between expression and cytotoxic potency remains ambiguous.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eii) The contextual relationship between intracellular targets and ADC cytotoxicity\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAlthough DNA and microtubules are common intracellular targets for both ADCs and traditional chemotherapeutics, the notable distinction is the ultra-cytotoxicity of the payload\u0026rsquo;s ADCs transport. The rationale is that these highly toxic agents can overcome natural insensitivities certain tumor types may have for traditional chemotherapeutic mechanisms of actions, which the latter evolved and optimized over several decades for the treatment of specific tumor types\u003csup\u003e55\u003c/sup\u003e. Analyzing the tumor cell lines in ADCpedia revealed significant variability in IC\u003csub\u003e50\u003c/sub\u003e values across tumor types and highlights the challenge of matching specific payload structures to tumor biology.\u003c/p\u003e\n\u003cp\u003eFor ADCs delivering microtubule inhibitors, there were several tumor types that were sensitive and IC\u003csub\u003e50\u0026nbsp;\u003c/sub\u003evalues were \u0026lt;10 nM. DM1 is a derivative of maytansine, whose \u003cem\u003ein vitro\u003c/em\u003e activity is 100- and up to 270-times more potent than traditional chemotherapeutic microtubule inhibitors vinca alkaloids and paclitaxel, respectively\u003csup\u003e56-58\u003c/sup\u003e. Monomethyl auristatin E (MMAE) and monomethyl auristatin F are peptide analogs of dolastatin 10, which showed ultra-cytotoxic activities against human cancer cell lines\u003csup\u003e59, 60\u003c/sup\u003e. As anticipated, breast and blood cancer cell lines were sensitive to microtubule inhibitor-incorporated ADCs with the majority of IC\u003csub\u003e50\u003c/sub\u003e values \u0026le;10 nM (Fig. 6A). This aligns with the indications for T-DM1, and Pola-V, as previously described. There was considerable variation for many tumor types, possibly reflecting investigational ADCs that did not achieve the potency of more successful or approved counterparts. Notably, prostate cancer was the most sensitive tumor type, while sarcomas, mesolthelial, kidney, and colorectal were notably less sensitive with ADC mean IC\u003csub\u003e50\u003c/sub\u003e values approaching 100 nM. Interestingly, lung cancers were distinctly divided in sensitivity between small cell and non-small cell types and explains the large variation in ADC cytotoxic potency. The mean IC\u003csub\u003e50\u003c/sub\u003e values for ADCs targeting non-small cell lung cancer (NSCLC) were highly potent bordering just below 1 nM. In contrast, ADCs targeting small cell lung cancer cell lines displayed IC\u003csub\u003e50\u003c/sub\u003e values approaching 100 nM. Other cancers such as ovarian and oral cavity cancer cell lines had lower mean sensitivities compared to breast cancer and non-Hodgkin\u0026rsquo;s lymphoma cell lines, albeit with significant variability, and suggest these tumor types can be targets for future ADC development. In contrast, colorectal tumor cell lines exhibited the least mean sensitivity, which aligns with colorectal cancer known as inherently resistant against anti-microtubule chemotherapy\u003csup\u003e61\u003c/sup\u003e. However, the variability is wide and potentially indicates a microtubule inhibitor-incorporated ADC with a different design format (\u003cem\u003ee.g.,\u003c/em\u003e linker, DAR) may prove otherwise.\u003c/p\u003e\n\u003cp\u003eFor ADCs delivering DNA-damaging payloads, there were 13 tumor types that had been tested (Fig. 6B). It is currently thought the reason for why ADCs transporting calicheamicin payloads have had limited clinical effectiveness against solid tumors is due to poor tumor penetration and/or accessing of the DNA at concentrations below dose-limiting toxicity\u003csup\u003e62, 63\u003c/sup\u003e. GO is indicated for the treatment of adult patients with newly diagnosed or relapsed/refractory CD33-positive acute myeloid leukemia (AML), and in pediatric patients\u003csup\u003e64\u003c/sup\u003e. In general, DNA-damaging-incorporated ADCs had a mean IC\u003csub\u003e50\u003c/sub\u003e value approaching 1 nM for hematologic malignancy cell lines and support the effectiveness in constructing ADCs against AML. However, the blood cancer subtype, acute lymphoblastic leukemia, displayed much less sensitivity against these ADCs. Melanoma was highly resistant against DNA-damaging-incorporated ADCs, suggesting that DNA-damaging-based ADCs may not be ideal for this tumor type and that the cytotoxic potency itself and not the systemic dosing issue, should be considered. Nevertheless, such profound contrasts reinforce that ADCs delivering DNA damaging agents are under explored.\u003c/p\u003e\n\u003cp\u003eTopoisomerase I inhibitors represent a unique class of agents in the ADC landscape, with potencies that test the definition of \u0026lsquo;ultra-toxic\u0026rsquo;\u003csup\u003e65\u003c/sup\u003e. Topoisomerase I was originally validated as a cancer target when tumor cells died when treated with camptothecin, which was limited by its poor solubility and unacceptable toxicity\u003csup\u003e66\u003c/sup\u003e. Both payloads SN-38 and Dxd for the currently approved SG and T-Dxd, respectively, are camptothecin derivatives. SN-38 is the active component of irinotecan and, importantly, is listed in the mid-to-high nM range and distant from the sub-nM payloads targeting DNA and tubulin\u003csup\u003e65\u003c/sup\u003e. Comparatively, Dxd, which is a derivative from exatecan, has been reported to be 10-fold more potent that SN-38\u003csup\u003e67, 68\u003c/sup\u003e. Unsurprisingly, topoisomerase I inhibitor derivatives have been greatly pursued where at least 72 novel ADCs transporting more than 15 different camptothecin-based payloads, in combination with more than 21 different linkers across 24 different targets have been developed\u003csup\u003e35\u003c/sup\u003e. Recently, several novel topoisomerase I inhibitors were reported all based on the camptothecin backbone and interestingly, exhibited sub-nM to low nM IC\u003csub\u003e50\u003c/sub\u003e values\u003csup\u003e35\u003c/sup\u003e and the available structures were integrated in ADCpedia.\u003c/p\u003e\n\u003cp\u003eAn analysis of the ADCs delivering topoisomerase I inhibitors in ADCpedia revealed more uniform potencies with mean values all hovering around 10 nM (Fig. 6C). The number of tumor types of eight was notably less than the tumor systems evaluated with ADCs incorporated with microtubule inhibitors and DNA damagers. ADCs delivering topoisomerase I inhibitors were the most potent against thyroid cancer followed by hematologic malignancies and ovarian/endometrial/cervical-grouped cancers. There was wide IC\u003csub\u003e50\u003c/sub\u003e variability for lung cancers since, NSCLC tumor cell lines were sensitive while small-cell lung cancer cell lines were less sensitive. Interestingly, breast cancer was the most resistant tumor type with IC\u003csub\u003e50\u0026nbsp;\u003c/sub\u003evalues approaching 100 nM. The structural diversity of the topoisomerase I inhibitor-incorporated ADCs was significantly less compared to the ADCs incorporated with the two other payload types, as 129/138 ADCs were SG or T-Dxd or very similar (Suppl. Fig. 1). Additionally, these ADCs were almost all exclusively constructed with DARs of 8, eliminating DAR as a confounding variable and narrowing the focus to tumor biology matching. This further underscores the importance of identifying the optimal tumor type-payload combination when designing an ADC in early development.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eiii) The contextual relationship between linker types and ADC cytotoxicity\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLinkers play a pivotal role in ADC design, substantially influencing both the stability and release of the cytotoxic payload. Over the past two decades, significant innovation in linker chemistry has been achieved \u0026ndash; yielding a rich array of constructs and captured within ADCpedia (Suppl. Fig. 1). There were 18 tumor systems/types tested with cleavable linker-incorporated ADCs, while only seven tumor systems/types tested with non-cleavable-incorporated ADCs.\u003c/p\u003e\n\u003cp\u003eThe ADCs incorporating cleavable linkers included those susceptible to cysteine-specific cathepsins (\u003cem\u003ei.e\u003c/em\u003e., vc linker), cysteine- and serine-specific cathepsins (\u003cem\u003ee.g\u003c/em\u003e., GGFG linker), glutathione- (\u003cem\u003ei.e\u003c/em\u003e., AcBut) and pH-based (\u003cem\u003ee.g\u003c/em\u003e., CL2A) payload release mechanisms. As a group, these types of ADCs varied greatly in cytotoxic potency across all tumor types (Fig. 7A). The mean IC\u003csub\u003e50\u003c/sub\u003e values indicated that prostate cancer, B-cell lymphomas, breast cancer, NSCLC, AML, and small cell lung cancer cell lines highly sensitive, albeit with wide variation. In contrast, ovarian and colorectal cancers exhibited notably less sensitivity, also with wide variability. Although mesothelial cancer displayed less sensitivity, there were only two ADC records.\u003c/p\u003e\n\u003cp\u003eThe large IC\u003csub\u003e50\u003c/sub\u003e value differences may be immediately due to the diversification of the cleavable linkers contained in ADCpedia. For instance, novel amino acid combinations and sequence lengths linkers such as the GGFG linker used in T-Dxd\u003csup\u003e68\u003c/sup\u003e, which is sensitive to both cysteine and serine proteases, it is not known how efficient cleavage and payload release is compared to other cleavable linkers. On the other hand, non-cleavable linkers require efficient lysosome ADC delivery and digestion to liberate the payload. For non-cleavable linker-incorporated ADCs, all seven tumor types displayed sensitivities of \u0026le;10 nM (Fig. 7B).\u003c/p\u003e\n\u003cp\u003eIn ADCpedia, the overall comparison of non-cleavable and cleavable linkers reveals a striking variability in cytotoxic potency that is driven by target expression heterogeneity and evolving linker chemistries. Non-cleavable linker-incorporated ADCs, while there appeared a narrower mean of IC\u003csub\u003e50\u003c/sub\u003e potencies compared to cleavable linker-incorporated ADCs, also exhibited wide variabilities, indicating the inconsistencies are tied to target antigen expression heterogeneity in tumor tissue since these payloads are trapped inside cells when released. In contrast, cleavable linker-incorporated ADCs showed much more deviations across and within tumor types, this may be more reflective of the different release mechanisms for these linkers.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eiv) The contextual relationship between DAR and ADC cytotoxicity\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDAR is greatly influential to ADC design as it effects overall potency, stability, and pharmacokinetics. The DARs in ADCpedia ranged from 0.8-9.0, with a mean DAR of 3.73. The most frequent DARs were 2.0, 3.4, 4.0, and 8.0 (Fig. 8). Importantly, both ultra-toxic and moderate-toxic payloads have been developed across this DAR spectrum over the past two decades. Interestingly, when DAR values were analyzed across tumor types and target antigens, no consistent patterns emerged. For cytotoxicity patterns, ADCs with DARs of 2 and 4 were more frequent with both high potency (IC\u003csub\u003e50\u003c/sub\u003e \u0026lt;1 nM) and low potency (IC\u003csub\u003e50\u003c/sub\u003e \u0026gt;100 nM). This suggest that the DAR at the 2-4 range did not uniformly correlate with enhanced cytotoxic potency. Notably, higher DARs were not associated with increased cytotoxic potency. This lack of association can be partially attributed to the inherent differences in free payload potency. For example, highly potent payloads such as PBD dimers achieve strong cytotoxic effects even at very low DARs of 1\u003csup\u003e69, 70\u003c/sup\u003e. In contrast, less potent payloads like SN-38 are almost exclusively assembled with DARs of 8 to compensate for reduced potency\u003csup\u003e65\u003c/sup\u003e. Because ADCs with DARs of 8, particularly with ADCs delivering topoisomerase I inhibitors, are more recent advancements and are underreported in the current ADC literature, they are not as represented to potentially reveal emerging trends in this specific type of ADC design.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA multimodal model for ADC activity prediction across tumor cell lines and tumor types\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAMM is multimodal convolutional neural network model designed to predict the probability of ADC \u003cem\u003ein vitro\u003c/em\u003e activity by analyzing ADC structures and their IC50 values and connecting this with the tumor cell biological information integrated in ADCpedia (Fig. 9A). Importantly, AMM also learned from the predicted antigen intensity embeddings from GENCEP and the proteomic and transcriptomic cell line embeddings (Fig. 9B). To this end, the maximum number of neurons were dedicated to the antigen intensity prediction embeddings as part of AMMs training and overall goal to generate predictions on the likelihood of cytotoxic potencies at defined nM values for given tumor cell lines expressing their individual levels of the target antigen.\u003c/p\u003e\n\u003cp\u003eAs previously mentioned, there are several other tumor cell biological parameters, beyond antigen expression, important for ADC efficacy. Therefore, also embedded were GENCEP-predicted intensities for proteins involved in key areas in the ADC mechanism of action and tumor cell resistance such as the endosomal-lysosomal system (RAB7, CTSB, LAMP1), drug efflux (ABCG1, ABCG2), DNA repair, (BRCA1, BRCA2, TP53, RAD51), and survival signaling (PIK3CA, AKT1, mTOR). As the main goal of this study was to evaluate the performance of the ADC design platform based on antigen expression, these other tumor cell protein embeddings were assigned only a single neuron. The AMM model was then trained as an ADC activity classifier using 10 nM, 5, and down to 1 nM thresholds to define active versus inactive ADCs because the median IC\u003csub\u003e50\u003c/sub\u003e value in ADCpedia was ~2.5 nM with 25% and 75% of values having median IC\u003csub\u003e50\u003c/sub\u003e values of 0.2 nM and 36.1 nM, respectively. By training at these three distinct IC\u003csub\u003e50\u003c/sub\u003e values, it was possible to evaluate predictions within a range reflecting current ADC potencies. The database was split so that ADC structural features were diverse in the training, validation, and test sets. A subsequent extensive hyperparameter optimization was performed, using a loss function to reduce penalties for incorrect predictions \u0026plusmn; 1 nM at the specific thresholds.\u003c/p\u003e\n\u003cp\u003eThe AMM model was evaluated on internal and external test sets, which consisted of structurally diverse ADCs. For the internal test set, the real-world cytotoxicity data was known. To assess generalizability, the IC\u003csub\u003e50\u003c/sub\u003e values were blinded from the authors in this study and unavailable until after the predictions were made. These evaluations tested if the AMM model could effectively predict ADC potency across multiple tumor cell lines from different tumor types, based on the model\u0026rsquo;s training emphasizing antigen expression.\u003c/p\u003e\n\u003cp\u003eThe AMM model successfully distinguished active from inactive ADCs with high confidence across the 1-10 nM range of relevant ADC cytotoxic potencies with the 10 nM threshold performing the best. The internal validation set results for the area under the receiver operating characteristic curve (AUC) scores were 0.85 (10 nM), 0.71 (5 nM), and 0.78 (1 nM), while the AUC scores for the internal test sets were 0.84 (10 nM), 0.87 (5 nM), and 0.87 (1 nM) (Fig. 10A).\u003c/p\u003e\n\u003cp\u003eEvaluations were also performed at a 50% probability classification threshold (active\u0026ge;50% and inactive\u0026lt;50%) to provide further insights into the AMM model\u0026rsquo;s performance. On the internal test set, the AMM model at the 10 nM threshold had a good and balanced performance with high accuracy (0.81), precision (0.70), recall (0.67), and F1-score (0.68) (Suppl. Fig. 6). At 5 nM, the AMM model also performed well on the internal test set having comparable accuracy (0.72) and F1-score (0.70) to the model\u0026rsquo;s performance at 10 nM. Notably, its precision (0.95) was higher. However, it had poor recall (0.56), indicating it missed many true active ADCs. At 1 nM, the AMM model also performed well. It had good accuracy (0.79), precision (0.85), recall (0.65), and F1-score (0.74) comparable to the model\u0026rsquo;s performance at 10 nM. These results indicated the model successfully distinguished active and inactive ADCs with high confidence, especially at 1 and 10 nM, across a broad range of relevant ADC cytotoxic potencies in the internal test sets.\u003c/p\u003e\n\u003cp\u003eThe blinded, external, and unpublished ADC dataset provided an opportunity to determine whether the AMM model could accurately classify ADC potency in a real-world scenario, which the model performed notably well. This data set was comprised of four distinct ADCs (T-DM1, T-Dxd, disitamab-vc-MMAE [D-MMAE], and trastuzumab-pAcF-Amberstatin 269 [T-A269]) targeting HER2 across nine tumor cell lines representing breast, esophageal, and gastric cancers (Fig. 10B). Additionally, the tumor cell lines had HER2 levels with varying predicted expression intensities and there was a total of four different linker and payload types, and three different DARs.\u003c/p\u003e\n\u003cp\u003eThe AUC scores were 0.95 (10 nM), 0.87 (5 nM), and 0.82 (1 nM) (Fig. 10A). The confusion matrix evaluations further supported the AUC scores, with accuracy values of 0.90 (10 nM), 0.85 (5 nM), and 0.70 (1 nM) (Suppl. Fig. 6). Notably, at 10 nM the AMM model outperformed internal test set ADC predictions with higher precision (0.88 vs 0.70), recall (0.78 vs. 0.67), and F1-score (0.82 vs. 0.68). At 5 nM, AMM also outperformed predictions compared to the internal test set for accuracy, recall, and F1 score. This highlights its potential robustness in real-world testing. This was particularly evident in cases such as D-MMAE and T-A269 ADCs tested against the gastric SNU-216, breast Hs-578T, and esophageal OE19 cancer tumor cell lines. These ADC-cell line pairings were absent from the original training set as they have not been reported in the literature.\u003c/p\u003e\n\u003cp\u003eFurthermore, there were observable instances where AMM predictions aligned with HER2 expression intensities. It was observed that the AMM model\u0026rsquo;s predictive probabilities for ADC activities shifted in response to IC\u003csub\u003e50\u003c/sub\u003e values near specific thresholds. For example, at the 10 nM threshold, AMM predicted high probabilities (0.89 \u0026plusmn; 0.06) of T-DM1 activity in high (\u0026ge;7.5) HER2-expressing cell lines (representing breast and esophageal cancers), where actual IC\u003csub\u003e50\u003c/sub\u003e values confirmed strong potencies (0.50 nM \u0026plusmn; 0.58 nM) (Suppl. Table 1). For the mid-level (5.31) HER2-expressing breast cancer JIMT-1 cells, AMM shifted its probability (0.32) as the T-DM1 IC\u003csub\u003e50\u003c/sub\u003e values were 7.62 and 6.89 nM. For the slightly lower (4.92) HER2-expressing gastric cancers SNU-216 cells, where T-DM1 failed to reach an IC\u003csub\u003e50\u003c/sub\u003e value, AMM predicted an even lower probability of 0.03. In the lowest (3.09) HER2-expressing Hs 578T cell line, where T-DM1 was also ineffective, AMM predicted a minimal activity probability of 0.06. Interestingly, at the 5 nM threshold, AMM\u0026rsquo;s probability for JIMT-1 increased to 0.47, indicating that this threshold captured better T-DM1\u0026rsquo;s activity near the 5 nM IC\u003csub\u003e50\u003c/sub\u003e value (Suppl. Table 2). Similarly, at the 1 nM threshold, AMM assigned a probability of 0.17 to JIMT-1 and 0.01 to Hs 578T (Suppl. Table 3). Although there is room to improve sensitivities, these findings suggest that the AMM model is able to refine predictions at changing HER2 expression levels by adjusting the IC\u003csub\u003e50\u003c/sub\u003e nM thresholds and, therefore, can improve ADC activity classifications. This reflects AMM\u0026rsquo;s capacity to integrate the intricate ADC-tumor cell interconnections into accurate ADC activity predictions.\u003c/p\u003e\n\u003cp\u003eThese AMM predictions also trended with the ADCs T-Dxd. Notably, at the 10 nM threshold, T-Dxd was ineffective against JIMT-1 cells and AMM assigned a probability of 0.28 (Suppl. Table 1). However, for OE19 cells, AMM assigned a probability of 0.80 when T-Dxd was also cytotoxically ineffective. Shifting to the 1 nM threshold, AMM was able to adjust and assigned probabilities of 0.09 and 0.65 for the JIMT-1 and OE19 cell lines, respectively (Suppl. Table 3), again indicating the model improves prediction when shifting to a nM threshold.\u003c/p\u003e\n\u003cp\u003eFurther analyses on specific cell lines, which were selected as they had real IC\u003csub\u003e50\u003c/sub\u003e value data above and below the given nM thresholds, revealed significant additional insights. For example, in the internal test set, AMM achieved perfect accuracies (1.0) for the triple-negative breast cancer MDA-MB-468 and non-Hodgkin\u0026rsquo;s lymphoma NU-DUL-1 cell lines (Suppl. Fig. 7). However, the AMM model was inaccurate for the colorectal carcinoma HT-29 cell line at 10 nM. Interestingly, AMM improved accuracy to 1.00 and 0.50 at the 5 and 1 nM thresholds, respectively.\u003c/p\u003e\n\u003cp\u003eFor cell lines from the blind external test set, AMM had an accuracy value of 0.75 for the 10 nM threshold and values of 0.67 for the 5 and 1 nM thresholds for the high HER2-expressing cell lines. Interestingly, the poor accuracy of 0.50 at the 10 nM threshold for JIMT-1 cells notably improved to 0.75 when the AMM model predicted at the 5 threshold. This again indicates the model\u0026rsquo;s ability to shift and adjust when the nM threshold is closer to the real IC\u003csub\u003e50\u003c/sub\u003e values.\u003c/p\u003e\n\u003cp\u003eBased on the above, the AMM model\u0026rsquo;s predictive performance showed strong potential for generalizability. These cases illustrate how the AMM model can shift according to data points lying close to the classification cutoff to improve ADC \u003cem\u003ein vitro\u003c/em\u003e activity probabilities. These results indicate an impressive predictive performance for ADC activity across different structural designs and tumor cell lines. Moreover, the AMM model showed a capacity to generalize for unseen data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe platform as an online tool for predicting \u003cem\u003ein vitro\u003c/em\u003e ADC efficacies\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA public online platform (www.adcpedia.com) was developed that allows users to access the AMM model for predicting ADC efficacies on selected tumor cell lines SK-BR-3, SKOV3, MCF7, OE19, and COLO 205. The platform is divided into two primary interfaces. The Predict interface offers users to select a target antigen, linker, payload, DAR, and one or all the cancer tumor cell lines previously indicated (Suppl. Fig. 8A). Users begin by creating an ADC \u0026lsquo;nickname\u0026rsquo; to open a new file. For linker and payload chemical structures, the two must be conjugated together. Predict interface offers the option to manually upload the linker-payload chemical structure details encoded in SMILES, SDF, SMI, or CSV formats. Users can also draw the linker-payload using the JSME molecular editor (Suppl. Fig. 8B) if ML-readable formats are not available. Once all parameters are set, users can click on \u0026lsquo;\u003cem\u003ePredict in-vitro\u003c/em\u003e \u003cem\u003eactivity\u003c/em\u003e\u0026rsquo; to generate ADC predictions. The Retrieve interface delivers ADC activity predictions currently set at 10 nM.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eADCs are a powerful yet complex class of therapeutics and face formidable challenges to deliver their payloads accurately and efficiently in target tumor cells. In the current ADC landscape, it is well known that factors such as the nature of the ADC linker, payload, DAR, target antigen expression, internalization, lysosomal targeting, and intrinsic tumor cell sensitivity directly affect antitumor efficacy and off-target toxicity. Moreover, the field understands that unique and optimized combinations of these factors significantly influence ADC efficacy for certain tumor types but not others. Yet, in the case of strategically and systematically assembling ADC structural factors and connecting them with tumor biology factors, little has been documented with AI-powered approaches.\u003c/p\u003e \u003cp\u003eAlthough rigorous experimental studies evaluating ADC design with specific biological characteristics such as antigen expression have been performed, they are limited and vary in approach and do not universally address this longstanding challenge. As previously described, typical ADC design studies base ‘go/no-go’ advancement decisions on empirical studies varying a specific element of an ADC design, while maintaining the other structural elements constant and evaluating cytotoxicity on only limited cell lines\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e. Or, utilizing an in-house antigen expression evaluation system on multiple cell lines in parallel to testing a single ADC design\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. These approaches, while important, fall short in systematically addressing the complexities of ADC design, particularly in early-stage development.\u003c/p\u003e \u003cp\u003eThis work introduced an ADC design platform based on a multimodal learning approach (Figs.\u0026nbsp;1 and 9) that provides a route for enabling confident ADC designs for target antigens and tumor types at the early yet critical initial stage in ADC development. The platform relied on three foundational components. The first was the ADCpedia database that was constructed and organized to interconnect ADC chemical structure-activity relationships with tumor cell biology. The second was the creation of the GENCEP model for predicting protein intensities and provided clarity on long known, but never clearly resolved, complex relationships between ADC structures, activity across target antigens and tumor types. Third was the key integration of the GENCEP model into the AMM model that enabled accurate probability predictions of ADC \u003cem\u003ein vitro\u003c/em\u003e activities at relevant IC\u003csub\u003e50\u003c/sub\u003e values. AMM accurately predicted the activity of a set external blinded real-world ADCs targeting HER2 at clinically relevant IC\u003csub\u003e50\u003c/sub\u003e values and showed instances of adjusting to various varying HER2 expression levels. As a result, this platform demonstrated the ability to potentially provide significant advantages for the generalizable selection of effective ADC designs for multiple specific target antigens and tumor types.\u003c/p\u003e \u003cp\u003eBeyond these advancements, the GENCEP model addressed a critical gap in proteomics by imputing missing protein intensity data, a common challenge in mass spectrometry-based proteomic mining. When integrated into the AMM model, GENCEP enhanced the model’s ability to decipher the complex interplay between ADC design and antigen expression. Specifically, this work uncovered key insights, such as the variability in antigen expression across the entire Cell Model Passport, thereby identifying potential new targets/tumor systems for ADC development. Furthermore, this work revealed the importance of optimizing ADC designs as even mid- and low-range overexpressed antigens are targetable. Importantly, by integrating a standardized antigen intensity metric system, this study demonstrated that antigens in these newly discovered quantified overexpression ‘grey zones’ must be carefully paired with ADCs optimized for linker, payload, and DAR selection.\u003c/p\u003e \u003cp\u003eThis work also highlighted the variability in ADC cytotoxicity for different payload and linker types, and DARs. Although, IC\u003csub\u003e50\u003c/sub\u003e values varied widely for different ADCs targeting the same tumor types, tracking the mean cytotoxicity values revealed important trends. For example, ADCs transporting anti-microtubule or DNA damaging payloads could be very attractive to develop against NSCLC as well as other tumor types where there is very limited ADC activity information. In comparison, most ADCs transporting topoisomerase I inhibitors were very potent across all evaluated tumor types except for breast cancer. However, nearly all these ADCs were T-Dxd or SG. It was recently cited that the CL2A linker of SG is highly unstable and causes premature release of SN38 leading to patients receiving significantly high doses and suffering from serious adverse events\u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e. For T-Dxd, a meta-analysis revealed it had a considerably higher systemic free payload concentration compared to other clinical-stage ADCs\u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u003c/sup\u003e. Taken together, there is considerable support to warrant the testing of different linker designs that conjugate topoisomerase I inhibitors to mAbs, which can then benefit from this platform to screen different designs.\u003c/p\u003e \u003cp\u003eML-based approaches are rapidly transforming drug discovery, extending beyond small molecules to encompass complex therapeutic modalities such as proteolysis-targeting chimeras, RNA therapies, peptides, and unconjugated mAbs\u003csup\u003e\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e\u003c/sup\u003e. Among these, ADCs present a particularly formidable challenge due to limited publicly available datasets for model training and their intricate multicomponent structures. Moreover, the therapeutic efficacy of ADCs hinges on a sophisticated multifaceted nature of their mechanism of action involving a series of coordinated steps commencing with antigen recognition and binding and followed by subsequent internalization, lysosomal degradation payload release, and intracellular target engagement by the payload. Inspired by recent advances in multimodal DL strategies in drug discovery\u003csup\u003e\u003cspan additionalcitationids=\"CR74\" citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e–\u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e\u003c/sup\u003e, we hypothesized that the development of a multimodal modeling framework that specifically captures the complex biological and chemical interactions governing ADC function can significantly streamline yet comprehensively improve ADC development.\u003c/p\u003e \u003cp\u003eTo this end, we developed an integrated platform capable of jointly learning from diverse chemical and biological data streamed within a unified model architecture. This platform demonstrated high predictive accuracy for generating probabilities for \u003cem\u003ein vitro\u003c/em\u003e activity of real-world, blinded ADCs. Notably, the AMM model exhibited great potential for robust generalizability, accurately predicting bioactivities for ADCs and cell lines absent from the training set. This suggests the model's strong potential for guiding the rational design of novel ADC therapeutics, and to the best of our knowledge represent the first significant application of artificial intelligence in ADC design.\u003c/p\u003e \u003cp\u003eAfter binding the target antigen on the cell surface, ADCs must be rapidly internalized through either clathrin-mediated, caveolin-mediated, or other pathways. Only after internalization, ADCs must traverse the endosomal system and delivered to lysosomes where they must be efficiently degraded so the payload is released and is able to bind to its intracellular target and induce tumor cell death. Unfortunately, the internalization mechanisms and kinetics for target antigens are not well understood\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. The aberrant expressions and activities of Rab proteins and their associated interacting partners is now recently becoming apparent as they also greatly influence lysosomal targeting and ADC degradation efficiency and linked to resistance\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eApart from HER2 and Trop2, the variability observed for most targeted antigens suggests that mid and even low expression levels can support potent ADC activity under the right conditions. Although the ADC community already knows that antigen alone is not a marker of ADC success and that understanding the tumor-specific biological conditions is important, the interconnectedness is not understood yet critical for ADC design. By embedding GENCEP-predicted antigen intensities into ADCpedia, a framework was provided for systematically identifying optimal antigens and tumor types for early ADC development. The authors understand the importance of also emphasizing these additional intracellular elements and, thus, the ADC design platform is already expanding with the end goal to cover the entire antigen-to-lysosome delivery process. Additionally, post release elements such as protein efflux and anti-apoptosis protein expressions will be further scrutinized. In this current work, a few relevant proteins were incorporated and integrated into the overall training of the AMM model but not assigned multiple neurons. Future work will systematically explore these proteins and the relationship with different linker and payload types for ADCs across tumor types.\u003c/p\u003e \u003cp\u003eIn conclusion, this multimodal ADC design platform interconnecting ADC structural design with tumor cell biology offers a novel, systematic, and universal approach to advance ADC development.\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003cp\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003cp\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003c/div\u003e \u003c/div\u003e "},{"header":"Methods","content":"\u003ch2\u003eLiterature curation and ADCpedia construction\u003c/h2\u003e\u003cp\u003eThe curation process was performed by entering the names of ADCs taken from the ADC DrugMap searching for them on Pubmed, Google Scholar, and in patent office websites for a period between January 2000 to April 15, 2024. In addition, ADC structure-activities were included from relevant and very publications in 2024 near the drafting and submission of this work. To standardize IC\u003csub\u003e50\u003c/sub\u003e values, values in mass units (e.g., µg/mL) were converted to nM using a 150 kDa for the molecular weight of the mAb and adding the molecular weights of the given linker and payload and multiplied by the DAR.\u003c/p\u003e\u003ch2\u003eTumor cell line transcriptomic and proteomic data integration\u003c/h2\u003e\u003cp\u003eThe entire raw read counts of mRNA expression profiles for the human tumor cell lines and totaling 37,603 protein-encoding genes were integrated into ADCpedia from the Cell Model Passport\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. The cell lines with published ADC activities were then subjected to PCA formatting to reduce dimensionality while retaining 95% of the variance, resulting in 910-dimensional cell line embeddings geared for ML applications (Fig.\u0026nbsp;9). A parallel procedure was performed for the all the proteomic data available. Cancer model names from the Cell Passport Model were used to enrich the database with detailed annotations on tissue, cancer type, and sample site for each tested cell line (Fig.\u0026nbsp;1B).\u003c/p\u003e\u003ch2\u003eAntigen sequence embedding generation\u003c/h2\u003e\u003cp\u003eThe mature and canonical amino acid sequences (no alternatively spliced isoforms) from all listed antigens in ADCpedia were retrieved from UniProt and transformed into 1,280-dimensional embeddings using ESM (650M model)\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. PCA was then applied, preserving 95% of the variance and resulting in 356 dimensions. This enhanced computational efficiency without significant loss of information.\u003c/p\u003e\u003ch2\u003eGENCEP model\u003c/h2\u003e\u003cp\u003eThe GENCEP model took as inputs (i) raw mRNA expression data, (ii) ESM embeddings of all protein sequences, which were both PCA-reduced, and (iii) gene-specific read counts for cell lines (Fig.\u0026nbsp;9B). ESM embeddings are processed through a fully connected neural network with two layers, each containing 512 neurons, with ReLU activations, batch normalization, and a dropout rate of 15%. The mRNA data was processed through two dense layers of 512 neurons each, with ReLU activations, batch normalization, and dropout rate of 15%. The read counts were passed through a neural network layer with 64 neurons, with ReLU activation, batch normalization, and dropout rate of 15%. The three outputs were then concatenated (2,112 dimension) and fed into a network consisting of three fully connected layers with decreasing neuron counts (2,112, 1,024, 512, 64). To each layer ReLU activation function, batch normalization, and a dropout of 0.15 is applied. The outputs from the mRNA expression and read count components were also utilized as embedded inputs for the AMM model (described in the next section). The model was trained using Root Mean Squared Error (RMSE) loss to measure the difference between predicted and experimental protein intensities:\u003c/p\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{Loss}_{MSE}=\\sqrt{\\frac{1}{n}\\:\\sum\\:_{i=1}^{n}{\\left({y}_{pred,i}-{y}_{true,i}\\right)}^{2}}\\#\\left(Eq.\\:1\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eOptimization was performed using the Adam optimizer\u003csup\u003e\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e\u003c/sup\u003e with an initial learning rate of 1E − 3 and a weight decay of 1E − 5. Dataset split was performed with the goal of evaluating the applicability of the model to ADC-targeted antigens. Thus, all the antigens in ADCpedia were kept in the hold-out test set, which was further populated via random sampling to a size equal to 10% of the dataset. Training and validation sets were also built via random sampling (80% and 10%, respectively). Early stopping was implemented based on the validation RMSE value with a patience of 10 epochs.\u003c/p\u003e\u003cp\u003e \u003cb\u003eAMM model\u003c/b\u003e \u003c/p\u003e\u003ch2\u003ei) Chemical descriptors\u003c/h2\u003e\u003cp\u003ePhysicochemical features for the payload and linker structures were computed using the RDKit cheminformatics\u003csup\u003e\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e\u003c/sup\u003e and Descriptastorous libraries\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. A set of 200 physicochemical properties was calculated, including log P, molecular weight, hydrogen bond donors and acceptors, rotatable bonds, and topological polar surface areas. Additionally, 2,048-bit molecular fingerprints were generated for the payload and linker chemical structures\u003csup\u003e\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003ch2\u003eii) Data preprocessing\u003c/h2\u003e\u003cp\u003eAll features were converted to numeric form, with missing values set to zero. Categorical variables, such as the intracellular target classes, were one-hot encoded. A StandardScaler\u003csup\u003e\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e\u003c/sup\u003e was applied for continuous features. For labeling, IC\u003csub\u003e50\u003c/sub\u003e values were binarized at 10 (IC\u003csub\u003e50\u003c/sub\u003e \u0026lt; 10 nM assigned label 1; otherwise 0). The same procedure was performed for the 5 and 1 nM thresholds. To prevent data leakage, we performed Butina clustering\u003csup\u003e\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e\u003c/sup\u003e (80% threshold) on the Tanimoto dissimilarity matrix of the linker-payload fingerprints. The resulting clusters were then partitioned with stratification into training, validation, and testing sets in proportions of 80%, 10%, and 10%, respectively. Additionally, the training was balanced by oversampling the minority class (0).\u003c/p\u003e\u003ch2\u003eiii) Architecture\u003c/h2\u003e\u003cp\u003eThe architecture was a multimodal CNN that processes the following inputs: (i) DAR and intracellular target information, (ii) PCA-reduced ESM embeddings of the target antigen, (iii) predicted antigen expression intensity (from the GENCEP model), (iv) Cell line-specific mRNA embeddings (from the GENCEP model), (v) Gene-specific mRNA read count embeddings (also from the GENCEP model), (vi) physicochemical (200 total) and MACCS fingerprint descriptors (167 bits) of the payload-linker system (Fig.\u0026nbsp;8A).\u003c/p\u003e\u003cp\u003eEach feature stream in the multimodal framework first underwent an initial linear projection to reduce dimensionality. DAR and ESM embeddings were merged into Feature Stream 1, antigen intensities and mRNA embeddings constitute Feature Stream 2, and molecular descriptors plus fingerprints form Feature Stream 3. The linearly projected features within each stream were concatenated and passed to a dedicated 1D convolutional block, whose layers, hidden channels, kernel size, and dropout were tuned via Optuna. All convolutional outputs then went through a global average pooling step and were subsequently concatenated into a combined embedding, which was refined by an attention mechanism. Finally, a fully connected (FC) classifier (with ReLU activations, batch normalization, and dropout) output a sigmoid probability for binary classification.\u003c/p\u003e\u003cp\u003eA custom binary cross-entropy (BCE) loss function was designed to emphasize classifications far from the selected nM threshold boundary while down-weighting ambiguous samples near each nM threshold. Specifically:\u003c/p\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}Penalty\\:Scale=1-{e}^{\\left(-penalt{y}_{slope}\\:\\bullet\\:\\:{|IC}_{50}-threshold\\:\\right|)}\\#\\left(Eq.\\:2\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}Loss=BCE\\left({y}_{pred},\\:{y}_{true}\\right)\\:\\bullet\\:Penalty\\:Scale\\:\\#\\left(Eq.3\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:BCE\\left({y}_{pred},\\:{y}_{true}\\right)\$\u003c/span\u003e\u003c/span\u003e was the standard binary cross-entropy loss, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:penalt{y}_{slope}\$\u003c/span\u003e\u003c/span\u003e controlled the rate at which the penalty decreased near the boundary (set to 10), \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\text{I}\\text{C}}_{50}\$\u003c/span\u003e\u003c/span\u003e is the IC₅₀ value for each data point, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:threshold\$\u003c/span\u003e\u003c/span\u003e was set to either 1, 5, or 10 nM. Early stopping was implemented based on validation loss, halting training if no improvement was observed over eight consecutive epochs. Learning rate scheduling via ReduceLROnPlateau\u003csup\u003e\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e\u003c/sup\u003e was employed to reduce the learning rate when validation loss plateaued.\u003c/p\u003e\u003ch2\u003eiv) Hyperparameter optimization\u003c/h2\u003e\u003cp\u003eHyperparameter searches were conducted using the default Tree-structured Parzan Estimator within Optuna\u003csup\u003e\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e\u003c/sup\u003e. The exploration covered 1 to 3 CNN layers, hidden dimensions ranging 32–512, kernel sizes up to 7, dropout up to 50%, learning rates in [E-5, E-2], and batch sizes of 32, 64, or 128, each with an early-stopping patience of 10. 1, 5, and 10 nM decision thresholds were used for labeling. The best identified hyperparameter combinations are detailed in Suppl. Table\u0026nbsp;4, 5, and 6.\u003c/p\u003e\u003ch2\u003eComputational infrastructure\u003c/h2\u003e\u003cp\u003eAll computational tasks were performed on high-performance computing clusters provided by the Digital Research Alliance of Canada, including a Béluga virtual machine allocation with 8 virtual CPUs and 30 GB of RAM (p8-30gb configuration). The operating system used was Ubuntu-22.04.4-Jammy-x64-2024-06. For inference and website integration, job scheduling and resource allocation were managed by the SLURM workload manager. The computational environment was also configured to support multi-threaded operations and scale with demand.\u003c/p\u003e\u003ch2\u003eExternal and blinded ADC test set\u003c/h2\u003e\u003cp\u003eThe external ADCs were kindly provided by Dr. Mark Barok (University of Helsinki). ADC structures (antibody name, linker type, payload, and DAR) and the cell lines tested against were provided. The information was processed by the AMM model and the probabilities for activity determined and sent back to Dr. Barok for verification. A portion of the ADCs has since been published\u003csup\u003e\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e\u003c/sup\u003e. Chemical structures and activity data for this set are reported in Supplemental Table\u0026nbsp;7.\u003c/p\u003e\u003ch2\u003eADCpedia website implementation\u003c/h2\u003e\u003cp\u003eAllocation and deployment with Digital Research Alliance of Canada involved using a virtual machine allocation for public access, facilitated by a \"Floating IP.\" A subdomain (server.adcpedia.com) was linked to this IP, and the application was developed with WordPress for the frontend and Django for the backend, integrating prediction models and data pipelines. Nginx and Gunicorn were used for deployment, while SSH managed the server. Project files were transferred, and settings were configured to handle cookies, CSRF, and CORS. Security rules for HTTP and HTTPS enabled requests on ports 80 and 443. Nginx was set up to redirect traffic to Gunicorn, serving the Django application. Static files were collected, database migrations executed, and SSL certificates obtained via Certbot for HTTPS security. File permissions ensured proper access, and multiple Gunicorn instances were able to maintain performance under heavy loads. Monitoring of Nginx logs helped detect and resolve issues. CSRF tokens secured POST requests, while CORS was configured to manage requests from different origins, essential for the Digital Research Alliance of Canada deployment.\u003c/p\u003e\u003ch2\u003eStatistical analysis\u003c/h2\u003e\u003cp\u003eMultivariable analytical methods were used to generate associations between ADC components and antigen expressions and IC\u003csub\u003e50\u003c/sub\u003e values. Continuous variables were compared using Pearson correlation coefficients and linear regression models, with model performance quantified by the coefficient of determination (R\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e). For instance, correlations between scaled mRNA read counts and true protein intensities were assessed by scatter plots overlaid with regression lines, while differences in antigen expression across predefined IC\u003csub\u003e50\u003c/sub\u003e bins were examined using boxplots and strip plots. Outliers were identified and excluded based on Z-score and interquartile range criteria with the box encompassing the 75th interquartile range (IQR) and the mean indicated by horizontal lines in the boxes. Box whiskers span the 25th IQR. The IC₅₀ values were log-transformed (pIC₅₀ = −log₁₀[IC₅₀]) when appropriate. In addition, classification models for drug sensitivity (using an IC₅₀ threshold of 10 nM) were evaluated by receiver ROC analyses, with AUC values computed for validation, test, and blind test datasets. Confusion matrices were further constructed for cell line–specific analyses. All tests were two-sided, with statistical significance defined as \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05, and analyses were performed using Python (pandas, SciPy, scikit-learn, and seaborn).\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThis research was funded by the Canadian Institutes of Health Research (J. Leyton; 378389) and by the Natural Sciences and Engineering Research Council of Canada (F. Gentile; RGPIN-2023-04129). The authors thank the Digital Research Alliance of Canada for computational resources and for a Resource Allocation Competition grant awarded to F. Gentile.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eDumontet, C., Reichert, J.M., Senter, P.D., Lambert, J.M. \u0026amp; Beck, A. Antibody-drug conjugates come of age in oncology. \u003cem\u003eNat Rev Drug Discov\u003c/em\u003e\u003cstrong\u003e22\u003c/strong\u003e, 641-661 (2023).\u003c/li\u003e\n \u003cli\u003eJin, H. et al. Rab GTPases: Central Coordinators of Membrane Trafficking in Cancer. \u003cem\u003eFront Cell Dev Biol\u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e, 648384 (2021).\u003c/li\u003e\n \u003cli\u003eMellman, I. \u0026amp; Yarden, Y. Endocytosis and cancer. \u003cem\u003eCold Spring Harb Perspect Biol\u003c/em\u003e\u003cstrong\u003e5\u003c/strong\u003e, a016949 (2013).\u003c/li\u003e\n \u003cli\u003eMosesson, Y., Mills, G.B. \u0026amp; Yarden, Y. Derailed endocytosis: an emerging feature of cancer. \u003cem\u003eNat Rev Cancer\u003c/em\u003e\u003cstrong\u003e8\u003c/strong\u003e, 835-850 (2008).\u003c/li\u003e\n \u003cli\u003eShen, L. et al. ADCdb: the database of antibody-drug conjugates. \u003cem\u003eNucleic Acids Res\u003c/em\u003e\u003cstrong\u003e52\u003c/strong\u003e, D1097-D1109 (2024).\u003c/li\u003e\n \u003cli\u003eNessler, I., Menezes, B. \u0026amp; Thurber, G.M. Key metrics to expanding the pipeline of successful antibody-drug conjugates. \u003cem\u003eTrends Pharmacol Sci\u003c/em\u003e\u003cstrong\u003e42\u003c/strong\u003e, 803-812 (2021).\u003c/li\u003e\n \u003cli\u003eBlenrep withdrawn for multiple myeloma https://www.gsk.com/en-gb/media/press-releases/gsk-provides-update-on-blenrep-us-marketing-authorisation/. (2022).\u003c/li\u003e\n \u003cli\u003eTrodelvy withdrawn for metastatic urothelial cancer. https://www.gilead.com/company/company-statements/2024/gilead-provides-update-on-us-indication-for-trodelvy-in-metastatic-urothelial-cancer#:~:text=Foster%20City%2C%20Calif.%2C%20October,and%20Drug%20Administration%20(FDA). (2024).\u003c/li\u003e\n \u003cli\u003eFu, Z., Li, S., Han, S., Shi, C. \u0026amp; Zhang, Y. Antibody drug conjugate: the \u0026quot;biological missile\u0026quot; for targeted cancer therapy. \u003cem\u003eSignal Transduct Target Ther\u003c/em\u003e\u003cstrong\u003e7\u003c/strong\u003e, 93 (2022).\u003c/li\u003e\n \u003cli\u003eKhongorzul, P., Ling, C.J., Khan, F.U., Ihsan, A.U. \u0026amp; Zhang, J. Antibody-Drug Conjugates: A Comprehensive Review. \u003cem\u003eMol Cancer Res\u003c/em\u003e\u003cstrong\u003e18\u003c/strong\u003e, 3-19 (2020).\u003c/li\u003e\n \u003cli\u003eLiu, T. et al. Overexpression of TROP2 predicts poor prognosis of patients with cervical cancer and promotes the proliferation and invasion of cervical cancer cells by regulating ERK signaling pathway. \u003cem\u003ePLoS One\u003c/em\u003e\u003cstrong\u003e8\u003c/strong\u003e, e75864 (2013).\u003c/li\u003e\n \u003cli\u003eHammood, M., Craig, A.W. \u0026amp; Leyton, J.V. Impact of Endocytosis Mechanisms for the Receptors Targeted by the Currently Approved Antibody-Drug Conjugates (ADCs)-A Necessity for Future ADC Research and Development. \u003cem\u003ePharmaceuticals (Basel)\u003c/em\u003e\u003cstrong\u003e14\u003c/strong\u003e (2021).\u003c/li\u003e\n \u003cli\u003eLeyton, J.V. Improving Receptor-Mediated Intracellular Access and Accumulation of Antibody Therapeutics-The Tale of HER2. \u003cem\u003eAntibodies (Basel)\u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e (2020).\u003c/li\u003e\n \u003cli\u003eCullen, P.J. \u0026amp; Steinberg, F. To degrade or not to degrade: mechanisms and significance of endocytic recycling. \u003cem\u003eNat Rev Mol Cell Biol\u003c/em\u003e\u003cstrong\u003e19\u003c/strong\u003e, 679-696 (2018).\u003c/li\u003e\n \u003cli\u003eZerial, M. \u0026amp; McBride, H. Rab proteins as membrane organizers. \u003cem\u003eNat Rev Mol Cell Biol\u003c/em\u003e\u003cstrong\u003e2\u003c/strong\u003e, 107-117 (2001).\u003c/li\u003e\n \u003cli\u003eNadal-Serrano, M. et al. The Second Generation Antibody-Drug Conjugate SYD985 Overcomes Resistances to T-DM1. \u003cem\u003eCancers (Basel)\u003c/em\u003e\u003cstrong\u003e12\u003c/strong\u003e (2020).\u003c/li\u003e\n \u003cli\u003eRios-Luci, C. et al. Resistance to the Antibody-Drug Conjugate T-DM1 Is Based in a Reduction in Lysosomal Proteolytic Activity. \u003cem\u003eCancer Res\u003c/em\u003e\u003cstrong\u003e77\u003c/strong\u003e, 4639-4651 (2017).\u003c/li\u003e\n \u003cli\u003eWeng, W. et al. Antibody-Exatecan Conjugates with a Novel Self-immolative Moiety Overcome Resistance in Colon and Lung Cancer. \u003cem\u003eCancer Discov\u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, 950-973 (2023).\u003c/li\u003e\n \u003cli\u003eKostova, V., Desos, P., Starck, J.B. \u0026amp; Kotschy, A. The Chemistry Behind ADCs. \u003cem\u003ePharmaceuticals (Basel)\u003c/em\u003e\u003cstrong\u003e14\u003c/strong\u003e (2021).\u003c/li\u003e\n \u003cli\u003eMaecker, H., Jonnalagadda, V., Bhakta, S., Jammalamadaka, V. \u0026amp; Junutula, J.R. Exploration of the antibody-drug conjugate clinical landscape. \u003cem\u003eMAbs\u003c/em\u003e\u003cstrong\u003e15\u003c/strong\u003e, 2229101 (2023).\u003c/li\u003e\n \u003cli\u003eGoncalves, E. et al. Pan-cancer proteomic map of 949 human cell lines. \u003cem\u003eCancer Cell\u003c/em\u003e\u003cstrong\u003e40\u003c/strong\u003e, 835-849 e838 (2022).\u003c/li\u003e\n \u003cli\u003eFoerderer, J. Should we trust web-scaped data? \u003cem\u003eArXiv:2308.02231\u0026nbsp;\u003c/em\u003e(2023).\u003c/li\u003e\n \u003cli\u003eADC Review. www.adcreview.com (accessed 04/12/2024).\u003c/li\u003e\n \u003cli\u003eHamann, P.R. et al. Gemtuzumab ozogamicin, a potent and selective anti-CD33 antibody-calicheamicin conjugate for treatment of acute myeloid leukemia. \u003cem\u003eBioconjug Chem\u003c/em\u003e\u003cstrong\u003e13\u003c/strong\u003e, 47-58 (2002).\u003c/li\u003e\n \u003cli\u003eLewis Phillips, G.D. et al. Targeting HER2-positive breast cancer with trastuzumab-DM1, an antibody-cytotoxic drug conjugate. \u003cem\u003eCancer Res\u003c/em\u003e\u003cstrong\u003e68\u003c/strong\u003e, 9280-9290 (2008).\u003c/li\u003e\n \u003cli\u003eDoronina, S.O. et al. Novel peptide linkers for highly potent antibody-auristatin conjugate. \u003cem\u003eBioconjug Chem\u003c/em\u003e\u003cstrong\u003e19\u003c/strong\u003e, 1960-1963 (2008).\u003c/li\u003e\n \u003cli\u003eDoronina, S.O. et al. Development of potent monoclonal antibody auristatin conjugates for cancer therapy. \u003cem\u003eNat Biotechnol\u003c/em\u003e\u003cstrong\u003e21\u003c/strong\u003e, 778-784 (2003).\u003c/li\u003e\n \u003cli\u003eTolcher, A.W. et al. Randomized phase II study of BR96-doxorubicin conjugate in patients with metastatic breast cancer. \u003cem\u003eJ Clin Oncol\u003c/em\u003e\u003cstrong\u003e17\u003c/strong\u003e, 478-484 (1999).\u003c/li\u003e\n \u003cli\u003eDamelin, M., Zhong, W., Myers, J. \u0026amp; Sapra, P. Evolving Strategies for Target Selection for Antibody-Drug Conjugates. \u003cem\u003ePharm Res\u003c/em\u003e\u003cstrong\u003e32\u003c/strong\u003e, 3494-3507 (2015).\u003c/li\u003e\n \u003cli\u003eSamantasinghar, A. et al. A comprehensive review of key factors affecting the efficacy of antibody drug conjugate. \u003cem\u003eBiomed Pharmacother\u003c/em\u003e\u003cstrong\u003e161\u003c/strong\u003e, 114408 (2023).\u003c/li\u003e\n \u003cli\u003ehttps://github.com/bp-kelley/descriptastorus.\u003c/li\u003e\n \u003cli\u003eLin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. \u003cem\u003eScience\u003c/em\u003e\u003cstrong\u003e379\u003c/strong\u003e, 1123-1130 (2023).\u003c/li\u003e\n \u003cli\u003eLeyton, J.V. The endosomal-lysosomal system in ADC design and cancer therapy. \u003cem\u003eExpert Opin Biol Ther\u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e, 1067-1076 (2023).\u003c/li\u003e\n \u003cli\u003eErickson, H.K. et al. The effect of different linkers on target cell catabolism and pharmacokinetics/pharmacodynamics of trastuzumab maytansinoid conjugates. \u003cem\u003eMol Cancer Ther\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 1133-1142 (2012).\u003c/li\u003e\n \u003cli\u003ePetersen, M.E. et al. Design and Evaluation of ZD06519, a Novel Camptothecin Payload for Antibody Drug Conjugates. \u003cem\u003eMol Cancer Ther\u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e, 606-618 (2024).\u003c/li\u003e\n \u003cli\u003eDornan, D. et al. Therapeutic potential of an anti-CD79b antibody-drug conjugate, anti-CD79b-vc-MMAE, for the treatment of non-Hodgkin lymphoma. \u003cem\u003eBlood\u003c/em\u003e\u003cstrong\u003e114\u003c/strong\u003e, 2721-2729 (2009).\u003c/li\u003e\n \u003cli\u003eKim, S.B. et al. Relationship between tumor biomarkers and efficacy in TH3RESA, a phase III study of trastuzumab emtansine (T-DM1) vs. treatment of physician\u0026apos;s choice in previously treated HER2-positive advanced breast cancer. \u003cem\u003eInt J Cancer\u003c/em\u003e\u003cstrong\u003e139\u003c/strong\u003e, 2336-2342 (2016).\u003c/li\u003e\n \u003cli\u003eSavage, S.R. et al. Pan-cancer proteogenomics expands the landscape of therapeutic targets. \u003cem\u003eCell\u003c/em\u003e\u003cstrong\u003e187\u003c/strong\u003e, 4389-4407 e4315 (2024).\u003c/li\u003e\n \u003cli\u003eDupree, E.J. et al. A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of this Field. \u003cem\u003eProteomes\u003c/em\u003e\u003cstrong\u003e8\u003c/strong\u003e (2020).\u003c/li\u003e\n \u003cli\u003eSlamon, D.J. et al. Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. \u003cem\u003eScience\u003c/em\u003e\u003cstrong\u003e235\u003c/strong\u003e, 177-182 (1987).\u003c/li\u003e\n \u003cli\u003eVogel, C.L. et al. Efficacy and safety of trastuzumab as a single agent in first-line treatment of HER2-overexpressing metastatic breast cancer. \u003cem\u003eJ Clin Oncol\u003c/em\u003e\u003cstrong\u003e20\u003c/strong\u003e, 719-726 (2002).\u003c/li\u003e\n \u003cli\u003eYu, S.Y., Park, J., Kwon, W.S., Jeong, I., Kang, S.K., Bae, H.J., Kim, T.S., Chung, H.C., Rha, S.Y. Abstract 945: Trastuzumab deruxtecan (T-DXd) sensitivity invarious levels of HER2 expressing gastric cancer cells. \u003cem\u003eCancer Res\u003c/em\u003e\u003cstrong\u003e81 (12_Supplement)\u003c/strong\u003e (2021).\u003c/li\u003e\n \u003cli\u003eLacasse, V., Beaudoin, S., Jean, S. \u0026amp; Leyton, J.V. A Novel Proteomic Method Reveals NLS Tagging of T-DM1 Contravenes Classical Nuclear Transport in a Model of HER2-Positive Breast Cancer. \u003cem\u003eMol Ther Methods Clin Dev\u003c/em\u003e\u003cstrong\u003e19\u003c/strong\u003e, 99-119 (2020).\u003c/li\u003e\n \u003cli\u003eLenart, S. et al. Trop2: Jack of All Trades, Master of None. \u003cem\u003eCancers (Basel)\u003c/em\u003e\u003cstrong\u003e12\u003c/strong\u003e (2020).\u003c/li\u003e\n \u003cli\u003eTagawa, S.T. et al. TROPHY-U-01: A Phase II Open-Label Study of Sacituzumab Govitecan in Patients With Metastatic Urothelial Carcinoma Progressing After Platinum-Based Chemotherapy and Checkpoint Inhibitors. \u003cem\u003eJ Clin Oncol\u003c/em\u003e\u003cstrong\u003e39\u003c/strong\u003e, 2474-2485 (2021).\u003c/li\u003e\n \u003cli\u003eBardia, A., Hurvitz, S.A. \u0026amp; Rugo, H.S. Sacituzumab Govitecan in Metastatic Breast Cancer. Reply. \u003cem\u003eN Engl J Med\u003c/em\u003e\u003cstrong\u003e385\u003c/strong\u003e, e12 (2021).\u003c/li\u003e\n \u003cli\u003eLoriot, Y. et al. Sacituzumab Govitecan Demonstrates Efficacy across Tumor Trop-2 Expression Levels in Patients with Advanced Urothelial Cancer. \u003cem\u003eClin Cancer Res\u003c/em\u003e\u003cstrong\u003e30\u003c/strong\u003e, 3179-3188 (2024).\u003c/li\u003e\n \u003cli\u003eSanti, D.V., Ashley, G.W., Cabel, L., Bidard, F.C. Could a Long-Acting Prodrug of SN-38 be Efficacious in Sacituzumab Govitecan-Resistant Tumors? \u003cem\u003eBioDrugs\u003c/em\u003e\u003cstrong\u003e38\u003c/strong\u003e, 171-176 (2024).\u003c/li\u003e\n \u003cli\u003ePowles, T. et al. Enfortumab Vedotin in Previously Treated Advanced Urothelial Carcinoma. \u003cem\u003eN Engl J Med\u003c/em\u003e\u003cstrong\u003e384\u003c/strong\u003e, 1125-1135 (2021).\u003c/li\u003e\n \u003cli\u003eChu, P.G. \u0026amp; Arber, D.A. CD79: a review. \u003cem\u003eAppl Immunohistochem Mol Morphol\u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e, 97-106 (2001).\u003c/li\u003e\n \u003cli\u003ePfeifer, M. et al. Anti-CD22 and anti-CD79B antibody drug conjugates are active in different molecular diffuse large B-cell lymphoma subtypes. \u003cem\u003eLeukemia\u003c/em\u003e\u003cstrong\u003e29\u003c/strong\u003e, 1578-1586 (2015).\u003c/li\u003e\n \u003cli\u003eTilly, H. et al. Polatuzumab Vedotin in Previously Untreated Diffuse Large B-Cell Lymphoma. \u003cem\u003eN Engl J Med\u003c/em\u003e\u003cstrong\u003e386\u003c/strong\u003e, 351-363 (2022).\u003c/li\u003e\n \u003cli\u003ePolson, A.G. et al. Antibody-drug conjugates targeted to CD79 for the treatment of non-Hodgkin lymphoma. \u003cem\u003eBlood\u003c/em\u003e\u003cstrong\u003e110\u003c/strong\u003e, 616-623 (2007).\u003c/li\u003e\n \u003cli\u003ePolson, A.G. et al. Antibody-drug conjugates for the treatment of non-Hodgkin\u0026apos;s lymphoma: target and linker-drug selection. \u003cem\u003eCancer Res\u003c/em\u003e\u003cstrong\u003e69\u003c/strong\u003e, 2358-2364 (2009).\u003c/li\u003e\n \u003cli\u003eTolcher, A.W. Antibody drug conjugates: lessons from 20 years of clinical experience. \u003cem\u003eAnn Oncol\u003c/em\u003e\u003cstrong\u003e27\u003c/strong\u003e, 2168-2172 (2016).\u003c/li\u003e\n \u003cli\u003eCassady, J.M., Chan, K.K., Floss, H.G. \u0026amp; Leistner, E. Recent developments in the maytansinoid antitumor agents. \u003cem\u003eChem Pharm Bull (Tokyo)\u003c/em\u003e\u003cstrong\u003e52\u003c/strong\u003e, 1-26 (2004).\u003c/li\u003e\n \u003cli\u003eJunttila, T.T., Li, G., Parsons, K., Phillips, G.L. \u0026amp; Sliwkowski, M.X. Trastuzumab-DM1 (T-DM1) retains all the mechanisms of action of trastuzumab and efficiently inhibits growth of lapatinib insensitive breast cancer. \u003cem\u003eBreast Cancer Res Treat\u003c/em\u003e\u003cstrong\u003e128\u003c/strong\u003e, 347-356 (2011).\u003c/li\u003e\n \u003cli\u003eRemillard, S., Rebhun, L.I., Howie, G.A. \u0026amp; Kupchan, S.M. Antimitotic activity of the potent tumor inhibitor maytansine. \u003cem\u003eScience\u003c/em\u003e\u003cstrong\u003e189\u003c/strong\u003e, 1002-1005 (1975).\u003c/li\u003e\n \u003cli\u003eSteube, K.G. et al. Dolastatin 10 and dolastatin 15: effects of two natural peptides on growth and differentiation of leukemia cells. \u003cem\u003eLeukemia\u003c/em\u003e\u003cstrong\u003e6\u003c/strong\u003e, 1048-1053 (1992).\u003c/li\u003e\n \u003cli\u003eQuentmeier, H., Brauer, S., Pettit, G.R., Drexler, H.G. Cytotostatic effects of dolastatin 10 and dolastatin 15 on human leukemia cell lines. \u003cem\u003eLeuk. Lymphoma\u003c/em\u003e\u003cstrong\u003e6\u003c/strong\u003e, 245-250 (1992).\u003c/li\u003e\n \u003cli\u003eKrause, W. Resistance to anti-tubulin agents: From vinca alkaloids to epothilones. \u003cem\u003eCancer Drug Resist\u003c/em\u003e\u003cstrong\u003e2\u003c/strong\u003e, 82-106 (2019).\u003c/li\u003e\n \u003cli\u003eMorgensztern, D., Ready, N.E., Johnson, M.L., Dowlati, A., Choudhury, N.J., Carbone, D.P., Schaefer, E.S., Arnold, S.M., Puri, S., Piotrowska, Z. First-in-human study of ABBV-011, a seizure-related homolog protein 6 (SEZ6)\u0026ndash;targeting antibody-drug conjugate, in patients with small cell lung cancer. \u003cem\u003eJ. Clin. Oncol.\u003c/em\u003e\u003cstrong\u003e41\u003c/strong\u003e (2023).\u003c/li\u003e\n \u003cli\u003eWiedemeyer, W.R. et al. ABBV-011, A Novel, Calicheamicin-Based Antibody-Drug Conjugate, Targets SEZ6 to Eradicate Small Cell Lung Cancer Tumors. \u003cem\u003eMol Cancer Ther\u003c/em\u003e\u003cstrong\u003e21\u003c/strong\u003e, 986-998 (2022).\u003c/li\u003e\n \u003cli\u003eNorsworthy, K.J. et al. FDA Approval Summary: Mylotarg for Treatment of Patients with Relapsed or Refractory CD33-Positive Acute Myeloid Leukemia. \u003cem\u003eOncologist\u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e, 1103-1108 (2018).\u003c/li\u003e\n \u003cli\u003eGoldenberg, D.M. \u0026amp; Sharkey, R.M. Antibody-drug conjugates targeting TROP-2 and incorporating SN-38: A case study of anti-TROP-2 sacituzumab govitecan. \u003cem\u003eMAbs\u003c/em\u003e\u003cstrong\u003e11\u003c/strong\u003e, 987-995 (2019).\u003c/li\u003e\n \u003cli\u003eAvendano, C., Menendez, J.C. Medicinal Chemistry of Anticancer Drugs. (Elsevier Science, 2008).\u003c/li\u003e\n \u003cli\u003eNakada, T., Sugihara, K., Jikoh, T., Abe, Y. \u0026amp; Agatsuma, T. The Latest Research and Development into the Antibody-Drug Conjugate, [fam-] Trastuzumab Deruxtecan (DS-8201a), for HER2 Cancer Therapy. \u003cem\u003eChem Pharm Bull (Tokyo)\u003c/em\u003e\u003cstrong\u003e67\u003c/strong\u003e, 173-185 (2019).\u003c/li\u003e\n \u003cli\u003eOgitani, Y. et al. DS-8201a, A Novel HER2-Targeting ADC with a Novel DNA Topoisomerase I Inhibitor, Demonstrates a Promising Antitumor Efficacy with Differentiation from T-DM1. \u003cem\u003eClin Cancer Res\u003c/em\u003e\u003cstrong\u003e22\u003c/strong\u003e, 5097-5108 (2016).\u003c/li\u003e\n \u003cli\u003ede Bever, L. et al. Generation of DAR1 Antibody-Drug Conjugates for Ultrapotent Payloads Using Tailored GlycoConnect Technology. \u003cem\u003eBioconjug Chem\u003c/em\u003e\u003cstrong\u003e34\u003c/strong\u003e, 538-548 (2023).\u003c/li\u003e\n \u003cli\u003eWang, S., Zhang, R., Zhong, K., Guo, W. \u0026amp; Tong, A. An Anti-CD7 Antibody-Drug Conjugate Target Showing Potent Antitumor Activity for T-Lymphoblastic Leukemia (T-ALL). \u003cem\u003eBiomolecules\u003c/em\u003e\u003cstrong\u003e14\u003c/strong\u003e (2024).\u003c/li\u003e\n \u003cli\u003eTang, S.C. et al. Influence of antibody-drug conjugate cleavability, drug-to-antibody ratio, and free payload concentration on systemic toxicities: A systematic review and meta-analysis. \u003cem\u003eCancer Metastasis Rev\u003c/em\u003e\u003cstrong\u003e44\u003c/strong\u003e, 18 (2024).\u003c/li\u003e\n \u003cli\u003eNagra, N.S. et al. The company landscape for artificial intelligence in large-molecule drug discovery. \u003cem\u003eNat Rev Drug Discov\u003c/em\u003e\u003cstrong\u003e22\u003c/strong\u003e, 949-950 (2023).\u003c/li\u003e\n \u003cli\u003eLu, X. et al. Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph. \u003cem\u003eComput Struct Biotechnol J\u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e, 1666-1679 (2024).\u003c/li\u003e\n \u003cli\u003eLuo, Y. et al. Toward Unified AI Drug Discovery with Multimodal Knowledge. \u003cem\u003eHealth Data Sci\u003c/em\u003e\u003cstrong\u003e4\u003c/strong\u003e, 0113 (2024).\u003c/li\u003e\n \u003cli\u003eSteurer, B., Vanhaelen, Q. \u0026amp; Zhavoronkov, A. Multimodal Transformers and Their Applications in Drug Target Discovery for Aging and Age-Related Diseases. \u003cem\u003eJ Gerontol A Biol Sci Med Sci\u003c/em\u003e\u003cstrong\u003e79\u003c/strong\u003e (2024).\u003c/li\u003e\n \u003cli\u003eKingma, D.P., Ba, J. Adam: A method for stochastic optimization. \u003cem\u003earXiv:1412.6980\u003c/em\u003e (2014).\u003c/li\u003e\n \u003cli\u003eRDKit https://www.rdkit.org/ (accessed 2024-04-26).\u003c/li\u003e\n \u003cli\u003eRogers, D. \u0026amp; Hahn, M. Extended-connectivity fingerprints. \u003cem\u003eJ Chem Inf Model\u003c/em\u003e\u003cstrong\u003e50\u003c/strong\u003e, 742-754 (2010).\u003c/li\u003e\n \u003cli\u003ePedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. Scikit-learn: Machine Learning in Python. \u003cem\u003eThe journal of Machine Learning Research\u003c/em\u003e\u003cstrong\u003e12\u003c/strong\u003e, 2825-2830 (2011).\u003c/li\u003e\n \u003cli\u003eButina, D. Unsupervised Data Base Clustering Based on Daylight\u0026apos;s Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. \u003cem\u003eJournal of Chemical Information and Computer Sciences\u003c/em\u003e\u003cstrong\u003e39\u003c/strong\u003e (1999).\u003c/li\u003e\n \u003cli\u003eDauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. \u003cem\u003eArXiv:1406.2572\u003c/em\u003e (2014).\u003c/li\u003e\n \u003cli\u003eAkiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M. Optuna: A next-generation hyperparameter optimization framework. \u003cem\u003eArchiv\u003c/em\u003e (2019).\u003c/li\u003e\n \u003cli\u003ePourjamal, N., Le Joncour, V., Vereb, G., Honkamaki, C., Isola, J., Leyton, J.V., Laakkonen, P., Joensuu, H., Barok, M. Disitamab vedotin in preclinical models of HER2-positive breast and gastric cancers resistant to trastuzumab emtansine and trastuzumab deruxtecan. \u003cem\u003eTransl. Oncol.\u003c/em\u003e\u003cstrong\u003e53\u003c/strong\u003e (2025).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6256038/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6256038/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Antibody-drug conjugates (ADCs) represent a significant advancement in therapeutic oncology, as they precisely deliver cytotoxic drugs to target tumor cells. However, ADC development is complex due to the entangled interplay between chemical design and tumor cell biology. Therefore, a platform was developed consisting of an ADC-tumor cell interconnected multimodal framework for machine learning applications. It contains ADC records from the past two decades that details linkers, payloads, drug-antibody ratios, and cytotoxicity IC50 values. Biological interconnection was achieved through integrating omics data from ~1,400 human tumor cell lines. Moreover, a protein intensity prediction tool was developed that further enriched the multifaceted framework by concentrating on cell surface antigens. A deep learning model was trained on the framework and accurately predicted ADC in vitro activity across tumor cell lines at relevant nanomolar thresholds. This work exposes the complexities at the ADC-tumor cell interface and can significantly influence current empirical ADC design decisions.","manuscriptTitle":"A Machine Learning Platform for Interconnecting Antibody-Drug Conjugate Cytotoxic Design with Tumor Cell Biology","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-23 06:49:25","doi":"10.21203/rs.3.rs-6256038/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"65ddda70-5c2f-4536-a80c-f9cf94c67b7c","owner":[],"postedDate":"April 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":45923255,"name":"Biological sciences/Drug discovery/Drug screening/Virtual screening"},{"id":45923256,"name":"Biological sciences/Computational biology and bioinformatics/Data integration"}],"tags":[],"updatedAt":"2025-07-04T11:41:07+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-23 06:49:25","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6256038","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6256038","identity":"rs-6256038","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-19T01:45:01.086888+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0