Circumventing the synthesizability problem in generative molecular design

doi:10.64898/2026.02.18.706722

Circumventing the synthesizability problem in generative molecular design

2026 · doi:10.64898/2026.02.18.706722

preprint OA: closed Public-Domain

📄 Open PDF Full text JSON View at publisher

Full text 47,510 characters · extracted from oa-pdf · 13 sections · click to expand

Abstract

Generative structure-based drug design (SBDD) models have shown great promise to accelerate our ability to discover novel drug candidates. However, these models have been criticized for producing compounds that are not very synthesizable, and therefore not practically applicable to drug design. In this work, we propose a way to circumvent the synthesizability issue by introducing a model -guided virtual screening (MGVS) pipeline which pairs SBDD models with efficient chemical similarity search

Methods

to ide ntify synthesizable analogs of generated compounds in existing ultra -large compound databases. Using this approach, we demonstrate that synthesizable analogs of generated compounds with equivalent or better docking scores and similar predicted binding pose s can be reliably identified across a wide range of protein targets. We find that MGVS outperforms standard virtual ligand screening (VLS), consistently yielding at least a 25x improvement in screening efficiency across three different SBDD models. As drug-like chemical spaces continue to grow and standard VLS methods focused on exhaustive screening become increasingly impractical, approache s like MGVS that effectively narrow the search space will become critical for advancing drug discovery.

Introduction

Generative deep learning methods for small-molecule drug design have captured widespread attention in both academia and industry. These approaches promise to accelerate early-stage drug discovery by identifying novel chemical structures tailored to specific biological targets, potentially unlocking areas of chemical space unexplored by traditional methods. Despite their promise, these models face a critical challenge that limits their practical utility: the synthesizability of generated compounds. This is a particular issue in the early stages of the drug discovery process , where the ability to rapidly procure and validate potential hit compounds is essential, and custom synthesis of novel chemical structures at scale is impractically slow and expensive. For this reason, traditional virtual ligand screening (VLS) approaches have focused on screening commercially available libraries of compounds with known synthesis routes1–4. Historically, such libraries have been limited to only a few million compounds , whereas drug-like chemical space is on the order of 10 60 compounds5. This limitation has been partially addressed with the development of virtual chemical spaces1—enumerated or reaction -defined collections of make -on-demand compounds that can be synthesized reliably at scale. Public databases now include billions2 to trillions6 of compounds, constructed by applying known reactions to commercially available building blocks. While these advances have dramatically expanded the accessible chemical search space, they have also introduced the daunting computational challenge of screening chemical spaces of this s cale. Parallel advancements in VLS methods such as synthon-based screening7,8 have lowered the computational overhead by systematically docking chemical building blocks instead of every reachable compound, thus helping to mitigate the combinatorial explosion . As virtual spaces continue to grow , however, such knowledge-free approaches that rely on exhaustively screening the chemical space will become increasingly impractical. In recognition of this challenge, recent efforts have turned toward advanced sampling strategies—ranging from traditional heuristics to modern AI-based methods1,9,10 to narrow the search space by screening a small fraction of the chemical space and using the most remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 3 promising candidates to guide further search. However, these approaches remain constrained by their dependence on large -scale enumeration and iterative evaluation, which limits their efficiency and scalability as chemical spaces expand. Generative deep learning models have recently emerged as a compelling alternative. These models can learn distributions of chemical structures and protein–ligand interactions11–13 from large chemical datasets, offering the potential to navigate chemical space without explicit enumeration11,13. When conditioned on structural information from a protein target, these models can generate compounds that fit within the protein pocket and exhibit favorable binding properties. Despite their promise, generative

Methods

have been criticized for producing compounds that are either synthetically inaccessible or chemically implausible14–16. In response, there has been a focus on restricting models to explicitly learn distributions over synthesizable subspaces17–19, but these methods often result in making trade -offs in terms of molecular diversity and controllability. In addition, the concept of a “synthesizable” compound is not well-defined ab initio in terms of chemical structure or properties but instead depends on the ever- changing set of available chemical synthesis ingredients and techniques. Restricting models to such arbitrary subspaces is not straightforward and could detract from the primary goal: generating true target-specific binders. Alternatively, we hypothesize that these synthesizability issues could be circumvented by pairing generative models with established chemical similarity search techniques to identify synthesizable analogs. In this way, rather than attempting to supplant the traditional discovery pipeline by directly generating ideal, synthesizable drug candidates, generative methods may be more effectively used to steer discovery toward tractable, high -potential subspaces —within which synthesizable compounds can be identified. In this work , we introduce a model-guided virtual screening (MGVS) pipeline that involves using structure-based generative models to sample compounds conditioned on a target pocket, selecting the compounds with strongest binding affinity, and retrieving similar compounds from existing chemical databases. We show that synthesizable compounds with strong predicted binding affinity and similar docking poses can be reliably identified using this generate-then-retrieve approach, and that the quality of the discovered candidates represent s at least a 25x improvement in screening efficiency over standard VLS. These results are consistent across three state-of-the-art generative SBDD models with diverse architectures: DrugHIVE11 (hierarchical VAE), Pocket2Mol12 (autoregressive), and DiffSBDD13 (diffusion), exhibiting the broad applicability of this approach. In summary, this work demonstrates that: 1) existing generative SBDD models reliably identify high potential chemical subspaces and thus narrow the search for high quality candidates and 2) though not always directly synthesizable, generated compounds tend to have synthesizable analogs identifiable via similarity search . The synthesizability issue therefore does not represent a significant obstacle to their effective application to drug discovery. Improving the capacity of these models to generate high quality target -specific binders, regardless of synthesizability, could further increase the effectiveness of MGVS over traditional screening methods.

Methods

AND DATA An overview of our MGVS pipeline for de novo molecular generation followed by synthesizable analog search is shown in Figure 1. Compounds are generated using three top SBDD models with a range of architectures: DrugHIVE11, a density-based hierarchical variational autoencoder model; Pocket2Mol12, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 4 a graph-based equivariant autoregressive model; and DiffSBDD13, a graph-based equivariant diffusion model. For each model and target, the following five step MGVS process was used: 1) The target pocket information is provided to the generative SBDD model and used to generate 1000 compound s using default settings. 2) Compounds are then docked to the target pocket and scored using QuickVina220. 3) Compounds are filtered to remove those with PAINS21 patterns, poor drug-like properties or irregular structures. The top-10 scoring compounds that remain are selected as queries for analog search. 4) For each of the top-10 generated query compounds, hierarchical graph edit distance (GED) similarity search is performed using SmallWorld22 to identify the 100 most similar compounds from existing ultra -large libraries: Enamine REAL, WuXi GalaXi, and ZINC. 5) These compounds are docked to target pocket and scored for predicted binding affinity using QuickVina2 20. The best compounds, representing synthesizable analogs, are identified for each generated query based on docking score. Compound Filtering To avoid inaccurate docking results, we filter out molecules with irregular configurations that lead to strained geometries similar to ref11. We remove molecules with consecutive double bonds, fused ring systems of more than four members or that form loops, and rings other than five - and six-membered rings. For consistency we apply this filtering to all compounds sets. Figure 1: Overview of de novo generation and synthesizable analog search pipeline. 1) Target pocket information is input into the generative SBDD model and used to generate 1000 compounds. 2) Generated compounds are docked to the target pocket and scored for pred icted binding affinity. 3) Compounds are filtered based on chemical structure and properties, and top-10 scoring compounds are selected for analog search. 4) For each generated query compound, efficient hierarchical graph edit distance (GED) similarity search is performed to identify the 100 most similar compounds from existing ultra -large libraries Enamine REAL, ZINC, WuXi GalaXi. 5) Top-100 most similar search-hits per query are docked to target pocket and scored for predicted binding affinity. The best synthesizable analogs for each generated query are selected based on docking. remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 5 Synthesizable Analog Search We perform GED search using SmallWorld22, which is the default tool used for similarity search in the ZINC database3. SmallWorld uses pre-indexing of each library’s topological space as anonymous graphs which can then be used for rapid (sublinear) search of compounds with matching or similar topology. With the topological relationship between two compounds known, the GED calculation is greatly simplified. SmallWorld also calculates the Daylight fingerprint and ECFP4 fingerprint Tanimoto distances. Conformer Generation Starting from the SMILES string representation of a molecule, we generate five conformers (3D coordinates) using the ETKDG algorithm in RDKit23. To reduce docking of redundant structures, w e perform Butina clustering using RMSD as the distance metric on the five conformers, keeping only the centroid for each cluster. We then dock each remaining conformer to the corresponding target and keep the best scoring conformer as a representative. Ligand Docking Docking compounds to target pockets is performed with QuickVina220 (with default settings and a 20 Å wide bounding box) , a computationally efficient version of AutoDock Vina 24. Protein structures are prepared for docking using AutoDockTools25. Prior to docking, we perform force field optimization on all molecular structures using the MMFF94 force field 26 in RDKit23. This step is especially important with de-novo generated structures, which are likely to have unrelaxed conformations. Using unrelaxed structures before docking could lead to significantly exaggerated docking scores. Evaluation For evaluation, we chose 30 protein targets from the PDBbind general set. Our selection process includes: 1) clustering all protein sequences by 30% sequence identity , 2) randomly sampl ing 30 clusters and 3) choosing one random target per cluster. We generate 1000 molecules for each of the receptors in the test set using each of the three SBDD models. As a reference set for random virtual screening, we randomly sampled 50k compounds from the ZINC20 drug-like subset. We perform force field optimization on each of the molecules using MMFF94 in RDKit 23 before virtually docking to the corresponding receptor. However, Vina score is strongly correlated with molecule size11 which prevents direct comparison between sets of compounds with different size distributions. For this reason, we primarily report Vina efficiency, or Vina score divided by number of heavy atoms, as it corrects for this size bias. Table 1: Protein-ligand interaction cutoff criteria. For each interaction type, we list the PLIP27 non-default cutoff criteria that we use for identifying intermolecular interactions. All table values are distance cutof fs, except for hydrogen bonds. For hydrogen bonds, we choose a stricter distance and D -H-A angle cutoff. In the case of π interactions, the table values refer to center–center or center–cation distance. H-bond Hydrophobic Salt bridge π-stack (parallel) π-stack (perpendicular) π-cation 3.5 Å, 120° 3.8 Å 4.0 Å 4.0 Å 5.5 Å 4.0 Å remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 6 Evaluation Properties and Metrics Binding affinity is estimated for each ligand with the Vina docking score calculated with QuickVina220, a speed optimized version of AutoDock Vina24. We define ΔVina as the difference between the Vina score of a generated ligand and a reference ligand (e.g., query ligand or ligand from crystal structure). Vina efficiency score (or ligand efficiency) is calculated as the Vina score divided by the number of heavy atoms. Graph Edit Distance (GED) is the number of edits needed to be made to the chemical structure of a compound to make it identical to a reference compound, and in this work is calculated using SmallWorld22. Daylight distance is calculated as one minus the Tanimoto distance between the Daylight fingerprints of a pair of molecules. ECFP4 distance is calculated as one minus the Tanimoto distance between the ECFP4 fingerprints of a pair of molecules. The Quantitative Estimate of Drug -Likeness (QED) score estimates the drug -likeness of a molecule by combining a set of molecular properties 28. The Synthetic Accessibility (SA) score estimates the ease of synthesis of a molecule29.

Results

Generated compounds tend to have synthesizable analogs We start by generating approximately one thousand molecules for each of our test target pockets using three SBDD generative models: DrugHIVE 11, DiffSBDD 13, and Pocket2Mol 12. We then dock each generated molecule into the corresponding target pocket using QuickVina2 20. To select our final generated set, we filter out molecules with PAINS patterns, unusual structures that could cause inaccurate docking results (see Compound Filtering) , or properties outside of the typical drug -like range30. We then rank each compound by normalized Vina score and take the 10 best compounds for each model and target. The properties of all generated molecules compared to the final set can be seen in Figure S1. For each compound in the final generated query set, we perform a similarity search over existing commercially available and readily synthesizable compound libraries . For each generated query compound, we use the SmallWorld22 API to search for up to 1000 closest compounds by GED within a maximum GED of 12, for each of the Zinc3, Enamine REAL2, and Wuxi GalaXi3 libraries. We then rank the returned search compounds by GED and then Daylight distance, selecting the top-100 for our final search-hit set. We then generate 3D conformers and dock each final search-hit compound to the corresponding protein pocket. The resulting top search-hit compounds are highly synthesizable and even tend to have improved predicted binding scores compared to the corresponding generated query compounds. As expected, we find a drastic improvement in predicted synthesizability of the top search-hit compounds when compared to the generated query compounds (Figure 2a). We also find an average improvement in predicted binding affinity compared to the query compounds (Figure 2b), with virtually all (98.7%) search-hit compounds within the Vina estimated margin of error of ±1.5 kcal/mol (Figure 2c). We also see that virtually all of the top search-hit compounds have equivalent or better predicted binding affinity compared to the corresponding PDB co-crystal ligand (Figure 2d), with a large number (38.8%) achieving significantly better scores (ΔVina < –1.5 kcal/mol). For all three models tested, we see a remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 7 similar distribution shift toward improved predicted binding efficiency in the search-hit compounds, with an average improvement of the median Vina efficiency (Figure 2e). In Figure 2f, we show a positive Spearman correlation between GED and worse binding affinity scores (ρ=0.44). We observe similar , though weaker, correlations for ECFP4 distance (ρ=0.31) and Daylight distance (ρ=0.14). Since lower similarity search-hits from these databases tend to have worse binding scores than the query, this suggests that the generated query compounds indeed represent good templates for a successful predicted binder. Further, GED appears to be a better predict or of binding score than the two fingerprint-based similarity measures . This is likely due to GED being a better measure of topological and structural similarity, which are key determinants of shape complementarity and interaction geometry within the binding pocket. Figure 2: Generated compounds vs. search-hit compounds. (a) Histogram showing distribution of Synthetic Accessibility (SA) score for generated query and search-hit compounds. (b) Histogram of ΔVina* scores for top-1 search-hit compounds with mean (black dashed line) and Vina uncertainty (shaded). (c) Vina scores for top-1 search-hit compounds versus the corresponding query compound for each target (colors) with Vina uncertainty (shaded). (d) Vina scores for top-1 search-hit compounds versus the corresponding crystal ligand for each target (colors) with Vina uncertainty (shaded). (e) Distributions of Vina efficiency for queries (dark) and top-1 search-hits (light) for each model (colors). (f) Distribution of ΔVina scorses for all (top-100) search- hits with respect to graph edit distance (GED), Daylight distance, and ECFP4 distance. Linear regression fit (grey line) shown along with computed Spearman (ρ) correlation coefficient. *ΔVina = Vinasearch-hit – Vinaquery remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 8 Generative SBDD paired with analog search outperforms random screening To be practically useful, a generative approach combined with similarity search needs to outperform random virtual screening. To assess whether this is the case, we compare our generative approach

Results

with screening of random compounds from the ZINC 31 database. We randomly sampled 50k compounds from the Zinc drug-like compound subset and docked all compounds into each of our test pockets. In Figure 3a-b, we compare the Vina efficiency of the top-10 compounds from the generated queries, search-hits, and different size random Zinc subsets (ranging from 1k to 50k ). The query compound distributions, which were selected from 1k generated compounds for each target , show significantly better Vina efficiency (Welch’s t -test p<0.05) than screening 10k (Pocket2Mol), 30k (DrugHIVE), and 50k (DiffSBDD) Zinc compounds. The search-hit compound distributions for all models show significant improvement (Welch’s t -test p<0.05) compared to the ir respective query compound distributions. Across all models, the top-10 similarity search-hits score better than those from screening up to 50k random Zinc compounds, despite only scree ning 2k compounds (including queries) per target—representing a 25x improvement in screening efficiency. These results represent selection from a docked candidate pool of the 100 most similar search-hits per query compound, however, docking and ranking fewer search-hits could lower the computational cost while still achieving favorable results. To assess this, we re-ran selection using different size search-hit candidate pools (docked search-hits) ranging between 1 and 100. In Figure 3c, we show the resulting ΔVina averaged across all models and targets ranges from -0.1 to 0.2, with parity (ΔVina=0) at approximately 40 search-hits docked per query. Surprisingly, even when using only a single search- hit, the upper limit of the ΔVina 95% confidence interval remains at a modest +0.3 kcal/mol—well within Figure 3: Comparison with random Zinc compounds. (a) Median Vina efficiency of the top-10 compounds for different size subsets of the random Zinc set relative to query (dashed) and search (solid) hit compounds for each model (color). Calculations for Zinc subsets are repeated a total of n=20 times, with median average and std. dev. shown. (b) Boxplots show the distributions of top -10 Vina efficiency for generated query compounds, search-hits, and Zinc (random) subsets of varying size. For each query or search-hit distribution, the largest Zinc subset distribution with significant Welch’s t-test result is indicated. Significance denoted as: p < 0.05 ( ∗), p < 0.01 ( ∗∗), p < 0.001 (∗∗∗), p < 0.0001 ( ∗∗∗∗). (c) Average ΔVina between top-10 scoring search-hit compound and query compound vs. number of total ranked search-hits that were docked to the target pocket, with 95% confidence interval (shaded). remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 9 the estimated ±1.5 kcal/mol Vina uncertainty24. This demonstrates that synthesizable analogs with comparable docking scores can be reliably identified for generated compounds even with relatively few search-hits screened per query. Shared intermolecular interactions between predicted binding poses Our results so far have shown that using similarity search to identify synthesizable analogs to de novo generated compounds reliably results in highly similar compounds with strong predicted binding affinity. However, this does not guarantee that the search-hit compounds have a similar predicted binding pose within the target pocket. To assess this, we compared the docked intermolecular interactions made by the top predicted binding pose of the search-hit compounds to those made by corresponding generated query compound. Interactions were identified using the Protein-Ligand Interaction Profiler (PLIP)27 on the top ranking docked pose from QuickVina220 for each compound. We consider matching interactions between query compound and search-hit compound at two levels of specificity: 1) same target residue and 2) same specific target atom. In Table 2, we show a summary of the proportion of search-hits having shared interactions with the generated query compound for each interaction type (H-bond, hydrophobic, salt bridge, π -cation, and π-stacking) across all models and targets . We observe that a majority of search-hits share at least one interaction with the generated query across all interaction types. We also see that search-hits are less likely to share the more frequent H-bond and hydrophobic interactions than the less frequent ones, except for π-stacking which is both frequent and highly conserved. Table 2: Shared intermolecular interactions by interaction type. Proportion of search-hits having shared interactions with the generated query compound across all models and targets . Statistics are reported for search-hits that share all or at least 1 of the query interactions, as well as for different positive match types: matching interactions share the same 1) target atom or 2) target residue. For each interaction type the total number (n) of search-hit compounds with a query compound predicted to make the corresponding interaction with the target is reported. # matched Match type H-bond (n=36404) Hydrophobic (n=62582) Salt bridge (n=1332) π-cation (n=474) π-stacking (n=29830) all atom 0.17 0.12 0.58 0.79 0.57 residue 0.27 0.30 0.58 0.79 0.83 at least 1 atom 0.65 0.75 0.73 0.79 0.90 residue 0.69 0.86 0.73 0.79 0.92 In Figure 4, we show the proportion of docked search-hit compounds that share at least one or all specific query compound interactions for each generative model. Here we choose to focus on specific non-hydrophobic interactions (Hydrogen bonds, π interactions, and salt bridges), because we are primarily interested in assessing how often search-hits exhibit the same type of target specificity as the query. The same statistics for all interactions can be found in Figure S2. In Figure S3, we show the distribution of intermolecular interactions made by all generated query compounds. The average number of intermolecular interactions made by query compounds is 2.5 (specific) and 5.2 (all). Across all query compounds, models and targets, an average of 19.5% (atom match) to 31.4% (residue match) of search-hits (top-100) share all specific interactions with the query and 76.7% (atom match) to 79.5% (residue match) share at least one (Figure 2a). For top-1 search-hits (Figure 2b), we see that a majority remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 10 of queries have at least one search-hit sharing all specific interactions (50.9% and 80.7% for atom and residue match, respectively) and nearly all queries have at least one search-hit that shares an interaction (99.2% for both atom and residue match) . In Figure 4c, we show that there is some variation in the proportion of queries with search-hit compounds matching all interactions across the different protein targets. However, even for the lowest performing target (PDB ID 5G1P) 10.5% (atom match) to 20.0% (residue match) of query compounds have an exact interaction analog. For all targets , nearly every query compound has a search-hit with at least one interaction. To give a better sense of what the docked poses of interaction analogs look like, in Figure 4d we show a selection of docked poses with highlighted intermolecular interactions for query and search-hit pairs with all interactions shared (atom match) for four of the targets. We report the Vina scores of each Figure 4: Shared intermolecular interactions between generated queries and search-hits. (a-c) Bar plots showing proportion of search-hits that share all or at least 1 of query specific protein–ligand interactions. Proportions are shown for both exact residue match (residue) and exact atom match (atom) criteria. (a) Proportion of top -100 search-hits for each query with matching specific interactions for each model. (b) Proportion of top -1 search-hits for each query with matching specific interactions for each model. (c) Proportion of top-1 search-hits for each query with matching specific interactions by protein target across all models. (d) Docked poses of search-hit (yellow carbon) and query (cyan carbon) compound pairs with protein-ligand interactions shown. Vina docking score, number of shared interactions, search-hit database, and chemical distance metrics (GED, ECFP4 distance, Daylight distance) are shown for each pair. remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 11 compound, the number of shared interactions, database of the search-hit, and the molecular distance metrics. We observe that search-hits with many shared query contacts tend to also have similar predicted binding poses and a relatively low GED. Interestingly, fingerprint-based distance metrics do not always agree with GED (or with each other) . For example, the pair of compounds displayed for target PDB ID 3JUZ have clear structural similarities and a relatively low GED (3) but high Daylight distance and ECFP4 distance (0.74 for both). For the pair of compounds displayed for target PDB ID 6I18, GED (3) and Daylight distance (0.29) are low while ECFP4 distance (0.67) is high. To quantify this, we calculated the Spearman correlation between each of the molecular distance metrics and the number of shared interactions (atom match). We found that GED has the highest correlation, especially as the number of protein–ligand interactions made by the query compound increases (Figure S4). These

Results

are consistent with the earlier findings (Figure 2f) that GED is a better predictor of binding affinity score than the fingerprint-based similarity metrics.

Discussion

In this work, we have shown that current generative structure-based drug design (SBDD) models paired with similarity-based search can be used to discover synthesizable compounds with high predicted binding affinity for a wide range of protein targets. Generative models tend to produce compounds with low synthetic accessibility scores14–16, which is a key limitation to practical drug discovery4. To address this, we show that model-guided virtual screening (MGVS)—using generated compounds to guide the search for synthesizable analogs —effectively overcomes this limitation. We apply MGVS across 30 diverse protein targets and observe that for three separate SBDD models (DiffSBDD13, DrugHIVE11, and Pocket2Mol12) that screening only 2k compounds leads to better quality candidates than standard VLS screening of 50k ZINC compounds, representing a 25x improvement in efficiency. Although current generative models tend to produce compounds with low synthesizability , they are effective at identifying compounds located in promising regions of chemical space. Once these regions have been identified, similarity search can be used to find adjacent synthesizable analogs. Across all models and targets, we find that virtually all (98.7%) generated compounds selected for analog search yielded a search-hit with equivalent or better docking scores. We also found that search-hits commonly have similar docking poses to the query compound, indicating that the compounds produced by generative models help identify promising potential binding modes. We observe that ~50% of query compounds yielded a search-hit with a docking pose sharing all specific (non-hydrophobic) residue-level protein-ligand interactions, and virtually all (~99%) yielded a search-hit sharing at least one interaction (Figure 4b). Assuming that the number of shared intermolecular interactions is an indicator of docking pose similarity (Figure 4d), this demonstrates consistent success in identifying predicted binding analogs. Altogether these results demonstrate that current generative models, despite their limitations, can already be used to improve virtual screening efficiency through MGVS. We show that identifying compounds with strong predicted affinity for a protein target, regardless of synthesizability, is useful for guiding the screening of high -potential subsets of existing ultra -large chemical databases. To be successful, this approach relies on search methods efficient enough to operate on these large and growing chemical spaces1, which could be viewed as a limitation. However, improving chemical search remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 12

Methods

is an active area of researc h32,33 that should enable efficient querying of expanded chemical databases. As chemical databases grow and search methods improve, we would expect MGVS to become even more effective due to improved retrieval. On the contrary, the expansion of chemical spaces poses a real challenge to existing VLS methods that rely on exhaustive screening9. Even for the most efficient synthon -based VLS approaches used to screen current ultra -large libraries, scaling to chemical spaces orders of magnitude larger will require innovations that surpass what current methods can support. There are efforts to overcome the generative SBDD synthesizability issue by designing models restricted to synthesizable subspaces17–19. However, we observe that even for the current (unrestricted) models that we tested, while they are able to generate high quality binding candidates, the set of synthesizable analogs identified based on these initial queries often contain compounds with even better predicted binding. This suggests that current models cannot consistently generate the most optimal compounds within these localized regions of chemical space and restricting them further could lead to even worse performance. Efforts to improve the ability of models to generate ideal binders, regardless of synthesizability, seem to be warranted. Applyi ng current models in MGVS already achieves a ~25x improvement in screening efficiency over standard virtual screening, so further improvement to models that can be utilized in this paradigm could have significant impact on early drug discovery. ASSOCIATED CONTENT Supporting Information Property distributions of generated compounds; shared intermolecular interactions between search queries and hits for all interaction types (including hydrophobic); distributions of number of protein-ligand interactions for docked poses across all generated query compounds; correlation of molecular similarity metrics with number of shared interactions between query and search-hit compounds (PDF) AUTHOR INFORMATION Corresponding Authors Remo Rohs - Department of Quantitative and Computational Biology, Department of Chemistry, Department of Physics & Astronomy, Thomas Lord Department of Computer Science, Division of Medical Oncology in the Department of Medicine, Alfred E. Mann Department of Biomedical Engineering, University of Southern California, Los Angeles, CA 90089, USA ; https://orcid.org/0000- 0003-1752-1884; Email: [email protected] Authors Jesse A. Weller - Department of Quantitative and Computational Biology, Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA ; https://orcid.org/0000- 0001-7184-4241; Jinsen Li - Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA; https://orcid.org/0000-0002-1015-5263; remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 13 Yibei Jiang - Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA; https://orcid.org/0009-0002-9785-3343. Author Contributions Project conception (J.A.W., R.R.), lead project design (J.A.W.), supporting project design (J.L., Y.J., R.R.), data generation and analysis (J.A.W.), manuscript writing (J.A.W.), manuscript edits (J.A.W., J.L., Y.J., R.R.), project supervision and funding (R.R.). Funding Sources This work was supported by the National Institutes of Health [grant R35GM130376 to R.R.] and a University of Southern California Office of Research and Innovation SBIR/STTR Planning Award [to R.R.].

Acknowledgement

The authors acknowledge helpful discussions with other members of the Rohs lab. ABBREVIATIONS ALogP, octanol-water partition coefficient (Crippen); HBA, hydrogen-bond acceptors; HBD, hydrogen- bond donors; HTS, high -throughput screening; MGVS, model-guided virtual screening; PAINS, Pan- Assay Interference Compounds; PDB, Protein Data Bank; QED, Quantitative Estimate of Drug-likeness; RMSD, root mean square distance; SA, Synthetic Accessibility; SAR, structure -activity relationship; SBDD, structure-based drug design; SMILES, Simplified Molecular Input Line Entry System ; TPSA, topological polar surface area; VAE, variational autoencoder; VLS, virtual ligand screening; DATA AND SOFTWARE AVAILABILITY Data used in evaluation including molecules, molecular properties, and docking scores, are available in supplementary information.

References

1. Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023). 2. Grygorenko, O. O. et al. Generating Multibillion Chemical Space of Readily Accessible Screening Compounds. iScience 23, 101681 (2020). 3. Tingle, B. I. et al. ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023). 4. Lam, J. H. & Katritch, V. Navigating structure-based drug discovery with emerging innovations in physics- and knowledge-based approaches. npj Drug Discov. 2, 29 (2025). remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 14 5. Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: A molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996). 6. Neumann, A., Marrison, L. & Klein, R. Relevance of the Trillion -Sized Chemical Space “eXplore” as a Source for Drug Discovery. ACS Med. Chem. Lett. 14, 466–472 (2023). 7. Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 1–8 (2021) doi:10.1038/s41586-021-04220-9. 8. Beroza, P. et al. Chemical space docking enables large-scale structure-based virtual screening to discover ROCK1 kinase inhibitors. Nat. Commun. 13, 6447 (2022). 9. Klarich, K., Goldman, B., Kramer, T., Riley, P. & Walters, W. P. Thompson Sampling─An Efficient

Method

for Searching Ultralarge Synthesis on Demand Databases. J. Chem. Inf. Model. 64, 1158– 1171 (2024). 10. Cavasotto, C. N. & Di Filippo, J. I. The Impact of Supervised Learning Methods in Ultralarge High- Throughput Docking. J. Chem. Inf. Model. 63, 2267–2280 (2023). 11. Weller, J. A. & Rohs, R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. J. Chem. Inf. Model. 64, 6450–6463 (2024). 12. Peng, X. et al. Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. in ICML 2022 17644–17655 (PMLR, 2022). 13. Schneuing, A. et al. Structure-based Drug Design with Equivariant Diffusion Models. Preprint at https://doi.org/10.48550/arXiv.2210.13695 (2023). 14. Gao, W. & Coley, C. W. The Synthesizability of Molecules Proposed by Generative Models. J. Chem. Inf. Model. 60, 5714–5723 (2020). 15. Gao, W., Luo, S. & Coley, C. W. Generative Artificial Intelligence for Navigating Synthesizable Chemical Space. Preprint at https://doi.org/10.48550/arXiv.2410.03494 (2024). 16. Buttenschoen, M., M. Morris, G. & M. Deane, C. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024). 17. Cretu, M. et al. SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints. Preprint at https://doi.org/10.48550/arXiv.2405.01155 (2025). 18. Koziarski, M. et al. RGFN: Synthesizable Molecular Generation Using GFlowNets. Preprint at https://doi.org/10.48550/arXiv.2406.08506 (2024). 19. Seo, S. et al. Generative Flows on Synthetic Pathway for Drug Design. Preprint at https://doi.org/10.48550/arXiv.2410.04542 (2025). 20. Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C. -K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015). 21. Baell, J. B. & Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 53, 2719–2740 (2010). remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint 15 22. SmallWorld. NextMove Software. https://www.nextmovesoftware.com/smallworld.html (2025). There is no corresponding record for this reference. 23. RDKit: Open-source cheminformatics. https://www.rdkit.org. 24. Trott, O. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010). 25. Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009). 26. Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996). 27. Adasme, M. F. et al. PLIP 2021: expanding the scope of the protein –ligand interaction profiler to DNA and RNA. NAR 49, W530–W534 (2021). 28. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012). 29. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug -like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009). 30. Lipinski, C. A. Lead - and drug-like compounds: the rule -of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004). 31. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. Journal of Chemical Information and Modeling http://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00675. 32. Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of Ultralarge Compound Collections for Drug Discovery. J. Chem. Inf. Model. 62, 2021–2034 (2022). 33. Bellmann, L., Penner, P., Gastreich, M. & Rarey, M. Comparison of Combinatorial Fragment Spaces and Its Application to Ultralarge Make -on-Demand Compound Catalogs. J. Chem. Inf. Model. 62, 553–566 (2022). remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, The copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: Public-Domain