{"paper_id":"187ea3cd-ea61-40a5-a9be-9e63db058ca2","body_text":"1 \nCircumventing the synthesizability problem in generative molecular design \nJesse A. Weller1,2, Jinsen Li1, Yibei Jiang1,†, and Remo Rohs1,2,3,4,5,6,* \n1Department of Quantitative and Computational Biology, University of Southern California, Los \nAngeles, CA 90089, USA \n2Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, \nUSA \n3Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA \n4Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA \n90089, USA \n5Division of Medical Oncology, Department of Medicine, University of Southern California, Los \nAngeles, CA 90033, USA \n6Alfred E. Mann Department of Biomedical Engineering, University of Southern California, Los \nAngeles, CA 90089, USA \n†Present address: Illumina, Inc., 5200 Illumina Way, San Diego, CA 92122, USA \n \n*Correspondence: Tel: +1 213 740 0552; Email: rohs@usc.edu  \n \n \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n2 \nABSTRACT \n \nGenerative structure-based drug design (SBDD) models have shown great promise to accelerate our \nability to discover novel drug candidates. However, these models have been criticized for producing \ncompounds that are not very synthesizable, and therefore not  practically applicable to drug design. In \nthis work, we propose a way to circumvent the synthesizability issue by introducing a model -guided \nvirtual screening (MGVS) pipeline which pairs SBDD models with efficient chemical similarity search \nmethods to ide ntify synthesizable analogs of generated compounds in existing ultra -large compound \ndatabases. Using this approach, we demonstrate that synthesizable analogs of generated compounds \nwith equivalent or better docking scores and similar predicted binding pose s can be reliably identified \nacross a wide range of protein targets. We find that MGVS outperforms standard virtual ligand screening \n(VLS), consistently yielding at least a 25x improvement in screening efficiency across three different \nSBDD models. As drug-like chemical spaces continue to grow and standard VLS methods focused on \nexhaustive screening become increasingly impractical, approache s like MGVS that effectively narrow \nthe search space will become critical for advancing drug discovery. \nINTRODUCTION \nGenerative deep learning methods for small-molecule drug design have captured widespread attention \nin both academia and industry. These approaches promise to accelerate early-stage drug discovery by \nidentifying novel chemical structures tailored to specific biological targets, potentially unlocking areas of \nchemical space unexplored by traditional methods. Despite their promise, these models face a critical \nchallenge that limits their practical utility: the synthesizability of generated compounds.  This is a \nparticular issue in the early stages of the drug discovery process , where the ability to rapidly procure \nand validate potential hit compounds is essential, and custom synthesis of novel chemical structures at \nscale is impractically slow and expensive. \n \nFor this reason,  traditional virtual ligand screening (VLS) approaches have focused on screening \ncommercially available libraries of compounds with known synthesis routes1–4. Historically, such libraries \nhave been limited to only a few million compounds , whereas drug-like chemical space is on the order \nof 10 60 compounds5. This limitation has been partially addressed  with the development of virtual \nchemical spaces1—enumerated or reaction -defined collections of make -on-demand compounds that \ncan be synthesized reliably at scale. Public databases now include billions2 to trillions6 of compounds, \nconstructed by applying known reactions to commercially available building blocks. \n \nWhile these advances have dramatically expanded the accessible chemical search space, they have \nalso introduced the daunting computational challenge  of screening chemical spaces of this s cale. \nParallel advancements in VLS methods such as  synthon-based screening7,8 have lowered the \ncomputational overhead by systematically docking chemical building blocks instead of every reachable \ncompound, thus helping to mitigate the combinatorial explosion . As virtual spaces continue to grow , \nhowever, such knowledge-free approaches that rely on exhaustively screening the chemical space will \nbecome increasingly impractical. In recognition of this  challenge, recent efforts have turned toward  \nadvanced sampling strategies—ranging from traditional heuristics to modern AI-based methods1,9,10 to \nnarrow the search space by screening a small fraction of the chemical space and using the most \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n3 \npromising candidates to guide further search. However, these approaches remain constrained by their \ndependence on large -scale enumeration and iterative evaluation, which limits their efficiency and \nscalability as chemical spaces expand. \n \nGenerative deep learning models have recently emerged as a compelling alternative. These models \ncan learn distributions of chemical structures and protein–ligand interactions11–13 from large chemical \ndatasets, offering the potential to navigate chemical space without explicit enumeration11,13. When \nconditioned on structural information from a protein target, these models can generate compounds that \nfit within the protein pocket and exhibit favorable binding properties. Despite their promise, generative \nmethods have been criticized for producing compounds that are either synthetically inaccessible or \nchemically implausible14–16. In response, there has been a focus on restricting models to explicitly learn \ndistributions over synthesizable subspaces17–19, but these methods often result in making trade -offs in \nterms of molecular diversity and controllability. In addition, the concept of a “synthesizable” compound \nis not well-defined ab initio in terms of chemical structure or properties but instead depends on the ever-\nchanging set of available chemical synthesis ingredients and techniques. Restricting models to such \narbitrary subspaces is not straightforward and could detract from the primary goal: generating true \ntarget-specific binders.  Alternatively, we hypothesize that  these synthesizability issues  could be \ncircumvented by pairing generative models with established chemical similarity search techniques to \nidentify synthesizable analogs. In this way, rather than attempting to supplant the traditional discovery \npipeline by directly generating ideal, synthesizable drug candidates, generative methods may be more \neffectively used to steer discovery toward tractable, high -potential subspaces —within which \nsynthesizable compounds can be identified. \n \nIn this work , we introduce a model-guided virtual screening (MGVS)  pipeline that involves using \nstructure-based generative models to sample compounds conditioned on a target pocket, selecting the \ncompounds with strongest binding affinity, and retrieving similar compounds from existing chemical \ndatabases. We show that synthesizable compounds with strong predicted binding affinity  and similar \ndocking poses can be reliably identified using this generate-then-retrieve approach, and that the quality \nof the discovered candidates represent s at least a  25x improvement in screening efficiency over  \nstandard VLS. These results are consistent across three state-of-the-art generative SBDD models with \ndiverse architectures: DrugHIVE11 (hierarchical VAE), Pocket2Mol12 (autoregressive), and DiffSBDD13 \n(diffusion), exhibiting the broad applicability of this approach. In summary, this work demonstrates that: \n1) existing generative SBDD models reliably identify high potential chemical subspaces and thus narrow \nthe search for high quality candidates  and 2) though not always directly synthesizable, generated \ncompounds tend to have synthesizable analogs identifiable via similarity search . The synthesizability \nissue therefore does not represent a significant obstacle to their effective application to drug discovery. \nImproving the capacity of these models to generate high quality target -specific binders, regardless of \nsynthesizability, could further increase the effectiveness of MGVS over traditional screening methods. \nMETHODS AND DATA \nAn overview of our MGVS pipeline for de novo molecular generation followed by synthesizable analog \nsearch is shown in Figure 1. Compounds are generated using three top SBDD models with a range of \narchitectures: DrugHIVE11, a density-based hierarchical variational autoencoder model; Pocket2Mol12, \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n4 \na graph-based equivariant autoregressive model; and DiffSBDD13, a graph-based equivariant diffusion \nmodel. For each model and target, the following five step MGVS process was used: 1) The target pocket \ninformation is provided to the generative SBDD model and used to generate 1000 compound s using \ndefault settings. 2) Compounds are then docked to the target pocket and scored using QuickVina220. 3) \nCompounds are filtered to remove those with PAINS21 patterns, poor drug-like properties or irregular \nstructures. The top-10 scoring compounds that remain are selected as queries for analog search. 4) For \neach of the top-10 generated query compounds, hierarchical graph edit distance (GED) similarity search \nis performed using SmallWorld22 to identify the 100 most similar compounds from existing ultra -large \nlibraries: Enamine REAL, WuXi GalaXi, and ZINC. 5) These compounds are docked to target pocket \nand scored for predicted binding affinity  using QuickVina2 20. The best compounds, representing \nsynthesizable analogs, are identified for each generated query based on docking score. \nCompound Filtering \nTo avoid inaccurate docking results, we  filter out molecules with irregular configurations that lead to \nstrained geometries similar to ref11. We remove molecules with consecutive double bonds, fused ring \nsystems of more than four members or that form loops, and rings other than five - and six-membered \nrings. For consistency we apply this filtering to all compounds sets. \nFigure 1: Overview of de novo generation and synthesizable analog search pipeline. 1) Target pocket \ninformation is input into the generative SBDD model and used to generate 1000 compounds.  2) Generated \ncompounds are docked to the target pocket and scored for pred icted binding affinity. 3) Compounds are \nfiltered based on chemical structure and properties, and top-10 scoring compounds are selected for analog \nsearch. 4) For each generated query compound, efficient hierarchical  graph edit distance (GED) similarity \nsearch is performed to identify the 100 most similar compounds from existing ultra -large libraries Enamine \nREAL, ZINC, WuXi GalaXi. 5) Top-100 most similar search-hits per query are docked to target pocket and \nscored for predicted binding affinity. The best synthesizable analogs for each generated query are selected \nbased on docking. \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n5 \nSynthesizable Analog Search \nWe perform GED search using SmallWorld22, which is the default tool used for similarity search in the \nZINC database3. SmallWorld uses pre-indexing of each library’s topological space  as anonymous \ngraphs which can then be used for rapid (sublinear) search of compounds with matching or  similar \ntopology. With the topological relationship between two compounds known, the GED calculation is \ngreatly simplified. SmallWorld also calculates the Daylight fingerprint and ECFP4 fingerprint Tanimoto \ndistances. \nConformer Generation \nStarting from the SMILES string representation  of a molecule, we generate five conformers (3D \ncoordinates) using the ETKDG algorithm in RDKit23. To reduce docking of redundant structures, w e \nperform Butina clustering using RMSD as the distance metric on the five conformers, keeping only the \ncentroid for each cluster. We then dock each remaining conformer to the corresponding target and keep \nthe best scoring conformer as a representative. \nLigand Docking \nDocking compounds to target pockets is performed with QuickVina220 (with default settings and a 20 Å \nwide bounding box) , a computationally efficient version of AutoDock Vina 24. Protein structures are \nprepared for docking using AutoDockTools25. Prior to docking, we perform force field optimization on all \nmolecular structures using the MMFF94 force field 26 in RDKit23. This step is especially important with \nde-novo generated structures, which are likely to have unrelaxed conformations. Using unrelaxed \nstructures before docking could lead to significantly exaggerated docking scores.  \nEvaluation \nFor evaluation, we chose 30 protein targets from  the PDBbind general set. Our selection process \nincludes: 1) clustering all protein sequences by 30% sequence identity , 2)  randomly sampl ing 30 \nclusters and 3) choosing one random target per cluster. We generate 1000 molecules for each of the \nreceptors in the test set using each of the three SBDD models. As a reference set for random virtual \nscreening, we randomly sampled 50k compounds from the ZINC20 drug-like subset. We perform force \nfield optimization on each of the molecules using MMFF94 in RDKit 23 before virtually docking to the \ncorresponding receptor. However, Vina score is strongly correlated with molecule size11 which prevents \ndirect comparison between sets of compounds with different size distributions. For this reason, we \nprimarily report Vina efficiency, or Vina score divided by number of heavy atoms, as it corrects for this \nsize bias. \n \nTable 1: Protein-ligand interaction cutoff  criteria. For each interaction type, we list the PLIP27 non-default \ncutoff criteria that we use  for identifying intermolecular interactions. All table values are distance cutof fs, \nexcept for hydrogen bonds. For hydrogen bonds, we choose a stricter distance and D -H-A angle cutoff. In \nthe case of π interactions, the table values refer to center–center or center–cation distance. \nH-bond Hydrophobic Salt bridge π-stack \n(parallel) \nπ-stack \n(perpendicular) π-cation \n3.5 Å, 120° 3.8 Å 4.0 Å 4.0 Å 5.5 Å 4.0 Å \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n6 \nEvaluation Properties and Metrics \nBinding affinity is estimated for each ligand with the Vina docking score calculated with QuickVina220, a \nspeed optimized version of AutoDock Vina24. We define ΔVina as the difference between the Vina score \nof a generated ligand and a reference ligand (e.g., query ligand or ligand from crystal structure). Vina \nefficiency score (or ligand efficiency) is calculated as the Vina score divided by the number of heavy \natoms. Graph Edit Distance (GED) is the number of edits needed to be made to the chemical structure \nof a compound to make it identical to a reference compound, and in this work is calculated using \nSmallWorld22. Daylight distance is calculated as one minus the Tanimoto distance between the Daylight \nfingerprints of a pair of molecules. ECFP4 distance is calculated as one minus the Tanimoto distance \nbetween the ECFP4 fingerprints of a pair of molecules.  The Quantitative Estimate of Drug -Likeness \n(QED) score estimates the drug -likeness of a molecule by combining a set of molecular properties 28. \nThe Synthetic Accessibility (SA) score estimates the ease of synthesis of a molecule29. \nRESULTS \nGenerated compounds tend to have synthesizable analogs \nWe start by generating approximately one thousand molecules for each of our test target pockets using \nthree SBDD generative models: DrugHIVE 11, DiffSBDD 13, and Pocket2Mol 12. We then dock each \ngenerated molecule into the corresponding target pocket using QuickVina2 20. To select our final \ngenerated set, we filter out molecules with PAINS patterns, unusual structures  that could  cause \ninaccurate docking results  (see Compound Filtering) , or properties outside of the typical drug -like \nrange30. We then rank each compound by normalized Vina score and take the 10 best compounds for \neach model and target. The properties of all generated molecules compared to the final set can be seen \nin Figure S1. \n  \nFor each compound in the final generated  query set, we perform a similarity search  over existing \ncommercially available and readily synthesizable compound libraries . For each  generated query \ncompound, we use the SmallWorld22 API to search for up to 1000 closest compounds by GED within a \nmaximum GED of 12, for each of the Zinc3, Enamine REAL2, and Wuxi GalaXi3 libraries. We then rank \nthe returned search compounds by GED and then Daylight distance, selecting the top-100 for our final \nsearch-hit set. We then generate 3D conformers and dock  each final search-hit compound to the \ncorresponding protein pocket. \n \nThe resulting top search-hit compounds are highly synthesizable and  even tend to have  improved \npredicted binding scores  compared to the corresponding generated query compounds. As expected, \nwe find a drastic improvement in  predicted synthesizability of the top search-hit compounds when \ncompared to the generated query compounds (Figure 2a). We also find an average improvement  in \npredicted binding affinity compared to the query compounds (Figure 2b), with virtually all  (98.7%) \nsearch-hit compounds within the Vina estimated margin of error of ±1.5 kcal/mol (Figure 2c). We also \nsee that virtually all of the top search-hit compounds have equivalent or better predicted binding affinity \ncompared to the corresponding PDB co-crystal ligand  (Figure 2d), with a large number  (38.8%) \nachieving significantly better scores  (ΔVina < –1.5 kcal/mol). For all three models tested, we see a  \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n7 \nsimilar distribution shift toward improved predicted binding efficiency in the search-hit compounds, with \nan average improvement of the median Vina efficiency (Figure 2e).  \n \nIn Figure 2f, we show a positive Spearman correlation between GED and worse binding affinity scores \n(ρ=0.44). We observe similar , though weaker, correlations for ECFP4 distance (ρ=0.31) and Daylight \ndistance (ρ=0.14). Since lower similarity search-hits from these databases tend to have worse binding \nscores than the query, this suggests  that the  generated query compounds indeed represent good \ntemplates for a successful predicted binder.  Further, GED appears to be a better predict or of binding \nscore than the two fingerprint-based similarity measures . This is likely due to GED being a better \nmeasure of topological and structural similarity, which are key determinants of shape complementarity \nand interaction geometry within the binding pocket. \nFigure 2: Generated compounds vs. search-hit compounds. (a) Histogram showing distribution of Synthetic \nAccessibility (SA) score for generated query and search-hit compounds. (b) Histogram of ΔVina* scores for \ntop-1 search-hit compounds with mean (black dashed line)  and Vina uncertainty (shaded). (c) Vina scores \nfor top-1 search-hit compounds versus the corresponding query compound for each target (colors) with Vina \nuncertainty (shaded). (d) Vina scores for top-1 search-hit compounds versus the corresponding crystal ligand \nfor each target (colors) with Vina uncertainty (shaded). (e) Distributions of Vina efficiency for queries (dark) \nand top-1 search-hits (light) for each model (colors). (f) Distribution of ΔVina scorses for all (top-100) search-\nhits with respect to graph edit distance (GED), Daylight distance, and ECFP4 distance. Linear regression fit \n(grey line) shown along with computed Spearman (ρ) correlation coefficient.  \n*ΔVina = Vinasearch-hit – Vinaquery \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n8 \nGenerative SBDD paired with analog search outperforms random screening \nTo be practically useful, a generative approach combined with similarity search needs to outperform \nrandom virtual screening. To assess whether this is the case, we compare our generative approach \nresults with screening of random compounds from the ZINC 31 database. We randomly sampled 50k \ncompounds from the Zinc drug-like compound subset and docked all compounds into each of our test \npockets. In Figure 3a-b, we compare the Vina efficiency of the top-10 compounds from the generated \nqueries, search-hits, and different size random Zinc subsets (ranging from 1k to 50k ). The query \ncompound distributions, which were selected from 1k generated compounds for each target , show \nsignificantly better Vina efficiency (Welch’s t -test p<0.05) than screening  10k (Pocket2Mol), 30k \n(DrugHIVE), and 50k (DiffSBDD) Zinc compounds. The search-hit compound distributions for all models \nshow significant improvement (Welch’s t -test p<0.05) compared to the ir respective query compound \ndistributions. Across all models, the top-10 similarity search-hits score better than those from screening \nup to  50k random Zinc compounds, despite only scree ning 2k compounds (including queries) per \ntarget—representing a 25x improvement in screening efficiency. \n \nThese results represent selection from a docked candidate pool of the 100 most similar search-hits \nper query compound, however, docking and ranking fewer search-hits could lower the computational  \ncost while still achieving favorable results. To assess this, we re-ran selection using different size  \nsearch-hit candidate pools (docked search-hits) ranging between 1 and 100. In Figure 3c, we show the \nresulting ΔVina averaged across all models and targets ranges from -0.1 to 0.2, with parity (ΔVina=0) \nat approximately 40 search-hits docked per query. Surprisingly, even when using only a single search-\nhit, the upper limit of the ΔVina 95% confidence interval remains at a modest +0.3 kcal/mol—well within \nFigure 3: Comparison with random Zinc compounds. (a) Median Vina efficiency of the top-10 compounds for \ndifferent size subsets of the random Zinc set relative to query (dashed) and search (solid) hit compounds for \neach model (color). Calculations for Zinc subsets are repeated a total of n=20 times, with median average \nand std. dev. shown.  (b) Boxplots show the distributions of top -10 Vina efficiency for generated query \ncompounds, search-hits, and Zinc (random) subsets of varying size. For each query or search-hit distribution, \nthe largest Zinc subset distribution with significant Welch’s t-test result is indicated. Significance denoted as: \np < 0.05 ( ∗), p < 0.01 ( ∗∗), p < 0.001 (∗∗∗), p < 0.0001 ( ∗∗∗∗). (c) Average ΔVina between top-10 scoring \nsearch-hit compound and query compound vs. number of total ranked search-hits that were docked to the \ntarget pocket, with 95% confidence interval (shaded). \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n9 \nthe estimated ±1.5 kcal/mol Vina uncertainty24. This demonstrates that synthesizable analogs  with \ncomparable docking scores can be reliably identified for generated compounds even with relatively few \nsearch-hits screened per query. \nShared intermolecular interactions between predicted binding poses \nOur results so far have shown that using similarity search to identify synthesizable analogs to de novo \ngenerated compounds reliably results in highly similar compounds with strong predicted binding affinity. \nHowever, this does not guarantee that the search-hit compounds have a similar predicted binding pose \nwithin the target pocket. To assess this, we compared the docked intermolecular interactions made by \nthe top predicted binding pose of the search-hit compounds to those made by corresponding generated \nquery compound. Interactions were identified using the Protein-Ligand Interaction Profiler (PLIP)27 on \nthe top ranking docked pose from QuickVina220 for each compound. We consider matching interactions \nbetween query compound and search-hit compound at two levels of specificity: 1) same target residue \nand 2) same specific target atom. In Table 2, we show a summary of the proportion of search-hits having \nshared interactions with the generated query compound for each interaction type (H-bond, hydrophobic, \nsalt bridge, π -cation, and π-stacking) across all models and targets . We observe that  a majority of \nsearch-hits share at least one interaction with the generated query across all interaction types. We also \nsee that search-hits are less likely to share the more frequent H-bond and hydrophobic interactions than \nthe less frequent ones, except for π-stacking which is both frequent and highly conserved. \nTable 2: Shared intermolecular interactions by interaction type. Proportion of search-hits having shared \ninteractions with the generated query compound across all models and targets . Statistics are reported for \nsearch-hits that share all or at least 1 of the query interactions, as well as for different positive match types: \nmatching interactions share the same 1) target atom or 2) target residue. For each interaction type the total \nnumber (n) of search-hit compounds with a query compound predicted to make the corresponding interaction \nwith the target is reported. \n# \nmatched \nMatch \ntype \nH-bond \n(n=36404) \nHydrophobic \n(n=62582) \nSalt bridge \n(n=1332) \nπ-cation \n(n=474) \nπ-stacking \n(n=29830) \nall \natom 0.17 0.12 0.58 0.79 0.57 \nresidue 0.27 0.30 0.58 0.79 0.83 \nat least 1 \natom 0.65 0.75 0.73 0.79 0.90 \nresidue 0.69 0.86 0.73 0.79 0.92 \n \nIn Figure 4, we show the proportion of docked search-hit compounds that share at least one or all \nspecific query compound interactions for each generative model. Here we choose to focus on specific \nnon-hydrophobic interactions  (Hydrogen bonds, π interactions, and salt bridges),  because we are \nprimarily interested in assessing how often search-hits exhibit the same type of target specificity as the \nquery. The same statistics for all interactions  can be found  in Figure S2. In Figure S3, we show the \ndistribution of intermolecular interactions made by all generated query compounds. The average \nnumber of intermolecular interactions made by query compounds is 2.5 (specific) and 5.2 (all). Across \nall query compounds, models and targets, an average of 19.5% (atom match) to 31.4% (residue match) \nof search-hits (top-100) share all specific interactions with the query and 76.7% (atom match) to 79.5% \n(residue match) share at least one (Figure 2a). For top-1 search-hits (Figure 2b), we see that a majority \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n10 \nof queries have at least one search-hit sharing all specific interactions (50.9% and 80.7% for atom and \nresidue match, respectively) and nearly all queries have at least one search-hit that shares an interaction \n(99.2% for both atom and residue match) . In Figure 4c, we show that there is some variation  in the \nproportion of queries with search-hit compounds matching all interactions across the different protein \ntargets. However, even for the lowest performing target (PDB ID 5G1P) 10.5% (atom match) to 20.0% \n(residue match) of query compounds have an exact interaction analog.  For all targets , nearly every \nquery compound has a search-hit with at least one interaction. \n \nTo give a better sense of what the docked poses of interaction analogs look like, in Figure 4d we show \na selection of docked poses with highlighted intermolecular interactions for query and search-hit pairs \nwith all interactions  shared (atom match) for four of the targets. We report the Vina scores of each \nFigure 4: Shared intermolecular interactions  between generated queries and search-hits. (a-c) Bar plots \nshowing proportion of search-hits that share all or at least 1  of query specific protein–ligand interactions. \nProportions are shown for both  exact residue match (residue) and exact atom match (atom) criteria.                  \n(a) Proportion of top -100 search-hits for each query with matching  specific interactions for each model.          \n(b) Proportion of top -1 search-hits for each query with matching  specific interactions for each model. (c) \nProportion of top-1 search-hits for each query with matching specific interactions by protein target across all \nmodels. (d) Docked poses  of search-hit (yellow carbon)  and query (cyan carbon) compound pairs with \nprotein-ligand interactions shown. Vina docking score, number of shared interactions, search-hit database, \nand chemical distance metrics (GED, ECFP4 distance, Daylight distance) are shown for each pair. \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n11 \ncompound, the number of shared interactions, database of the search-hit, and the molecular distance \nmetrics. We observe that search-hits with many shared query contacts tend to also have similar \npredicted binding poses and a relatively low GED. Interestingly, fingerprint-based distance metrics do \nnot always agree with GED (or with each other) . For example,  the pair of compounds displayed for \ntarget PDB ID 3JUZ have  clear structural similarities and  a relatively low GED (3) but high Daylight \ndistance and ECFP4 distance (0.74 for both). For the pair of compounds displayed for target PDB ID \n6I18, GED (3) and Daylight distance (0.29) are low while ECFP4 distance (0.67) is high. To quantify \nthis, we calculated the Spearman correlation between each of the molecular distance metrics and the \nnumber of shared interactions (atom match). We found that GED has the highest correlation, especially \nas the number of protein–ligand interactions made by the query compound increases (Figure S4). These \nresults are consistent with the earlier findings (Figure 2f) that GED is a better predictor of binding affinity \nscore than the fingerprint-based similarity metrics.  \nDISCUSSION \nIn this work, we have shown that current generative structure-based drug design (SBDD) models paired \nwith similarity-based search can be used to discover synthesizable compounds with high predicted \nbinding affinity for a wide range of protein targets. Generative models tend to produce compounds with \nlow synthetic accessibility scores14–16, which is a key limitation to practical drug discovery4. To address \nthis, we show that model-guided virtual screening (MGVS)—using generated compounds to guide the \nsearch for synthesizable analogs —effectively overcomes this limitation. We apply MGVS across 30 \ndiverse protein targets and observe that for three separate SBDD models (DiffSBDD13, DrugHIVE11, and \nPocket2Mol12) that screening only 2k compounds leads to better quality candidates than standard VLS \nscreening of 50k ZINC compounds, representing a 25x improvement in efficiency. \n \nAlthough current generative models tend to produce compounds with low synthesizability , they are \neffective at identifying compounds located in promising regions of chemical space. Once these regions \nhave been identified, similarity search can be used to find adjacent synthesizable analogs. Across all \nmodels and targets, we find that virtually all (98.7%) generated compounds selected for analog search \nyielded a search-hit with equivalent or better docking scores. We also found that search-hits commonly \nhave similar docking poses to the query compound, indicating that the compounds produced by \ngenerative models help identify promising potential binding modes. We observe that ~50% of query \ncompounds yielded a search-hit with a docking pose sharing all specific (non-hydrophobic) residue-level \nprotein-ligand interactions, and virtually all (~99%) yielded a search-hit sharing at least one interaction \n(Figure 4b). Assuming that the number of shared intermolecular interactions is an indicator of docking \npose similarity (Figure 4d), this demonstrates consistent success in identifying predicted binding \nanalogs. \n \nAltogether these results demonstrate that current generative models, despite their limitations, can \nalready be used to improve virtual screening efficiency through MGVS. We show that identifying \ncompounds with strong predicted affinity for a protein target, regardless of synthesizability, is useful for \nguiding the screening of high -potential subsets of existing ultra -large chemical databases. To be \nsuccessful, this approach relies on search methods efficient enough to operate on these large and \ngrowing chemical spaces1, which could be viewed as a limitation. However, improving chemical search \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n12 \nmethods is an active area of researc h32,33 that should enable efficient querying of expanded chemical \ndatabases. As chemical databases grow and search methods improve, we would expect MGVS to \nbecome even more effective due to improved retrieval. On the contrary, the expansion of chemical \nspaces poses a real challenge to existing VLS methods that rely on exhaustive screening9. Even for the \nmost efficient synthon -based VLS approaches used to screen current ultra -large libraries, scaling to \nchemical spaces orders of magnitude larger will require innovations that surpass what current methods \ncan support. \n \nThere are efforts to overcome the generative SBDD synthesizability issue by designing models \nrestricted to synthesizable subspaces17–19. However, we observe that even for the current (unrestricted) \nmodels that we tested, while they are able to  generate high quality binding candidates, the set of \nsynthesizable analogs identified based on these initial queries often contain compounds with even better \npredicted binding. This suggests that current models cannot consistently generate the most optimal \ncompounds within these localized regions of chemical space and restricting them further could lead to \neven worse performance. Efforts to improve the ability of models to generate ideal binders, regardless \nof synthesizability, seem to be warranted. Applyi ng current models in MGVS already achieves a ~25x \nimprovement in screening efficiency over standard virtual screening, so further improvement to models \nthat can be utilized in this paradigm could have significant impact on early drug discovery. \nASSOCIATED CONTENT \nSupporting Information \nProperty distributions of generated compounds; shared intermolecular interactions between search \nqueries and hits for all interaction types (including hydrophobic); distributions of number of protein-ligand \ninteractions for docked poses across all generated query compounds; correlation of molecular similarity \nmetrics with number of shared interactions between query and search-hit compounds (PDF) \nAUTHOR INFORMATION \nCorresponding Authors \nRemo Rohs - Department of Quantitative and Computational Biology, Department of Chemistry, \nDepartment of Physics & Astronomy, Thomas Lord Department of Computer Science, Division of \nMedical Oncology in the Department of Medicine, Alfred E. Mann Department of Biomedical \nEngineering, University of Southern California, Los Angeles, CA 90089, USA ; https://orcid.org/0000-\n0003-1752-1884; Email: rohs@usc.edu  \nAuthors \nJesse A. Weller - Department of Quantitative and Computational Biology, Department of Physics & \nAstronomy, University of Southern California, Los Angeles, CA 90089, USA ; https://orcid.org/0000-\n0001-7184-4241;  \nJinsen Li - Department of Quantitative and Computational Biology, University of Southern California, \nLos Angeles, CA 90089, USA; https://orcid.org/0000-0002-1015-5263;  \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n13 \nYibei Jiang - Department of Quantitative and Computational Biology, University of Southern California, \nLos Angeles, CA 90089, USA; https://orcid.org/0009-0002-9785-3343. \n \nAuthor Contributions \nProject conception (J.A.W., R.R.), lead project design (J.A.W.), supporting project design (J.L., Y.J., \nR.R.), data generation and analysis (J.A.W.), manuscript writing (J.A.W.), manuscript edits (J.A.W., \nJ.L., Y.J., R.R.), project supervision and funding (R.R.). \n \nFunding Sources \nThis work was supported by the National Institutes of Health [grant R35GM130376 to R.R.] and a \nUniversity of Southern California Office of Research and Innovation SBIR/STTR Planning Award [to \nR.R.]. \nACKNOWLEDGEMENT \nThe authors acknowledge helpful discussions with other members of the Rohs lab. \nABBREVIATIONS \nALogP, octanol-water partition coefficient (Crippen); HBA, hydrogen-bond acceptors; HBD, hydrogen-\nbond donors; HTS, high -throughput screening; MGVS, model-guided virtual screening;  PAINS, Pan-\nAssay Interference Compounds; PDB, Protein Data Bank; QED, Quantitative Estimate of Drug-likeness; \nRMSD, root mean square distance; SA, Synthetic Accessibility; SAR, structure -activity relationship; \nSBDD, structure-based drug design;  SMILES, Simplified Molecular Input Line Entry System ; TPSA, \ntopological polar surface area; VAE, variational autoencoder; VLS, virtual ligand screening; \nDATA AND SOFTWARE AVAILABILITY \nData used in evaluation including molecules, molecular properties, and docking scores, are available in \nsupplementary information. \nREFERENCES  \n1. Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature \n616, 673–685 (2023). \n2. Grygorenko, O. O. et al. Generating Multibillion Chemical Space of Readily Accessible Screening \nCompounds. iScience 23, 101681 (2020). \n3. Tingle, B. I. et al. ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand \nDiscovery. J. Chem. Inf. Model. 63, 1166–1176 (2023). \n4. Lam, J. H. & Katritch, V. Navigating structure-based drug discovery with emerging innovations in \nphysics- and knowledge-based approaches. npj Drug Discov. 2, 29 (2025). \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n14 \n5. Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: \nA molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996). \n6. Neumann, A., Marrison, L. & Klein, R. Relevance of the Trillion -Sized Chemical Space “eXplore” \nas a Source for Drug Discovery. ACS Med. Chem. Lett. 14, 466–472 (2023). \n7. Sadybekov, A. A. et al.  Synthon-based ligand discovery in virtual libraries of over 11 billion \ncompounds. Nature 1–8 (2021) doi:10.1038/s41586-021-04220-9. \n8. Beroza, P. et al. Chemical space docking enables large-scale structure-based virtual screening to \ndiscover ROCK1 kinase inhibitors. Nat. Commun. 13, 6447 (2022). \n9. Klarich, K., Goldman, B., Kramer, T., Riley, P. & Walters, W. P. Thompson Sampling─An Efficient \nMethod for Searching Ultralarge Synthesis on Demand Databases. J. Chem. Inf. Model. 64, 1158–\n1171 (2024). \n10. Cavasotto, C. N. & Di Filippo, J. I. The Impact of Supervised Learning Methods in Ultralarge High-\nThroughput Docking. J. Chem. Inf. Model. 63, 2267–2280 (2023). \n11. Weller, J. A. & Rohs, R. Structure-Based Drug Design with a Deep Hierarchical Generative Model. \nJ. Chem. Inf. Model. 64, 6450–6463 (2024). \n12. Peng, X. et al. Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. in ICML \n2022 17644–17655 (PMLR, 2022). \n13. Schneuing, A. et al. Structure-based Drug Design with Equivariant Diffusion Models. Preprint at \nhttps://doi.org/10.48550/arXiv.2210.13695 (2023). \n14. Gao, W. & Coley, C. W. The Synthesizability of Molecules Proposed by Generative Models. J. \nChem. Inf. Model. 60, 5714–5723 (2020). \n15. Gao, W., Luo, S. & Coley, C. W. Generative Artificial Intelligence for Navigating Synthesizable \nChemical Space. Preprint at https://doi.org/10.48550/arXiv.2410.03494 (2024). \n16. Buttenschoen, M., M. Morris, G. & M. Deane, C. PoseBusters: AI-based docking methods fail to \ngenerate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 \n(2024). \n17. Cretu, M. et al. SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints. \nPreprint at https://doi.org/10.48550/arXiv.2405.01155 (2025). \n18. Koziarski, M. et al.  RGFN: Synthesizable Molecular Generation Using GFlowNets. Preprint at \nhttps://doi.org/10.48550/arXiv.2406.08506 (2024). \n19. Seo, S. et al.  Generative Flows on Synthetic Pathway for Drug Design. Preprint at \nhttps://doi.org/10.48550/arXiv.2410.04542 (2025). \n20. Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C. -K. Fast, accurate, and reliable molecular \ndocking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015). \n21. Baell, J. B. & Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference \nCompounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. \nChem. 53, 2719–2740 (2010). \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint \n\n     \n \n \n \n \n15 \n22. SmallWorld. NextMove Software.  https://www.nextmovesoftware.com/smallworld.html (2025). \nThere is no corresponding record for this reference. \n23. RDKit: Open-source cheminformatics. https://www.rdkit.org. \n24. Trott, O. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new \nscoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010). \n25. Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor \nflexibility. J. Comput. Chem. 30, 2785–2791 (2009). \n26. Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and \nperformance of MMFF94. J. Comput. Chem. 17, 490–519 (1996). \n27. Adasme, M. F. et al. PLIP 2021: expanding the scope of the protein –ligand interaction profiler to \nDNA and RNA. NAR 49, W530–W534 (2021). \n28. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical \nbeauty of drugs. Nat. Chem. 4, 90–98 (2012). \n29. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug -like molecules \nbased on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009). \n30. Lipinski, C. A. Lead - and drug-like compounds: the rule -of-five revolution. Drug Discov. Today \nTechnol. 1, 337–341 (2004). \n31. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. Journal of Chemical \nInformation and Modeling http://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00675. \n32. Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of Ultralarge Compound \nCollections for Drug Discovery. J. Chem. Inf. Model. 62, 2021–2034 (2022). \n33. Bellmann, L., Penner, P., Gastreich, M. & Rarey, M. Comparison of Combinatorial Fragment \nSpaces and Its Application to Ultralarge Make -on-Demand Compound Catalogs. J. Chem. Inf. \nModel. 62, 553–566 (2022). \n \nremix, or adapt this material for any purpose without crediting the original authors. \npreprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, \nThe copyright holder has placed thisthis version posted February 19, 2026. ; https://doi.org/10.64898/2026.02.18.706722doi: bioRxiv preprint","source_license":"Public-Domain","license_restricted":false}