Challenge #6

Hit Identification
Method type (check all that applies)
De novo design
Deep learning
High-throughput docking
Machine learning
Physics-based
Description of your approach (min 200 and max 800 words)

We will employ a structure-guided drug discovery approach based on a unique molecular generative model  recently developed by us. This generative model performs de novo design of ligands targeting the 3D structure of an input protein binding pocket. The method is fine-tuned to produce Enamine REAL Space molecules. Compared to traditional virtual screening, this approach is fast, enabling us to sample from the entire enamine library space (30 billion compounds) in a structure-guided way. With this increased efficiency, we can generate ligands using multiple receptor conformations (rather than using a single docking structure as typically used in virtual screening), which increases potential diversity of hits.

Our approach consists of three steps: (1) modeling of multiple protein conformations (2) using our generative model to sample Examine ligands for each structure (3) scoring and ranking of generated ligands utilizing interactions of known actives.

We will generate up to 15 structural models of the TUDOR domain of SETDB1. We will start with the several solved structures from the PDB. We will conduct restrained molecular dynamics (MD) simulations (Amber) to further expand our set of conformational models. The conformations obtained from MD will be clustered using the PENSA (Python Ensemble Analysis) library to yield approximately 10 unique protein conformations, in addition to the solved PDB structures.

Next, we will employ our generative model to generate 10,000 enamine ligands for each input protein structure (100,000 molecules generated total). Our model utilizes geometric deep learning to directly position and join chemical fragments in the binding site. The model takes as input a protein pocket - we will define several variations of the binding pocket using known molecules in the PDB structures. We will ensure a variety of ligand sizes are sampled. We will remove any molecules that share a scaffold with known active ligands.

Finally, we will use a custom scoring approach to rank our generated ligands. We will use a scoring function that rewards similar interactions (e.g. hydrogen bonds / salt-bridges) to known ligands. We observed that several of the known ligands to this domain of SETDB1 form salt-bridges that mimic the lysines in the natural substrate; our scoring function will therefore factor these interactions. This interaction similarity score is combined with the standard docking score (Schrodinger’s Glide) to produce a final ranking of the compounds. We will cluster the top 2000 molecules by chemical similarity (3D ECFP fingerprint) and select the top 100 representative ligands from these clusters.

What makes your approach stand out from the community? (<100 words)

Our approach distinguishes itself through the incorporation of a newly developed molecular generative AI model, facilitating swift exploration of the expansive Enamine library space. By leveraging structure-guided techniques, we can produce unique scaffolds distinct from existing ligands. Moreover, the efficiency of our approach enables us to explore multiple receptor conformations, increasing diversity of hits. Additionally, our distinctive scoring approach integrates information from other known actives to increase hit rates.

Method Name
SAGE
Commercial software packages used

Schrodinger, Glide, Maestro 

Free software packages used

Amber (MD simulations), MDAnalysis, PENSA, e3fp, PyTorch, rdkit

Relevant publications of previous uses by your group of this software/method

Paggi, Joseph M., et al. "Leveraging nonstructural data to predict structures and affinities of protein–ligand complexes." Proceedings of the National Academy of Sciences 118.51 (2021): e2112621118.

Vögele, Martin, et al. "Systematic analysis of biomolecular conformational ensembles with PENSA." arXiv preprint arXiv:2212.02714 (2022).