Challenge #6 – COMPUTATIONAL METHODS

Here is a list of all computational methods used for hit identification in CACHE Challenge #6. Click on the Description for more details. Some participants preferred not to release their publications to stay anonymous at this time.

Description

Method name

Commercial software

Free software

We will use our joint/combined expertise in cheminformatics, molecular dynamics (MD), structure-based drug design (SBDD), pharmacophore modeling, and medicinal chemistry to generate hits for multiple subcavities of the histone binding groove of the SETDB1 triple Tudor domain (TTD). Read more...

FORECASTER (proprietary software); ATOMFORGE (proprietary software)

RDKit, OpenBabel, DataWarrior, BIOVIA Discovery Studio

We will employ a structure-guided drug discovery approach based on a unique molecular generative model recently developed by us. This generative model performs de novo design of ligands targeting the 3D structure of an input protein binding pocket. The method is fine-tuned to produce Enamine REAL Space molecules.

SAGE

Schrodinger, Glide, Maestro

Amber (MD simulations), MDAnalysis, PENSA, e3fp, PyTorch, rdkit

The design of molecules targeting the histone binding groove of the SETDB1 triple Tudor domain (TTD) presents a unique opportunity to discover novel therapeutics. We propose two complementary strategies: active learning and pharmacophore modeling, to identify and optimize potential inhibitors for SETDB1 TTD.

Active learning, and pharmacophore driven molecular design

1. Molpal: Graff DE, Shakhnovich EI, Coley CW. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem Sci. 2021 Apr 29;12(22):7866-7881. doi: 10.1039/d0sc06805e. PMID: 34168840; PMCID: PMC8188596. https://github.com/coleygroup/molpal/tree/main 2. Pharmit: Sunseri J, Koes DR. Pharmit: interactive exploration of chemical space. Nucleic Acids Res. 2016 Jul 8;44(W1):W442-8. doi: 10.1093/nar/gkw287. Epub 2016 Apr 19. PMID: 27095195; PMCID: PMC4987880. https://github.com/dkoes/pharmit 3. SMINA/AUTODOCK: Koes DR, Baumgartner MP, Camacho CJ. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model. 2013 Aug 26;53(8):1893-904. doi: 10.1021/ci300604z. Epub 2013 Feb 12. PMID: 23379370; PMCID: PMC3726561. https://sourceforge.net/projects/smina/

A computer-implemented method for screening ligand candidates for a target protein. This is done through an in-house developed, integrated ensemble machine learning (ML) model for predicting binding affinity with very high speed and precision.

i-TripleD

none

F-Pocket, D-Pocket, RDKit

The proposed approach is Reaction-GFlowNet (RGFN), a recently developed generative small molecule design algorithm. It is an extension of the GFlowNet framework that operates directly in the space of chemical reactions, allowing for out-of-the-box synthesizability while maintaining the quality of generated candidates.

RGFN

None

PyTorch, Python, RDKIT, Vina-GPU

Our approach combines expertise of Kozakov Lab at Stony Brook and Tropsha Lab at UNC. Our workflow uses several complimentary modules for identification of high affinity hits for a given protein target with a known 3D structure. Identification of the binding site hot-spot information together with conventional structure-based virtual screening methods enhanvced by generative modeling are key enabling components of our hit selection approach.

Frag2Hits

FTMap server (https://ftmap.bu.edu/), RDKit; HIDDEN GEM (https://github.com/molecularmodelinglab/HIDDEN-GEM)

Our approach is a combination of active-learning techniques and a state-of-the-art physics-based virtual screening method to screen ultra-large chemical compound libraries for hit discovery. Concretely, we will use the Virtual Screening Express (VSX) mode in RosettaVS and the OpenVS platform. The aim is to screen either the Enamine REAL library (~4 billion compounds) or the ZINC22 library (~4 billion compounds) against multiple conformations of the target structure.

RosettaVS

Rosetta software suite (free for academic and non-commercial purposes), OpenVS, CSD, RDKit, Openbabel, dimorphite_dl

In summary, we employ a contrastive virtual screening model to sift through extensive chemical libraries and identify the top 1% of molecules. These high-ranking molecules are then grouped using molecular fingerprints such as ECFP4 or MACCS. Subsequently, the clustered molecules, typically around 200, are docked to the target pocket, and all docking poses are assessed based on docking scores, RMSD, and expert evaluation.

DrugCLIP+

Schrodinger suite and CCDC-GOLD

Python, rdkit, obabel, autodock-vina, biopython, pytorch and unicore.

We will use structure-based ultra-large virtual screenings using VirtualFlow 2.0 [Gorgulla 2023]. The procedure will consist of four steps.

VirtualFlow 2

Maestro (protein preparation)

VirtualFlow 2, AutoDock Vina, QuickVina, GWOVina

BIOPTIC is a target-agnostic, potency-based molecule search model for finding structurally dissimilar molecules with similar biological activities. We used best practices to design a fast retrieval system, based on processor-optimized SIMD instructions, to screen 40B Enamine REAL Space with 100% recall rate.

1. Modeling

BIOPTIC: A TARGET-AGNOSTIC POTENCY-BASED SEARCH ENGINE FOR SMALL MOLECULES

N/A

RDKit REOS RoBERTa

We propose a detailed computational strategy aimed at discovering and optimizing novel ligands that target the histone binding groove of the SETDB1 triple Tudor domain (TTD). By focusing on the aromatic cages and the acetylated lysine (Kac) binding pocket, Our methodology involves a comprehensive exploration of the histone binding groove's sub-cavities to identify ligands with high affinity.

pharmacophore modeling for virtual screening

Schrödinger Suite:

Maestro
Glide:
Desmond:
Phase

Amber

Open Babel RDKit PyMOL NAMD

We developed a structure-based molecular generative model named Topology Molecular Type assignment (TopMT) that generates highly potent molecules while addressing synthetic feasibility, ensuring all generated molecules are achievable through combinatorial parallel synthesis with fragments in the Enamine REAL space. TopMT features two modules: a GAN module and a Matching module.

Topology Molecular Type assignment (TopMT)

Schrodinger Molecular Modelling Suite (Glide, QikProp, LigPrep, and Epik modules)

RDKit, Autodock Vina

Our modeling approach integrates advanced deep learning (DL) techniques with physics-based methods to enhance molecular docking accuracy and efficiency. We leverage the state-of-the-art DiffDock system, which treats molecular docking as a learning problem for predicting ligand poses.

Compass

Nan

DiffDock, PoseCheck, AA-score, Openbabel, RDKit, py3dmol, biopandas, esm2, prolif, datamol, PyTorch, biopandas, e3nn, fair-esm, nvidia-cuda, prody, pybel, pytorch-lightning, torch-geometric

Introduction：Our plan combines structure-based methods by computational biology and empirical knowledge with artificial intelligence (AI) techniques. By leveraging the powerful predictive capabilities of AI algorithms and the computational speed of GPUs, we aim to screen large molecular libraries efficiently. High-precision computational chemistry methods will enhance the hit rate of active molecules.

Multiscale drug screening and design methods

We don't need to use commercial software

Amber, pymol academic version, and Schrödinger academic version will be used in this drug screening

Our strategy for finding hit compounds is based on de novo design of compounds using generative AI we developed (Logiston). We made use of both conventional binding structure prediction models and deep learning-based binding structure generation models including AutoDock vina [1], Diffdock [2], and FABind [3]. However, over-confidence of target-ligand binding of those methods is well-known, which causes compounds unlikely to bind are largely included after screening process.

MIN-T (Molecules Inventing Network for Target binding)

Not applicable

PyTorch, RDkit, OpenBabel, fpocket, P2Rank, Autodock vina, Diffdock, FABind

The available PDB crystal structures will be used as the targets of a high throughput docking protocol using an ensemble docking approach. To effectively screen ENAMINE in the given time frame, a “deep docking” approach will be used where a surrogate model of docking scores is iteratively trained to select compounds for docking.

GNINA FTW

None

GNINA, AMBER (partially free)

Tailored Approach for CACHE Challenge Campaign

Our CACHE challenge campaign integrates several advanced methodologies to create a comprehensive and dynamic drug discovery process. Here’s how we plan to execute this:

Molecular Dynamics (MD) Simulation

EntelliMix (Enamine Intelligent Mix of CADD tools)

Schrodinger Suite 2024-1 (Prime, Glide, Induced Fit Docking, Desmond, Phase, Canvas, QSAR, LigPrep, etc.)

PyMOL

Chemaxon (JChem Engines)

RDKit Library AutoDock R Python Bash (shell) Enamine in-house scripts

To identify novel ligands for the Triple Tudor Domain of SETDB1 we will consider two starting PDB structures: 8UWP (complexed with the MR46747 ligand), and 7CJT (complexed with (R,R)-59 ligand).

Deep Docking

OpenEye for ligand preparation (QUACPAC, OEOMEGA)

Gromacs, Autodock-GPU, gnina, Deep Docking, rdkit

We aim to extend quantum-level accuracy and insight to high throughput scales. To that end, ab-initio and semi-empirical methods will be combined with Machine Learning (ML) approaches generalizing the accuracy of these tools to scale. We have recently demonstrated fully scalable QM-accurate molecular dynamics of proteins in explicit water [1]. In the context of CACHE6 challenge, we aim to extend our previous work to predict ligand-protein binding affinities at QM accuracy.

QCACHE

FHI-aims

AutoDock Vina MGLTools Psi4 Atomic Simulation Environment (ASE) VMD RDkit SO3KRATES MDAnalysis UCSF Chimera DFTB+