Challenge #6

Hit Identification
Method type (check all that applies)
De novo design
Deep learning
High-throughput docking
Machine learning
Description of your approach (min 200 and max 800 words)

The proposed approach is Reaction-GFlowNet (RGFN), a recently developed generative small molecule design algorithm. It is an extension of the GFlowNet framework that operates directly in the space of chemical reactions, allowing for out-of-the-box synthesizability while maintaining the quality of generated candidates. GFlowNets (GFNs) are amortized variational inference algorithms trained to sample from an unnormalized target distribution over compositional objects (in this case, small molecules composed of chemical building blocks). GFlowNets aim to sample objects from a set of terminal states proportionally to a reward function (a measure of an object's quality). GFNs allow the learning of the ‘structure’ of a high-dimensional property space (here, ‘small molecule-SETDB1 interaction space’) and systematic generalization of this space. This results in orders of magnitude speed-up and the ability to search a much larger fraction of chemical space for de novo SETDB1 binders. Unlike canonical Reinforcement Learning (RL) techniques, GFlowNets generate a diverse set of high-scoring candidate solutions.

The proposed approach uses a trainable transformer model to learn a distribution over possible actions (choosing an initial precursor building block, compatible chemical reactions, and additional molecular building blocks) and constructs final ligand candidates by combining a sequence of these actions. The model is optimized to generate molecules with probabilities proportional to the reward, in this case, a docking score calculated by a GPU-accelerated docking algorithm. By doing so, it learns to generate high-quality, but also diverse, ligand candidates.

Although previous work has demonstrated that potent inhibitors can be obtained when comprehensively screening large virtual compound libraries by docking alone or by generative designs algorithms using docking as the sole oracle, we also plan to augment our pipeline with a higher confidence method for calculating binding energy, such as MM-PBSA or FEP. It will be implemented in a multi-fidelity context; most calls will go to the computationally cheap lower fidelity docking algorithm, and a limited set of calls will go to the computationally expensive high-fidelity oracle, when deemed appropriate by the GFN sampler. We expect this part of the pipeline to be available for the hit optimization stage.

We use a set of commercially available building blocks combined with a set of well-validated and robust chemical reactions. This ensures not only the generation of high-quality and chemically realistic ligands but also a high likelihood that these can be synthesized in a few steps and at high yield. The final sets of de novo SETDB1 ligand designs will be evaluated by experienced chemists.

More details about the proposed approach and experimental results demonstrating its usefulness can be found in the provided paper preprint.



 

What makes your approach stand out from the community? (<100 words)

We use commercially available building blocks and robust chemical reactions to ensure successful chemical synthesis, sourcing from collections like Enamine Real Space. Combined with the GFlowNet deep learning algorithm, this approach efficiently navigates the complex space of potential SETDB1 ligands (in a vast space of non-SETDB1 ligands). Unlike many current generative design algorithms that ignore synthetic feasibility, our method ensures synthesizable molecules. Traditional scoring functions for synthetic feasibility often still lead to complex synthesis schemes, while methods that enforce synthesizability using chemical reactions do not include deep learning methods to efficiently explore the vast search space.

Method Name
RGFN
Commercial software packages used

None

Free software packages used

PyTorch, Python, RDKIT, Vina-GPU

Relevant publications of previous uses by your group of this software/method

Koziarski, M., Rekesh, A., Shevchuk, D., van der Sloot, A., Gaiński, P., Bengio, Y., Liu, C.-H., Tyers, M., & Batey, R. A. (2024). RGFN: Synthesizable Molecular Generation Using GFlowNets. In arXiv [physics.chem-ph]. arXiv. http://arxiv.org/abs/2406.08506&nbsp;