Our approach is a combination of active-learning techniques and a state-of-the-art physics-based virtual screening method to screen ultra-large chemical compound libraries for hit discovery. Concretely, we will use the Virtual Screening Express (VSX) mode in RosettaVS and the OpenVS platform. The aim is to screen either the Enamine REAL library (~4 billion compounds) or the ZINC22 library (~4 billion compounds) against multiple conformations of the target structure.
Our approach uses active learning techniques to effectively explore the chemical space without docking each individual compound in the ultra-large chemical library. Around ten iterations of docking will be performed. During each iteration, half a million compounds will be docked and a surrogate model will be trained using the predicted binding affinities from the ligand docking.
The surrogate model will be used to infer the binding affinity on the entire library to select another half a million compounds for the next iteration of docking. The iterative process will be terminated when the predicted binding affinities of the top-ranked compounds converge or the pre-specified maximum iterations (usually ten iterations) have been reached.
A flexible docking protocol in RosettaVS will be employed to re-dock the top-ranked compounds from the initial screen to account for the flexibility of the pocket. Finally, a set of filters, such as the number of unsatisfied hydrogen bonds and the number of torsion angle outliers, will be used to select the final compounds for experimental validation.