Copyright © 2020, Molsoft LLC Jun 5 2024
|
[ VLS Overview ] by Max Totrov and Ruben Abagyan
[ Receptor | Choosing ligands | Docking timing | Scoring | Docking intro | Project setup | _dockBatch | Converting chemicals | Running dock job | VLS Introduction | Vls threshold | Mf score | Admet selection | Parallelization | Vls cluster | Vls scores storage | Make Hitlist | DockScan | SLURM SGE | GINGER | RIDGE | Template docking | V-SYNTHES | GigaScreen ] This section concerns with predictions of interactions of drugs or small biological substrates (less than about 600-700 dalton) to pockets of larger, more rigid, receptors (typically, protein molecules, DNA or RNA). There are five major steps in docking and screening.
The goal here is to have an adequate three-dimensional model of the receptor pocket you are planning to dock ligands to. And the pitfalls are that your model is not accurate overall, or does not reflect the induced fit, or alternative conformations of the receptor binding pocket are missed. Receptor from PDB If you have only a single entry with your receptor, convert the protein with convertObject yes yes no no , after deleting water molecules and irrelevant chains (e.g. delete a_!1 ), or use menus as in the ligand docking section. However, if you have a choice between several templates, take the following into account:
Receptor from homology modeling A model by homology can be built with the build model command (menu Homology/Build_Model) followed macro refineModel . Identifying pockets If a binding pocket is not known in advance, use icmPocketFinder or icmCavityFinder (for closed pockets) macros. icmPocketFinder can also be accessed from menu Docking/Receptor Setup , submenu Identify_Binding_Sites
Usually a good start is to try to dock the known ligand(s) to the receptor model. You may also want to dock a library of compounds in order to identify lead candidates. In this case the main pitfall is that the library is too restricted, molecules are not chemically feasible or not drug-like. Ligand from PDB Then to dock a ligand from pdb, go through the procedure described in the ligand docking section. Ligand(s) from a mol/mol2- file, or SMILES strings. The main prerequisite is that the formal charges and the bond types are correct. If they are not correct, you need to process each molecule manually as described in the ligand docking section. From a command line you may use the build smiles or convert2Dto3D macro.
After the receptor maps are built, you will start a docking simulation. The goal of the flexible docking calculation is prediction of correct binding geometry for each binder. ICM stochastic global optimization algorithm attempts to find the global minimum of the energy function which includes five grid potentials describing interaction of the flexible ligand with the receptor and internal conformational energy of the ligand. During this process a stack of alternative low energy conformations is saved ( one of the choices in the Docking menu ). Some facts about ICM docking:
Pitfalls. Inaccurate receptor model, or incorrectly converted ligands, or insufficient optimization effort may lead to incorrect predictions.
The goal of scoring in virtual ligand screening is to ensure maximal separation between binders and non-binders , and not to rank a small number of binders according to their binding energies. The scores can be linearly related to binding energy estimates, but the transformation parameters need to be calculated from several reference points (see the learn command).
The vls module allows you to access a good scoring function.
ICM ligand docking procedure performs docking of the fully flexible small-molecule ligand to a known receptor 3D structure. Before setting up the docking project, ICM object of the receptor has to be created. In most cases, x-ray structure of the receptor is initially in the PDB format. Thus, it has to be converted to the ICM format. This process involves addition of the hydrogen atoms, assignment of atom types and charges from the residue templates (icm.res) and imposition of internal coordinates tree (icm-tree) on the original pdb coordinates. The easiest way to convert pdb structure into icm object is through the GUI as described in the GUI Manual or you can automate the setup in the command line using _dockBatch.
Start project setup by defining the project name (GUI menu Docking/New Project ). Avoid spaces and leading digits in the name. All files related to the docking project will be stored under names, which start from the project name. Most customized parameters will be saved in the table file under the project name dockProjName.dtb as well: DOCK1.dtb # control table DOCK1_rec.ob # receptor object DOCK1_gb.map # 3D potential grids, or 'maps' DOCK1_gc.map DOCK1_ge.map DOCK1_gh.map DOCK1_gl.map DOCK1_gs.map DOCK1_probe.ob # 4 atom probe for initial superposition (or) DOCK1_tmplt.ob # template ligand (optional)etc.. The next step is to set up the receptor (GUI menu Docking/New Project ). Select the receptor molecules, in most cases a_* will do - all molecules in the current object will be included. Define binding site residues, either manually e.g. a_/123,144,152 for selection by residue numbers, or graphically using lasso tool (don't forget to set selection level to residue). This selection is used solely to define boundaries of the docking search and the size of the grids and doesn't have to be complete, selecting some 4 residues delimiting the binding site is sufficient. Receptor setup dialog also lets you run binding site identification routine to quickly locate putative binding sites on your receptor.
The receptor setup procedure will first display the grid box, allowing you to adjust the box dimensions, and
then the 'probe' which defines the initial positioning of the ligand's center of mass and
long/short axis. The probe can be moved/rotated. While its positioning has only minor
influence on the results as long as it remains inside the binding site, it may help the
procedure to find the correct docked orientation more reliably and/or in shorter time.
To run the docking job from a unix shell, use the _dockScan macro with appropriate parameters, e.g. $ICMHOME/icm -c /icm/_dockScan abl -a -S confs=10 effort=2. from=10 to=20 outdir=/tmp/ name=ou >& abl10.ou &
You can automate docking setup using _dockBatch and/or immediately start the screen. e.g. $ICMHOME/icm $ICMHOME/_dockBatch 1xbb.pdb ligmol=asti Arguments:
Fully Automated Optional - you can run screening automatically by defining the sdf file at the end e.g. $ICMHOME/icm64 $ICMHOME/_dockBatch 1xbb.pdb ligmol=asti chem_database.sdf Note about Ionization State of Ligand setup using _dockBatch There is a setting within the docking project that controls protonation state handling. When project is set up by _dockBatch , the default is to apply built-in pKa model to determine predominant protonation state. To keep protonation as in input sdf, you can turn this off with _dockBatch -u option (or change it in .dtb file later). If you just want to see results of automatic protonation state assignment, you can load your sdf in icm and use dialog under Chemistry/Set Formal Charges
The Protein data bank, due to unprecedented ignorance, for the last 15 years has not been storing any information about covalent bond types and formal charges of the chemical compounds interacting with proteins! This oversight makes it impossible to automatically convert those molecules to anything sensible and requires your manual interactive assignment of bond types and formal charges for each compound in a pdb-entry. Therefore, if you apply the convert command to a pdb-entry with ligands, the ligands will just become some crippled incomplete molecules which can not be further conformationally optimized. Follow these steps to convert a chemical properly from a pdb form to an a correct icm object.
Setting up a ligand or a set of ligands Let's now consider the situation when icm object of the ligand loaded. ICM object of the ligand can also be prepared, for instance, by reading structure from SD file (menu File/Read Molecule/Mol/SDF ) and converting it to ICM (menu MolMechanics/ICM-Convert/Chemical ). Once the icm object of the ligand is ready, proceed to docking ligand setup (menu Docking/Ligand Setup/From Loaded ICM object ). Ligand setup procedure can be ran repeatedly to change the ligand source within the same docking project. Also box size and probe position can be changed later (menu Review/Adjust ligand/box ). At this point, the project is ready for the calculation of maps (menu docking/Make receptor maps ). The recalculation of the maps typically requires less than 1 minute. While the map calculation dialog allows changing the grid step, we do not recommend altering the default value of 0.5 which was found optimal for a large number of test cases. Maximum van der Walls repulsion parameter can be increased if more rigorous enforcement of steric exclusion is desired With the map calculations completed, everything is ready to start the actual docking simulation. A larger set of ligands in a mol file can be considered as a database and indexed with the ICM indexing tool (menu Docking/Tools/Index Mol,Mol2 Database ) for fast access. Ligand structures from mol/mol2 file can be converted to ICM on the fly and do not require manual preparations necessary in the case of PDB structures.
Use menu docking/Small Set Docking Batch to start docking of one or few ligands in the background. You can also view the process interactively (menu docking/Interactive Docking ) although it is much slower due to the time spent on drawing the molecules. The results of the batch docking job are saved in the PROJECTNAME_answers*.ob #icm-object file with best solutions for each ligand PROJECTNAME_*.cnf # icm conformational stack files with multiple docked conf. PROJECTNAME_*.ou # output file were various messages are stored.Multiple conformations accumulated during the docking of the ligand can be visualized and browsed in ICM (menu Docking/Browse Stack Conformations ). Use menu Docking/Display/Preferences to change default graphic representation of ligand/receptor.
Virtual Ligand Screening (VLS) in ICM is performed via docking of each ligand in the database to the receptor structure, with subsequent evaluation of the docked conformation with a binding-score function. Best-scoring ligands are then stored in the multiple icm-object file. The set-up of the VLS is largely identical to the set-up of the docking simulation (see How-to: Ligand Docking Simulations). In most cases the ligand input file will be an SDF or MOL2 file. These files need to be indexed by ICM before they can be used in VLS runs. The index is used to allow fast access to an arbitrary molecular record in a large file. Use menu Docking/Tools/Index Mol/Mol2 File/Database to generate the index, then set up the SDF/MOL2 file as a ligand source (menu Docking/Ligand Setup/From Database ). As in docking, _dockScan ICM script can be ran directly from UNIX shell/command line to start simulations.
An important parameter of the VLS run is the score threshold. Docked conformation for a particular ligand will only be stored by ICM VLS procedure if its binding-score is below the threshold. Edit the project .dtb file to adjust this value:
#>r DOCK1.r_ScoreThreshold -35. The choice of the threshold can be done in two ways:
Potential of mean force calculation ( pmf ) provides an independent score of the strength of ligand-receptor interaction. The pmf-parameters are stored in the icm.pmf file and read with the read pmf s_pmfFile command. There are two types of the mf-calculation: all-to-all atoms and intermolecular mode. The mode is switched with the mfMethod preference. To enable calculation of the pmf-score, define the PROJECTNAME.r_mfScoreThreshold threshold paramter to the table: #>r PROJECTNAME.r_mfScoreThreshold 999.
ICM VLS uses a number of criteria to pre-select compounds before docking. Edit the project .dtb file to change their defaults. A full description of each field in the .tab file can be found here ftp://ftp.molsoft.com/man/dockTable.pdf #>i DOCK1.i_maxHdonors 5 #>i DOCK1.i_maxLigSize 500 #>i DOCK1.i_maxNO 10 #>i DOCK1.i_maxTorsion 10 #>i DOCK1.i_minLigSize 100
If the database size exceeds several thousand compounds, it is desirable to run a number of VLS jobs in parallel to speed up calculations. Use from= and to= options of _dockScan to start multiple jobs on different slices of the database, e.g. icm _dockScan from=1 to=10000 MYPROJECT icm _dockScan from=10001 to=20000 MYPROJECT icm _dockScan from=20001 to=30000 MYPROJECT ..
Jobs on the Linux cluster are run through PBS queuing system. Several scripts are provided to facilitate submission of vls jobs. To submit a single job, use pbs script 'pbsrun', which is a pbs wrapper for rundock qsub $ICMHOME/pbsrun -v"JOBARGS=-f 1 -t 1000 -o MYPROJECT" Note that the rundock arguments go in the quotes after JOBARGS= . The qsub command is a part of PBS. To submit multiple jobs, there is a simple shell script 'pbsscan' which executes multiple qsub's for database stripes: $ICMHOME/pbsscan MYPROJECT 1 6000 1000 -submits 6 jobs, 1 to 1000; 1001 to 2000 ... 5001 to 6000. Currently this script only supports default rundock arguments, copy/edit to change. The command qstat is a part of PBS and can be used to check the status of the jobs. In addition, $ICMHOME/scanstat script can be used to monitor the progress of the VLS jobs. It analyses the *.ou rundock output files. $ICMHOME/scanstat *.ou To delete the jobs, use PBS command qdel: qdel 1234 # delets job number 1234
Once the compounds are docked, if VLS option is installed, the procedure evaluates the score and stores it in the 'comment' of the ligand object. When browsing scan answers, the SCORE>... line appears for each object viewed, containing the value of the score and it's component terms. It can also be extracted from the icm object in shell using Namex( a_1. ) function, and Field() can be used to get particular component or the total: Field( Namex( a_1. ) "Score=" 1). The SCORE lines also appear in the output file and can be extracted by simple unix grep command grep SCORE *.ou The MFScore is calculated if r_mfScoreThreshold variable is defined in the project .dtb file. It can be added manually: #>r PROJECTNAME.r_mfScoreThreshold 999. The hitlist can also be prepared by a macro. In this case the scores will be extracted.
The hits found by the screening procedure and stored in *answers*.ob files can be processed into a hitlist spreadsheet which is browsable and exportable as SDF: (menu Docking/Make Hit List... ). An older way to export hits as SD file is using (menu Docking/Tools/Export scan answers as mol ). The score and its components are stored in the resulting SD file as well. Simple analysis of the score distribution can be performed by making a histogram (menu Docking/Tools/Scan results histogram ). To make a hitlist in GUI use Docking/Make hitlist and on the command line use _scanMakeHitlist
scanMakeHitList "DOCK1" ""//vls/DOCK1_answers*.ob" Name(Name( "//vls/DOCK1_answers*.ob"")) no no yes 0 The logical arguments at the end are:
Please see the GUI Manual for a description of the physics-based score (Score) and the Neural Network score (RTCCN).
After the project, the project directory and the maps have been created, you can start docking different sets of ligands into this receptor. To run it directly by ICM instead of through an intermediate Unix shell script, use the _dockScan script. To run the _dockScan script just run ICM and provide the script as the first argument. All _dockScan arguments need to be provided after it. Prerequisites:Complete these steps of the Docking menu:
The full syntax of the _dockScan script is the following. icm _dockScan [ optional arguments ] projName The arguments could be the following
Example: Docking an sdf file (first configure the receptor, make the grid maps and setup the ligand input source in GUI). icm _dockScan /home/gpcr/PROJECTNAME -a -S confs=10 effort=3. This will dock all compounds with 3-fold longer (more thorough) simulations, and rescore up to 10 conformations per ligand. If you have a cluster license without graphics you will need to use -vlscluster flag after calling icm.
icm -vlslcuster _dockScan /home/gpcr/PROJECTNAME -a -S confs=10 effort=3.
In the /bin directory you will find a script called docksub.icm. This script prepares your docking run and distributes it via SGE or SLURM job queueing system on your cluster or cloud. use: $ICMHOME/icm docksub.icm -vlscluster <chemTable.sdf|.inx> {<dockProj>|<APF_template.mol>} [jobs=100] [-sub] [-apf] [qtype=sge|slurm] [<dockProj_and_dockScan_options>|<APF_options>] e.g. submit 100 slurm jobs icm64 docksub.icm qtype=slurm chemTable.inx DOCK1 jobs=100 –subor submit 2 slurm jobs and each one will use 18 cores icm64 docksub.icm input.inx DOCK1 jobs=2 proc=18
GINGER - Graph Internal-coordinate Neural-network conformer Generator with Energy Refinement
Overview:Conformer generation is an essential step of a variety of molecular modeling and computer-assisted drug discovery workflows such as 3D ligand-based virtual screening or fast GPU docking. GINGER (Graph Internal-coordinate Neural-network conformer Generator with Energy Refinement) is Molsoft's new cutting-edge software designed for ultra-rapid high quality conformer library generation on GPUs.Read More...
Usage> icm64 _ginger input.sdf|.tsv|.csv header=no smicol=A idcol=B output.sdf|.molt
Example:
RIDGE is a very fast GPU accelerated structure-based docking method. Read More...
Options:
We recommend to use if possible a most recent NVIDIA GPU. For optimal performance you would use a GEFORCE 4090 but RIDGE will run on older versions (e.g. 3090 2080).
Please note multiple .molt files can be screened using comma separated filenames.
Example
Dock 1000 random compounds from /path/to/conf_db.molt with Cartesian minimization (-C)
Unpack the files:
Create a link to MEL library and Markush
Create your docking Project
You can setup your docking project files in the GUI or fully automated in the command line using _dockBatch
STEP 1 - Minimal Enumeration Library (MEL) Docking
Copy docking project files to the 'run' directory
Submit docking job(s) in 'run' directory
Single Machine
Cluster using docksub.icm script in /bin directory
At the end of this stage you expect multiple answer files in the 'run' directory.
STEP 2.1 (Load and Process Hits)
STEP 2.2 (Enumerate)
STEP 3 (Dock final hits)
The GigaScreen method combines machine learning and deep learning tools to tackle the computational intensity of screening very large chemical databases. To overcome these challenges several protocols are employed:
Read more here about GigaScreen.
System Requirements
How to run GigaScreen:
1) Download and install icm-mxnet package - contact MolSoft to download this package and a RIDGE GPU license is required.
Make sure you can run the binary (no missing dependencies)
1.1) Download compressed conformation DB - contact MolSoft to obtain these databases.
2) gigaScreen.icm script is provided with distribution ($ICMHOME/bin/gigaScreen.icm)
2.1) create working directory and copy docking project files
# docking project consists of following files (copy them into working directory)
2.2) run gigaScreen.icm inside working directory
That should perform 5 iterations of RIDGE/Build Model/Predict + Final Docking
All intermediate results will be stored inside 'screen_out' directory
Each stage will create the following files:
* ridge_out_
Final Docking will be saved as screen_out/ridge_final.sdf
|
Copyright© 1989-2024, Molsoft,LLC - All Rights Reserved. Copyright© 1989-2024, Molsoft,LLC - All Rights Reserved. This document contains proprietary and confidential information of Molsoft, LLC. The content of this document may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission from Molsoft, LLC. |