[ Vls overview | Ligand dock faq | Vls faq ]
by Max Totrov and Ruben Abagyan
Docking and virtual ligand screening. Overview. |
[ Receptor | Choosing ligands | Docking timing | Scoring ]
This section concerns with predictions of interactions of drugs
or small biological substrates (less than about 600-700 dalton) to
pockets of larger, more rigid, receptors (typically, protein molecules,
DNA or RNA). There are five major steps in docking and screening.
1. Where to dock. Building Receptor and Pocket Model |
The goal here is to have an adequate three-dimensional model of the
receptor pocket you are planning to dock ligands to. And the pitfalls are that
your model is not accurate overall, or does not reflect the induced fit, or alternative conformations
of the receptor binding pocket are missed.
Receptor from PDB
If you have only a single entry with your receptor, convert the protein with
convertObject yes yes no no , after deleting water molecules and irrelevant chains
(e.g. delete a_!1 ), or use menus as in the ligand docking section.
However, if you have a choice between several templates, take the following into account:
Receptor from homology modeling
A model by homology can be built with the build model command (menu Homology/Build_Model)
followed macro refineModel .
Identifying pockets
If a binding pocket is not known in advance, use icmPocketFinder or icmCavityFinder (for closed pockets) macros.
icmPocketFinder can also be accessed from menu Docking/Receptor Setup , submenu Identify_Binding_Sites
2. What to dock. Ligand, ligands, a database or a library. |
Usually a good start is to try to dock the known ligand(s) to the receptor model.
You may also want to dock a library of compounds in order to identify lead candidates.
In this case the main pitfall is that the library is too restricted, molecules are not chemically
feasible or not drug-like.
Ligand from PDB
Then to dock a ligand from pdb, go through the procedure described in the ligand docking section.
Ligand(s) from a mol/mol2- file, or SMILES strings.
The main prerequisite is that the formal charges and the bond types are correct.
If they are not correct, you need to process each molecule manually as described in the ligand docking section.
From a command line you may use the build smiles or convert2Dto3D macro.
Flexible docking considerations. |
After the receptor maps are built, you will start a docking simulation.
The goal of the flexible docking calculation is prediction of correct binding geometry for each binder.
ICM stochastic global optimization algorithm attempts to find the global minimum of the energy function which
includes five grid potentials describing interaction of the flexible ligand with the receptor
and internal conformational energy of the ligand. During this process a stack of alternative low energy conformations is
saved ( one of the choices in the Docking menu ).
Some facts about ICM docking:
- an average docking time is 1 - 3 minutes per ligand per processor
- ICM docking performed very well in predicting the binding geometry in several comparative benchmarks.
- the time per ligand was chosen to be the smallest possible to allow screening of very large data sets.
To increase the time spent per ligand, change the Docking_thoroughness parameter from the Docking.Small Set Docking Batch
menu to 3. or 5., or supply this parameter to the rundock script directly.
Pitfalls. Inaccurate receptor model, or incorrectly converted ligands, or insufficient optimization effort may
lead to incorrect predictions.
The goal of scoring in virtual ligand screening is to ensure maximal separation
between binders and non-binders , and not to rank a small number of binders according
to their binding energies. The scores can be linearly related to binding energy estimates,
but the transformation parameters need to be calculated from several reference points (see the learn command).
The vls module allows you to access a good scoring function.
How-to: Ligand docking simulations. |
[ Docking intro | Project setup | _confGen | Converting chemicals | Running dock job | DockScan | Template docking ]
Introduction and pre-requisites |
ICM ligand docking procedure performs docking of the fully flexible small-molecule ligand to
a known receptor 3D structure.
Before setting up the docking project, ICM object of the receptor has to be created. In most
cases, x-ray structure of the receptor is initially in the PDB format. Thus, it has to be
converted to the ICM format. This process involves addition of the hydrogen atoms, assignment of
atom types and charges from the residue templates (icm.res) and imposition of internal
coordinates tree (icm-tree) on the original pdb coordinates. The easiest way to convert pdb
structure into icm object is through GUI as follows:
- load pdb file into icm ( menu File/Read Molecule/PDB )
- convert loaded structure into icm object (menu MolMechanics/ICM-convert/Protein ).
It is recommended that "optimize hydrogens" option is selected. To accelerate the procedure,
disable the 3D graphics window (side menu Clear/No Graphics )
When the procedure finishes, converted object is the 'current' object in icm. You can check
the results by displaying the converted structure.
Start project setup by defining the project name (menu Docking/Set project name ).
Avoid spaces and leading digits in the name. All files related to the docking project will be
stored under names, which start from the project name. Most customized parameters will be
saved in the table file under the project name as well:
DOCK1.tab # control table
DOCK1_gb.map # 3D potential grids, or 'maps'
DOCK1_gc.map
DOCK1_ge.map
DOCK1_gh.map
DOCK1_gs.map
DOCK1_rec.ob # receptor object
etc..
The next step is to set up the receptor (menu Docking/Receptor setup ). Select the receptor
molecules, in most cases a_* will do - all molecules in the current object will be included.
Define binding site residues, either manually e.g. a_/123,144,152 for selection by residue
numbers, or graphically using lasso tool (don't forget to set selection level to residue).
This selection is used solely to define boundaries of the docking search and the size of the
grids and doesn't have to be complete, selecting some 4 residues delimiting the binding site
is sufficient. Receptor setup dialog also lets you run binding site identification routine to
quickly locate putative binding sites on your receptor.
After the receptor setup is complete, the program normally displays the receptor with the
selected binding site residues highlighted in yellow xstick representation.
Ligand setup offers a number of ways (submenu docking/ligand setup ) to define the ligand,
depending on the source of the ligand structure(s).
A chemical table or an .sdf file can be converted to 3D and sampled using the conformational generator with
the _confGen script provided with the ICM distribution.
_confGen [-f] [-I] [-s] input.sdf output.sdf .. [thoroughness=1.] [mnconf=50] [vicinity=30]
- thoroughness : the relative mc sampling effort. Increase for more rigorous sampling
- mnconf : the maximal number of conformers per compound
- vicinity : the root-mean-square deviation threshold for the rotatable torsions
- auto= max : number of rotatable bonds to auto switch between systematic and MC
- -q : quiet (suppress warnings)
- -f : force overwriting of the output file
- -s : use systematic search instead of MC
- -r : sample flexible ring systems
- -d : sample cis/trans for double bonds
- -c : improve geometries and energies with cartesian MMFF minimization
- -hydrogen : keep hydrogens in the result table
- -molcart= connect_string : host,user,pass,database
Converting a chemical from pdb. |
The Protein data bank, due to unprecedented ignorance, for the last 15 years
has not been storing any information about covalent bond types and formal charges
of the chemical compounds interacting with proteins! This oversight makes it impossible
to automatically convert those molecules to anything sensible and requires your
manual interactive assignment of bond types and formal charges for each
compound in a pdb-entry. Therefore, if you apply the convert command to
a pdb-entry with ligands, the ligands will just become some crippled incomplete molecules
which can not be further conformationally optimized.
Follow these steps to convert a chemical properly from a pdb form to
an a correct icm object.
- display the molecule, set wireStyle=2 (or via top-left gui-menu),
and selection type to GRAPHICS.selectionMode=1 (the first item of the gui-selection-mode menu)
- invoke MolMechanics.Structure.SetBondType menu item
- graphically select groups of atoms (e.g. a ring) and set appropriate bond type
- invoke the next menu item, MolMechanics.Structure.SetFormalCharge
and set formal charges
- proceed to the MolMechanics.ICM-Convert.Chemical menu (see below)
Setting up a ligand or a set of ligands
Let's now consider the situation when icm object of the ligand loaded.
ICM object of the ligand can also be prepared, for instance, by
reading structure from SD file (menu File/Read Molecule/Mol/SDF ) and converting it to ICM
(menu MolMechanics/ICM-Convert/Chemical ).
Once the icm object of the ligand is ready, proceed
to docking ligand setup (menu Docking/Ligand Setup/From Loaded ICM object ). The ligand setup
procedure will first display the grid box, allowing you to adjust the box dimensions, and
then the 'probe' which defines the initial positioning of the ligand's center of mass and
long/short axis. The probe can be moved/rotated. While its positioning has only minor
influence on the results as long as it remains inside the binding site, it may help the
procedure to find the correct docked orientation more reliably and/or in shorter time.
Ligand setup procedure can be ran repeatedly to change the ligand within the same docking
project. Also box size and probe position can be changed later (menu Review/Adjust ligand/box ).
At this point, the project is ready for the calculation of maps (menu docking/Make receptor maps ).
The calculations generally take several minutes to prepare the maps.
While the dialog allows changing the grid step, we do not recommend altering
the default value of 0.5 which was found optimal for a large number of test cases.
With the map calculations completed, everything is ready to start the actual docking
simulation.
A larger set of ligands in a mol file can be considered as a database and indexed with the ICM indexing tool
(menu Docking/Tools/Index Mol,Mol2 Database ) for fast access. Ligand structures from mol/mol2 file
can be converted to ICM on the fly and do not require manual preparations necessary in the case of PDB structures.
Running docking simulations |
Use menu docking/Small Set Docking Batch to start docking of one or few ligands in the
background. You can also view the process interactively (menu docking/Interactive Docking )
although it is much slower due to the time spent on drawing the molecules. The results of the
batch docking job are saved in the
PROJECTNAME_answers*.ob #icm-object file with best solutions for each ligand
PROJECTNAME_*.cnf # icm conformational stack files with multiple docked conf.
PROJECTNAME_*.ou # output file were various messages are stored.
Multiple conformations accumulated during the docking of the ligand can be visualized and
browsed in ICM (menu Docking/Browse Stack Conformations ). Use menu
Docking/Display/Preferences to change default graphic representation of ligand/receptor.
Running ICM batch docking script with _dockScan |
After the project, the project directory and the maps have been created, you can start docking
different sets of ligands into this receptor. To run it directly by ICM instead of through
an intermediate Unix shell script, use the _dockScan script.
To run the _dockScan script just run ICM and provide the script as the first argument.
All _dockScan arguments need to be provided after it.
Prerequisites:Complete these steps of the Docking menu:
- Receptor Setup
- Make receptor maps
- Ligand setup ( Note: you do not need this step if you dock directly from an .sdf file)
Modify the resulting projFile (e.g. DOCK1.tab ) to your liking if necessary.
The full syntax of the _dockScan script is the following.
icm _dockScan projFile [ optional arguments ]
The arguments could be the following
argument | comment | example
|
---|
| dock according to parameters and ligand source settings from this file
|
input= | dock directly from an sdf file (other modes require modifications of the project.tab file)
|
-a | dock ALL molecules, ignore filters | _dockScan /home/dock/PROJECTNAME -a
|
-d | docking only, NO scoring |
|
-E | evaluates binding score for several poses, resorts them but does save all the stacks in files (convenient for screening) -S |
|
-p | dock probes from the input sdf file, allow overlaps and encourage coverage |
|
-r | RIGID docking |
|
-s | keep multiple docking poses of the ligand |
|
-S | keep the poses and SCORE ALL of them, plus keep the stack conformation file, see also -E |
|
confs= | score only up to poses |
|
from= to= | range of molecules from an indexed .sdf file | from=1 to=10000
|
jobs= | spaws n processes, default is 1 |
|
name= | the result file with poses will be named accordingly (the default is 'answers') |
|
outdir= | directory for the output files |
|
output= | makes a hitlist sdf file |
|
seed=|random seed from the previous docking to reproduce the results exactly |
|
thorough=|a number from 0.5 to 20. (default is 1.) that allows to extend the docking effort |
|
Example:
Docking an sdf file (requires to configure the receptor and make the grid maps).
icm _dockScan /home/clayton/gpcr/PROJECTNAME -S confs=10 thorough=3.
Template constrained docking |
It is possible to use a template object to tether part of the ligand structure
to a preferred position during the docking simulation. Prepare an object file
containing the group of atoms to be tethered to. Edit the .tab docking setup
file, setting the s_templateObj field to the name of the template object file
(it is 'none' by default). The variable l_superByName controls the way
correspondence is established between the ligand and template atoms. If it is
'no', chemical substructure search is performed and tethers imposed according to
the substructure match. If l_superByName is 'yes', simple matching according to
atom names is performed. Tethers can be individually weighted by assigning
b-factor values to the template atoms. Weights are reversely proportional to b-factor,
default b-factor of 20. corresponds to the weight of 1.
How-to: Virtual Ligand Screening |
[ Introduction | Vls threshold | Mf score | Admet selection | Parallelization | Vls cluster | Vls scores storage | Vls results ]
Virtual Ligand Screening (VLS) in ICM is performed via docking of each ligand in the database
to the receptor structure, with subsequent evaluation of the docked conformation with a
binding-score function. Best-scoring ligands are then stored in the multiple icm-object file.
The set-up of the VLS is largely identical to the set-up of the docking simulation (see
How-to: Ligand Docking Simulations). In most cases the ligand input file will be an SDF or
MOL2 file. These files need to be indexed by ICM before they can be used in VLS runs. The
index is used to allow fast access to an arbitrary molecular record in a large file. Use menu
Docking/Tools/Index Mol/Mol2 File/Database to generate the index, then set up the SDF/MOL2
file as a ligand source (menu Docking/Ligand Setup/From Database ). As in docking, rundock
UNIX shell script can be used to start simulations.
An important parameter of the VLS run is the score threshold. Docked conformation for a
particular ligand will only be stored by ICM VLS procedure if its binding-score is below the
threshold. Edit the project .tab file to adjust this value:
#>r DOCK1.r_ScoreThreshold
-35.
The choice of the threshold can be done in two ways:
- based on the scores calculated by docking known ligands.
Generally, a value somewhat above typical score observed for known ligands is a good guess.
- if no ligands are known, a pre-simulation can be run using ~1000 compounds from the target database.
Using the resulting statistics for the scores, the threshold should be set to retain ~1% of the ligands.
Potential of mean force score |
Potential of mean force calculation ( pmf ) provides an independent score of the strength of
ligand-receptor interaction. The pmf-parameters are stored in the icm.pmf file
and read with the read pmf s_pmfFile command.
There are two types of the mf-calculation: all-to-all atoms and intermolecular mode.
The mode is switched with the mfMethod preference.
To enable calculation of the pmf-score, define the PROJECTNAME.r_mfScoreThreshold
threshold paramter to the table:
#>r PROJECTNAME.r_mfScoreThreshold
999.
ICM VLS uses a number of criteria to pre-select compounds before docking. Edit the project .tab file to
change their defaults:
#>i DOCK1.i_maxHdonors
5
#>i DOCK1.i_maxLigSize
500
#>i DOCK1.i_maxNO
10
#>i DOCK1.i_maxTorsion
10
#>i DOCK1.i_minLigSize
100
If the database size exceeds several thousand compounds, it is desirable to run a number of VLS jobs in
parallel to speed up calculations. Use -f and -t options of rundock to start multiple jobs on different
parts of the database, e.g.
rundock -f 1 -t 10000 -o
rundock -f 10001 -t 20000 -o
rundock -f 20001 -t 30000 -o
..
Running VLS jobs in PBS UNIX cluster environment |
Jobs on the Linux cluster are run through PBS queuing system. Several
scripts are provided to facilitate submission of vls jobs. To submit a
single job, use pbs script 'pbsrun', which is a pbs wrapper for rundock
qsub $ICMHOME/pbsrun -v"JOBARGS=-f 1 -t 1000 -o MYPROJECT"
Note that the rundock arguments go in the quotes after JOBARGS= .
The qsub command is a part of PBS.
To submit multiple jobs, there is a simple shell script 'pbsscan' which
executes multiple qsub's for database stripes:
$ICMHOME/pbsscan MYPROJECT 1 6000 1000
-submits 6 jobs, 1 to 1000; 1001 to 2000 ... 5001 to 6000. Currently this
script only supports default rundock arguments, copy/edit to change.
The command qstat is a part of PBS and can be used to check the status of
the jobs. In addition, $ICMHOME/scanstat script can be used to monitor the
progress of the VLS jobs. It analyses the *.ou rundock output files.
$ICMHOME/scanstat *.ou
To delete the jobs, use PBS command qdel:
qdel 1234 # delets job number 1234
Where to find the scores of the compounds |
Once the compounds are docked, if VLS option is installed, the procedure
evaluates the score and stores it in the 'comment' of the ligand object.
When browsing scan answers, the SCORE>... line appears for each object
viewed, containing the value of the score and it's component terms. It can
also be extracted from the icm object in shell using Namex( a_1. )
function, and Field() can be used to get particular component or the
total: Field( Namex( a_1. ) "Score=" 1).
The SCORE lines also appear in the output file and can be extracted by
simple unix grep command
grep SCORE *.ou
The MFScore was recently added, it's calculated if r_mfScoreThreshold
variable is defined in the project .tab file. It can be added manually:
#>r PROJECTNAME.r_mfScoreThreshold
999.
The hitlist can also be prepared by a macro. In this case the scores will be extracted.
The hits found by the screening procedure and stored in *answers*.ob files can be visualized in ICM
(menu Docking/Browse Scan Solutions ).
If necessary, they can also be exported as SD file using
(menu Docking/Tools/Export scan answers as mol ).
The score and its components are stored in the resulting SD file as well.
Simple analysis of the score distribution can be performed by making a histogram
(menu Docking/Tools/Scan results histogram ).