Jul 1 2004
Contents
 
Introduction
Reference Guide
User's guide
 ICM-shell
 ICM graphics
 Structure analysis
 Sequence, searches and alignments
  How to search all Prosite patterns in your sequence
  How to find a fragment in the PDB database ( obsolete )
  How to identify binding pockets
  How to find a similar fold or topological motif in the PDB database
  How to generate a non-redundant list of PDB sequences
  How to merge several pdb files
  How to compile a database of protein secondary structures and their folds
  How to search headers of the PDB entries
 Energetics and electrostatics
 Manipulations with molecules
 Animation
 Transformations and symmetry
 Maps and factors
 How to plot
 How-to: Docking and Virtual Ligand Screening
 Example scripts
References
Glossary
 
Index
Prev
3.4 Sequence, searches and alignments
Next

[ sp | spf | bindingsiteanalysis | spt | prl | pdbmerge | ssd | sph ]

3.4.1 How to search all Prosite patterns in your sequence

[Top]
Use macro searchSeqProsite. For example:
 
  read pdb "2dhf" 
  make sequence a_1.1        # sequence of a PDB structure  
  show sequence 
  find prosite 2dhf_a        # 2dhf_a is the sequence of the protein  
See also find prosite, find pattern and read prosite.

3.4.2 How to find a fragment in the PDB database ( obsolete )

[Top]
First, make sure that you have a library of representative icm-objects. String variable s_qsearchDirshould contain the relative path of this directory with respect to the s_dataDirdirectory. The library may be created and updated with the provided _mkQsearchLib script. Use qsearch or iqsearch macros. Load the object and type qsearch or iqsearch + arguments. You will be prompted for the forgotten arguments.
To understand the meaning of the arguments, see the find pdb command.
Examples:
 
 read object s_icmhome+"crn" 
 call s_icmhome+"_qsearch" 
                 # no graphics, just the list of solutions 
# qsearch a_/2:6,14:18                                        
                 # interactive 
 iqsearch a_1crn./2:6,14:18 "xxxxx------xxxxxx" "*" "*" .7 


3.4.3 How to identify binding pockets

[Top]

There are three algorithms (A, B, and C) with ICM which can identify pockets:
optiontargetmacro
A closed pocketsicmCavityFinder
B almost closed pocketsmake map potential , etc., see below
C pockets with good ligand-binding potentialicmPocketFinder
For the areas of space attracting ligands (option C), use two macros:
Example:
 
 read pdb "1a28" 
 delete a_!1,2 
 convert  
 delete a_2 
 icmPocketFinder a_ 3.  

In the following example we find an almost closed pocket which can not be identified with icmCavityFinder .
 
 read pdb "1fm6"  # read the 'a' chain of RXR 
 delete a_!1,9    # keep the RXR and its ligand only 
 make map potential a_1 Box( a_ 1. ) 1.  # grid size 1.5 A 
 make grob m_atoms exact 0.1 solid  
 split g_atoms 
 cool a_ 
 display g_atoms2 reverse 
If you have problems with identifying pockets, change the grid size, the threshold level for make grop m_atoms , or try to convert object to the ICM type (the conversion will add hydrogens and make the object more dense).



3.4.4 How to find a similar fold or topological motif in the PDB database

[Top]
Use macro searchObjSegment, for example:
 
 read object s_icmhome+"crn" 
 searchObjSegment a_1.1 30 3. 
# or 
 read pdb "1pxt" 
 delete a_!1 
 convert 
 searchObjSegment a_1.1 24 6. 
You may need to adjust the seed fragment length and the RMSD parameters for a cleaner list.
The database foldbank.seg is provided and may be recompiled, customized and updated by the supplied _mkSegmentLib script.
See also segment, find segment, write segment, foldbank.seg, How to extract a diverse set of PDB entries How to compile a database of protein secondary structures and their folds .

3.4.5 How to generate a non-redundant list of PDB sequences

[Top]
The following script is a skeleton of the provided script _mkUniqPdbSeqs which is somewhat more automated.
 
 l_commands=no 
 errorAction="none"        # if something goes wrong do not  
                           # interrupt the loop 
 s_pdbDir = "/data/pdb/"   # make sure you have correct path 
 pdbDirStyle = 4           #  
 read sarray s_pdbDir+"/derived_data/index/source.idx" 
                          # you need a list of all pdb-entries  
                          # (4 char. code per line will do) 
 source = Tolower(Trim(Field(source,1))) 
 n=Nof(source) 
 for i=1,n 
   read pdb sequence resolution source[i] 
        # append resolution to the chain name (like 9lyz_a19) 
 endfor 
 group sequence "*" uniqSeqs unique 0.1  
        # cutoff inter-sequence  
        # distance 0.1 (dissimilar by more than 10%) 
# 
# Other possibilities 
# 
#  group sequence uniqSeqs unique 5     # if two seqs differ by more 
#                                       # than 5 mutations  
#  group sequence uniqSeqs unique       # throw away only identical  
#                                       # sequences  
# 
 delete sequences                     # get rid of sequences not  
                                      # included in uniqSeqs 
 
 write sequence s_inxDir + "/pdb1.seq" 
                               # actual sequences for searches 
 write Name(uniqSeqs) "chainList"               
                               # list of protein chains if you need it 
 quit 


3.4.6 How to merge several pdb files

[Top]
The simplest way to merge two pdb files is to read them as separate objects and the use the move a_1. a_2. command. Example:
 
 read pdb "1crn" 
 read pdb "1d48" 
 move a_2. a_1.        # merges objects 
 write pdb a_1. "both" # saves both files in pdb format 
 write object a_1.     # saves merged object in compact binary form 

Before or after merging, the objects can also be edited, translated to a new position, rename chains, change residue numbers etc. Example:
 
 read pdb "1d48" 
 delete a_w* 
 delete a_2       # delete the second chain 
 read pdb "1crn" 
 delete a_/33:99  # delete a C-term. part of crambin 
 move a_1. a_2.   # merge the remains 
 write object a_   

If you want to re-engineer a polypeptide chain of a protein, using two pdb-files, e.g. to transplant one part of a protein to another and restore the bonding connectivity, you may use the modify command:
 
 read pdb "1crn"  # one pdb 
 read pdb "1cbn"  # similar protein 
 modify a_1./20:25 a_2./20:25   
   # translants a loop from 2nd object to the 1st one 
 write pdb a_1. "combo" 

3.4.7 How to compile a database of protein secondary structures and their folds

[Top]
The following script uses the previously compiled list of unique pdb chains and creates two files: foldbank.db containing sequences, resolutions, the deposited and the automatically assigned secondary structures of the nonredundant set and foldbank.seg containing quantitative topology descriptions of the folds. The GAP (which stands for Gly-Ala-Pro) library allows to build only the backbones necessary for the secondary structure prediction algorithm and speeds up the PDB->ICM conversion. The foldbank.db is in the ICM database format, so that you can create an ICM table shell-object. This allows to sort entries and perform searches to create subsets.
 
 l_commands    =no 
 l_info        =no 
 l_confirm     =no 
 errorAction="none" 
 segMinLength  =3 
 mncalls       =300 
 s_icmhome     ="./" 
 s_reslib      ="icmGAP" # Gly-Ala-Pro residue library 
 read library 
       # ...getting the representative list of chains... 
 read sequences s_pdbDir+"/derived_data/pdb_seqres.txt"  
       #make sure to have  _mkUniqPdbSeqs  executed recently 
 li=Name(sequence) 
 delete sequences 
       #...you may modify the method or create your own list... 
 if (Error) quit 
 
 unix mv foldbank.db  foldbank.db.OLD 
 unix mv foldbank.seg foldbank.seg.OLD 
 for i=1,Nof(li) 
   lii=Tolower(li[i]) 
   read pdb lii[1:4]+"."+lii[6]+"/" 
   delete !Mol(a_*/A)                    # delete HET-molecules 
   convert 
   er=r_out 
   rz=Resolution(a_1.) 
   if(rz < 0.01)rz=9.99 
   sx=Sstructure(a_*) 
   assign sstructure 
                      # uncomment the following line, if you'd like 
                      # to save GAP objects. requires GAP subdirectory 
                      # write object "GAP/"+lii[1:4]+lii[6]  
   sprintf "# %d\nNA %s.%s\nRZ %.2f\nER %.3f\nSE %s\nSX %s\nSS %s\n" \ 
            i lii[1:4] lii[6] rz er String(Sequence(a_*)) sx Sstructure(a_*) 
   write append s_out "foldbank.db" 
   assign sstructure segment 
   rename a_2. lii[1:4]  # restore the original pdb-name  
   write append segment "foldbank.seg" 
   delete a_*. 
 endfor 
 quit 


3.4.8 How to search headers of the PDB entries

[Top]
There is an PDB.tab file which contains one line header descriptions of all the entries. Now you have three ways of doing it:
  • In unix: grep -i kinase PDB.tab
  • From ICM: you have more possibilities:
     
    read table s_icmhome+"data/inx/PDB.tab"  # or in s_userDir+"inx/" 
    show PDB.head ~ "kinase"   # or 
    show PDB.head ~ "*kinase*" # or 
    show PDB.comp ~ "kinase*"  # regular expressions 
    # You can also 
    a=PDB.head~"kinase" 
    for i=1,Nof(a) 
      nice PDB.ID 
      pause 
      delete a_*. 
    endfor 
    
  • Use the gui (Find.In PDB.By Keyword..)


Prev
cavityanalysis
Home
Up
Next
ez25

Copyright© 1989-2004, Molsoft,LLC - All Rights Reserved.
This document contains proprietary and confidential information of Molsoft, LLC.
The content of this document may not be disclosed to third parties, copied or duplicated in any form,
in whole or in part, without the prior written permission from Molsoft, LLC.