ICM Manual v.3.9
by Ruben Abagyan,Eugene Raush and Max Totrov Copyright © 2020, Molsoft LLC Jun 5 2024
|
[ sarray | script | sequence | segment | (ICM)-shell | site | skin | sln | stack | stick | string | Svariable | Factor | Slide | surface area ]
Array of strings: {"one","two","five","minus ten"}
To show them use:
show sarray [simple]
(option simple to skip the header information) or simply type their name and hit 'Enter'.
See also: Sarray , Tostring , read sarray
(or ICM script) means a collection of
ICM commands
stored in a file (or string) which can be
called from
ICM-shell.
Example:
call _demo_fold # find demo_fold file and start the script
ICM scripts as any other scripts can also be called directly from shell if they have a correct
header, e.g.
> cat icmscript
#!/home/icm/icm -s
print "This script name is " + File(last)
quit
> icmscript
This script is icmscript
Scripts as ICM shell stringsA script can also be stored in a string and run from GUI. Use carriage return "\n" to enter several commands.
Example:
scr = "print 333\nprint 444"
call string scr
333
444
To make it executable from GUI by double click use:
set property command s2 # now s2 will be visible in Workspace as script. double click on it or right-click for menu
Script files with Arguments
ICM scripts also understand arbitrary
name=value arguments. Some of them can be used in a script and others automatically
passed to other scripts. For example:
icm n=2 t="01:20"
This arguments can be extracted inside a script with the Getarg command.
Also after certain arguments are extracted all other arguments can be passed along
to another script. See this example:
#!/home/ruben/icm/icm -s
#
macro bye
print " Help: "+ File(last) +" a1=.. file=.. [a2=s (3.14)]"
print " a sample script to get three args and pass along remaining arguments"
quit
endmacro
if Nof(Getarg(name))==0 bye
if Nof(NotInList(Getarg(name),{"a1","file"}))!=0 bye # must have a1 and file
a1=Getarg("a1" delete) # a must-have string argument, no defaults
file=Getarg("file" delete )
if !Exist(file) then
print " Error> File "+file+" not found"
bye
endif
a2=Getarg("a2",3.14, delete) # an optional real argument
#
# check the sanity of values and exit if anything is wrong: if .. bye
#
show "These arguments have been extracted and tested: " a1,a2,file
show " unix IcmScript " + Getarg() # all other arguments are passed along
quit
If the script runs during an interactive graphics session, the graphics can be
used concurrently with the script execution.
Teaching your vim editor to highlight ICM syntax:Linux/Mac1. create ~/.vim/syntax in your HOME
mkdir -p ~/.vim/syntax
2. create a link or copy icm.vim file there
ln -s $ICMHOME/icm.vim ~/.vim/syntax/icm.vim
3. paste the line below into ~/.vimrc (create one if it's not there)
au BufRead,BufNewFile *.icm,_* set filetype=icm
vi/vim/gvim myscript.icm # enjoy syntax highlighting
Vim syntax highlighting for Windows:1. In you home folder ( '\Documents and Settings\' on XP or '\Users\' on Vista and Win7 )
create folder 'vimfiles' and folder 'syntax' inside it.
2. copy icm.vim (\Program Files\Molsoft LLC\ICM-Pro\icm.vim) there
3. In you home folder create (or edit exiting one) file '_vimrc' and paste line below:
au BufRead,BufNewFile *.icm,_* set filetype=icm
gvim myscript.icm # enjoy syntax highlighting
Running Scripts from ICM molecular documents
To call a script from an html-document inside ICM, use the #icm/script/scriptName path in href :
script1 = "print 1"
set property command script1
MyDocWithLink2Script = "<html><a href=\"#icm/script/script1\">click to run script1</a></html>"
set property html MyDocWithLink2Script
Now the document will contain a highlighted link. Clicking on that link will run script1 .
an ICM-shell object containing an amino-acid or DNA sequence.
The ICM-shell is tuned to work with very large sets
sets of millions of genomic sequences at once.
One can read a sequence from a sequence file in different formats,
create it with the Sequence() function, make sequence
command, or by assignment (e.g., aseq = bseq [2:18], new sequence aseq is
a 2:18 fragment of sequence bseq).
A valid amino-acid sequence contains an uppercase string of one-characters amino-acid names.
Please distinguish this ICM-shell object from the "sequence" in the
ICM-sequence file
which contains detailed 3 (or 4)-character notations of residues from the
icm residue library.
One can concatenate two sequences ( seq1 // seq2 ) and extract a
part
of it ( seq[15:67] ). Sequence object may contain the secondary structure
string (e.g. EEE___HHH_) of the same length as the sequence. It is automatically
created by the
make sequence
command and the
Sequence( )
function or can be directly set with the
set sstructure
command. If logical
l_showSstructure
is set to yes, the secondary structure string will be shown
in
alignments.
Examples:
aseq=Sequence("ASSAARTYIP")
read sequences "aa.seq"
aseq[3:4]="WW"
read object "crn"
crn_seq = Sequence(a_/*)
Resetting sequence type
ICM is trying to guess sequence type. To set sequence type explicitly, use the
set type [protein|nucleotide] command. E.g.
a=Sequence("AAAATAAAA")
set type a protein # or if you change your mind
set type a nucleotide
Properties of a sequence can be projected to an alignment in which
the sequence participates with the
r property transfer via alignment {Rarray}( R_property,seq_,ali_,r_gapDefault ) function.
The opposite action, i.e. projecting from alignment to a particular sequence
can be achieved with another form of the Rarray function:
map aa property to sequence {Rarray}( R_ali ali_from seq_ | i_seqNumber )
Functions return sequence, operating on sequences or related to sequences:
- Align ( seq seq ) : returns alignment
- Area ( seq ) : returns standard accessible area of a linear chain of those residues
- Distance( seq seq .. ) and Score : derived from pairwise alignment
- IcmSequence ( seq [ s_nter ] [ s_cter ] ) : returns 3-letter sequence for the icm build command.
- Index( seq s_substr ) : position of a subsequence
- Length( seq )
- Mass( seq ) : array of amino acid masses.
- Mol ( seq ) -> ms_ : molecular selection of 3d molecules with identical sequence
- Namex( seq ) : sequence comment/description
- Pattern( seq disulfide ) : Cys pattern eg "ACAACFW" -> "C??C?"
- Reference( seq ) : returns the swissprot database reference
- Rarray( seq , R_26aa_prop ) → R : use Smooth( R window ) if needed.
- Sequence( rs ) : sequence of selected residues
- Sequence( seq reverse ) : sequence in the opposite direction
- Sequence( rs_ ) returns seq_
- Sequence( seq [reverse] ) returns seq_ of DNA reverse complement
- Sequence( ali_Mult ) seq_
- Sequence( ali_Mult i_seq ) = seq_
- Sequence( profile ) = seq_
- Sequence( i_len R26_resProb ) = seq_
- Sequence( I[n]_len R26_resProb ) = seqArr[n]_
- Sstructure ( seq ) : returns or predicts the secondary structure
- Shuffle ( seq ) : randomized sequence of the same length
- Temperature( dnaSeq [ r_DNA_C_nM (0.25) r_Salt_C_mM(50.)] ) ⇒ r_meltingT
- Tostring( seq )
- Tr123( seq | s ) : translate one letter code to three letter code
- Tr321( seq | s ) : the inverse to Tr123
- Trans( seq_nucl [all|frame] ) : T_with_seq_translated_protein_seq
- Trim( seq S_proteinTags ) : extract protein tags described in S_proteinTags sarray.
- Turn ( seq | s ) : R_n_predictedProbOfTurn
- Type ( seq 1) : sequence type
an element of the simplified representation of a protein topology in
terms of its secondary structure elements (
Abagyan and Maiorov, 1988).
One element (referred to as a segment) is a vector of the best axis of
the element. Loop segments are represented by a straight line between
the end of the previous segment and the beginning of the next one.
This representation can be used for a fold search through a library
of pre-calculated segment descriptions of the protein topologies (foldbank.seg).
See also
ribbonStyle.
user-friendly, high-level command interpreter combined with a collection of tools
allowing you to interact conveniently with the kernel of the ICM software.
The shell consists of commands separated by a carriage return or a semicolon, e.g.
read pdb "2ins"
a=1;b=2
[ Site table ]
ICM sequences and molecular objects may contain specific information about
local sequence features, such as location of binding sites, disulfide bonds etc.
These information is stored in the feature table (FT)
section of the Swissprot protein sequence entries or after the SITE
fields of
pdb files. The sites in the feature table may look like this:
FT ACT_SITE 15 15 ACTIVE SiTE HIS
FT TRANSMEM 309 332 PROBABLE
FT DOMAIN 333 362 CYTOPLASMIC TAIL.
FT DISULFID 125 188 BY SIMILARITY.
We use one letter code (the second column) to specify the site type.
The first column shows
the priority value which is used by the display site
command and the selection by site residue selection (e.g. a_/F ).
Priority | Char | SWISSPROT def. | Description
| 4 | A | ACT_SITE | Amino acid(s) involved in the Activity of an enzyme.
| 2 | B | BINDING | Binding site for any chem.group(co-enzyme,prosthetic group...)
| 5 | C | CA_BIND | Extent of a Calcium-binding region.
| 5 | D | DNA_BIND | Extent of a DNA-binding region.
| 4 | F | SITE | Any other Feature on the sequence (i.e. SITE records in PDB).
| 2 | G | CARBOHYD | Glycosylation site.
| 7 | I | INIT_MET | The sequence is known to start with an initiator methionine.
| 2 | L | LIPID | Covalent binding of a Lipidic moiety
| 2 | M | METAL | Binding site for a Metal ion.
| 5 | N | NP_BIND | Extent of a Nucleotide phosphate binding region.
| 6 | O | PROPEP | Extent of a prOpeptide.
| 6 | P | PEPTIDE | Extent of a released active Peptide.
| 5 | R | REPEAT | Extent of an internal sequence Repetition.
| 6 | S | SIGNAL | Extent of a Signal sequence (prepeptide).
| 5 | T | TRANSMEM | Extent of a Transmembrane region.
| 1 | V | VARIANT | Authors report that sequence Variants exist.
| 1 | X | CONFLICT | Different papers report differing sequences.
| 5 | Z | ZN_FING | Extent of a Zinc finger region.
| 6 | c | CHAIN | Extent of a polypeptide Chain in the mature protein.
| 5 | d | DOMAIN | Extent of a Domain of interest on the sequence.
| 3 | e | THIOLEST | ThiolEster bond.
| 1 | m | MUTAGEN | Site which has been experimentally altered.
| 2 | p | MOD_RES | Post-translational modification of a residue.
| 3 | s | DISULFID | DiSulfide bond.
| 3 | t | THIOETH | Thioether bond.
| 1 | v | VARSPLIC | Sequence Variants produced by alternative splicing.
| 6 | z | TRANSIT | Transit peptide(mitochondrial,chloroplastic,cyanelle,microbody)
| 5 | ~ | SIMILAR | Extent of a similarity with another protein sequence.
| 4 | - | NON_CONS | Non consecutive residues.
| 7 | + | NON_TER | The residue at an extremity of seq.is not the terminal res.
| 4 | ? | UNSURE | Uncertainties in the sequence
|
The sites can be
- read from a swissprot entry with the
read sequence swiss command
- set to a
sequence or a molecular object with the
set site [seq_from [ali_] {seq_|ms_} [only]
command , or a
copy site
command
- a new site can be set with the
set site s_siteString {seq_|ms_} [only]
command (e.g. set site a_1.1 "FT SITE 15 15 important residue") .
- and delete with the
delete site {seq_|ms_} i_siteNumber
command (e.g. delete site a_mol1 1) .
-
To show sequence sites use the
show sequence swiss command, and in objects:
show site {seq_|ms_}
command.
- Sites assigned to molecular
objects can be selected (and thereby visualized) with the
a_/ F SiteString selection
- Sites will be written to an object and restored upon reading under the
OBJECT.site or
OBJECT.auto
preference.
The ICM-shell variable
l_showSites toggles the appearance of
the site information in the
show sequence command.
The sites can be colored with the
color site rs_
command, e.g.
color site a_/FA red # features/sites from the active site
Example:
read pdb "1hla" # this object Ca atoms of 2 molecules
make bond chain a_//ca # link them into a chain
read sequence swiss web "1A02_HUMAN"
read sequence swiss web "B2MG_HUMAN"
set site a_1 1A02_HUMAN
set site a_2 B2MG_HUMAN
show site B2MG_HUMAN
ds wire a_
ds cpk magenta a_/FV # display variants
ds cpk yellow a_/Fs # display disulfides
The following functions work with the sites of sequences:
Table( seq site )
gives you a table with ll sites.
Index( seq site iSite )
#>T
#>-key---------fr----------to----------list--------desc-------
TRAMSMEM 10 20 "" "predicted tm"
Nof(site seq )
returns the number of sites associated with the sequence. The same number
can be returned by Nof(Table( seq , site ))
Index( site seq iSiteNum )
returns an iarray of the site limits, e.g. Index(site,a,1) returns {10,20}
Retrieving sequence site information |
Table( site seq ) → T_allSites
Example:
read pdb "1f88"
make sequence
Table( 1f88_a site)
#>T
#>-key---------fr----------to----------list--------desc-------
CARBOHYD 15 15 "" "glycosylation site "
CARBOHYD 2 2 "" "glycosylation site "
a solid graphical representation of the molecular surface, also referred to as the
Connolly surface.
It is a smooth envelope touching the van der Waals surface of atoms as the solvent probe of the
waterRadius size rolls over the molecule.
"Skin" is important for analysis of recognition, electrostatics, energetics, ligand binding and protein
cavities.
The surface is calculated with a new fast analytical contour-buildup
algorithm ( Totrov and Abagyan, 1996) and can be generated as a general
graphics object with the make grob skin
command.
'Skin' consists of three types of elements: convex spherical elements, concave
spherical elements, and torus-shaped elements.
ICM allows the calculation of the volume
confined by the 'skin' and its surface area. In a general case skin is defined
by two atom-selections:
- atoms the skin is calculated for
- atoms surrounding the atoms from the previous selection
One can calculate/display only a patch within a context of the rest (as_part a_*),
or skin around one molecule as the rest does not exist (as_part as_part):
read object "complex"
display a_//ca,c,n
pocket = a_1//!h* & Sphere(a_2//!h*)
display skin pocket a_1//!h* # 5A sphere around the second subunit
set plane 2 # or F2 : to avoid deletion of the previous patch
display skin a_2//!h* a_2//!h* green # ignore everything but the second molecule
Colored molecular surface can be saved as:
ICM can also generate smooth Gaussian surfaces with the following commands:
make map potential Box( a_ 3. ) # build Gaussian map
make grob m_atoms solid exact 0.5 # contour it
display g_atoms # display the envelope grob
Sybyl line notation, a string representation of molecular structure
similar to Smiles. The sln string is returned by the
String( as_ sln ) function.
a set of conformations of a particular object.
Two types of stacks are supported in ICM:
- stacks of conformations of an ICM object stored as sets of internal coordinates
(here there is a stack subtype with the information about atom masks added with the store conf atom command )
- stacks of conformations of a non-ICM (PDB) object stored as a set of cartesian coordinates.
The stack can be just a place to store (with the store conf command)
a number of complete descriptions of different conformations
regardless of the way they have been created.
The properties of stack conformations are either set by the search procedure or can
be manually set with the set stack energy|number|all|align Array commands.
The maximal number of stack conformations is determined by the mnconf parameter.
The stack conformations can be created manually in the course of
interactive procedure, or created automatically as a result of a montecarlo run.
The energies of stack conformations can be shown with the
show stack [all] command.
The stack can be saved
into a .cnf file, and you can also read stack.
Stack in Biased Probability Monte Carlo procedure represents best
energy representatives of different conformational families (see
Abagyan and Argos, 1992).
Measure of difference (or distance) is defined by the
compare command and vicinity parameter.
Stack can influence the search via the following variables:
mnvisits, mnhighEnergy, mnreject, visitsAction, highEnergyAction and rejectAction .
Stack stored in an object
The stack can be assigned to an object and saved/retrieved with the object with the
store stack object and load stack object commands, copied with copy os stack command and deleted with the delete stack object command.
Cartesian stacks
The sets of coordinates from multiple models can be also stored in a special stack with the
read pdb all stack s_multiModelPdbFile command.
The stack conformations can be pushed to a trajectory file with the store frame [ append ] command. Then the trajectory can be displayed in interpolated smooth fashion with the display trajectory command.
See also:
graphical representation of a covalent bond as a solid cylinder.
Its radius is defined by the
GRAPHICS.stickRadius
ICM-shell variable.
may exist in the ICM-shell as a named
variable
or a
constant
(e.g. "1crn", "A b\n c" ). There is a number of
predefined string variables
in the ICM-shell. You can concatenate strings ( "aaa" +"bbb" or
"aaa" //"bbb" -> "aaabbb"), sum a string and a number
("aaa"+4.5 -> "aaa4.5" ), compare them ( if ( s_pdbDir ==
"/data/pdb/", or if ( s1 > s2 ) ). Strings may be used in
arithmetic expressions,
commands and
functions.
Examples:
s = "1crn"
s1 = s1 + ".brk"
if (s != "2ins") print "wrong protein"
converting a string into an executable command-file To make an internally stored script
s = "print 'hello'"
set property command s
# or
set property command s auto # for autoexec status
svariable , or ICM-shell variable |
a named object stored in the program memory of one of the following types:
integer (i),
real (r),
string (s),
logical (l),
preference (p),
iarray (I),
rarray (R),
sarray (S),
matrix (M),
sequence (seq),
profile (prf).
alignments (ali),
maps (m),
graphics objects (grob) (g)
. They can be created by direct
assignment to a constant
(e.g.
a={1 4 3 8} , to a function (e.g. a=Iarray(4) ) or
read from a file (e.g. read iarray "a" )
Most of ICM-shell variables can also be
written to a file, and
shown.
They can take part in the arithmetic and logical
expressions.
For some of the variable types,
subsets
are defined (e.g. a[2:4]).
structure factor (factor) |
a named ICM-shell
table
containing information about reflections.
A structure factor table header may contain
maximal absolute values of h k and l.
#>I igd.HKL
31 36 37
It will be calculated on the fly if absent and
is important for Fourier transformation.
You may also have any number of additional members in the header section for
your convenience. For example, real values for the minimal and maximal
resolution, etc.
The "column" part of a table contains mandatory
integer arrays of h,k and l. Some of the other arrays with fixed names
may be necessary for specific operations. They are:
- fo :
real array of
observed amplitudes (used by the
"xr" term)
- fc :
real array of calculated amplitudes.
They are added and updated automatically by the
"xr" term calculations.
- ac and bc :
real array of
Real and Imaginary components of calculated structure factors.
ac and bc may be read from a file, calculated in the ICM-session, and/or
added and updated automatically by the
"xr" term calculations.
These two arrays are used as the input arrays for the
make map factor command.
- w :
real array of
weights of individual reflections which are used if defined in the
"xr" term calculations. Note, that multiplicity will be automatically
taken into account, do not multiply your weights by it to avoid double counting.
- free :
integer array of 0 and non-zeros to mark reflections for R-free calculations.
Reflections marked with non-zeros will not be used in the
"xr" term calculations. They will be used instead by the
Rfree( T_factor) function.
One can add any number of additional arrays to the factor-table.
Of course, the table can be
read,
written,
sorted,
shown,
etc. You may also use powerful table arithmetics and expressions to
generate new columns and specify subsets.
Examples:
# new columns
group table append F Sqrt(F.ac*F.ac+F.bc*F.bc) \
"fc" Atan2(F.bc,F.ac) "ph_calc"
F.ac = (2*F.fo-F.fc)*Cos(F.ph_calc)
F.bc = (2*F.fo-F.fc)*Sin(F.ph_calc)
make map factor F # 2Fo - Fc map is ready
F1= F.fc > 1. # another table of strong reflections
F2= F.h < 20 & F.k < 30 & F.l < 20 # another subset
See also:
How to manipulate with structure factors
The command word "factor" serves to read/write the XPLOR formatted
structure-factor-files.
Slide is a recorded state of the Graphics window and other GUI windows.
The slides are added to an table with a single array called slides . Each slide becomes an element
slideshow.slides[1]
slideshow.slides[2]
..
The slides just contain the display attributes and need the full objects compatible with the them
to be present in the ICM shell. The matching occurs by name and number of elements in the object.
Here are the main operations on slides and related parameters.
add slide arguments # add new slide
store slide # replace the current slide
display slide [ i_slideNumber ]
set slide name slideArray s_oldname s_newname
SLIDE.ignoreBackgroundColor - a user preference to ignore the slide background color or background image. May be useful if
you do not like the background of somebody else's slideshow.
SLIDE.ignoreFog - a user preference to skip enforcement of the fog setting (is useful for some graphics cards).
Changing object names in slides. This may be needed if a molecular object, a grob, a table, etc. changed their names in the shell.
set slide name oldname newname
E.g. set slide name "1crn" "1abc"
Replacing other properties from a script.The simplest way to replace properties of a slide is to run a for loop like this:
for i=1,Nof(slideshow.slides)
display slide i
color residue label black
endfor
Compressing the data and the file for fast network transfer:Do the following:
- compress your grobs (meshes/surfaces) with the compress grob command.
- delete unnecessary objects from the session with the delete all compress command.
- write binary # to gzip the .icb file
- compress the files externally with gzip and rename them from .icb.gz to .icb (ICM will recognize the need to uncompress on the fly).
See also: Image
in the ICM-shell means a solvent-accessible surface (center of water-sphere).
Important: Do not confuse this surface with the molecular or Connolly surface
which is referred to as
skin
.
(see also
Acc function,
Area function,
display skin,display surface,
show area surface,show area skin,
show volume surface
"sf" term
set color surface).
Important: There are two ways to calculate the surface area: via the
show area surface
or the
show energy "sf"
commands. In both cases individual atomic accessibilities are calculated
and assigned to individual atoms. These accessibilities can be shown with the
show as_
command, or can be accessed with the
Area( as_)
function. However, the two commands use different atomic radii:
- show area surface
- uses van der Waals radii as defined in the icm.vwt file
- calculates areas for all atoms including hydrogens
- show energy "sf"
- uses special radii designed for calculations of the solvation energy.
The radii are defined in the icm.hdt file ;
- employs a united atom model, in which hydrogens are ignored and radii increased
accordingly;
- calculates areas only for non-hydrogen atoms, ignores hydrogens.
Examples:
# dipeptide
build string "se nter ala his cooh"
# fill out individual accessibilities
# (incl. hydrogens)
show area surface # takes all atoms w. vdWaals radii into account
show a_//* # look at the accessibilities
show Area(a_//n*) # extract atomic accessibilities for all nitrogens
#
show energy "sf" # only heavy atom accessibilities used in energy calc.
show a_//* # look at these new accessibilities
show Area(a_//n*) # "energy" accessibilities for nitrogens
|