ICM Manual v.3.9
by Ruben Abagyan,Eugene Raush and Max Totrov
Copyright © 2020, Molsoft LLC
Nov 14 2024

Contents
 
Introduction
Reference Guide
 ICM options
 Editing
 Graph.Controls
 Alignment Editor
 Constants
 Subsets
 Molecules
 Selections
 Fingerprints
 Regexp
 Cgi programming with icm
 Xml drugbank example
 Tree cluster
 Arithmetics
 Flow control
 MolObjects
 Energy Terms
 Integers
 Reals
 Logicals
 Strings
 Preferences
 Tables
 Other
 Chemical
 Smiles
 Chemical Functions
 MolLogP
 MolLogS
 MolSynth
 Soap
 Gui programming
 Commands
 Functions
 Icm shell functions
 Macros
 Files
Command Line User's Guide
References
Glossary
 
Index
PrevICM Language Reference
SMILES and SMARTS
Next

Simplified Molecular Input Line Entry Specification which stems from traditional string notation of graphs and trees, e.g. the Newick notation. The acronym introduced by David Weininger to represent chemical valence model by a string (e.g. CC=O). It can also be used as an exchange format for chemical data. The algorithm was published in 1988 and is described in detail at the WWW site of Daylight Chemical Information Systems, Inc. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html . Another description can also be found here: http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification

The SMILES notation allows one to represent a 2D chemical drawing as a string, (e.g. "C1CCCCC1" for cyclohexane ). The SMARTS notation is an extension of SMILES that allows one to specify chemical patterns with wildcards for atoms or bonds, e.g. "[C,N,O]?" . SMARTSSMARTS is an extension of the SMILES notation to include wildcards. This chemical patterns can be used in chemical queries and is described here: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

The primitives supported in ICM include the following (note that the atom primitives in general are in brackets, e.g. [Cl] for a chlorine atom):
SymbolDescriptionExamples
* any atom *
a aromatic aN(=O)O
A aliphatic AAA
C aliphatic carbon
c aromatic carbon
[#6] any carbon
Dn the number of heavy neighbors [*;D2] any atom with two non-H connections
Hn number of attached hydrogens [*;H2] atom with two hydrogens (see also Y )
Rn the number of rings the atom belongs to [#6;R2] any carbon in two rings
rn the size of smallest ring the atom belongs to [*;r6]
vn valence, sum of bond orders of all neighbors
Xn the number of all neighbors including hydrogens
- negative charge [--], [-2]
+n positive charge [++], [+2]
^n sp1,sp2,sp3 hybridization e.g. [C;^2] sp2 carbon # ICM extension
yn ring number in SSSR e.g. [*;y1] any atom which belongs to the first ring # ICM extension
Yn number of at least attached hydrogens [*;Y2] atom with two or more hydrogens # ICM extension
** match attachment point [C**] # ICM extension
#n atomic number [#7]
@ anticlockwise chirality C[C@H](F)O
@@ clockwise chirality
~ any bond C~C
: aromatic bond c:c
-,=,# single, double and triple bonds C#C
=&!@ bond SMART notation for double, not in ring acC=&!@Cca
!primitive negation [!C] non-aliphatic carbon, [*;!R] any atom not in a ring
expr1&expr2 logical and (high precedence) [c,n&H1] any arom carbon OR H-pyrrole nitrogen
expr1,expr2 logical or [C,N,O] C or N or O
expr1;expr2 logical and (low precedence) [c,n;H1] arom carbon OR nitrogen with one hydrogen

Aromatic vs aliphatic

Note that uppercase atoms in SMARTS will only match aliphatic (not aromatic) atoms. For example "C" will not match any atom in "c1ccccc1" (or "C1=CC=CC=C1") ring. If you want to match both cases you should use [#] notation. For example "[#6]" will match both aliphatic and aromatic carbons.

Recursive SMARTS

This SMARTS feature allows you to define "atomic enviroment" when matching. "Enviroment" atoms will not be included into result match.

For example [C&!$(C=O)&!$(C#N)] will match any aliphatic carbon not double bonded to an oxygen and not triple bonded to a nitrogen.

Example:


build smiles "CN(CCN(CC(C=CC(=C1)C(=O)NC(C=CC(C)=C2NC(N=CC=C3C(C=CC=N4)=C4)=N3)=C2)=C1)C1)C1" name="gleevec"
display xstick
find chemical a_ "[O,S&v2,N&^2&X2,N&^1&X1,N&^3&X3]" all # find hydrogen bond acceptors
color xstick as_out rgb = { 152 251 152 }
find chemical a_ "[!#6;!H0]" all # find hydrogen bond donors
color xstick as_out rgb = { 238 130 238 }
find chemical a_ "a" all        # aromatic
color xstick as_out rgb = { 255 165 0 }
find chemical a_ "[C&!$(C=O)&!$(C#N),S&^3,#17,#15,#35,#53]" all # hydrophobic
color xstick as_out rgb = { 224 255 255 }

R-groups, attachment points, and chemical searches.

In reactions, Markush structures and building blocks two additional wildcards are used:
Group Example
R1,--R2,.. groups [R1]N(=O)O
[C*] attachment points
Do not confuse any atom, e.g. [*], and attachment point where the asterisk follows an atom symbol in square brackets, e.g. [C*]. Example in which an attachment point is added to a carbon attached to a ring:


Replace( Chemical( "C(=CC=CC1)C=1C" ), "Cc(:c):c" "[C*]c(c)c" exact ) 

See also:

  • Smiles function and the build smiles command.
  • Nof( chemarray s_smart ) # the number of found fragments
  • Index( chemarray s_smart )
  • Replace( chemarray s_smartFrom s_smartReplace [ exact ] )
  • modify chemarray s_smartFrom s_smartTo
  • find molcart table= s_tableName s_smart [ name= s_outputTable ]


Prev
Chemical
Home
Up
Next
Chemical Functions

Copyright© 1989-2024, Molsoft,LLC - All Rights Reserved. Copyright© 1989-2024, Molsoft,LLC - All Rights Reserved. This document contains proprietary and confidential information of Molsoft, LLC. The content of this document may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission from Molsoft, LLC.