PrevICM Language Reference
SMILES and SMARTS
Next

Simplified Molecular Input Line Entry Specification which stems from traditional string notation of graphs and trees, e.g. the Newick notation. The acronym introduced by David Weininger to represent chemical valence model by a string (e.g. CC=O). It can also be used as an exchange format for chemical data. The algorithm was published in 1988 and is described in detail at the WWW site of Daylight Chemical Information Systems, Inc. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html . Another description can also be found here: http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification

The SMILES notation allows to represent a 2D chemical drawing as a string, (e.g. "C1CCCCC1" for cyclohexane ). The SMARTS notation is an extension of SMILES that allows to specify chemical patterns with wildcards for atoms or bonds, e.g. "[C,N,O]?" . SMARTSSMARTS is an extension of the SMILES notation to include wildcards. This chemical patterns can be used in chemical queries and is described here: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

The primitives supported in ICM include the following (note that the atom primitives in general are in brackets, e.g. [Cl] for a chlorine atom):
SymbolDescriptionExamples
* any atom *
a aromatic aN(=O)O
A aliphatic AAA
C aliphatic carbon
c aromatic carbon
[#6] any carbon
Dn the number of heavy neighbors [*;D2] any atom with two non-H connections
Hn number of attached hydrogens [*;H2] atom with two hydrogens
Yn number of at least attached hydrogens [*;Y2] atom with two or more hydrogens # ICM extension
Rn the number of rings the atom belongs to [#6;R2] any carbon in two rings
rn the size of smallest ring the atom belongs to [*;r6]
vn valence, sum of bond orders of all neighbors
Xn the number of all neighbors including hydrogens
- negative charge [--], [-2]
+n positive charge [++], [+2]
^n sp1,sp2,sp3 hybridization e.g. [C;^2] sp2 carbon # ICM extension
yn ring number in SSSR e.g. [*;y1] any atom which belongs to the first ring # ICM extension
#n atomic number [#7]
@ anticlockwise chirality C[C@H](F)O
@@ clockwise chirality
~ any bond C~C
: aromatic bond c:c
-,=,# single, double and triple bonds C#C
!primitive negation [!C] non-aliphatic carbon, [*;!R] any atom not in a ring
expr1&expr2 logical and (high precedence) [c,n&H1] any arom carbon OR H-pyrrole nitrogen
expr1,expr2 logical or [C,N,O] C or N or O
expr1;expr2 logical and (low precedence) [c,n;H1] arom carbon OR nitrogen with one hydrogen

Aromatic vs aliphatic

Note that uppercase atoms in SMARTS will only match aliphatic (not aromatic) atoms. For example "C" will not match any atom in "c1ccccc1" (or "C1=CC=CC=C1") ring. If you want to match both cases you should use [#] notation. For example "[#6]" will match both aliphatic and aromatic carbons.

Recursive SMARTS

This SMARTS feature allows you to define "atomic enviroment" when matching. "Enviroment" atoms will not be included into result match.

For example [C&!$(C=O)&!$(C#N)] will match any aliphatic carbon not double bonded to an oxygen and not triple bonded to a nitrogen.

Example:


build smiles "CN(CCN(CC(C=CC(=C1)C(=O)NC(C=CC(C)=C2NC(N=CC=C3C(C=CC=N4)=C4)=N3)=C2)=C1)C1)C1" name="gleevec"
display xstick
find chemical a_ "[O,S&v2,N&^2&X2,N&^1&X1,N&^3&X3]" all # find hydrogen bond acceptors
color xstick as_out rgb = { 152 251 152 }
find chemical a_ "[!#6;!H0]" all # find hydrogen bond donors
color xstick as_out rgb = { 238 130 238 }
find chemical a_ "a" all        # aromatic
color xstick as_out rgb = { 255 165 0 }
find chemical a_ "[C&!$(C=O)&!$(C#N),S&^3,#17,#15,#35,#53]" all # hydrophobic
color xstick as_out rgb = { 224 255 255 }

R-groups, attachment points, and chemical searches.

In reactions, Markush structures and building blocks two additional wildcards are used:
Group Example
R1,--R2,.. groups [R1]N(=O)O
[C*] attachment points
Do not confuse any atom, e.g. [*], and attachment point where the asterisk follows an atom symbol in square brackets, e.g. [C*]. Example in which an attachment point is added to a carbon attached to a ring:


Replace( Chemical( "C(=CC=CC1)C=1C" ), "Cc(:c):c" "[C*]c(c)c" exact ) 

See also:


Prev
Chemical
Home
Up
Next
Soap