Prev | ICM Language Reference SMILES and SMARTS | Next |
Simplified Molecular Input Line Entry Specification which stems from traditional string notation of graphs and trees, e.g. the Newick notation. The acronym introduced by David Weininger to represent chemical valence model by a string (e.g. CC=O). It can also be used as an exchange format for chemical data. The algorithm was published in 1988 and is described in detail at the WWW site of Daylight Chemical Information Systems, Inc. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html . Another description can also be found here: http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification
The SMILES notation allows to represent a 2D chemical drawing as a string, (e.g. "C1CCCCC1" for cyclohexane ). The SMARTS notation is an extension of SMILES that allows to specify chemical patterns with wildcards for atoms or bonds, e.g. "[C,N,O]?" . SMARTSSMARTS is an extension of the SMILES notation to include wildcards. This chemical patterns can be used in chemical queries and is described here: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
The primitives supported in ICM include the following (note that the atom primitives in general are in brackets, e.g. [Cl] for a chlorine atom):
Symbol | Description | Examples |
---|---|---|
* | any atom | * |
a | aromatic | aN(=O)O |
A | aliphatic | AAA |
C | aliphatic carbon | |
c | aromatic carbon | |
[#6] | any carbon | |
Dn | the number of heavy neighbors | [*;D2] any atom with two non-H connections |
Hn | number of attached hydrogens | [*;H2] atom with two hydrogens |
Yn | number of at least attached hydrogens | [*;Y2] atom with two or more hydrogens # ICM extension |
Rn | the number of rings the atom belongs to | [#6;R2] any carbon in two rings |
rn | the size of smallest ring the atom belongs to | [*;r6] |
vn | valence, sum of bond orders of all neighbors | |
Xn | the number of all neighbors including hydrogens | |
- | negative charge | [--], [-2] |
+n | positive charge | [++], [+2] |
^n | sp1,sp2,sp3 hybridization | e.g. [C;^2] sp2 carbon # ICM extension |
yn | ring number in SSSR | e.g. [*;y1] any atom which belongs to the first ring # ICM extension |
#n | atomic number | [#7] |
@ | anticlockwise chirality | C[C@H](F)O |
@@ | clockwise chirality | |
~ | any bond | C~C |
: | aromatic bond | c:c |
-,=,# | single, double and triple bonds | C#C |
!primitive | negation | [!C] non-aliphatic carbon, [*;!R] any atom not in a ring |
expr1&expr2 | logical and (high precedence) | [c,n&H1] any arom carbon OR H-pyrrole nitrogen |
expr1,expr2 | logical or | [C,N,O] C or N or O |
expr1;expr2 | logical and (low precedence) | [c,n;H1] arom carbon OR nitrogen with one hydrogen |
Aromatic vs aliphatic
Note that uppercase atoms in SMARTS will only match aliphatic (not aromatic) atoms.
For example "C" will not match any atom in "c1ccccc1" (or "C1=CC=CC=C1") ring.
If you want to match both cases you should use [#
Recursive SMARTS
This SMARTS feature allows you to define "atomic enviroment" when matching. "Enviroment" atoms
will not be included into result match.
For example [C&!$(C=O)&!$(C#N)] will match any aliphatic carbon not double bonded to an oxygen and
not triple bonded to a nitrogen.
Example:
R-groups, attachment points, and chemical searches.
In reactions, Markush structures and building blocks two additional wildcards
are used:
See also:
build smiles "CN(CCN(CC(C=CC(=C1)C(=O)NC(C=CC(C)=C2NC(N=CC=C3C(C=CC=N4)=C4)=N3)=C2)=C1)C1)C1" name="gleevec"
display xstick
find chemical a_ "[O,S&v2,N&^2&X2,N&^1&X1,N&^3&X3]" all # find hydrogen bond acceptors
color xstick as_out rgb = { 152 251 152 }
find chemical a_ "[!#6;!H0]" all # find hydrogen bond donors
color xstick as_out rgb = { 238 130 238 }
find chemical a_ "a" all # aromatic
color xstick as_out rgb = { 255 165 0 }
find chemical a_ "[C&!$(C=O)&!$(C#N),S&^3,#17,#15,#35,#53]" all # hydrophobic
color xstick as_out rgb = { 224 255 255 }
Do not confuse any atom, e.g. [*], and attachment point where the
asterisk follows an atom symbol in square brackets, e.g. [C*].
Example in which an attachment point is added to a carbon attached to a ring:
Group Example
R1,--R2,.. groups [R1]N(=O)O
[C*] attachment points
Replace( Chemical( "C(=CC=CC1)C=1C" ), "Cc(:c):c" "[C*]c(c)c" exact )
Prev
ChemicalHome
UpNext
Soap