PrevICM Language Reference
Regular expressions (regexp)
Next

[ Regexp syntax ]

Functions supporting regular expressions:

ICM regular expression syntax


[ Simple expressions | Shortcuts | Regexp back references | Greedy matching ]

Simple expressions


Inline modifiers of regular expressions:

Shortcuts


Repetitions and back-references

( a and b are simple regular expressions, e.g. a DNA base [ACTG], or ([hp]anky.*) ):

A problem with the posix repetitions


Imagine that you want to match text between two tags, e.g. <i>one</i> in a text which has two items of the same kind ( <i>one</i> and <i>two</i> ). Unfortunately, we can not just use <i>.*</i> to match <i>one</i> since the POSIX standard tries to match the MAXIMAL LENGTH expression between the italic tags (shown in bold are the flanking expressions: <i>one</i> and <i>two</i>).
A straight-forward solution of this problem is to make a more complex definition of the word between the tags, by saying that the italized word should not contain the '<' symbol.

ICM followed Perl in using the question mark (?) after the repetition symbol to enforce the minimal match. The minimal match expressions will look like this (a is a simple regular expression, like a character or a string in parentheses ):

Therefore:


Prev
Res.ranges
Home
Up
Next
Xml drugbank example