|   
	ICM Manual v.3.9
	by Ruben Abagyan,Eugene Raush and Max Totrov Copyright © 2025, Molsoft LLC
 Oct 13 2025
 
 | 
 [ Regexp syntax ] 
Functions supporting regular expressions:
 
See regexp syntax . 
 | ICM regular expression syntax | 
 
 [ Simple expressions | Shortcuts | Regexp back references | Greedy matching ] 
 
 
 
 . any character except new line ( to match anything, say (.|\n)
or use (?n) in the beginning of the expression )
 ^ the beginning of the line
 $ the end of the line
 [abc] any character from the list
 [^abc] any character NOT in the list
 [a-z] a range, e.g. [0-9] or [0-9A-Z]
 \c backslash suppresses special meaning of a character
 \\ backslash itself
 (string) enclose a simple expression in parentheses to write 
repetitions, back-references, or field=number expressions in
the Split, Match and Replace functions.
 
Inline modifiers of regular expressions:
 
 (?i) ignore case until the end of the same enclosing group, e. g. 'aBc' ~ '(?i)abc', 'a((?i)bc)d'  matches 'aBCd','abcd','aBcd', but not 'Abcd' or 'abcD'  (?-i) match case-sensitive until the end of the same enclosing group, e. g. 'a(?i)bc(?-i)d' matches 'aBCd', but not 'Abcd' or 'abcD',  (?n) begin matching newline character with dot '.': "1bc\nd2" ~ '(?n)1.*2'  
 
 
 
 
 \d   matches a digit ( '[0-9]' ).
  '\d+' matches one or more digits.
 \D   matches a NON-digit. '\D+' matches space between numbers
 \w  matches a character in a word ( [a-zA-Z_] ). '\w+' matches a word
 \W   matches a NON-word character. '\W+' matches the interword space
 \s  matches a whitespace character, or a separator ( [ \r\t\n\f] )
 \S  matches a non-separator symbol 
 \b  matches a word boundary, i. e. a boundary between \w and \W symbols, for example,
'\bedgeh\b' matches inside 'the edge' and does not match inside 'the hedge'
 
 | Repetitions and back-references ( a and b are simple regular expressions, e.g. a DNA base [ACTG], or ([hp]anky.*) ): | 
 
 
 
a? -     nothing or a single occurrence of a
a* -     nothing or any number of repetitions of a 
a+ -     matches a at least once or more
a{n,m} - matches a from n to m times 
a|b -    matches a or b
ab -         matches a and b
(a)\1 -  \1 is a back-reference: matches a, then matches exactly the same string.
Back-references can go from \1 to \9.
 
 | A problem with the posix repetitions | 
 
 
Imagine that you want to match text between 
two tags, e.g.  <i>one</i>   in a text which has
two items of the same kind (  <i>one</i> and <i>two</i> ). 
Unfortunately, we can not just use <i>.*</i> to match  <i>one</i> 
since the POSIX standard tries to match the MAXIMAL LENGTH expression
between the italic tags (shown in bold are the flanking expressions: 
<i>one</i> and <i>two</i>).A straight-forward solution of this problem is to make a more complex   
definition of the word between the tags, by saying that the 'italized' word should not contain the '<' symbol.
 
ICM followed Perl in using the question mark (?) after the repetition symbol to enforce the minimal match. 
The minimal match expressions will look like this (a is a simple regular expression, like a character or a string in parentheses ):
 
Therefore:a?? -  nothing or a single occurrence of minimal
occurrence of a
a*? -  nothing or any number of repetitions of minimal
occurrence of a 
     (e.g. Match(s,'tag(.*?)endtag':n))
a+? -  matches a at least once or more
 
  '<i>.*</i>' - matches the entire 'one</i> and <i>two'
  '<i>[^<]*</i>' - explicitly prohibits the tag inside. matches only the first word
  '<i>.*?</i>' - the '*?' expression enforces the smallest match 
 
 
 
 
 |