Parse regular expression

Web Site  Methods

 
Regexp::Parse
 
void Parse(const STRCHAR *re, BOOL bCaseSensitive= TRUE)
Requires regular expression support to be enabled by defining STR_USE_REGEX 

This method is typically called just after a Regexp object has been constructed. It will parse the provided regular expression string, creating an internal representation ("program") which can later be used to perform searches by calling Match.

The passed regular expression consists of metacharacters and regular characters. The latter are matched directly; the former have a meaning according to the following table:

Metacharacter

Meaning

. Match any single character.
[ ] Defines a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").
^ If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c").

If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").

- In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").
? Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").
+ Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "666", and so on).
* Indicates that the preceding expression matches zero or more times.
??, +?, *? Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions which match as much as possible. Example: given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>".
( ) Grouping operator. Example: (\d+,)*\d+ matches a list of numbers separated by commas (such as "1" or "1,23,456").
{ } Indicates a match group. See class RegexpMatch for a more detailed explanation.
\ Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character). Also used for abbreviations (such as \a for any alphanumeric character; see table below).

If \ is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?</\0> matches "<head>Contents</head>".

$ At the end of a regular expression, this character matches the end of the input. Example: [0-9]$ matches a digit at the end of the input.
| Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").
! Negation operator: the expression following ! does not match the input. Example: a!b matches "a" not followed by "b".
\a Any alphanumeric character. Shortcut for ([a-zA-Z0-9])
\b White space (blank). Shortcut for ([ \t])
\c Any alphabetic character. Shortcut for ([a-zA-Z])
\d Any decimal digit. Shortcut for ([0-9])
\h Any hexadecimal digit. Shortcut for ([0-9a-fA-F])
\n Newline. Shortcut for (\r|(\r?\n))
\q A quoted string. Shortcut for (\"[^\"]*\")|(\'[^\']*\')
\w A simple word. Shortcut for ([a-zA-Z]+)
\z An unsigned integer.  Shortcut for ([0-9]+)

 

See also: Regular expression classes, Regular expressions overview