14.3.1.  Regular Expression Syntax

[ fromfile: regexp.xml id: regexsyntax ]

Table 14.3.  Examples of Regular Expressions

Pattern Meaning
hello Matches the literal string, hello
c*at Quantifier: zero or more occurrences of c, followed by at: at, cat, ccat, etc.
c?at Matches zero or 1 occurrences of c, followed by at: at or cat only.
c.t Matches c followed by any character, followed by t: cat, cot, c3t, c%t, etc.
c.*t Matches c followed by 0 or more characters, followed by t: ct, caaatttt, carsdf$#S8ft, etc.
ca+t + means 1 or more of the preceding "thing", so this matches cat, caat, caaaat, etc., but not ct.
c\.\*t Backslashes precede special characters to "escape them" so this matches only the string c.*t
c\\\.t Matches only the string, c\.t
c[0-9a-c]+z Between the 'c' and the 'z' one or more of the chars in the set [0-9a-c] – matches strings like c312abbaz and "caa211bac2z"
the (cat|dog) ate the (fish|mouse) (Alternation) the cat ate the fish or the dog ate the mouse or the dog ate the fish, or the cat ate the mouse
\w+ A sequence of one or more alphanumerics (word chars), same as [a-zA-Z0-9]+
\W A character which is not part of a word (punctuation, whitespace, etc)
\s{5} Exactly 5 whitespace chars (tabs, spaces, or newlines)
^\s+ Matches one or more white space at the beginning of the string.
\s+$ Matches one or more white space at the end of the string.
^Help Matches Help if it occurs at the beginning of the string.
[^Help] Matches any single char except one of the letters in the word Help, anywhere in the string. (a different meaning for the metacharacter ^)
\S{1,5} At least 1, at most 5 non-whitespace (printable characters)
\d A digit [0-9] (and \D is a non-digit, i.e., [^0-9] )
\d{3}-\d{4} 7-digit phone numbers: 555-1234
\bm[A-Z]\w+ \b means word boundary: matches mBuffer but not StreamBuffer

[Note] Backslashes and C++ Strings

Backslashes are used for escaping special characters in C++ strings as well, so this means that regular expression strings inside C++ strings must be "double-backslashed" – i.e. every \ becomes \\, and to match the backslash character itself you need four: \\\\.

[Note] C++ 0x Users

If your compiler supports C++0x, you may want to use raw quoted strings for regular expressions, to avoid the need to double-escape backslashes.

R"(The String Data \ Stuff " )"
R"delimiter(The String Data \ Stuff " )delimiter"

Figure 14.3.  Regular Expression Tester

Regular Expression Tester