Regular Expression

Regular expressions are a powerful tool that developers use to manipulate text and data. You can find regular expressions in most scripting languages, editors, programming environments or specialized tools. The concept is rather simple. An expression is any sequence of identifiers, keywords, and/or operators that evaluate to some value. A regular expression is a formula for matching strings that follow a pattern. For example, typing in "*.html" in Windows search will return a list of all files that end in ".html". The "*.html" is considered a regular expression since it uses characters and symbols to define a pattern of text.

Regular expressions can be useful if you're trying to replace a substring in a piece of text or to validate user entered data. By combining a sequence of characters and symbols in a regular expression you can find matching patterns in string values and take an action with them. For example, regular expressions are very helpful to do a search and replace in a string. This tip, for experienced JavaScript programmers, will show you how to start using Regular Expressions in your JavaScript code.

To learn more about how to use JavaScript in Notes Domino 6 check out TLCC's JavaScript in Notes Domino 6 course.
Click here for more information on TLCC's JavaScript in Notes Domino 6 course.

To learn more about developing web applications using Domino check out TLCC's Developing Domino 6 Web Applications course.

Click here for more information on TLCC's Developing Domino 6 Web Applications course.







Regular Expression Basics

Regular expressions are made up of normal characters and metacharacters. Normal characters are upper and lowercase alpha-numeric characters.

A simple regular expression to match the string "and" would be:

/and/

This would match the first occurrence of "and" in the following sentence (highlighted in red:)

Brandy and Alex went to the beach.

Metacharacters are symbols that have special meaning. These metacharacters are the power behind regular expressions. The use of metacharacters can be used to modify the search. The following expression will just match the word "and" when it appears between word boundaries for the sentence above.

/\band\b/

The following table provides a listing and description of metacharacters.
Metacharacter
Description
\
For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally. For example, /b/ matches the character 'b'. By placing a backslash in front of b, that is by using /\b/, the character becomes special to mean match a word boundary.
-or-
For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally. For example, * is a special character that means 0 or more occurrences of the preceding character should be matched; for example, /a*/ means match 0 or more a's. To match * literally, precede the it with a backslash; for example, /a\*/ matches 'a*'.
^
Matches beginning of input or line. For example, /^A/ does not match the 'A' in "an A," but does match it in "An A."
$
Matches end of input or line. For example, /t$/ does not match the 't' in "eater", but does match it in "eat"
*
Matches the preceding character 0 or more times. For example, /bo*/ matches 'boooo' in "A ghost booooed", but nothing in "A goat grunted".
+
Matches the preceding character 1 or more times. Equivalent to {1,}. For example, /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy."
?
Matches the preceding character 0 or 1 time. For example, /e?le?/ matches the 'el' in "angel" and the 'le' in "angle."
.
(The decimal point) matches any single character except the newline character. For example, /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
(x)
Matches 'x' and remembers the match. For example, /(foo)/ matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the resulting array's elements [1], ..., [n], or from the predefined RegExp object's properties $1, ..., $9.
x|y
Matches either 'x' or 'y'. For example, /green|red/ matches 'green' in "green apple" and 'red' in "red apple."
{n}
Where n is a positive integer. Matches exactly n occurrences of the preceding character. For example, /a{2}/ doesn't match the 'a' in "candy," but it matches all of the a's in "caandy," and the first two a's in "caaandy."
{n, }
Where n is a positive integer. Matches at least n occurrences of the preceding character. For example, /a{2,} doesn't match the 'a' in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy."
{n. m}
Where n and m are positive integers. Matches at least n and at most m occurrences of the preceding character. For example, /a{1,3}/ matches nothing in "cndy", the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy" Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more a's in it.
[xyz]
A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen. For example, [abcd] is the same as [a-d]. They match the 'b' in "brisket" and the 'c' in "ache".
[^xyz]
A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen. For example, [^abc] is the same as [^a-c]. They initially match 'r' in "brisket" and 'h' in "chop."
[\b]
Matches a backspace. Not to be confused with \b.
\b
Matches a word boundary, such as a space. Not to be confused with [\b]. For example, /\bn\w/ matches the 'no' in "noonday"; /\wy\b/ matches the 'ly' in "possibly yesterday."
\B
Matches a non-word boundary. For example, /\w\Bn/ matches 'on' in "noonday", and /y\B\w/ matches 'ye' in "possibly yesterday."
\cX
Where X is a control character. Matches a control character in a string. For example, /\cM/ matches control-M in a string.
\d
Matches a digit character. Equivalent to [0-9]. For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."
\D
Matches any non-digit character. Equivalent to [^0-9]. For example, /\D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."
\f
Matches a form feed
\n
Matches a line feed.
\r
Matches a carriage return.
\s
Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v]. For example, /\s\w*/ matches ' bar' in "foo bar."
\S
Matches a single character other than white space. Equivalent to [^ \f\n\r\t\v]. For example, /\S/\w* matches 'foo' in "foo bar."
\t
Matches a tab.
\v
Matches a vertical tab.
\w
Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_]. For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."
\W
Matches any non-word character. Equivalent to [^A-Za-z0-9_]. For example, /\W/ or /[^$A-Za-z0-9_]/ matches '%' in "50%."
\n
Where n is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, /apple(,)\sorange\1/ matches 'apple, orange', in "apple, orange, cherry, peach." A more complete example follows this table.

Note: If the number of left parentheses is less than the number specified in \n, the \n is taken as an octal escape as described in the next row.
\octal

\xhex

Where \octal is an octal escape value or \xhex is a hexadecimal escape value. Allows you to embed ASCII codes into regular expressions.
The above table was recreated from the Netscape JavaScript Core 1.4 Language reference at:


JavaScript programmers typically use the indexOf( ), lastIndexOf( ), and substring( ) String object methods to replace substrings within a string value. These only test for exact matches and you can not test for variations of the exact string provided in the indexOf( ) method. The regular expressions in combination with the special symbols above can be used to search for sting variations and pattern matching.

Working with regular expressions is a two part process:







Creating a Regular Expression

The basic syntax for regular expressions is to embed the pattern of characters and symbols within forward slashes.

A simple regular expression would be as follows:
var myRegExp = /hello/;

This regular expression pattern matches the first instance of the string "hello" wherever the series of letters appears in a larger string. So if you search the following string using the regular expression above:
"Othello said hello"

The match will be the word "hello" in "Othello". That might not be what you really wanted, so you could use the \b word boundary attribute in the regular expression to isolate the word "hello":
var myRegExp = /\bhello\b/;

A search using this regular expression will only have one match in the string, the word "hello".

There are two useful flags to use when working with regular expressions:

For example, if you have the string,
"Othello said Hello"

using the regular expression:
var myRegExp = /hello/gi

The result will match both the "hello" in "Othello" and the word "Hello".

Note
Using Parentheses in Regular Expressions

Parentheses play a dual role in regular expressions. Any set of matching open and close parentheses, ( ) will manage the order of precedence for operators. They also store into memory the results of a found match for the portion of the regular expression enclosed in the parentheses. It is possible to nest parentheses. The storage of the results is automatic and the results are stored in an array that you can access in your scripts.






Using Regular Expressions

The implementation of regular expressions in JavaScript was modeled after the regular expression object model in the Perl programming language. There are two ways to work with regular expressions in JavaScript:

The first is where you use String object methods and regular expressions to match, search and replace text in a String object. The other is a way to create objects out of regular expressions and apply them to strings. This tip will cover how to use the String methods with regular expressions.



String Object Methods and Regular Expressions

The following table lists the three String object methods that use regular expressions.
MethodSampleReturned ValueDescription
s.match(re)myRegExp = /hello/gi;
myString.match(myRegExp)
"hello" - if myString is "Hello World!"

"hello, hello" - if myString was "Hello World, Othello!"
Used to match a regular expression against a string. Returns an array of all the matched values.
s.replace(re, s)myRegExp = /hello/gi;
newVal = "Hi";
myString.replace(myRegExp, newVal)
"Hi World!" - if myString is "Hello World!".Used to find a match between a regular expression and a string, and to replace the matched substring with a new substring.
s.search(re)myRegExp = /hello/gi;
myString.search(myRegExp)
0 - if the myString is "Hello World"

-1 - if the myString is "My dog has fleas."
Executes the search for a match between a regular expression and a specified string. If successful, search returns the index of the regular expression inside the string (the first character is at position 0). Otherwise, it returns -1. Use search when you want to know whether a pattern is found in a string (similar to the regular expression test method).



Demonstration
Demonstration: Regular Expression Objects

This demonstration form demonstrates the properties and methods of the regular expression objects.

Use the button below to open the demonstration. Follow the demonstration instructions continued on the form.