About Pattern Expressions

For the pattern expressions entered for Match Expression and Transformation, the expression must be a valid regular expression as defined in:

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Consider the example below:

The Match Expression in the example would match a phone number in the format:

(333) 123-1234

Parentheses surround groupings that you want to reuse in the Transformation expression. A backslash marks any parenthesis that is part of the string to match. In the transformation expression, the group of three digits end up in the output of the transformed string by placing \1 in the transformation expression, which means that the first group goes there; similarly for the other groupings. The resulting, transformed phone number is 3331231234. This canonical form of a phone number is applied to all phone numbers so that they can be compared.

The table below shows how you might define telephone number patterns

Phone Number Storage Pattern

Match Expression

Transformation

Canonical Form Used by Discovery

(333) 123-1234

\((\d\d\d\)\s*(\d\d\d)-(\d\d\d\d)

\1\2\3

3331231234

333.123.1234

(\d\d\d)\.(\d\d\d)\.(\d\d\d\d)

\1\2\3

3331231234

333 123-1234

(\d\d\d)\s(\d\d\d)-(\d\d\d\d)

\1\2\3

3331231234

333-123-1234

(\d\d\d)-(\d\d\d)-(\d\d\d\d)

\1\2\3

3331231234

The formats of all phone numbers in your databases can be thus be normalized in an intermediate step during indexing and comparison, so that cells that contain the same phone number will match.