Discovery Guide > Introducing Discovery > About Pattern Expressions
 
About Pattern Expressions
For the pattern expressions entered for Match Expression and Transformation, the expression must be a valid regular expression as defined in:
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Consider the example below:
The Match Expression in the example would match a phone number in the format:
(333) 123-1234
Parentheses surround groupings that you want to reuse in the Transformation expression. A backslash marks any parenthesis that is part of the string to match. In the transformation expression, the group of three digits end up in the output of the transformed string by placing \1 in the transformation expression, which means that the first group goes there; similarly for the other groupings. The resulting, transformed phone number is 3331231234. This canonical form of a phone number is applied to all phone numbers so that they can be compared.
The table below shows how you might define telephone number patterns
Phone Number Storage Pattern
Match Expression
Transformation
Canonical Form Used by Discovery
(333) 123-1234
\((\d\d\d\)\s*(\d\d\d)-(\d\d\d\d)
\1\2\3
3331231234
333.123.1234
(\d\d\d)\.(\d\d\d)\.(\d\d\d\d)
\1\2\3
3331231234
333 123-1234
(\d\d\d)\s(\d\d\d)-(\d\d\d\d)
\1\2\3
3331231234
333-123-1234
(\d\d\d)-(\d\d\d)-(\d\d\d\d)
\1\2\3
3331231234
The formats of all phone numbers in your databases can be thus be normalized in an intermediate step during indexing and comparison, so that cells that contain the same phone number will match.