Discovery Guide > Using and Configuring Discovery > Working with Data Domains in Discovery > Using Data Domains
 
Using Data Domains
Discovery lets you index and discover related data that is stored in various patterns. A data domain is defined as a type of data such as phone number, social security number, or date. In each of these examples, the data might be stored in a variety of patterns. For example, if you have phone number data that is stored in different data sources in different patterns such as 800-123-4567 and (800) 123-4567, you can define a phone number domain that lists all the possible patterns in which a phone number might be stored. This enables Discovery to find and match all columns that contain related phone numbers even when their patterns are different.
You can define one or more data domains and a set of patterns for each. If a data domains is enabled during indexing and relationship discovery, Discovery finds and matches data that conforms to the patterns in the domain.
In a model, if data domains were enabled during relationship discovery, the columns that were discovered as a result of domain analysis are indicated.
To use domain analysis for relationship discovery, you must:
Enable domain analysis. See Enabling Data Domains, for information about the configuration parameters that control domain analysis.
Define data domains and their patterns. See Defining Data Domains and Patterns.
Editing Domains and Patterns.
Deleting Domains and Patterns.
Saving and Loading Domain Definitions.
Enabling Data Domains
To use data domains during indexing and discovery, you must first enable data domains.
Note: If you have created any indexes or done relationship discovery, you must first delete all indexes prior to running indexing and relationship discovery with domain analysis enabled.
To enable data domains
1. From the Studio Administration menu, choose Configuration.
2. Under Discovery, expand the Indexing branch.
3. Set the Use Data Domains for Discovery value to True.
4. Click Apply and then OK to apply your configuration changes.
See Configuring Data Domains, for information about domain analysis configuration parameters.
Defining Data Domains and Patterns
For each type of data that might be stored in a variety of patterns, you must define a domain and then define one or more patterns that describe the various ways it might be represented.
To define a data domain
1. In Studio, click on the Discovery tab at the left side of the window.
2. Click the Domains tab.
Discovery displays the Domains tab, where you can define the data domains and their patterns.
3. Optionally, click Load from File if you have created a domain definitions file. See Saving and Loading Domain Definitions.
4. Under Data Domains, click New.
Note: The configuration parameter Use Data Domains for Discovery must be set to True to define data domains. If not, Discovery reminds you to enable data domains.
Discovery displays the New Data Domain dialog in which you can define the data domain.
5. Under Data Domain, enter a name and description for this data domain.
6. Under First Pattern, you must define the first pattern for the new domain:
Name—Enter a pattern name or number.
Match Expression—Enter a regular expression that is expected to match the string as it is found in the database.
Transformation—Enter the replacement string as it would be declared if doing a typical programmatic find/replace operation with regular expressions, where the Match Expression is the “find” expression and the Transformation expression is the “replace” expression.
See About Pattern Expressions, for more information about how Discovery interprets and uses pattern expressions.
This example illustrates a new domain and its first pattern.
7. Under First Pattern, check Enabled to make Discovery search for data that fits this pattern.
8. Click OK.
Discovery displays the new data domain on the Domains tab.
9. Optionally, click New under Patterns to define another pattern for this domain and repeat Steps 6 through 8.
You can define as many patterns as needed. You can also disable or enable individual patterns by checking or unchecking the adjacent Enabled checkboxes.
Editing Domains and Patterns
You can edit any existing domain or pattern. See Defining Data Domains and Patterns, for more information.
To edit a data domain or pattern
1. In Studio, click on the Discovery tab at the left side of the window.
2. Click the Domains tab.
3. Select the domain or pattern you want to edit.
4. Click the Edit button under Data Domains or Patterns.
5. Edit the domain or pattern and click OK to save your changes.
Deleting Domains and Patterns
You can delete any existing domain or pattern.
To delete a data domain or pattern
1. In Studio, click on the Discovery tab at the left side of the window.
2. Click the Domains tab.
3. Select the domain or pattern you want to delete.
4. Click the Delete button under Data Domains or Patterns.
5. Confirm the deletion.
Saving and Loading Domain Definitions
You can save domain definitions that you created in Discovery to a comma-separated values (.csv) file. You can also create a .csv file containing domain definitions and load them into Discovery. The format of the domain definitions file is:
The domain name and domain description of every pattern are repeated on every row for every pattern in the domain. Each pattern gets one row in the table. Domains do not have separate rows: they are implied by the existence of patterns with domain names that match each other. Every row in the table should be unique if you consider both the domain name and the pattern name to be the two keys that define uniqueness.