Setting Up Data Quality Process
Data Quality is the process used to derive unique, standardized, and complete master data. Data Quality routines ensure that the data entered in a repository is "golden" so that data can be managed appropriately. If data quality is low, the repository contains two or more records for the same logical item, that is, it results in duplication of data.
To avoid duplicate records, you must bring the same logical record into a standardized form. Only after this data standardization, you can successfully check for duplicates. This process of standardizing the data is also known as data cleansing.
Even after data cleansing, it may sometimes be difficult to determine whether a record is really new or is actually a variation (that is, a version) of an existing record. It may require a mix of automated decisions (for most of the records) and some human intervention to decide whether a record is new or existing. This is often not a simple decision. For example, deciding whether two persons are the same when a reliable or unique ID is missing, is difficult. It, typically, depends on the nature of the data and, in particular, which attributes are needed for identification. In case a reliable or unique ID is supplied, deduplication is not required. However, this represents an ideal scenario, which rarely occurs in the real world.
- Out-of-the-box Matching: Performs matching using the and operator for all the specified attributes names. You can conduct ’similarity or fuzzy’ searches based on certain identifying attributes and detect a duplicate in the data and merge the data during a single or bulk record. For example, "FIRST_NAME" = "tom" And "LAST_NAME" ="Campbell". See Record Duplicate Detection Process.
- Custom Matching: Integrates the custom matcher. Provides a hook to get the Netrics query and search on the IndexEntites. By using this approach, you can write in the code in Java, compile, and merge it in ECM.ear. For example, "FIRST_NAME" = "tom" And "LAST_NAME" ="Campbell" "OR" "FIRST_NAME" = "tom" And "SSN_ID" ="987 XX XXXX". See Custom Netrics Query.