Monitoring Data Quality Metrics

The ibi Data Quality WatchDog application provides users with meaningful insights on their Data Quality (DQ) metrics. Watchdog users can create new panels and dashboards with pre-configured metrics and KPIs, setup notification channels, and create alerts for monitoring metrics against user defined thresholds.

To learn more about creating reports, dashboards, and notification channels on Grafana, see the following website:

https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v7-5/

DQ Metrics Views

This section describes all of the DQ metric views that are accessible through the Watchdog app.

Profile Summary

View Name

watchdog_pstats_summ

Description

This view contains the summary stats and metrics generated by the Profiler for an input data set. This view will have one row for each data set profiled in ibi Data Quality.

Column

Description

id

Unique identifier for the profile summary record.

dataset_id

Unique identifier for the input data set.

created_by

User account that executed the data profile.

created_date

Date when the data profile report is generated.

source_name

User assigned source name for the data set.

source_type

User assigned source type for the data set.

app_name

User assigned app name for the data set.

industry

User assigned industry associated with the data set.

entity

User assigned entity associated with the data set.

count_variables

Total number of data attributes in the data set.

count_observations

Total number of observations in the data set (row times column).

count_rows

Total number of rows in the data set.

count_dup_rows

Total number of duplicate rows in the data set.

pct_dup_rows

Percentage of duplicate rows in the data set.

count_dedup_rows

Total number of rows that were removed during data dedup.

pct_dedup_rows

Percentage of of rows that were removed during data dedup.

count_missing

Total number of observations that are empty, blank or nulls.

pct_missing

Percentage of observations that are empty, blank or nulls.

profile_score

Profile score of the data set calculated using the profiling stats and user’s expectations of the data.

Profile Details

View Name

watchdog_pstats_dtl

Description

This view contains the detailed stats and metrics generated by the Profiler for an input data set. This view will have one row for each data attribute in the data set that is profiled in ibi Data Quality.

Column

Description

id

Unique identifier for the profile detail record.

pstats_summ_id

Unique identifier for the profile summary record.

dataset_id

Unique identifier for the input data set.

created_by

User account that executed the data profile.

created_date

Date when the data profile report is generated.

source_name

User assigned source name for the data set.

source_type

User assigned source type for the data set.

app_name

User assigned app name for the data set.

industry

User assigned industry associated with the data set.

entity

User assigned entity associated with the data set.

variable_name

Name of the data attribute.

type

Data type of the attribute.

content_type

Data class of the attribute.

sensitive_flag

True if the data attribute is classified a Sensitive data, otherwise False.

expect_nulls

User assigned value set to True if values in the data attribute can be NULL.

expect_unique

User assigned value set to True if values in the data attribute is expected to be unique.

business_impact

User assigned value set to HIGH, MEDIUM, or LOW indicating business impact of the data attribute.

profile_score

Profile score of the data attribute calculated using the profiling stats and user’s expectations of the data.

count_total

Total number of observations for the data attribute.

count_unique

Total number of unique observations for the data attribute.

count_non_unique

Total number of non-unique observations for the data attribute.

count_distinct

Total number of distinct observations for the data attribute.

count_blanks

Total number of blank values or empty strings in the data attribute.

count_nulls

Total number of NULL values in the data attribute.

count_not_nulls

Total number of non-NULL values in the data attribute.

length_min

Minimum character length for the values in the data attribute.

length_max

Maximum character length for the values in the data attribute.

length_avg

Average character length for the values in the data attribute.

length_median

Median character length for the values in the data attribute.

most_common_pattern

Most common pattern for the values in the data attribute.

count_pattern

Total count of the values with the most common pattern in the data attribute.

pct_pattern

Percentage of the values with the most common pattern in the data attribute.

most_common_mask

Most common mask for the values in the data attribute.

count_mask

Total count of the values with the most common mask in the data attribute.

pct_mask

Percentage of the values with the most common mask in the data attribute.

num_avg

Average value calculated from all the numeric values in the data attribute.

num_min

Minimum value calculated from all the numeric values in the data attribute.

num_max

Maximum value calculated from all the numeric values in the data attribute.

num_std_dev

Standard deviation value calculated from all the numeric values in the data attribute.

num_pctl_25

25th percentile value calculated from all the numeric values in the data attribute.

num_pctl_50

50th percentile value calculated from all the numeric values in the data attribute.

num_pctl_75

75th percentile value calculated from all the numeric values in the data attribute.

DQ Summary

View Name

watchdog_dqstats_summ

Description

This view contains the summary stats and metrics generated by executing Rules against an input data set. This view will have one row for each set of Rules analyzed against an input data set in ibi Data Quality.

Column

Description

dq_summ_id

Unique identifier for the data quality summary record.

input_name

Unique identifier for the input data set.

created_by

User account that executed the Rules against the input data set.

created_date

Date when the data quality report is generated.

source_name

User assigned source name for the data set.

source_type

User assigned source type for the data set.

app_name

User assigned app name for the data set.

industry

User assigned industry associated with the data set.

entity

User assigned entity associated with the data set.

count_processed

Total number of observations processed.

count_invalid

Total number of observations that failed validation and could not be fixed.

count_valid

Total number of observations that contain valid output values.

count_cleansed

Total number of observations that were fixed - standardized, normalized, or enriched, number of cleansed observations are also counted under number of valid observations.

count_missing

Total number of observations that are empty or missing.

pct_valid

Percentage of total observations that failed validation and could not be fixed.

pct_invalid

Percentage of total observations that contain valid output values.

pct_cleansed

Percentage of total observations that were fixed - standardized, normalized, or enriched, number of cleansed observations are also counted under number of valid observations.

pct_missing

Percentage of total observations that are empty or missing.

dq_score

Data quality score of the input data set calculated using the data quality stats and user’s expectations of the data.

DQ Details

View Name

watchdog_dqstats_dtl

Description

This view contains the detailed stats and metrics generated by executing Rules against an input data set. This view will have one row for each Rule executed against an input data set in ibi Data Quality.

Column

Description

dq_dtl_id

Unique identifier for the data quality analysis detail record.

dq_summ_id

Unique identifier for the data quality summary record.

variable_group_name

Name of the data attribute or group of data attributes analyzed by a Rule.

input_name

Name of the data set.

created_by

User account that executed the Rules against the input data set.

created_date

Date when the data quality report is generated.

source_name

User assigned source name for the data set.

source_type

User assigned source type for the data set.

app_name

User assigned app name for the data set.

industry

User assigned industry associated with the data set.

entity

User assigned entity associated with the data set.

rule_id

Unique identifier of the Rule.

count_processed

Total number of observations processed.

count_invalid

Total number of observations that failed validation and could not be fixed.

count_valid

Total number of observations that contain valid output values.

count_cleansed

Total number of observations that were fixed - standardized, normalized, or enriched, number of cleansed observations are also counted under number of valid observations.

count_missing

Total number of observations that are empty or missing.

dq_score

Data quality score of the data attribute or group of data attributes calculated using the data quality stats and user’s expectations of the data.

DQ Transaction Summary

View Name

watchdog_dqstats_summ

Description

This view contains the summary stats and metrics generated by executing Rules against an input data set. This view will have one row for each set of Rules analyzed against an input data set in ibi Data Quality.

Column

Description

dq_summ_id

Unique identifier for the data quality summary record.

input_name

Unique identifier for the input data set.

created_by

User account that executed the Rules against the input data set.

created_date

Date when the data quality report is generated.

source_name

User assigned source name for the data set.

source_type

User assigned source type for the data set.

app_name

User assigned app name for the data set.

industry

User assigned industry associated with the data set.

entity

User assigned entity associated with the data set.

count_processed

Total number of observations processed.

count_invalid

Total number of observations that failed validation and could not be fixed.

count_valid

Total number of observations that contain valid output values.

count_cleansed

Total number of observations that were fixed - standardized, normalized, or enriched, number of cleansed observations are also counted under number of valid observations.

count_missing

Total number of observations that are empty or missing.

pct_valid

Percentage of total observations that failed validation and could not be fixed.

pct_invalid

Percentage of total observations that contain valid output values.

pct_cleansed

Percentage of total observations that were fixed - standardized, normalized, or enriched, number of cleansed observations are also counted under number of valid observations.

pct_missing

Percentage of total observations that are empty or missing.

dq_score

Data quality score of the input data set calculated using the data quality stats and user’s expectations of the data.