High Level Design

ibi Data Qualityhas several application components that provide a rich set of services to the consumers of data quality services, as well as the producers of those services.

Producers

Producers are users who play the role of citizen developer or advanced developer. These users will interact with the application to create new Services, define new Data Classes, or author new Rules. For more information on Data Classes, Rules, and Services, see Concepts.

The ibi Data Quality has an open Service authoring architecture (i.e., developers have several different options to create new Services):

Option 1. Use built-in Services to author new Rules.
Option 2. Use ibi Omni-Gen Data Quality Server (DQS) as an authoring tool to create new or custom cleanse plans.
Option 3. Create and register a RESTful Service using any other tool or language, such as Python, Java, etc.

Consumers

Consumers are users who play the role of citizen user, data engineer, data analyst, data scientist, data steward, etc. These users will interact with the application to discover data attributes and other technical characteristics, discover attribute relationships, detect outliers and anomalies, embed data quality analysis in data pipelines and data integration jobs, evaluate and review data quality scores and monitor and govern their data assets on different data quality dimensions.

Consumer-Focused Components

Valet API

This is the main pillar of ibi Data Quality. It powers all the other components of the solution. It also provides API services for programmatic interaction with ibi Data Quality.

Valet UI

This is a no-code user interface that enables citizen users to upload data, run profiling analysis, associate Rules with the data to execute data quality analysis, and download the results.

Rules Catalog

This is a discovery-focused user interface that enables data analysts to search for Rules applicable to their data.

Watchdog

This is a reporting user interface that allows data stewards to track, monitor, and report data quality metrics.

Producer-Focused Components

Rules Editor

This is a no-code user interface that allows data experts and data owners to author and publish new Rules from existing Services.

Data Class Editor

This is a no-code user interface that allows data experts and data owners to define new data classes using character masks/patterns, regular expressions and lookups.

Service Registry

This is a set of API services that allows developers to test new Services and register those Services in their ibi Data Quality instance.

DQ Studio

This is a Service authoring tool that allows developers to build new Services.

Internal Components

Knowledge Hub

This is the central repository of definitions and related artifacts for Data Classes, Services, and Rules.

Profiler

This is a scalable engine that performs comprehensive technical analysis of data.

Butler

This is an orchestration engine that interprets incoming requests and routes those requests to Service Providers.

Service Providers (Core, Address, DQS)

This is a set of Service Providers that accept requests from the Butler, execute data quality analyses, and respond with output results.

Analytics Repository

This is a relational data store used for persisting profiling and data quality stats, metrics, and scores.

Results Repository

This is a persistent file storage system used for storing profiling and data analysis results (input data, deduplicated data, cleansed output, detailed results, and summarized reports).

Containers

The following table lists and describes the containers that comprise ibi Data Quality.

Container Name		Description
address-server		Performs address cleansing using Loqate.
butler		Orchestrates DQ analysis by interpreting Rules and routing requests to underlying services (python or DQS).
dqs		(Optional) Runs DQS workflows if the customer opts to use DQS to author and run data quality workflows.
dsml-classifier		Provides DSML models to classify data.
grafana		Hosts the Watchdog app that provides reports and dashboards on DQ metrics.
ksource		Provides API to manage data classes, rules and services.
nginx		Provides main ingress for ibi Data Quality.
postgres		Provides relational data store for shared application configuration and operational data.
profiler		Performs technical analysis on input data to generate data profile.
python-services		Performs data quality analyses on input data to generate data quality results.
redis		Provides a simple queue service for profiler.
valet-services		Provides back-end services for the Valet UI and APIs for programmatic interactions with ibi Data Quality.
valet-ui		Provide user interface for citizen users to interact with ibi Data Quality via web browser.

Security Features

All external interactions with ibi Data Quality use HTTPS. For more information on steps to change the default certificate for the Kubernetes ingress or the nginx proxy used by the Docker Compose, see Deploying ibi Data Quality.

The ibi Data Quality requires that valet-ui, valet-services, and watchdog be exposed to the external network, along with endpoints in the WSO2 Identity Server that are required for OAuth2 and OpenID Connect. For convenience when running in trusted environments, Postgres and the WSO2 Identity Server console are exposed by default. In production or untrusted environments, you can disable access to these services by following the steps described in Deploying ibi Data Quality.

Authentication and Authorization inibi Data Qualityuse the OAuth2 and OpenID Connect protocols, with signed JSON web tokens. Any access to the application requires authentication, and specific operations require roles or permissions as described in Managing Users.

The valet-services API provides an authentication service that allows an API user to acquire a bearer token. You must supply this token in the HTTP Authorization header for all requests made through the API. The bearer token expires after one hour. For more information, see Authorize.

Did you find this helpful?

Yes No