Managing Services

In ibi Data Quality, a service represents a workflow that implements the logic to verify and cleanse data. Services can either be generic or domain specific.

Built-in Services

ibi Data Quality provides a set of built-in services that can be exposed as Rules. These services are registered in the DQCore workspace and cannot be edited or replaced by end users. However, service developers can define and register new services in the Custom workspace.

For more information on the default services that are included in the product, see Packaged DQ Services.

DQ Service Requirements

To add a new service in ibi Data Quality, the new service must comply with the following ibi Data Quality service specifications.

Request Method

This requirement applies to external DQ Services. The Service must support HTTP POST.

Authentication Method

This requirement applies to external DQ Services. The service may require basic authentication (user ID and password), but no other security mechanism is supported.

Input Data

  1. The service should support input data as a single block of CSV text i.e. a string of CSV data with each row separated by a newline (\n) character.
  2. The input data must not include a column header.
  3. Input values must be provided in the order specified in the service definition provided to ibi Data Quality. For more information, see Managing Services.
  4. When creating the service definition, input column names should be prefaced with in_.

    For example, an email cleanse service should have an input variable called in_email.

Output Data

  1. Output column names that represent cleansed input values should be prefaced by out_.

    Example: An email cleanse service should have an output variable called out_email.

  2. In addition to the desired output columns, each output row should have these two additional columns
    1. Tag value: A short string (abbreviated) that represents a summary of the data defects or other meaningful facts about the results of the analysis.
    2. Tag category : a string that summarizes the final outcome of the data analysis. Note - only one of these four supported values are allowed for tag_category.

      • Empty.the input data is empty string or only contains white spaces

      • VALID.Indicates the input data is valid and has the same value as the output data.

      • CLEANSED.Indicates that the input value is cleansed and a standardized, augmented, or enriched output value was generated.

      • INVALID. Indicates the input value is invalid, either the defects cannot be fixed or the value do not represent the corresponding class of data.

      Warning - If a service generates an output tag_category value that does not belong to the above listed set of values then the service will fail to execute.

    3. All input values should be included in the output and records should be emitted in the same order in which they were provided in the input data set.

    4. The output should be a CSV block with the first line being a column header.

Parameters

Parameters represent name-value pairs that are passed in the Service request and are immutable values in the scope of a request executed by the Service. Rule authors use these parameters to create different Rules for a given service by setting different parameter values when creating Rules in the Rules Editor.

Example - The “cleanse_email” service specifies a parameter called “default_email” which allows a Rule author to set up a default email to replace invalid email addresses in the output.

Note- If you are developing a Python script to deploy into iDQ’s custom services, you need to support parameters by defining key-value pairs as arguments to the cleanse function.

DQ Service Definition

In order to register a new Service in iDQ, you need to create a definition for the service in JSON format (an example is included below).

Attributes Description
name Name of the service, has to be unique in the “custom” workspace
description A brief description of the service.
location Service location URL.
sendFullDataSet Optional. Set this to “true” if the service expects the entire data set as input as opposed to one row at a time.
supportsJson Optional. Set this to “true” if the service can interpret the request body in JSON format and generate a response in JSON format.
batchSize If your DQ Service generates exactly one output record for each input record then set this value to null. If your DQ Service generates more than one output record for each input record then set this value= 1. When batchSize is 1, users should expect performance degradation because iDQ will make separate calls to the DQ Service for each row of the input data set. If a DQ Service generates multiple output rows for a single input row the output data set will have multiple rows with the same dq_recid. Note - Users cannot use the Merge & Export feature when analyzing data sets with Rules that implement DQ Services with batchSize set to 1.
input Columns An array of paired values that have name - name of the input column description - brief description of the input column
outputColumns

An array of paired values that have

name - name of the output column.

description - brief description of the output column

parameters (Optional) An array of values that have name - name of the parameter description - brief description of the parameter
credentials (Optional) A pair of values that have: user - user name for the service credentials password - password for the service credentials

The following is an example JSON document for a new service called cleanse_vin:

{
"name": "cleanse_vin",
"description": "Cleanse vehicle identification number",
"location": "/idq/custom/custom_cleanse_svcs/idq_cleanse_vin",
"supportsJson": false,
"sendFullDataSet": false,
"batchSize": null,
"inputColumns": [
{
"name": "in_vin",
"description": "value to be cleansed or verified"
}
],
"outputColumns": [
{
"name": "out_vin",
"description": "input value when tag_category is VALID, cleansed value
 when tag_category is CLEANSED, default value when tag_category is MISSING or INVALID"
},
{
"name": "tag_value",
"description": "Tag value that provides explanation for malformed, unexpected or missing data"
},
{
"name": "tag_category",
"description": "Tag category that categorizes tags as Missing Data, Cleansed Data or Invalid Data"
}
],
"tags": [
{
"name": "EMPTY_OR_WHITESPACES",
"description": "Input value is an empty string or contains all whitespaces"
},
{
"name": "INVALID_VIN",
"description": "Value does not represent a valid email address"
},
{
"name": "VALID_VIN",
"description": "Input value is Valid"
},
{
"name": "NORMALIZED_VIN",
"description": "Input value was reformatted or cleansed"
}
],
"parameters": [
{
"name": "default_vin",
"description": "Default value when tag_category is MISSING or INVALID"
}
]
}

Adding custom DQ services

There are two different ways to add your custom DQ Service in ibi Data Quality.

  1. External DQ Services

    1. Develop and deploy your custom DQ logic as an external application with REST endpoints.

    2. Register the new DQ Service.

  2. Hosted DQ Services

    1. Develop and deploy your custom DQ logic as a Python script in iDQ’s custom services container.

    2. Register the new DQ Service.

Regardless of where you decide to host the new DQ Service, the service has to comply with the minimum requirements as mentioned in DQ Service Requirements.

Process Overview

Process to add:

Process to rollback

Creating Python DQ script

This section covers the detailed steps for a developer to directly deploy custom Python scripts in iDQ with minimal effort.

Important considerations

  1. The only Python runtime environment supported in this release is 3.11.3.

  2. All deployed scripts will run in the same Python environment, so developers will have to be extremely careful deploying scripts because conflicting dependencies may result in unpredictable behavior.

  3. You can deploy more than one cleanse function in a single script provided each function name has idq_ as prefix.

    For example, if you submit a script with the name “cleanse_svcs” that contains two different functions idq_cleanse_vin and idq_cleanse_ssn, two different endpoints will be available at the end of the deployment process

    idq/custom/cleanse_svcs/idq_cleanse_vin

    idq/custom/cleanse_svcs/idq_cleanse_ssn

  4. You must test your python script before deploying in iDQ

High level steps:

  1. Import required packages.

  2. Create a cleanse function with “idq_” as the name prefix. Note - the system only recognizes functions that start with the name “idq_”, all other functions will be ignored.

  3. The function should have two arguments

    1. csvstr - represents a string of CSV data with each row separated by a newline (\n) character

    2. params - represents a dictionary of key-value pairs that are passed in the request and are immutable values in the scope of a runtime execution of the script.

    The following example will walk you through the steps of creating a python script that verifies Vehicle Identification Numbers as input values.

Steps:

  1. Import all your required packages.

  2. Define a function to cleanse VIN numbers, the name of the function should always start with idq_ and should always have two arguments.

  3. Read input values and parameters.

  4. Define output columns.

  5. Replace input NULL values with empty strings and count the number of rows in the input data.

  6. For each row, execute a sub function that verifies input value and returns the verified, cleansed and enriched output.

  7. Create a Pandas DataFrame from the output list, convert the dataframe to CSV format and return the results.

In the sub function, each input row is analyzed and based on the analysis results, a tag_value and a tag_category is set.

Deploying Python DQ script

You will need an API client like Postman to deploy custom Python DQ scripts.

The following API services will enable you to deploy, test and register your new DQ Service. Before using these services, you will need to authenticate and authorize your user account. For more information, refer to the section Authorize.

Get Catalog

Description

Use this endpoint to retrieve a catalog of existing custom DQ Services that have already been deployed in the environment.

Endpoint

https://{{host}}:{{port}}//api/v1/custom/catalog

HTTP Method

GET

Request Body

NA

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
responseType java.util.ArrayList
response A list of RESTful service endpoints, one of each DQ function with the prefix idq_
exception NA

Add Python Script

Description

Use this endpoint to deploy your custom Python script.

Endpoint

https://{{host}}:{{port}}//api/v1/custom/module/{name of your python script}

Note - This name for your Python script can only contain alphabets, numbers and underscores. Since this name is going to become a part of the service endpoint URL, provide a meaningful and short name that is unique and does not contain spaces and special characters.

HTTP Method

POST

Request Body

Paste the contents of your Python script

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
response Type java.lang.String
response The name of the python script you deployed
exception Null

Verify Functions

Depending on the number of functions and number of required packages imported in the script, the deployment process might take from a few seconds to several mins to complete.

In order to verify that your service has deployed correctly, you can run the Get Catalog request as mentioned above.

For example, we posted the request to create the cleanse VIN python script to this end point

https://{{host}}:{{port}}/api/v1/custom/module/cleanse_svcs

After successful deployment, if you call the Get Catalog you should see your functions listed in the response message.

Note - the response contains the endpoint that you will need to register the DQ Service in iDQ.

{
"status": "OK",
"code": 200,
"message": null,
"developerMessage": null,
"responseType": "java.util.ArrayList",
"response": [
"/idq/custom/cleanse_svcs/idq_cleanse_vin"
],
"exception": null
}

Test Function

Description

Use this endpoint to test the functions defined in your custom Python script.

Note - If you have defined more than one idq_ function in your script, you will have to test each function individually.

Endpoint

https://{{host}}:{{port}}/api/v1/service/test

HTTP Method

POST

Parameters

input

A single row of input data.

Request Body

JSON message that provides the location of the function you intend to test.

Example:

{ "location": "/idq/custom/custom_cleanse_svcs/idq_cleanse_vin"}

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
response Type com.tibco.tdq.common.model.butler.ServiceValidationResponse
response

url : End point of the service that was tested

payload : Input data submitted as request for this test

response : Comma separated sets of output values with each row separated by a newline (\n) character

exception Null

Register DQ Service

The last step of the deployment process is to register the function as a DQ Service so Rule authors can use this service to create new Rules.

Note - If you have defined more than one idq_ function in your script, you will have to define and register each corresponding DQ Service individually.

Endpoint

https://{{host}}:9803/api/v1/service

HTTP Method

POST

Parameters

NA

Request Body

JSON message that describes the DQ Service. Refer to the section on Service Definition.

Example

JSON message that describes the DQ Service. Refer to the section on Service Definition.

Example:

{
"name": "cleanse_vin",
"description": "Cleanse vehicle identification number",
"location": "/idq/custom/custom_cleanse_svcs/idq_cleanse_vin",
"supportsJson": false,
"sendFullDataSet": false,
"batchSize": null,
"inputColumns": [
{
"name": "in_vin",
"description": "value to be cleansed or verified"
}
],
"outputColumns": [
{
"name": "out_vin",
"description": "input value when tag_category is VALID, cleansed value 
when tag_category is CLEANSED, default value when tag_category is MISSING 
or INVALID"
},
{
"name": "tag_value",
"description": "Tag value that provides explanation for malformed, unexpected or missing data"
},
{
"name": "tag_category",
"description": "Tag category that categorizes tags as Missing Data, Cleansed Data
 or Invalid Data"
}
],
"tags": [
{
"name": "EMPTY_OR_WHITESPACES",
"description": "Input value is an empty string or contains all whitespaces"
},
{
"name": "INVALID_VIN",
"description": "Value does not represent a valid email address"
},
{
"name": "VALID_VIN",
"description": "Input value is Valid"
},
{
"name": "NORMALIZED_VIN",
"description": "Input value was reformatted or cleansed"
}
],
"parameters": [
{
"name": "default_vin",
"description": "Default value when tag_category is MISSING or INVALID"
}
]
}

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
response Type com.tibco.tdq.common.model.rules.Service
response

id : Unique identifier assigned to the service

workspace : custom

version : 1.0

Rest of the attributes as defined in the service definition

exception Null

Verify DQ Service

After registering a DQ Service, use this endpoint to retrieve the service info and verify that your newly registered DQ Service is now available for use in iDQ.

Note -If you have defined more than one idq_ function in your script and registered them in iDQ, you will have to verify each DQ Service individually.

Endpoint

https://{{host}}:9803/api/v1/service

HTTP Method

GET

Parameters

name

name of the DQ Service you have registered

workspace

custom ( by default, all new DQ Services are deployed to custom workspace)

Request Body

NA

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
response Type com.tibco.tdq.common.model.rules.Service
response

Description of the DQ Service in JSON format

exception Null

Upon successfully registering your new DQ Service, you can login to the iDQ user interface, go to the Service Registry page and find your newly registered DQ Service ready for use.

Delete DQ Service

Description

Use this endpoint to delete a previously registered DQ Service from iDQ.

Note - You can only remove DQ Services that were registered in the “custom” workspace. You cannot delete prepackaged DQ Services that are shipped with the product.

Endpoint

https://{{host}}:{{port}}/api/v1/service/{id of the service}

You can find the id of the service from the response of the Verify DQ Service call.

HTTP Method

DELETE

Parameters

NA

Request Body

NA

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
response Type org.springframework.http.ResponseEntity
response

body : 1

statusCode : OK

statusCodeValue : 200

exception Null

Delete Python Script

Description

Use this endpoint to delete your Python script from the environment.

Note - if your script contains more than one idq_ function, all functions will be removed from the system.

Also, before removing a script make sure you remove the corresponding DQ Services by deleting them from the Service Registry.

Endpoint

https://{{host}}:{{port}}/api/v1/custom/module/{name of the python script}

HTTP Method

DELETE

Parameters

NA

Request Body

NA

Response

Status OK when the request is successful.
Code 200 when the request is successful.
Message NA
developerMessage NA
response Type org.springframework.http.ResponseEntity
response

body : cleanse_svcs (name of the script that is deleted)

statusCode : OK

statusCodeValue : 200

exception Null