Row Cleanser

This operator removes the records according to the specified row completeness criteria.

Row Cleanser operator icon

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Transform
Data source type TIBCO® Data Virtualization
Send output to other operators Yes
Data processing tool TIBCO® DV, Apache Spark 3.2 or later

Algorithm

This operator applies a set of rules to remove incomplete rows. The user selects the columns to focus on and then a filtering condition is set. According to this condition, rows are selectively removed.

The number of null values in selected columns per each row is calculated. The input rules are applied so that the remaining rows have the desired limit of null columns.

Input

An input is a single tabular data set.

Configuration

The following table provides the configuration details for the Row Cleanser operator.

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Columns to Use Specify the columns for checking the null values. Click Select Columns to select the required column.
How many selected columns should be null before removing rows Specify the filtering limits to be calculated. The following values are available:

  • A number of columns
  • A percentage of columns
  • All
  • Any

Default: All

Percentage(%) / Number of Columns

Specify the percentage or number of columns to calculate. If the previous parameter is set as A percentage of columns, specify the desired percentage. If set as A number of columns, specify the desired number. If the previous parameter is set as All or Any, this parameter is ignored.

Default: 80

Output Schema Specify the schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output
A table that displays the output of a data set after removing the incomplete rows.
Output to Successive operator
A single tabular data set with selected rows.

Example

The following example displays the cleansed data for the given data set using the Row Cleanser operator.

Row Cleanser operator workflow
Data
golf: This data set contains the following information:
  • Multiple columns namely outlook, temperature, wind, humidity, and play.
  • Multiple rows (14 rows).
Parameter Setting
The parameter settings for the golf data set are as follows:
  • Columns to Use: outlook, temperature, humidity

  • How many selected columns should be null before removing rows: A percentage of columns

  • Percentage(%) / Number of Columns: 80

  • Store Results: Yes

Output
The following figure displays the output for the parameter settings for the golf data set.
Row Cleanser Output