Dynamic Column Filter

This operator selects a subset of the columns from the data source. Only the selected columns remain in the output data set.

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter	Description
Category	Transform
Data source type	TIBCO® Data Virtualization
Send output to other operators	Yes
Data processing tool	TIBCO® DV, Apache Spark 3.2 or later

Algorithm

This operator is particularly helpful when there are many columns in a data source that are not needed for the data analysis workflow. You can also specify a regular expression for selecting a subset of columns by column names.

The Dynamic Column Filter operator is designed to be used in conjunction with Modeling operators. When the Use all available columns as Predictors parameter is set to Yes, the Dynamic Column Filter operator can decide which columns are used as predictors dynamically.

Note:

The parameters Columns to Include, Data Types to Include, and Column Names to Include (regex) work as a union for adding a column to the operator.

Input

An input is a single tabular data set.

Configuration

The following table provides the configuration details for the Dynamic Column Filter operator.

Note:

At least one of the parameters must be used from Columns to Include, Data Types to Include, and Column Names to Include (regex).

Parameter	Description
Notes	Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Columns to Include	The columns to be made available for analysis. Only static columns can be selected. Click Select Columns to select the columns. It is an optional parameter. Note: You can also select more than one column.
Data Types to Include	Specify the data type that you want to include. Click Select to select the data type. It is an optional parameter. The following values are available: String Int Boolean Long Double DateTime By default, all data types are selected. Note: You can also select more than one data type.
Column Names to Include (regex)	Specify the regular expression or wildcards for selecting a subset of columns by name. It is an optional parameter.
Output Schema	Specify the schema for the output table or view.
Output Table	Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results	When set to Yes, it enables the operator to save the results. When set to No, it disables the operator to save the results.

Output

Visual Output

The data rows of the output table or view are displayed (up to 2000 rows of the data).

Output to successive operators

A tabular data set that can be used with other operators.

Example

The following example demonstrates the workflow of a Dynamic Column Filter operator.

Dynamic Column Filter Workflow

Data

people: A data set with multiple rows and columns such as id, first_name, last_name, email, gender, ip_address, date_of_birth, height, and so on.

Parameter Setting

The parameter settings for the People data set are as follows:

Columns to Include: first_name, last_name
Data Types to Include: String
Column Names to Include (regex): name

Results

The following figure displays the results for people data sets with the name variable by using the Dynamic Column Filter operator.

Dynamic Column Filter operator results

Did you find this helpful?

Yes No