Dynamic Column Filter
This operator selects a subset of the columns from the data source. Only the selected columns remain in the output data set.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | TIBCO® Data Virtualization |
| Send output to other operators | Yes |
| Data processing tool | TIBCO® DV, Apache Spark 3.2 or later |
Algorithm
This operator is particularly helpful when there are many columns in a data source that are not needed for the data analysis workflow. You can also specify a regular expression for selecting a subset of columns by column names.
The Dynamic Column Filter operator is designed to be used in conjunction with Modeling operators. When the Use all available columns as Predictors parameter is set to Yes, the Dynamic Column Filter operator can decide which columns are used as predictors dynamically.
The parameters Columns to Include, Data Types to Include, and Column Names to Include (regex) work as a union for adding a column to the operator.
Input
An input is a single tabular data set.
Configuration
The following table provides the configuration details for the Dynamic Column Filter operator.
At least one of the parameters must be used from Columns to Include, Data Types to Include, and Column Names to Include (regex).
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Columns to Include |
The columns to be made available for analysis. Only static columns can be selected. Click Select Columns to select the columns. It is an optional parameter. Note:
You can also select more than one column. |
| Data Types to Include |
Specify the data type that you want to include. Click Select to select the data type. It is an optional parameter. The following values are available:
By default, all data types are selected. Note:
You can also select more than one data type. |
| Column Names to Include (regex) | Specify the regular expression or wildcards for selecting a subset of columns by name. It is an optional parameter. |
| Output Schema | Specify the schema for the output table or view. |
| Output Table | Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
| Store Results | When set to Yes, it enables the operator to save the results. When set to No, it disables the operator to save the results. |
Output
The data rows of the output table or view are displayed (up to 2000 rows of the data).
A tabular data set that can be used with other operators.
Example
The following example demonstrates the workflow of a Dynamic Column Filter operator.
Data
people: A data set with multiple rows and columns such as id, first_name, last_name, email, gender, ip_address, date_of_birth, height, and so on.
Parameter Setting
The parameter settings for the People data set are as follows:
-
Columns to Include: first_name, last_name
-
Data Types to Include: String
-
Column Names to Include (regex): name
Results
The following figure displays the results for people data sets with the name variable by using the Dynamic Column Filter operator.