Row Filter (HD)
Sets the criteria for filtering data set rows. Only the rows that meet the criteria remain in the output data set.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | HD |
| Send output to other operators | Yes |
| Data processing tool | Pig |
Note: The Row Filter (HD) operator is for Hadoop data only. For database data, use the
Row Filter (DB) operator.
You can specify row filters in the following modes.
- Simple mode: Use a simple template to define the filter, choosing a column, an inequality (for example, ">" or "between"), and a value (for example, a literal value or a column expression).
- Script mode: Enter almost any set of filters by using a Pig script.
Input
A data set from the preceding operator.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Filter | The filters for the operator. See Define Filter dialog for more information. |
| Use Row Limit? | Specify whether to use a row limit to limit filtering to specified rows. Default: false. |
| Row Limit Amount | If Use Row Limit? is set to true, set this field to the number of rows to which to limit the filtering. |
| Store Results? | Specifies whether to store the results.
|
| Results Location | The HDFS directory where the results of the operator are stored. This is the main directory, the sub-directory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer dialog and browse to the storage location. Do not edit the text directly. |
| Results Name | The name of the file in which to store the results. |
| Overwrite | Specifies whether to delete existing data at that path and file name.
|
| Storage Format | Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are Avro, CSV, TSV, or Parquet. |
| Compression | Select the type of compression for the output.
Available Parquet compression options.
Available Avro compression options.
|
| Use Spark | If Yes (the default), uses Spark to optimize calculation time. |
| Advanced Spark Settings Automatic Optimization |
|
Output
Visual Output
The data rows of the output table or view displayed (up to 200 rows of the data).
Data Output
Either a newly created table or a new file.