Provides a method for converting a Hadoop CSV file into either Avro or Parquet format.
Information at a Glance
Category
Tools
Data source type
HD
Sends output to other operators
Yes
Data processing tool
n/a
Input
One CSV data set from the preceding operator.
Configuration
Parameter
Description
Notes
Any notes or helpful information about this operator's parameter settings. When you enter content in the
Notes field, a yellow asterisk is displayed on the operator.
Storage Format
Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are
Avro,
CSV,
TSV, or
Parquet.
Compression
Select the type of compression for the output.
Available Parquet compression options.
GZIP
Deflate
Snappy
no compression
Available Avro compression options.
Deflate
Snappy
no compression
Results Location
The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in
Results Name. Click
Choose File to open the
Hadoop File Explorer Dialog Box and browse to the storage location. Do not edit the text directly.
Results Name
The name of the file in which to store the results.
Overwrite
Specifies whether to delete existing data at that path and file name.
Yes - if the path exists, delete that file and save the results.