Hadoop File

Specifies a file or files stored on your Hadoop data source, allowing the data to be incorporated into the workflow.

Hadoop

Information at a Glance

Parameter

Description
Category Load Data
Data source type HD
Send output to other operators Yes
Data processing tool MapReduce

The Hadoop data can subsequently be used in data mining algorithms, prediction algorithms, and statistical analyses.

TIBCO Data Science – Team Studio automatically handles files that are stored in compressed format with the gzip or deflate codec.

Input

None. Hadoop File is a source operator.

Configuration

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Data Source Name The Hadoop connection for access to the Hadoop file system where the file resides.
Hadoop File Name The path and name of the file. Click Choose File to display the Hadoop File Explorer dialog, and to browse the Hadoop file structure and select the file location.
Note: To process multiple files using wildcard characters, see Selecting Groups of HDFS files.
Hadoop File Format The source file format. If the file name extension is available, the file format automatically defaults to a setting based on that extension. You can override it manually.

The following file formats are available.

  • Avro
  • Parquet
  • Text file

Hadoop File Structure Click Hadoop File Structure to display the Configure Columns dialog. The file type determines the dialog display and configuration options. The following file types are supported.

The Hadoop file structure settings specify the delimiters, columns, and data types in the Hadoop file.

For more information, see Configure Columns dialog.

Output

Visual Output
A preview of the data output.
Data Output
A Hadoop file.