Hadoop File

Specifies a file or files stored on your Hadoop data source, allowing the data to be incorporated into the workflow.

Hadoop

Information at a Glance

Category Load Data
Data source type HD
Sends output to other operators Yes
Data processing tool MapReduce

The Hadoop data can subsequently be used in data mining algorithms, prediction algorithms, and statistical analyses.

Team Studio automatically handles files that are stored in compressed format with the gzip or deflate codec.

Input

None. Hadoop File is a source operator.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Data Source Name The Hadoop connection for access to the Hadoop file system where the file resides.
Hadoop File Name The path and name of the file. Click Choose File to display the Hadoop File Explorer dialog box, and to browse the Hadoop file structure and select the file location.
Note: To process multiple files using wildcard characters, see Selecting Groups of HDFS files.
Hadoop File Format The source file format. If the file name extension is available, the file format automatically defaults to a setting based on that extension. You can override it manually.

The following file formats are available.

  • Avro
  • Parquet
  • Text file
Hadoop File Structure Click Hadoop File Structure to display the Configure Columns dialog box. The file type determines the dialog box display and configuration options. The following file types are supported.

The Hadoop file structure settings specify the delimiters, columns, and data types in the Hadoop file.

For more information, see Configure Columns Dialog Box.

Output

Visual Output
A preview of the data output.
Data Output
A Hadoop file.