Hadoop File
Specifies a file or files stored on your Hadoop data source, allowing the data to be incorporated into the workflow.
Information at a Glance
Parameter |
Description |
---|---|
Category | Load Data |
Data source type | HD |
Send output to other operators | Yes |
Data processing tool | MapReduce |
The Hadoop data can subsequently be used in data mining algorithms, prediction algorithms, and statistical analyses.
TIBCO Data Science – Team Studio automatically handles files that are stored in compressed format with the gzip or deflate codec.
Input
None. Hadoop File is a source operator.
Configuration
Parameter | Description |
---|---|
Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
Data Source Name | The Hadoop connection for access to the Hadoop file system where the file resides. |
Hadoop File Name | The path and name of the file. Click
Choose File to display the
Hadoop File Explorer dialog, and to browse the Hadoop file structure and select the file location.
Note: To process multiple files using wildcard characters, see
Selecting Groups of HDFS files.
|
Hadoop File Format | The source file format. If the file name extension is available, the file format automatically defaults to a setting based on that extension. You can override it manually.
The following file formats are available.
|
Hadoop File Structure | Click
Hadoop File Structure to display the
Configure Columns dialog. The file type determines the dialog display and configuration options. The following file types are supported.
The Hadoop file structure settings specify the delimiters, columns, and data types in the Hadoop file. For more information, see Configure Columns dialog. |
Output