HDFS Import Text Node Example

This example demonstrates how to import data from a file in Hadoop distributed file system using HDFS Import Text node.

The process consists of three main steps:

Step 1.

    1. Add HDFS Server in Statistica Enterprise Manager.
    1. Define the URL of the server and click Test connection to verify that the URL points to HDFS server.

      NOTE: A fully qualified domain name should be used. Depending on the security settings of HDFS, a user name might be required.

Step 2.

    1. Add HDFS Import Text node to the workspace and configure its parameters.
    1. Next, open the Specification tab and click on the File button.
    1. On the dialog, browse the HDFS folder structure and select the file you want to import.  

Step 3.

Modify format of the data to be imported. The Specifications tab of the HDFS Import Text dialog box will display a preview of the data.

Select the check box under Import Options if you want to take variable names from the first row .

Modify the parameters of columns by clicking on a specific column or by selecting multiple columns using the Shift key.

When you run the node, it produces a downstream document with data, as shown below.