HDFS Connection

The HDFS Connection shared resource contains all necessary parameters to connect to HDFS. It can be used by the HDFS Operation, ListFileStatus, Read, Write activities, and the HCatalog Connection shared resource.

General

In the General panel, you can specify the package that stores the HDFS Connection shared resource and the shared resource name.

The following table lists the fields in the General panel of the HDFS Connection shared resource:

Field Module Property? Description
Package No The name of the package where the shared resource is located.
Name No The name as the label for the shared resource in the process.
Description No A short description for the shared resource.

HDFSConnection

In the HDFSConnection Configuration panel, you can provide necessary information to connect the plug-in with HDFS. You can also connect to a Kerberized HDFS server.

The following table lists the fields in the HDFSConnection panel of the HDFS Connection shared resource:

Field Module Property? Description
HDFS Url Yes The WebHDFS URL that is used to connect to HDFS. The default value is http://localhost:50070.

The plug-in supports HttpFS and HttpFS with SSL. You can enter an HttpFS URL with HTTP or HTTPS in this field. For example:

http://httpfshostname:14000

https://httpfshostname:14000

Note: To set up high availability for your cluster, enter two comma-separated URLs in this field. Make sure that there are no spaces in between the comma and the second URL. The plug-in designates the first entry to be the primary node and the second entry to be the secondary node.
User Name Yes The unique user name that is used to connect to HDFS.
Enable Kerberos No If you want to connect to a Kerberized WebHCat server, you can select this check box.
Note: If your server uses the AES-256 encryption, you must install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files on your machine. For more details, see Installing JCE Policy Files.
Kerberos Method No The Kerberos authentication method that is used to authorize access to HDFS. Select an authentication method from the list:
  • Keytab: specify a keytab file to authorize access to HDFS.
  • Cached: use the cached credentials to authorize access to HDFS.
  • Password: enter the name of the Kerberos principal and a password for the Kerberos principal.

This field is displayed only when you select the Enable Kerberos check box.

Kerberos Principal Yes The Kerberos principal name that is used to connect to HDFS.

This field is displayed only when you select the Enable Kerberos check box.

Kerberos Password Yes The password for the Kerberos principal.

This field is displayed only when you select the Password from the Kerberos Method list.

Kerberos Keytab Yes The keytab that is used to connect to HDFS.

This field is displayed only when you select Keytab from the Kerberos Method list.

Test Connection

You can click Test Connection to test whether the specified configuration fields result in a valid connection.

Setting up High Availability

You can set up high availability for your cluster in this panel. To do so, enter two URLs as comma-separated values (no space between the comma and the second URL) in the HDFS Url field under the HDFS Connection section of this panel. The plug-in designates the first entry to be the primary node and the second entry to be the secondary node. The plug-in automatically connects and routes the request to the secondary node in the event that the primary node goes down.

To check the status of a node, use the API, <HDFS URL>/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus, For example, http://cdh571.na.tibco.com:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus