HDFS Connection
The HDFS Connection shared resource contains all necessary parameters to connect to HDFS. It can be used by the HDFS Operation, ListFileStatus, Read, Write activities, and the HCatalog Connection shared resource.
General
In the General panel, you can specify the package that stores the HDFS Connection shared resource and the shared resource name.
The following table lists the fields in the General panel of the HDFS Connection shared resource:
HDFSConnection
In the HDFSConnection Configuration panel, you can provide necessary information to connect the plug-in with HDFS. You can also connect to a Kerberized HDFS server. The HDFS Connection shared resource also supports the Knox Gateway security system provided by HortonWorks.
The following table lists the fields in the HDFSConnection panel of the HDFS Connection shared resource:
Field | Module Property? | Description |
---|---|---|
Select Url Type | No | The URL type used to connect to HDFS. There are two types of URL:
The default URL type is Namenode when a new connection is created. |
Gateway Url | Yes | This field is displayed when the Gateway option is selected in the Select Url Type field. The Knox Gateway URL is used to connect to HDFS. For example, enter Knox Gateway URL as https://localhost:8443/gateway/default, where default is the topology name. |
HDFS Url | Yes | This field is displayed when the
Namenode option is selected in the
Select Url Type field. The WebHDFS URL is used to connect to HDFS. The default value is
http://localhost:50070.
The plug-in supports HttpFS and HttpFS with SSL. You can enter an HttpFS URL with HTTP or HTTPS in this field. For example: http://httpfshostname:14000 https://httpfshostname:14000 |
User Name | Yes | The unique user name that is used to connect to HDFS. |
Password | Yes | This field is displayed when the Gateway option is selected in Select Url Type field. The password that is used to connect to HDFS. |
SSL | No | Select the check box to enable the SSL configuration. By default, the SSL check box is unchecked. |
Key File | Yes | Select the server certificate for HDFS. This field is displayed only when the SSL check box is selected. |
Key Password | Yes | The password for the server certificate. This field is displayed only when the SSL check box is selected. |
Trust File | Yes | Select the client certificate for HDFS. This field is displayed only when the SSL check box is selected. |
Trust Password | Yes | The password for the client certificate. This field is displayed only when the SSL check box is selected. |
Enable Kerberos | No | If you want to connect to a Kerberized WebHCat server, you can select this check box.
Note: If your server uses the AES-256 encryption, you must install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files on your machine. For more details, see
Installing JCE Policy Files.
|
Kerberos Method | No | The Kerberos authentication method that is used to authorize access to HDFS. Select an authentication method from the list:
This field is displayed only when you select the Enable Kerberos check box. |
Kerberos Principal | Yes | The Kerberos principal name that is used to connect to HDFS.
This field is displayed only when you select the Enable Kerberos check box. |
Kerberos Password | Yes | The password for the Kerberos principal.
This field is displayed only when you select the Password from the Kerberos Method list. |
Kerberos Keytab | Yes | The keytab that is used to connect to HDFS.
This field is displayed only when you select Keytab from the Kerberos Method list. |
Login Module File | Yes | The login module file is used to authorize access to WebHDFS. Each LoginModule-specific item specifies a LoginModule, a flag value, and options to be passed to the LoginModule.
This field is displayed only when you select Keytab from the Kerberos Method list. Note: Kerberos Principal and
Keytab File fields can be left empty if login module is provided. The login module file takes preference over the principal and keytab file fields if populated.
The login module file for HDFS client is of the following format: HDFSClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=false debug=true keyTab="<keytab file path>" principal="<Principal>"; }; For AIX platform, the login module file for HDFS client is of the following format: HDFSClient { com.ibm.security.auth.module.Krb5LoginModule required principal="<Principal>" useKeytab="<keytab file path>" credsType="both"; }; |
Test Connection
You can click Test Connection to test whether the specified configuration fields result in a valid connection.
Setting up High Availability
You can set up high availability for your cluster in this panel. To do so, enter two URLs as comma-separated values (no space between the comma and the second URL) in the HDFS Url field under the HDFS Connection section of this panel. The plug-in designates the first entry to be the primary node and the second entry to be the secondary node. The plug-in automatically connects and routes the request to the secondary node in the event that the primary node goes down.
To check the status of a node, use the API, <HDFS URL>/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus, For example, http://cdh571.na.tibco.com:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus