HDFS Connection
The HDFS Connection shared resource contains all necessary parameters to connect to HDFS. It can be used by the HDFS Operation, ListFileStatus, Read, Write activities, and the HCatalog Connection shared resource.
General
In the General panel, you can specify the package that stores the HDFS Connection shared resource and the shared resource name.
The following table lists the fields in the General panel of the HDFS Connection shared resource:
HDFSConnection
The following table lists the fields in the HDFSConnection panel of the HDFS Connection shared resource:
Condition Applicable | Field | Module Property? | Description |
---|---|---|---|
N/A | Connection Type | No | The connection type used to connect to HDFS. The following types of connections are available:
The default URL type is Namenode when a new connection is created. |
Available only when you select Namenode as the connection type | HDFS Url | Yes | The WebHDFS URL is used to connect to HDFS. The default value is
http://localhost:50070.
The plug-in supports HttpFS and HttpFS with SSL. You can enter an HttpFS URL with HTTP or HTTPS in this field. For example: http://httpfshostname:14000 https://httpfshostname:14000 |
Available only when you select Gateway as the connection type | Gateway Url | Yes | The Knox Gateway URL is used to connect to HDFS. For example, enter Knox Gateway URL as https://localhost:8443/gateway/default, where default is the topology name. |
Password | Yes |
The password that is used to connect to HDFS. |
|
Available only when you select Namenode or Gateway as the connection type | User Name | Yes | Create a unique user name to connect to HDFS. |
SSL | No | Select the check box to enable the SSL configuration. By default, the SSL check box is not selected. | |
Available only when you enable SSL configuration | Key File | Yes | Select the server certificate for HDFS. |
Key Password | Yes | The password for the server certificate. | |
Trust File | Yes | Select the client certificate for HDFS. | |
Trust Password | Yes | The password for the client certificate. | |
Available only when you select Namenode or Gateway as the connection type | Enable Kerberos | No | To connect to a Kerberized WebHCat server, you can select this check box.
By default, the
Enable Kerberos check box is not selected.
Note: If your server uses the AES-256 encryption, you must install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files on your machine. For more details, see
Installing JCE Policy Files.
|
Available only when you enable Kerberos and select either Keytab, Cached, or Password as the Kerberos method | Kerberos Method | No | The Kerberos authentication method that is used to authorize access to HDFS. Select an authentication method from the list: |
Kerberos Principal | Yes | The Kerberos principal name that is used to connect to HDFS. | |
Available only when you enable Kerberos and select Keytab as the Kerberos method | Kerberos Keytab | Yes | The keytab that is used to connect to HDFS. |
Login Module File | Yes | The login module file is used to authorize access to WebHDFS. Each LoginModule-specific item specifies a LoginModule, a flag value, and options to be passed to the LoginModule.
Note: You can leave the
Kerberos Principal and
Keytab File fields empty if login module is provided. The login module file takes preference over the principal and keytab file fields if populated.
The login module file for HDFS client is of the following format: HDFSClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=false debug=true keyTab="<keytab file path>" principal="<Principal>"; }; |
|
Available only when you enable Kerberos and select Password as the Kerberos method | Kerberos Password | Yes | Password for the Kerberos principal |
Available only when you select Azure Data Lake Storage Gen1 as the connection type | Data Lake Name | Yes | Name of the Azure Data Lake Storage Gen1 resource you created on Azure portal |
Authentication Type | Yes | The authentication type you want to use.
For the Azure Data Lake Storage Gen1 connection type, the default OAuth2.0 resource values are as follows: To override the default values, set the -Dcom.tibco.bw.webhdfs.oauthtoken.resource system property. |
|
Directory (Tenant) ID | Yes | Tenant ID of the Azure Active Directory Application | |
Application (Client) ID | Yes | Client ID of the Azure Active Directory Application | |
Client Secret | Yes | Client secret registered under Certificates and Secrets section in the Azure Active Directory application configuration | |
Token Refresh Time (min) | Yes | The time in minutes after which the authentication access token is refreshed by the plug-in.
The default token refresh time is 60 minutes. By default, the Azure Active Directory application OAuth2.0 access is valid for 60 minutes. To set a buffer time You can set a buffer time to refresh the token by using the -Dcom.tibco.bw.webhdfs.oauthtoken.minbeforeexpiry system property. |
Test Connection
You can click Test Connection to test whether the specified configuration fields result in a valid connection.
Setting up High Availability
You can set up high availability for your cluster in this panel. To do so, enter two URLs as comma-separated values (no space between the comma and the second URL) in the HDFS Url field under the HDFS Connection section of this panel. The plug-in designates the first entry to be the primary node and the second entry to be the secondary node. The plug-in automatically connects and routes the request to the secondary node in the event that the primary node goes down.
To check the status of a node, use the API, <HDFS URL>/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus, For example, http://cdh571.na.tibco.com:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus