Hive

You can use the Hive activity to facilitate querying and managing large datasets located in a distributed storage.

Note:
  • If you run this activity on the Red Hat platform (except version 7.0), you have to upgrade XML User Interface Language (XUL) Runner to version 1.8 or later. After the upgrading, reinstall Mozilla Firefox. To run this activity on Red Hat Enterprise Linux 7.0, see the "Known Issues" part in TIBCO ActiveMatrix BusinessWorks Plug-in for Big Data Release Notes for more details.
  • The Hive activity does not support uploading Hive data from local clusters.

The latest Hive, 0.12 or later, is supported by default. If you want to use the Hive activity based on previous Hive versions (earlier than Hive 0.12), you have to add the following property to the VM arguments in TIBCO Business Studio:

-Dcom.tibco.plugin.bigdata.oldhive.active=true

To access the VM arguments in TIBCO Business Studio, click Run > Run Configurations, expand BusinessWorks Application > BWApplication in the left panel of the Run Configurations dialog, and then click the Argument tab in the right panel of the dialog.

Also, for deployment of the WaitForJobCompletion activity, you have to add the following property to the Config.ini file:

com.tibco.plugin.bigdata.oldhive.active=true

The Config.ini file is located in the TIBCO_HOME/bw/version_number/domains/domain_name/appnodes/appspace_name/appnode_name directory.

General

In the General tab, you can specify the activity name in the process, establish a connection to HCatalog, and specify Hive scripts and other Hive options to query data.

The following table lists the configurations in the General tab of the Hive activity:

Field Module Property? Description
Name No The name to be displayed as the label for the activity in the process.
HCatalog Connection Yes The HCatalog Connection shared resource that is used to create a connection between the plug-in and HCatalog. Click to select an HCatalog Connection shared resource.

If no matching HCatalog Connection shared resources are found, click Create Shared Resource to create one. For more details, see Creating an HCatalog Connection.

IsFileBase No If the Hive scripts are from a file, you can select this check box
Hive Script File Yes The path of the file that contains the Hive scripts.

This field is displayed only when you select the IsFileBase check box.

Hive Editor No If the Hive scripts are not from a file, you can enter the Hive scripts directly in the Hive Editor field. The keywords of the scripts are highlighted automatically.

This field is displayed only when you clear the IsFileBase check box.

Define No In this field, you can define the Hive configuration variables. A variable is associated with a name and a value.
Status Directory Yes The directory where the status of the Hive job is located.
WaitForResult Yes If you want the process to wait for the Hive operation to complete, you can select this check box .
Note: When you select this check box, the Hive activity does not support querying more than 2 GB result data at one time because of the limitations of TIBCO ActiveMatrix BusinessWorks.

Description

In the Description tab, you can enter a short description for the Hive activity.

Input

The values that you specify in the Input tab override the ones that you specify in the corresponding fields in the General tab.

The following table lists all the possible input elements in the Input tab of the Hive activity:

Input Item Data Type Description
HiveFile String The path of the HDFS file that contains commands.

This element is displayed only when you select the IsFileBase check box in the General tab.

HiveScript String The Hive scripts directly.

This element is displayed only when you clear the IsFileBase check box in the General tab.

Defines String You can define the Hive configuration variables. Each variable is associated with a name and a value.
Status

Directory

String The directory where the status of the Hive job is located.
timeout Long The amount of time, in milliseconds, to wait for this activity to complete.

By default, this activity does not time out if you do not specify a value.

Output

In the Output tab, you can view the job ID of the Hive operation or the result of the job depending on whether you select the WaitForResult check box in the General tab.

The following table lists the output elements in the Output tab of the Hive activity:

Output Item Data Type Description
jobId String The job ID of the Hive operation.
Note: You can use the WaitForJobCompletion activity to wait for the job to complete. The exitValue output element in the Output tab of the WaitForJobCompletion activity The exit value of Hive SQL execution.

This element is displayed only when you clear WaitForResult check box in the General tab.

content String The result of the job.

This element is displayed only when you select the WaitForResult check box in the General tab.

Fault

In the Fault tab, you can view the error code and error message of the Hive activity. See Error Codes for more detailed explanation of errors.

The following table lists the error schema elements in the Fault tab of the Hive activity:

Error Schema Element Data Type Description
msg String The error message description that is returned by the plug-in.
msgCode String The error code that is returned by the plug-in.