Pig

The Pig activity is used to create and queue a Pig job.

Note: If you run this activity on the Redhat platform, you need to upgrade XML User Interface Language (XUL) Runner to version 1.8 or later. After the upgrading, you need to reinstall Mozilla Firefox.

General

The General tab has the following fields.

Field Module Property? Description
Name No The name of the activity in the process definition.
HCatalog Connection Yes Click to select an HCatalog Connection shared resource.

If no matching HCatalog Connection shared resources are found, click Create Shared Resource to create one.

IsFileBase No Select this check box if Pig scripts are from a file.
Pig File Yes Specifies the path of the file contains Pig scripts.
Note: This field is displayed when the IsFileBase check box is selected.
PigEditor No Specifies Pig scripts. The keywords of the scripts are highlighted automatically.
Note: This field is displayed when the IsFileBase check box is cleared.
Arguments No Specifies Pig arguments containing space-separated string.
Status Directory Yes Specifies the directory where the status of the Pig job is located.
Files Yes Specifies the comma separated files to be copied to the Mapreduce cluster.

Description

Provide a short description for the activity.

UDF

The UDF tab has the following fields.

Input Item Module Property? Description
UDF Directory Yes Specifies the directory where user define functions (UDF) is located.
Available UDF Files No Specifies the UDF file to be applied.

Click to list available UDF files under the specified UDF directory. Select a UDF file and click , the UDF file is displayed in the Pig Editor field in the General tab.

Upload UDF File Yes Specifies the UDF file to be uploaded.

Click to select the UDF file to be uploaded, and then click Upload to upload the file to the specified directory.

Input

The values specified in this tab takes precedence over the ones in the corresponding fields in the General tab.

Input Item Data Type Description
PigScript string Specifies Pig scripts.
Note: This item is displayed when the IsFileBase check box is cleared.
PigFile string Specifies the comma separated files to be copied to the Mapreduce cluster.
Note: This item is displayed when the IsFileBase check box is selected.
Arguments string Specifies Pig arguments.
StatusDirectory string Specifies the directory where the status of the Hive job is located.
Files string Specifies the comma separated files to be copied to the Mapreduce cluster.

Output

The output of the activity are as follows.

Output Item Data Type Description
jobId string Returns the job ID of the Pig operation.
Note: You can use the WaitForJobCompletion activity to wait for the job to complete. The exitValue in the Output tab of the WaitForJobCompletion activity shows the exit value of Pig scripts execution.

Fault

The Fault tab lists the exceptions that can be thrown by this activity.

HadoopException Description
msg The error message description returned by the plug-in.
msgCode The error code returned by the plug-in.