Pig
You can use the Pig activity to create and queue a Pig job.
General
In the General tab, you can specify the activity name in the process, establish a connection to HCatalog, and create and queue a Pig job.
The following table lists the configurations in the General tab of the Pig activity:
Field | Module Property? | Description |
---|---|---|
Name | No | The name to be displayed as the label for the activity in the process. |
HCatalog Connection | Yes | The HCatalog Connection shared resource that is used to create a connection between the plug-in and HCatalog. Click
to select an HCatalog Connection shared resource.
If no matching HCatalog Connection shared resources are found, click Create Shared Resource to create one. For more details, see Creating an HCatalog Connection. |
IsFileBase | No | If the Pig scripts are from a file, you can select this check box. |
Pig File | Yes | The path of the file that contains the Pig scripts.
This field is displayed only when you select the IsFileBase check box. |
Pig Editor | No | If the Pig scripts are not from a file, you can enter the Pig scripts directly in the
Pig Editor field. The keywords of the scripts are highlighted automatically.
This field is displayed only when you clear the IsFileBase check box. |
Arguments | No | The Pig arguments that contain space-separated string. |
Status
Directory |
Yes | The directory where the status of the Pig job is located. |
Files | Yes | The comma-separated files to be copied to the MapReduce cluster. |
UDF
In the UDF tab, you can use user-defined functions (UDFs) to specify custom processing.
The following table lists the configurations in the UDF tab of the Pig activity:
Input
The values that you specify in the Input tab override the ones that you specify in the corresponding fields in the General tab.
The following table lists the input elements in the Input tab of the Pig activity:
Input Item | Data Type | Description |
---|---|---|
PigScript | String | The Pig scripts.
This element is displayed only when you clear the IsFileBase check box in the General tab. |
PigFile | String | The comma-separated files to be copied to the MapReduce cluster.
This element is displayed only when you select the IsFileBase check box in the General tab. |
Arguments | String | The Pig arguments. |
Status
Directory |
String | The directory where the status of the Pig job is located. |
Files | String | The comma-separated files to be copied to the MapReduce cluster. |
timeout | Long | The amount of time, in milliseconds, to wait for this activity to complete.
By default, this activity does not time out if you do not specify a value. |
Output
In the Output tab, you can view the job ID of the Pig operation.
The following table lists the output element in the Output tab of the Pig activity:
Output Item | Data Type | Description |
---|---|---|
jobId | String | The job ID of the Pig operation.
Note: You can use the
WaitForJobCompletion activity to wait for the job to complete. The
exitValue output element in the
Output tab of the WaitForJobCompletion activity displays the exit value of Pig scripts execution.
|
Fault
In the Fault tab, you can view the error code and error message of the Pig activity. See Error Codes for more detailed explanation of errors.
The following table lists the error schema elements in the Fault tab of the Pig activity: