MapReduce

General

In the General tab, you can specify the activity name in the process, establish a connection to HCatalog, and create and queue a standard or streaming MapReduce job.

The following table lists the configurations in the General tab of the MapReduce activity:

Field	Module Property?	Description
Name	No	The name to be displayed as the label for the activity in the process.
HCatalog Connection	Yes	The HCatalog Connection shared resource that is used to create a connection between the plug-in and HCatalog. Click to select an HCatalog Connection shared resource. If no matching HCatalog Connection shared resources are found, click Create Shared Resource to create one. For more details, see Creating an HCatalog Connection.
Streaming	No	If you want to create and run streaming MapReduce jobs, you can select this check box.
Input	Yes	The path of the input data in Hadoop. This field is displayed only when you select the Streaming check box.
Output	Yes	The path of the output data. This field is displayed only when you select the Streaming check box.
Mapper	Yes	The path of the mapper program in Hadoop. This field is displayed only when you select the Streaming check box.
Reducer	Yes	The path of the reducer program in Hadoop. This field is displayed only when you select the Streaming check box.
Jar Name	Yes	The name of the .jar file for the MapReduce activity to use. This field is displayed only when you clear the Streaming check box.
Main Class	Yes	The name of the class for the MapReduce activity to use. This field is displayed only when you clear the Streaming check box.
Lib Jars	Yes	The comma-separated .jar file to be included in the classpath. This field is displayed only when you clear the Streaming check box.
Files	Yes	The comma-separated .jar files to be copied to the MapReduce cluster. This field is displayed only when you clear the Streaming check box.
Status Directory	Yes	The directory where the status of MapReduce jobs is stored.
Arguments	No	The program arguments. If you select the Streaming check box, specify a list of program arguments that contain space-separated strings to pass to the Hadoop streaming utility. For example: - files /user/hdfs/file - D mapred.reduce.task=0 - input format org.apache.hadoop.mapred.lib.NLineInputFormat - cmdenv info=wc-reducer If you clear the Streaming check box, specify the Java main class arguments.
Define	No	In this field, you can define the Hadoop configuration variables. A variable is associated with a name and a value. This field is displayed only when you clear the Streaming check box.

Description

In the Description tab, you can enter a short description for the MapReduce activity.

Input

The values that you specify in the Input tab override the ones that you specify in the corresponding fields in the General tab.

The following table lists the input elements in the Input tab of the MapReduce activity:

Input Item	Data Type	Description
Input	String	The path of the input data in Hadoop. This element is displayed only when you select the Streaming check box in the General tab.
Output	String	The path of the output data. This element is displayed only when you select the Streaming check box in the General tab.
Mapper	String	The path of the mapper program in Hadoop. This element is displayed only when you select the Streaming check box in the General tab.
Reducer	String	The path of the reducer program in Hadoop. This element is displayed only when you select the Streaming check box in the General tab.
JarName	String	The name of the .jar file for the MapReduce activity to use. This element is displayed only when you clear the Streaming check box in the General tab.
ClassName	String	The name of the class for the MapReduce activity to use. This element is displayed only when you clear the Streaming check box in the General tab.
Libjars	String	The comma-separated .jar file to be included in the classpath. This element is displayed only when you clear the Streaming check box in the General tab.
Files	String	The comma-separated .jar files to be copied to the MapReduce cluster. This element is displayed only when you clear the Streaming check box in the General tab.
Status Directory	String	The directory where the status of MapReduce jobs is stored.
Arguments	String	The program arguments.
Defines	String	You can define the Hadoop configuration variables. A variable is associated with a name and a value. This field is displayed only when you clear the Streaming check box in the General tab.
timeout	Long	The amount of time, in milliseconds, to wait for this activity to complete. By default, this activity does not time out if you do not specify a value.

Output

In the Output tab, you can view the job ID of the MapReduce operation.

The following table lists the output element in the Output tab of the MapReduce activity:

Output Item	Data Type	Description
jobId	String	The job ID of the MapReduce operation. Note: You can use the WaitForJobCompletion activity to wait for the job to complete. The exitValue output element in the Output tab of the WaitForJobCompletion activity displays the exit value of MapReduce execution.

Fault

In the Fault tab, you can view the error code and error message of the MapReduce activity. See Error Codes for more detailed explanation of errors.

The following table lists the error schema elements in the Fault tab of the MapReduce activity:

Error Schema Element	Data Type	Description
msg	String	The error message description that is returned by the plug-in.
msgCode	String	The error code that is returned by the plug-in.

Contents