MapReduce

General

The General tab has the following fields.

Field	Module Property?	Description
Name	No	The name of the activity in the process definition.
HCatalog Connection	Yes	Click to select an HCatalog Connection shared resource. If no matching HCatalog Connection shared resources are found, click Create Shared Resource to create one.
Streaming	No	Select this check box to create and run Mapreduce streaming jobs.
The following four fields are displayed when the Streaming check box is selected.
Input	Yes	Specifies the path of the input data in Hadoop.
Output	Yes	Specifies the path of the output data.
Mapper	Yes	Specifies the path of the mapper program in Hadoop.
Reducer	Yes	Specifies the path of the reducer program in Hadoop.
The following four fields are displayed when the Streaming check box is cleared.
Jar Name	Yes	Specifies the name of the .jar file for Mapreduce to use.
Main Class	Yes	Specifies the name of the class for Mapreduce to use.
Lib Jars	Yes	Specifies the comma separated .jar file to be included in the classpath.
Files	Yes	Specifies the comma separated .jar files to be copied to the Mapreduce cluster.
Status Directory	Yes	Specifies the directory where the status of Mapreduce jobs are stored.
Arguments	No	Specifies the program arguments. If the Streaming check box is cleared, specify Java main class arguments. If the Streaming check box is selected, specify a list of arguments that contain space-separated strings to pass to the Hadoop streaming utility. For example, - files /user/hdfs/file - D mapred.reduce.task=0 - input format org.apache.hadoop.mapred.lib.NLineInputFormat - cmdenv info=wc-reducer
The following field is displayed when the Streaming check box is cleared.
Define	No	Specifies the Hadoop configuration variables. A variable is associated with a name and a value.

Description

Provide a short description for the activity.

Input

The values specified in this tab takes precedence over the ones in the corresponding fields in the General tab. The following table specifies the possible input of the activity.

Input Item	Data Type	Description
The following four fields are displayed when the Streaming check box is selected.
Input	string	Specifies the path of the input data in Hadoop.
Output	string	Specifies the path of the output data.
Mapper	string	Specifies the path of the mapper program in Hadoop.
Reducer	string	Specifies the path of the reducer program in Hadoop.
The following four fields are displayed when the Streaming check box is cleared.
JarName	string	Specifies the name of the .jar file for Mapreduce to use.
ClassName	string	Specifies the name of the class for Mapreduce to use.
Libjars	string	Specifies the comma separated .jar file to be included in the classpath.
Files	string	Specifies the comma separated .jar files to be copied to the Mapreduce cluster.
StatusDirectory	string	Specifies the directory where the status of Mapreduce jobs are stored.
Arguments	string	Specifies the program arguments.
The following field is displayed when the Streaming check box is cleared.
Defines	string	Specifies the Hadoop configuration variables. A variable is associated with a name and a value.

Output

The output of the activity are as follows.

Output Item	Data Type	Description
jobId	string	Returns the job ID of the Mapreduce operation. Note: You can use the WaitForJobCompletion activity to wait for the job to complete. The `exitValue` in the Output tab of the WaitForJobCompletion activity shows the exit value of Mapreduce execution.

Fault

The Fault tab lists the exceptions that can be thrown by this activity.

HadoopException	Description
msg	The error message description returned by the plug-in.
msgCode	The error code returned by the plug-in.