Overriding Hadoop Data Source Parameters Using Workflow Variables

It is possible to override the default Hadoop data source parameters using the workflow variable settings.

This fine-tunes the Hadoop data source settings for only the specified workflow.

  • To view a workflow's variables, select Workflow Variables from the Actions drop-down list box.
  • To create a new variable for overriding a default Hadoop setting (such as the amount of time before a task times out), click create.
  • The default is 600,000 ms (or 10 minutes).
  • To override a Hadoop variable for a specific workflow, click create and make a new variable called, for example, @alpine.mapred.mapred.task.timeout, where
    • @alpine.mapred. indicates it is the Team Studio override for a Hadoop variable and mapred.task.timeout is the official Hadoop variable name.
    • Set the value to 300000, for example.
  • To override a Hadoop variable only for a specific operator task within a workflow, create a new variable called, for example, @alpine.mapred.join.Hadoop_Join.mapred.task.timeout, where
    • @alpine.mapred. indicates it is the Team Studio override for a Hadoop variable,
    • join indicates it is for the Join operator,
    • Hadoop_Join indicates the particular operator job that is being overridden, and
    • mapred.task.timeout is the official Hadoop variable name.
    • Set the override value (for this workflow's Join operators only) to 200,000, for example.

Note: Any of the possible Hadoop configuration parameters can be configured either here at the workflow level or at the Hadoop data source level. Here is a list of Hadoop configuration parameters:

Note: For customizing specific operator tasks, the correct Team Studio operator name and job name must be referenced. Here is a list of Team Studio operators and job names:

Team Studio Operator Job Names