Overriding Hadoop Data Source Parameters Using Workflow Variables
It is possible to override the default Hadoop data source parameters using the workflow variable settings.
This fine-tunes the Hadoop data source settings for only the specified workflow.
- To view a workflow's variables, select Workflow Variables from the Actions dropdown list box.
- To create a new variable for overriding a default Hadoop setting (such as the amount of time before a task times out), click create.
- The default is 600,000 ms (or 10 minutes).
- To override a Hadoop variable for a specific workflow, click
create and make a new variable called, for example,
@alpine.mapred.mapred.task.timeout, where
- @alpine.mapred. indicates it is the TIBCO Data Science - Team Studio override for a Hadoop variable and mapred.task.timeout is the official Hadoop variable name.
- Set the value to 300000, for example.
- To override a Hadoop variable only for a specific operator task within a workflow, create a new variable called, for example,
@alpine.mapred.join.Hadoop_Join.mapred.task.timeout, where
- @alpine.mapred. indicates it is the TIBCO Data Science - Team Studio override for a Hadoop variable,
- join indicates it is for the Join operator,
- Hadoop_Join indicates the particular operator job that is being overridden, and
- mapred.task.timeout is the official Hadoop variable name.
- Set the override value (for this workflow's Join operators only) to 200,000, for example.
Note: Note: Any of the possible Hadoop configuration parameters can be configured either here at the workflow level or at the Hadoop data source level. Here is a list of Hadoop configuration parameters:
Note: For customizing specific operator tasks, the correct
TIBCO Data Science - Team Studio operator name and job name must be referenced. Here is a list of
TIBCO Data Science - Team Studio operators and job names: