Spark Node Fusion
You can use Spark Node Fusion to have multiple operators in a single Spark job (also called "Spark context"). This allows the job to run faster because it is not recreating a new job and persisting results to HDFS at each analytical step.
Regardless of your workflow's size or the number of operators it contains, performance during runtime is crucial. Being able to specify the use of node fusion on existing workflows, and then revert to the previous setting, must be easy to do. You can accomplish this using the Use Spark property. For more information, see Convert to Spark/Revert to Non-Spark.
When a workflow with Spark operators is run through the job scheduler, the results are not made visible to the user, because doing so would make the job run much more slowly. If you want to view results anyway, set Store Results to true before you run the job.
The following operators have been updated to use Spark Node Fusion. Prior to the release of Team Studio version 6.4, these operators typically used MapReduce or Pig execution frameworks.
- Viewing Results for Individual Operators
When a workflow that contains Spark operators is run, the results are not shown by default. This enables the workflow to run faster. You can see the results for individual operators available for Spark Node Fusion under the following conditions.