Setting Up the Spark Job
Only one function needs to be overridden here - onExecution().
This function contains all the code we need for the Spark job. It takes in five parameters:
- sparkContext - The Spark context that is created when the job is submitted.
- appConf - A map that contains system-related parameters (rather than parameters for the operator itself). This includes all Spark parameters and workflow-level variables.
- input - The IOBase object defined as the input to the operator. In our example, this operator takes no input, so this is set to IONone.
- params - The chosen operator parameters passed from the GUI Node.
- listener - A listener object that allows you to send messages to the Team Studio GUI during the Spark job. You can use this to post error messages or provide status reports in the Team Studio console.
Because our operator returns a tabular HDFS file, we set the output type to HdfsTabularDataset.
Procedure
Next topic: Creating the Dataset
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.