Setting Up the Spark Job
You need to override only one function in this part of the example -
onExecution().
This function contains all of the code needed for the Spark job. The function takes the following five parameters:
sparkContext- The Spark context that is created when the job is submitted.appConf- A map that contains system-related parameters (rather than parameters for the operator itself). This includes all Spark parameters and workflow-level variables.input- The IOBase object defined as the input to the operator. In the example, this operator takes no input, so this is set toIONone.params- The chosen operator parameters passed from the GUI Node.listener- A listener object for sending messages to the TIBCO Data Science - Team Studio GUI during the Spark job. You can use this to post error messages or provide status reports in the TIBCO Data Science - Team Studio console.
The operator returns a tabular HDFS file, so set the output type to
HdfsTabularDataset.
- Procedure
- Create the skeleton of the
onExecution()method as follows:override def onExecution( sparkContext: SparkContext, appConf: mutable.Map[String, String], input: IONone, params: OperatorParameters, listener: OperatorListener): HdfsTabularDataset = { }Do the following in this Spark job:
- Create a small list of data in memory.
- Transform the small list into a DataFrame with the output schema that you defined so that the custom operator framework can export it as an
HdfsTabularDataset.
What to do nextCreate the dataset.