Setting Up the Spark Job

You need to override only one function in this part of the example - onExecution().

This function contains all of the code needed for the Spark job. The function takes the following five parameters:

  • sparkContext - The Spark context that is created when the job is submitted.
  • appConf - A map that contains system-related parameters (rather than parameters for the operator itself). This includes all Spark parameters and workflow-level variables.
  • input - The IOBase object defined as the input to the operator. In the example, this operator takes no input, so this is set to IONone.
  • params - The chosen operator parameters passed from the GUI Node.
  • listener - A listener object for sending messages to the TIBCO Data Science - Team Studio GUI during the Spark job. You can use this to post error messages or provide status reports in the TIBCO Data Science - Team Studio console.

The operator returns a tabular HDFS file, so set the output type to HdfsTabularDataset.

    Procedure
  1. Create the skeleton of the onExecution() method as follows:
    override def onExecution(
                             sparkContext: SparkContext,
                             appConf: mutable.Map[String, String],
                             input: IONone,
                             params: OperatorParameters,
                             listener: OperatorListener): HdfsTabularDataset = {
    }

    Do the following in this Spark job:

    1. Create a small list of data in memory.
    2. Transform the small list into a DataFrame with the output schema that you defined so that the custom operator framework can export it as an HdfsTabularDataset.

What to do nextCreate the dataset.