Creating a Spark Job

In this task, implement the logic of the Spark job.

This is where the actual algorithm of the operator goes. For this tutorial, the Spark job creates a small list of rows in memory, converts them to a Spark SQL DataFrame, and then uses the SparkRuntimeUtils class to save that DataFrame and return an HDFSTabularDataset object that corresponds to the DataFrame we generated.

Note: This operator is intended to show you how to use the TIBCO Data Science - Team Studio Custom Operator SDK. The process used to create the data set in this example is not scalable, because all of the data is created in memory and then distributed. Capping the "number of things" parameter at 100 ensures that the data created in the example is never too big to fit in the driver memory. For a more practical example of dataset generation, see theSparkRandomDatasetGenerator example.
Before you beginBuild the source operator.
    Procedure
  1. Add the following code:
    class SimpleDatasetGeneratorJob extends SparkIOTypedPluginJob[IONone, HdfsTabularDataset] {
    }
What to do nextSet up the Spark job.