Building a Source Operator

Not every operator falls under the "data set in, data set out" paradigm. This tutorial demonstrates how to build a source operator - that is, one that creates a data source and can be connected to other operators for further transformation or modeling.

No data source input is necessary - we will build that ourselves. However, you must have a Hadoop dataset added in order to set up the flow in the first place.

This example produces a simple HDFS dataset.

Input: An integer n
  
Output:
  
Thing, 1
Thing, 2
Thing, 3
Thing, 4
Thing, 5
...
Thing, n
  
Where n-1 is the number of rows in the dataset.

To see a finished copy of the code, look at SimpleDatasetGenerator.scala.

Prerequisites

This tutorial assumes that you have successfully completed the previous two tutorials, Compiling and Running the Sample Operators and Building Your First Custom Operator in Scala. Alternatively, if you have sufficient Spark and Scala knowledge, feel free to follow along and look at the previous tutorials for reference.