Building a Source Operator

Not every operator falls under the "data set in, data set out" paradigm. This tutorial demonstrates how to build a source operator - that is, one that creates a data source and can be connected to other operators for further transformation or modeling.

No data source input is necessary; this instruction creates the data source. However, you must add a Hadoop data set to set up the flow initially.

This example produces a simple HDFS data set.

Input: An integer n
  
Output:
  
Thing, 1
Thing, 2
Thing, 3
Thing, 4
Thing, 5
...
Thing, n
  
Where n-1 is the number of rows in the data set.

To see a finished copy of the code, see SimpleDatasetGenerator.scala. (Created as part of Setting Up Your Environment.)

Before you begin To successfully complete this tutorial, you must have successfully completed the previous two tutorials, Installing the Custom Sample Operator for your Version and Building Your First Custom Operator in Scala. Alternatively, if you have sufficient Spark and Scala knowledge, you can just follow along and look at the previous tutorials for reference.
What to do nextSet up your environment.