Building a Source Operator
Not every operator falls under the "data set in, data set out" paradigm. This tutorial demonstrates how to build a source operator - that is, one that creates a data source and can be connected to other operators for further transformation or modeling.
No data source input is necessary - we will build that ourselves. However, you must have a Hadoop dataset added in order to set up the flow in the first place.
This example produces a simple HDFS dataset.
Input: An integer n Output: Thing, 1 Thing, 2 Thing, 3 Thing, 4 Thing, 5 ... Thing, n Where n-1 is the number of rows in the dataset.
To see a finished copy of the code, look at SimpleDatasetGenerator.scala.
Prerequisites
- Setting Up Your Environment
Follow this procedure to set up your environment for building a source operator. - Creating the Signature Class
This is where we write the code for our operator. A custom operator must implement three classes: the signature, the GUI node, and the runtime. Each of these defines behavior and information that the Team Studio workflow engine uses to execute the operator. - Creating the Utils Object
Now we create a Utils object to hold some utility functions and constants that we will use across the different classes in our operator. - Creating the GUI Node Class
The GUI node defines the UI behavior of the operator. The utils that we made in the last section will come into play here. While we could have put that code directly into the GUI Node class, separating it allows for better readability and maintainability. - Creating the OnPlacement Method
In this step, we choose our parameters for the GUI Node, starting with the onPlacement() method. - Creating the onInputOrParameterChange Method
Once you define the parameters the user can customize, you must define the behavior of the operator when the user adjusts those parameters. - Creating a Runtime Class
Finally, you must define what happens when the user clicks Run from the application. - Creating a Spark Job
In this section, you implement the logic of the Spark job.
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.