Creating a Runtime Class
After creating a signature and a GUI node, create the runtime. This is where the bulk of the operator's action occurs. Define the Spark job that performs the data transformation here.
You could extend the base class
OperatorRuntime and define output steps manually; however, in this exercise, use
SparkDataFrameRuntime. With this approach, you can write a Spark job to transform the data, and then use a set of predefined methods on the back end to package the results and return them to the
TIBCO Data Science - Team Studio application.
Name this class
MyColumnFilterRuntime and extend
SparkDataFrameRuntime, passing in the Spark job
MyColumnFilterJob as a type parameter. For the Column Filter operator, submit a Spark job called
MyColumnFilterJob. To use the default implementation that launches a Spark job according to the default Spark settings, you do not need to add code to the body of the
MyColumnFilterRuntime class.
- Procedure
- Add the following code:
class MyColumnFilterRuntime extends SparkDataFrameRuntime[MyColumnFilterJob] { } - Implement the Spark job referenced in the previous class.
- Create a class called
MyColumnFilterJoband extendSparkDataFrameJob. - Override the
transform()method to start writing your Spark code, as follows.
class MyColumnFilterJob extends SparkDataFrameJob { override def transform(parameters: OperatorParameters, dataFrame: DataFrame, sparkUtils: SparkRuntimeUtils, listener: OperatorListener): DataFrame = { } }The parameters are passed from the parameters the user selects at design time. When the operator starts running, the parameter information is passed to the runtime. The parameterdataFrameis the input, and the function should return a data frame when the code completes. - Create a class called
- To perform the Column Filter operation, retrieve a list of the parameters the user selects, and then return a data frame with only those columns included.
- Access the user-selected columns using the following.
parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)
OperatorConstants.parameterIDis the value referenced in theConstantsclass. It points to the columns chosen in theOperatorDialogstep. This call returns a collection of column names that were selected. - Apply Spark's
col()method across the collection of column names by adding the following code:
class MyColumnFilterJob extends SparkDataFrameJob { override def transform(parameters: OperatorParameters, dataFrame: DataFrame, sparkUtils: SparkRuntimeUtils, listener: OperatorListener): DataFrame = { // grab the selected column names val columnNamesToKeep = parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)._2 // returns the columns from the DF containing the column names val columnsToKeep = columnNamesToKeep.map(dataFrame.col) // returns a data frame with only the selected columns included dataFrame.select(columnsToKeep: _*) } } - Access the user-selected columns using the following.
When Your Operator Runs
For basic Spark transformations using
SparkDataFrameGUINode , the result console output is automatically configured to show a preview of the output table.