Creating a Runtime Class

After creating a signature and a GUI node, create the runtime. This is where the bulk of the operator's action occurs. Define the Spark job that performs the data transformation here.

You could extend the base class OperatorRuntime and define output steps manually; however, in this exercise, use SparkDataFrameRuntime. With this approach, you can write a Spark job to transform the data, and then use a set of predefined methods on the back end to package the results and return them to the TIBCO Data Science - Team Studio application.

Name this class MyColumnFilterRuntime and extend SparkDataFrameRuntime, passing in the Spark job MyColumnFilterJob as a type parameter. For the Column Filter operator, submit a Spark job called MyColumnFilterJob. To use the default implementation that launches a Spark job according to the default Spark settings, you do not need to add code to the body of the MyColumnFilterRuntime class.

Before you begin

Define the output schema.

    Procedure
  1. Add the following code:
    class MyColumnFilterRuntime extends SparkDataFrameRuntime[MyColumnFilterJob] {
     }
  2. Implement the Spark job referenced in the previous class.

    1. Create a class called MyColumnFilterJob and extend SparkDataFrameJob.
    2. Override the transform() method to start writing your Spark code, as follows.

    class MyColumnFilterJob extends SparkDataFrameJob {
     override def transform(parameters: OperatorParameters,
             dataFrame: DataFrame,
             sparkUtils: SparkRuntimeUtils,
             listener: OperatorListener): DataFrame = {
     
             }
    }
    The parameters are passed from the parameters the user selects at design time. When the operator starts running, the parameter information is passed to the runtime. The parameter dataFrame is the input, and the function should return a data frame when the code completes.
  3. To perform the Column Filter operation, retrieve a list of the parameters the user selects, and then return a data frame with only those columns included.

    1. Access the user-selected columns using the following.
      parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)
      OperatorConstants.parameterID is the value referenced in the Constants class. It points to the columns chosen in the OperatorDialog step. This call returns a collection of column names that were selected.
    2. Apply Spark's col() method across the collection of column names by adding the following code:

    class MyColumnFilterJob extends SparkDataFrameJob {
     
     override def transform(parameters: OperatorParameters,
                            dataFrame: DataFrame,
                            sparkUtils: SparkRuntimeUtils,
                            listener: OperatorListener): DataFrame = {
       // grab the selected column names
       val columnNamesToKeep = parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)._2
       // returns the columns from the DF containing the column names
       val columnsToKeep = columnNamesToKeep.map(dataFrame.col)
       // returns a data frame with only the selected columns included
       dataFrame.select(columnsToKeep: _*)
     
     }
    }

When Your Operator Runs

For basic Spark transformations using SparkDataFrameGUINode , the result console output is automatically configured to show a preview of the output table.