Defining the Output Schema

To ensure the output displays in a consistent fashion, define the output schema for the data frame.

The schema must be set in both the runtime and the GUI node class, and the schema must match in both. The GUI node must have access to the output schema to format any output visualization properly, and to pass to subsequent operators information about the operator dataset schema.

Before you beginBuild the operator dialog.
    Procedure
  1. Add the following code:
    override def defineOutputSchemaColumns(inputSchema: TabularSchema,
                                            parameters: OperatorParameters): Seq[ColumnDef] = {
       val columnsToKeep = parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)._2
       inputSchema.getDefinedColumns.filter(colDef => columnsToKeep.contains(colDef.columnName))
     }

    The first line pulls the column names selected from the TabularDatasetColumnCheckbox parameter defined in the operatorDialog. The second line filters the available columns by the selected column names and returns those columns as the output schema.

    The code should now resemble the following.

    class MyColumnFilterGUINode extends SparkDataFrameGUINode[MyColumnFilterJob]{
     override def onPlacement(operatorDialog: OperatorDialog,
                               operatorDataSourceManager: OperatorDataSourceManager,
                               operatorSchemaManager: OperatorSchemaManager): Unit = {
     
       operatorDialog.addTabularDatasetColumnCheckboxes(
     OperatorConstants.parameterID,      // the ID of the operator
     "Columns to keep", // the label of the operator (user-visible)
     ColumnFilter.All,     // this means users can select all of the columns
                        // but this can also be changed to allow for only
                        // numeric or categorical columns
     "main"             // this is the selectionGroupId,
                        // which is used for validating groups of parameters
       super.onPlacement(operatorDialog, operatorDataSourceManager, operatorSchemaManager)
     }
     
     
     override def defineOutputSchemaColumns(inputSchema: TabularSchema,
                                            parameters: OperatorParameters): Seq[ColumnDef] = {
       val columnsToKeep = parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)._2
       inputSchema.getDefinedColumns.filter(colDef => columnsToKeep.contains(colDef.columnName))
     }
    }
What to do nextCreate a Runtime class.