Defining the Output Schema
To ensure the output displays in a consistent fashion, define the output schema for the data frame.
The schema must be set in both the runtime and the GUI node class, and the schema must match in both. The GUI node must have access to the output schema to format any output visualization properly, and to pass to subsequent operators information about the operator dataset schema.
- Procedure
- Add the following code:
override def defineOutputSchemaColumns(inputSchema: TabularSchema, parameters: OperatorParameters): Seq[ColumnDef] = { val columnsToKeep = parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)._2 inputSchema.getDefinedColumns.filter(colDef => columnsToKeep.contains(colDef.columnName)) }The first line pulls the column names selected from the
TabularDatasetColumnCheckboxparameter defined in theoperatorDialog. The second line filters the available columns by the selected column names and returns those columns as the output schema.The code should now resemble the following.
class MyColumnFilterGUINode extends SparkDataFrameGUINode[MyColumnFilterJob]{ override def onPlacement(operatorDialog: OperatorDialog, operatorDataSourceManager: OperatorDataSourceManager, operatorSchemaManager: OperatorSchemaManager): Unit = { operatorDialog.addTabularDatasetColumnCheckboxes( OperatorConstants.parameterID, // the ID of the operator "Columns to keep", // the label of the operator (user-visible) ColumnFilter.All, // this means users can select all of the columns // but this can also be changed to allow for only // numeric or categorical columns "main" // this is the selectionGroupId, // which is used for validating groups of parameters super.onPlacement(operatorDialog, operatorDataSourceManager, operatorSchemaManager) } override def defineOutputSchemaColumns(inputSchema: TabularSchema, parameters: OperatorParameters): Seq[ColumnDef] = { val columnsToKeep = parameters.getTabularDatasetSelectedColumns(OperatorConstants.parameterID)._2 inputSchema.getDefinedColumns.filter(colDef => columnsToKeep.contains(colDef.columnName)) } }