Creating the OnPlacement Method

In this step, we choose our parameters for the GUI Node, starting with the onPlacement() method.

As we discussed in the previous tutorial, the onPlacement() method defines the behavior of the operator when the user drags the operator from the sidebar onto the workflow canvas. During this step, it is useful to think about what your users will see and be able to choose from the Team Studio GUI when they run your operator.

Because our output dataset needs an integer for the number of rows to output, we should add a parameter for it here. In addition, it would be helpful for users to choose the format in which their result dataset is stored (in this case, TSV, Avro, or Parquet). Finally, because this job runs on Spark, we should add some parameters to let the user customize the Spark job.

Prerequisites

You must have created the GUI node class.

Procedure

  • Add the following code:
    override def onPlacement(
                             operatorDialog: OperatorDialog,
                             operatorDataSourceManager: OperatorDataSourceManager,
                             operatorSchemaManager: OperatorSchemaManager): Unit = {
        // add a parameter to the dialog box which lets the user enumerate the number of things (the
        // length of the dataset) she wants to generate.
        operatorDialog.addIntegerBox(
          DatasetGeneratorUtils.numberRowsParamKey, // here we are using the constant we defined in DatasetGeneratorUtils, above
          "Number of things", // name of the parameter
          0, // minimum accepted value
          100,// maximum accepted value
          10 // default value
        )
      
        /*
        Use the HdfsParameterUtils class to add a dropdown box which will let the user determine
        how the output data is stored (CSV, Avro, Parquet). Default to CSV.
         */
        HdfsParameterUtils.addHdfsStorageFormatParameter(operatorDialog, HdfsStorageFormatType.CSV)
        /*
         * Use the HdfsParameterUtils class to add parameters about how/where to store the output
         * dataset. In particular: the output directory, the output name, and whether to overwrite
         * anything already stored at that location.
         */
        HdfsParameterUtils.addStandardHdfsOutputParameters(operatorDialog)
        /*
         * Use the SparkParameterUtils class to add the some parameters to let the user configure the
         * Spark job.
         */
        SparkParameterUtils.addStandardSparkOptions(
          operatorDialog,
          additionalSparkParameters = List()
        )
      }