Settings for Spark-Enabled Operators

The easiest and quickest way to change Spark settings is from the operator itself.

  • You can enable Automatic Configuration for the Spark parameters. Team Studio selects default values to run the operator.
  • You can edit these parameters in the operator parameter dialog box directly by setting Advanced Settings Automatic Optimization to No, and then clicking Edit Settings to display and edit the settings in the Advanced Settings dialog box.

Additional parameters can be available, depending on the operator. Additionally, you can add parameters, using any of those mentioned in the official Spark documentation.

As a use case example, imagine that you are parsing a lot of files using the Text Extractor. The Spark job keeps failing or going very slow. Depending on your input data, you can take one of the following actions to correct these problems.

  • If you have lots of medium or small sized file (hundreds of thousands of files < 40MB) to parse and the job is failing, you should try to increase the driver memory and the number of executors.
  • If you have bigger files to parse ( > 90MB) and the Spark job is failing, increase the executor memory so that the bigger files are parsed by a single executor. You should also increase the driver memory.

Data Source configuration

Spark settings can be changed on the data source itself. To do this, you must have access to the Hadoop cluster with Spark installed.

Tips and tricks

For more information about Spark optimization, see the following resources.