Advanced Settings Dialog Box

When Spark is enabled for an operator, you can apply the Automatic configuration for the Spark parameters, setting the default values to run the operator. However, you can edit these parameters directly.

To edit these parameters directly in the operator parameter dialog box, select No for Advanced Settings Automatic Optimization, and then click Edit Settings. Set your desired configuration in the resulting Advanced Settings dialog box.

Spark operators Advanced Settings dialog box

Note: Available options are determined by the type of operator. The following table shows the settings that apply for all operators for which you can enable Spark. For information about additional settings, see the specific operator help.
  • If you check a check box from the Override? column, you can specify a value for the corresponding setting, which supersedes any default value set by your cluster or workflow variables. If you provide no alternative value, the default value is used.
  • If you click Add Parameter, you can provide custom Spark parameters. This option provides more control and tuning on your Spark jobs. See Spark Autotuning for more information.
Setting Description
Disable Dynamic Allocation

Select both check boxes to indicate that idle CPU cores or execution memory should not be released for other applications.

Dynamic allocation allows Spark to increase and decrease the number of executors as it needs them over the course of an application. If you can configure dynamic allocation on your cluster, it is probably most performant to do so.

By default, dynamic allocation is disabled. Team Studio can use dynamic allocation only if the following conditions are true.

  • It is enabled in alpine.conf.
  • You have not set the number of executors.
  • Your cluster is correctly configured for dynamic allocation.
Number of Executors Specify the number of Spark executors to run this job (spark.executor.instances).
Executor Memory in MB Specify the Spark executor memory in megabytes.

This value depends on the size of the data, the resources on the cluster, and the YARN container. Team Studio prevents you from setting this value higher than the size of the YARN container. Override this behavior by setting the limit.spark.executor.memory value in alpine.conf to false.

Driver Memory in MB Specify the Spark driver memory, in megabytes.

Some operators, such as Alpine Forest and Summary Statistics, pull a lot of information back to the driver, so these operators assign more driver memory.

This value depends on the size of the data, the resources on the cluster and the YARN container, and the algorithm. Team Studio prevents you from setting this value higher than the size of the YARN container, even if this is set by the user. Override this behavior by setting the limit.spark.executor.memory value in alpine.conf to false.

Number of Executor Cores Specify the number of executor cores to use on each executor for the Spark job (spark.executor.cores).

When this value is explicitly set, multiple executors from the same application can be launched on the same worker if the worker has enough cores and memory. Otherwise, each executor grabs all the cores available on the worker by default, in which case only one executor per application can be launched on each worker during one single schedule iteration. See the Spark documentation for more information.

Additional Information

See the Spark Documentation for more details on the available properties.
Related reference