Spark Optimization for Data Scientists

Many operators in Team Studio run using custom Spark algorithms. While many Spark implementations come with default configuration settings, these might not be optimal for every use case.

You can edit your Spark configuration from Team Studio in three ways:

  • Operator settings
  • Alpine.conf (overridable at the workflow level with workflow variables)
  • Data source configuration

You can edit the settings, such as memory and executors, for each of these options. Each takes effect on a different scope of the application.

To learn more about Spark settings, see the official documentation.

Related reference