Spark Autotuning

Tuning your Spark parameters can get confusing. Team Studio includes automatic optimization for increased performance on Spark jobs.

Based on the size of your cluster, the available resources in your queue, the size of the input data, and what is known about the operator, Team Studio can dynamically assign Spark parameters at runtime. Spark autotuning is currently available on the following operators.

Aggregation
Alpine Forest Classification
Alpine Forest Regression
ARIMA Time Series
Association Rules
Batch Aggregation
Classification Threshold Metrics
Collapse
Column Filter
Correlation
Correlation Filter
Distinct
Fuzzy Join
Gradient Boosting Classification
Gradient Boosting Regression
Join
K-Means
LDA Trainer
LDA Predictor
N-Gram Dictionary Builder
Naive Bayes
Neural Network
Normalization
Null Value Replacement
Numeric to Text
Pivot
Replace Outliers
Sort by Multiple Columns
Linear Regression
Logistic Regression
Resampling
Row Filter
Set Operations
Stability Selection
Summary Statistics
Text Extractor
Text Featurizer
Transpose
Unpivot
Variable
Window Functions - Aggregate
Window Functions - Lag/Lead
Window Functions - Rank

To enable Spark autotuning, no action is required. These operators default to Automatic Optimization being applied. You can apply a greater degree of control by editing the advanced configuration for each of the Spark settings.

Team Studio sets the following Spark parameters.

spark.executor.memory
spark.driver.memory
spark.executor.cores
spark.default.parallelism and spark.sql.shuffle.partitions

Additionally, Team Studio can determine if dynamic allocation is enabled on the cluster and, if so, to use that to choose the maximum number of executors (spark.dynamic.allocation.max.executors and spark.dynamic.allocation.enabled). If dynamic allocation is not enabled on the cluster, Team Studio sets a value for spark.executor.instances based on your cluster size, input data, and current operator.

To override the settings, set Automatic Optimization to no, and then edit the settings provided in the Advanced Settings dialog box or add your own key/value pairs. Team Studio always uses a setting provided by the user before attempting to specify one.

Settings for Spark-Enabled Operators
The easiest and quickest way to change Spark settings is from the operator itself.

Contents

Index

Search Results

Spark Autotuning