Team Studio-Specific Spark Values

Team Studio uses the following settings to determine the bulk of the Spark settings (if they are not set manually at the operator level).

Name Default Value Notes
spark.driver.memory 1024 Minimum driver memory. For large data sets and large clusters, we increase this value.
spark.show.conf true
min.spark.executor.memory 1024
percent.resources.available.to.alpine.job 1.0 = 100% Percent of available resources that we allocate to a given job. Consider reducing this value if there are many Team Studio users or if you are worried about them launching very large jobs.
limit.spark.driver.memory.based.on.capacity true Limits the Spark driver memory based on the memory capacity of the YARN container. If the memory setting is too high, we use the largest possible driver memory that still fits in the YARN container.

The YARN container must be large enough to accommodate the memory set by the Spark driver memory parameter and the overhead.

limit.spark.executor.memory.based.on.capacity true Limits the Spark executor memory based on memory capacity of the YARN container. If spark.executor.memory requested is too large, we set spark.executor.memory to the capacity. This setting also ensures that the total executor memory requested + driver memory is not too large for the total available memory on the cluster.
spark.max.executors.per.machine 5
alpine.small.cluster.threshold.g 6 We assume that if the total resources on the cluster are less than 6 GB, minimum memory settings are used.