Adding a Spark Cluster from the User Interface
To add a Spark cluster, first, make sure the TIBCO Data Science - Team Studio server can connect to the hosts.
You must have Data Administrator or higher privileges to add a Spark cluster. Ensure that you have the correct permissions before continuing.
- Procedure
- From the sidebar menu, select
Data.
- On the Data Sources page, click Add Data Source.
- The Add Data Source dialog appears.
From the Data Source Type dropdown list, select Spark Cluster.
- In the Data Source Name field, specify a user-facing name. You can provide any useful text.
- In the Description field, provide a useful description for your Spark cluster. This field is optional.
- From the Cluster Manager Type dropdown list, select the cluster manager for Spark. The available options are Apache Spark Standalone and YARN.
- When Apache Spark Standalone is selected, provide the URL for Spark in the Spark Master URL field.
- When YARN is selected, perform the following steps:
- In the YARN Configurations Files, click Select Files, and then browse for the (yarn-site.xml and core-site.xml) configuration files in your local system.Note: You can download the (yarn-site.xml and core-site.xml) files from the YARN clusters. You can also get these files from the cluster admin.
Before uploading, open the core-site.xml file and remove the
com.hadoop.compression.lzo.LzoCodec
andcom.hadoop.compression.lzo.LzopCodec
value from theio.compression.codecs
property. - In the Hadoop Username field, enter your Hadoop username.Note: The username must be available in the /user directory of Hadoop cluster. The user must have the read/write permissions. If you face any permission issues, see Configuring the HDFS Directory and Permissions for Results File Storage.
- In the YARN Configurations Files, click Select Files, and then browse for the (yarn-site.xml and core-site.xml) configuration files in your local system.
- For further configuration, choose
Configure Connection Parameters. The CONFIGURE CONNECTION PARAMETERS dialog appears.
- Specify key-value pairs for YARN on the TIBCO Data Science - Team Studio server.
- To add a new parameter, click Add parameter.
- To edit the connection parameters in bulk, click Bulk Edit.
- Click Save.
For more information, see the Configuring Connection Parameters.
-
In the Workspace Visibility dropdown list, select the visibility of the workspace. The available options are Public and Limited.
Note: A data source with Limited visibility must be manually associated with a workspace for members of that workspace to use the data source. To learn more about associating a data source to a workspace, see Data Visibility. -
The Load Configuration from File allows you to set the values from a file that has been saved from another Spark Cluster connection.
-
Click Add Data Source to add the data source.
Perform the following steps to add a Spark cluster in TIBCO Data Science - Team Studio