Adding a Spark Cluster from the User Interface

To add a Spark cluster, first, make sure the TIBCO Data Science - Team Studio server can connect to the hosts.

Before you begin

You must have Data Administrator or higher privileges to add a Spark cluster. Ensure that you have the correct permissions before continuing.

    Procedure

    Perform the following steps to add a Spark cluster in TIBCO Data Science - Team Studio

  1. From the sidebar menu, select Data.

  2. On the Data Sources page, click Add Data Source.

  3. The Add Data Source dialog appears. From the Data Source Type dropdown list, select Spark Cluster.

    Add Data Source - Spark Cluster

  4. In the Data Source Name field, specify a user-facing name. You can provide any useful text.
  5. In the Description field, provide a useful description for your Spark cluster. This field is optional.
  6. From the Cluster Manager Type dropdown list, select the cluster manager for Spark. The available options are Apache Spark Standalone and YARN.
    1. When Apache Spark Standalone is selected, provide the URL for Spark in the Spark Master URL field.
    2. When YARN is selected, perform the following steps:
      1. In the YARN Configurations Files, click Select Files, and then browse for the (yarn-site.xml and core-site.xml) configuration files in your local system.
        Note: You can download the (yarn-site.xml and core-site.xml) files from the YARN clusters. You can also get these files from the cluster admin.

        Before uploading, open the core-site.xml file and remove the com.hadoop.compression.lzo.LzoCodec and com.hadoop.compression.lzo.LzopCodec value from the io.compression.codecs property.

      2. In the Hadoop Username field, enter your Hadoop username.
        Note: The username must be available in the /user directory of Hadoop cluster. The user must have the read/write permissions. If you face any permission issues, see Configuring the HDFS Directory and Permissions for Results File Storage.
  7. For further configuration, choose Configure Connection Parameters. The CONFIGURE CONNECTION PARAMETERS dialog appears.
    Configure Connection Parameters
    1. Specify key-value pairs for YARN on the TIBCO Data Science - Team Studio server.
    2. To add a new parameter, click Add parameter.
    3. To edit the connection parameters in bulk, click Bulk Edit.
    4. Click Save.

    For more information, see the Configuring Connection Parameters.

  8. In the Workspace Visibility dropdown list, select the visibility of the workspace. The available options are Public and Limited.

    Note: A data source with Limited visibility must be manually associated with a workspace for members of that workspace to use the data source. To learn more about associating a data source to a workspace, see Data Visibility.
  9. The Load Configuration from File allows you to set the values from a file that has been saved from another Spark Cluster connection.

  10. Click Add Data Source to add the data source.