Connecting Team Studio to Data Sources

Review and follow these steps to connect your installation of Team Studio to your data sources.

Perform this task on the computer where you have installed Team Studio.

Prerequisites

Test network connectivity and configure the Team Studio server.

Enable web sockets.
Verify that web sockets are correctly enabled by using a web socket test.
Access the cluster nodes, including the NameNode and DataNodes for Hadoop.
Verify that you can connect to them by using the command $ telnet hostname port.
Enable read and write permissions for the appropriate directories, including /tmp for Hadoop.
Verify this step by writing to a file in one of those directories and running a MapReduce job, if applicable.
Ensure that the appropriate agent is enabled for your data source.
Configure the necessary ports in $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/configuration/alpine.conf.
If you are using Spark, ensure the following.
- The Spark host is added in $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/configuration/alpine.conf.
```
alpine.spark.sparkAkka.akka.remote.netty.tcp.hostname = IP address for Team Studio Server
```
- Full communication is open between the Team Studio server and all cluster nodes.
Ensure the Team Studio server can access the LDAP server if applicable.
Verify that you can connect by using $ telnet hostname port.

Connect to either a database data source or a Hadoop data source.

Database Data Sources
You can add a database as a data source in Team Studio from the sidebar menu by selecting Data and then selecting Add Data Source.
Hadoop Data Sources
These topics show you how to add a Hadoop data source from the command line or through the Team Studio user interface, and how to connect to various data sources.

Related concepts

Administrator Options in Team Studio

Related tasks

Workflow Editor Preferences