Adding a Hadoop Data Source from the Command Line

To add an HDFS data source, first make sure the Team Studio server can connect to the hosts, then use the Add Data Source dialog box to add it to Team Studio.

Supported Hadoop distributions are listed in Team Studio System Requirements.

Prerequisites

Procedure

  1. Ensure the Team Studio user has read/write permissions on the HDFS directories. Any HDFS directory that will be used within the application must be readable and writable. In addition, these directories must be readable and writable:
    • /tmp
    • /tmp/tsds_out
    • /user
    • /user/chorus
  2. Make sure the Team Studio server can connect to the hosts with the fully qualified domain name (FQDN).

    Option A: Modify the /etc/hosts file of the Team Studio server and cluster nodes to include host names and IP address of each server.

    Option B: DNS lookup for all client and Hadoop nodes.

    On the DNS server, add these lines:
    alpinechorusserver IN A ipaddress
    clusternode1 IN A ipaddress
    clusternode2 IN A ipaddress

    They should be added to these files:

    /var/named/alpinenow.local.zone
    /var/named/alpinenow.local.rr.zone
    Now restart the named service and verify that you can connect by using telnet.
    service named restart
    telnet hostname port