Connect to a Pivotal Hadoop (PHD) Data Source

You might be required to add an additional parameter to configure a Team Studio data source to connect to PHD 3.0.

Perform this task from the Pivotal Hadoop UI, and also in the Team Studio data source connection UI.

Prerequisites

  • You must have access to the Pivotal Hadoop command line and the configuration UI.
  • You must be able to add a the Team Studio data source connection configuration user interface.

Procedure

  1. Open the Pivotal Hadoop configuration UI.
  2. Locate the file mapred-default.xml.
  3. In the file mapred-default.xml, locate the following class path parameters.
    mapreduce.application.classpath
    mapreduce.application.framework.path
  4. Check if these parameters contain an environment variable named either stack.name or stack.version.
    • If the parameter does not exist, then it is not needed by Team Studio to configure the data source connection. You can continue to Step 8.
    • If the environment variable exists, then you must provide the version information when you configure the data source in Team Studio. Continue to step 5.
  5. Open a command-line prompt on the Pivotal Hadoop cluster.
  6. Run the following command.
    hadoop version
    The output should be /usr/phd/<yourversion#> where yourversion# is the version of Hadoop you are running (for example, 2.4.0.2.1.2.0-403).
  7. Make a note of the version number (both major and minor).
  8. Open Team Studio Web UI.
  9. From the menu, click Data.
    The Data Sources window is displayed.
  10. Click Add Data Source.
    The Add Data Source dialog box is displayed.
  11. From the Data Source Type drop-down menu, select Hadoop Cluster.
  12. Provide all required information.
  13. If you found stack.version or stack.name from Step 4, then click Configure Connection Parameters.
    Note: If you found neither stack.version nor stack.name, then skip the next steps. Test the configuration to confirm that it is correct, and then click Add Data Source.
  14. In the resulting Configure Connection Parameters dialog box, provide the key (either stack.version or stack.name and the value (the version number from step 6).


    Note:
    • Set the parameter stack.version=<your cluster version> to run jobs on the server. (For example, stack.version=2.3.4.0-3485)
    • Set hive.additional.parameter.disabled=true to ensure that the stack.version parameter is accepted.
  15. Save and test the configuration, and then click Add Data Source.