Connect to a Hive JDBC Data Source

This topic describes how to make a Hive data source available as a JDBC connection to Team Studio.

For information about which Hadoop distributions support Hive as a JDBC data source, see System Requirements. For information about adding Hive as a Hadoop data source, see Connect to a Hive Data Source on Hadoop.

Prerequisites

Procedure

  • Place the appropriate Hive JAR files in the ~/ALPINE_DATA_REPOSITORY/jdbc_driver/Public and $CHORUS_HOME/shared/libraries folders. The list of necessary JARS is as follows:
    • commons-logging-*.jar
    • hive-common*.jar
    • hive-exec*.jar
    • hive-jdbc*.jar
    • hive-metastore*.jar
    • hive-service*.jar
    • httpclient*.jar
    • httpcore*.jar
    • libfb303*.jar
    • libthrift*.jar
    • log4j*.jar
    • slf4j-api*.jar

    The * indicates that the version might be different, depending upon vendor. These JARs should all be available from the vendor installation.

Hive JDBC on CDH, HDP, or PHD



Procedure

  • Fill in the required fields, marked with an asterisk *.
    • Data Source Name: Set a user-facing name for data source. You can choose anything you like.
    • Description: Enter some optional text with information about this data source.
    • Hadoop Version: The distribution of Hadoop that is running your Hive server. CDH5, HDP and PHD are supported Hadoop distributions. Note: the JAR files copied into $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc_driver/Public/ and $CHORUS_HOME/shared/libraries must match the Hadoop distribution you select here.
    • JDBC URL: the JDBC URL used to connect to the data source.
      • CDH: The URL should be in the format jdbc:hive2://SERVER_HOSTNAME:PORT
      • HDP: The URL should be in the format jdbc:hive2://SERVER_HOSTNAME:PORT/default?stack.name=hdp;stack.version=<hdpversion>
      • PHD: The URL should be in the format jdbc:hive2://SERVER_HOSTNAME:PORT/default?stack.name=phd;stack.version=<phdversion>
    • Authentication: Team Studio supports standard password authentication and Kerberos authentication. Select the type of authentication that is configured on your Hive server.
      • Database Account and Database Password: If Account/Password authentication type is selected, enter the Hive metastore account and password. By default, Hive uses an account of hive with password of hive.
      • Kerberos: If the Kerberos authentication type is selected, enter the Kerberos Principal and Kerberos Keytab Location. The Kerberos principal must have permission to access the Hive server, and is typically hive/myHadoopcluster.com@mycompany.com.
    • Set database credentials as a shared account: If your authentication type is set to Account/Password, the option to share the database credentials is available. If you check Set database credentials as a shared account, all users can access the data source without providing their own credentials - they are accessing the database with your credentials as the data source owner. If you do not check this box, each user must enter his or her own credentials for that data source in order to access it. You can change this setting later if you change your mind.