Connect to a BigQuery Data Source

You can connect Team Studio to a GCP BigQuery data source.

Perform this task on the computer where Team Studio server is installed.

Prerequisites

You must have write access to the Team Studio server. You must also have access to the GCP BigQuery configuration files.

Procedure

  1. Copy the BigQuery driver to the following directories. and then change the ownership of these copies to the user who runs Team Studio (usually user 'chorus').
    • $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc_driver/Public
    • $CHORUS_HOME/shared/libraries
  2. Change the ownership of these copies to the user who runs Team Studio.

    Usually, that user name is 'chorus'.

  3. Create a new bigquery directory named $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc/bigquery.
  4. Copy the file driver.properties from the directory $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc/default to the newly created bigquery directory.
  5. Edit the contents of the file $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc/bigquery/driver.properties as follows.
    # Specify the JDBC class driver for the desired database type.
    # Examples:
    # Oracle = oracle.jdbc.driver.OracleDriver
    # Greenplum = org.postgres#ql.Driver
    # DB2 = com.ibm.db2.jcc.DB2Driver
    # Netezza = org.netezza.Driver
    # PostgreSQL = org.postgresql.Driver
    # SQLServer = com.microsoft.sqlserver.jdbc.SQLServerDriver
    # MySQL = com.mysql.jdbc.Driver
    # Teradata = com.teradata.jdbc.TeraDriver
    # Vertica = com.vertica.jdbc.Driver
    # Sybase = com.sybase.jdbc2.jdbc.SybDriver
    # Informix = com.informix.jdbc.IfxDriver
    # SAPDB = com.sap.dbtech.jdbc.DriverSapDB
    # InterBase = interbase.interclient.Driver
    # HSqlDB = org.hsqldb.jdbcDriver
    # MariaDB = org.mariadb.jdbc.Driver
    # MySQL = com.mysql.jdbc.Driver
    driverClass=com.simba.googlebigquery.jdbc42.Driver
    
    # BigQuery (like Hive) does not support "schema"."tablename".
    # For BigQuery, this entry must be empty string: identiferQuotation=
    # with no whitespace (except newline) or characters after the equals sign
    identifierQuotation=
    
  6. Locate and open for editing the file additional_jdbc_drivers.rb.

    The path is similar to path similar to /usr/chorus/install/releases/6.3.2.0.8068-7ac910ae3/components/core/app/mixins/sequel/extensions/additional_jdbc_drivers.rb

  7. Add a line for the bigquery class so that the content resembles the following.
    module Sequel
      module AdditionalJdbcDrivers
        MAP =  {
            mariadb: ->(db) { org.mariadb.jdbc.Driver },
            teradata: ->(db) { com.teradata.jdbc.TeraDriver },
            vertica: ->(db) { com.vertica.jdbc.Driver },
            hive2: ->(db) { org.apache.hive.jdbc.HiveDriver },
            hive: ->(db) { org.apache.hadoop.hive.jdbc.HiveDriver },
            bigquery: ->(db) { com.simba.googlebigquery.jdbc42.Driver }
            }
     
        MAP.each do |key, driver|
          ::Sequel::JDBC::DATABASE_SETUP[key] = driver
        end
      end
    end
    Important: You must apply this change to the file additional_jdbc_drivers.rb again after upgrading Team Studio.
  8. Open the file $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/configuration/alpine.conf and add the following configuration parameter.
    database.bigquery.batchSize (default = 1000)
    Note: Due to idiosyncrasies in the way BigQuery handles batch updates, uploading a narrow table can result in an error. This configuration parameter addresses possible batch size issues.
  9. Restart Team Studio.
  10. Open the Add Data Source dialog box.

    Add Data Source - BigQuery

  11. Provide the Data Source Type, the Data Source Name, and (optionally), the Description.
  12. Set the data connection (JDBC URL) using a URL similar to the following.
    Tip: You can copy your BigQuery URL from your GCP BigQuery page
    jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=teamstudio-user@teamstudio-alpine.iam.gserviceaccount.com
  13. Specify Workspace Visibility.
    A data source with Limited visibility must be manually associated with a workspace for members of that workspace to use the data source. By default, this option is set to Public.
  14. Optional: Select Set database credentials as a shared account if you intend to allow all users to access the data source without using their own credentials.

    Users access the database with your credentials as the data source owner. If you do not select this check box, each user must provide credentials for that data source to access it. You can check the box later if you change your mind.