Accessing Data from Google BigQuery


You can access and analyze your Google BigQuery data in Spotfire.

Prerequisites

Before you start, read the System Requirements for Spotfire Connectors. Make sure your version of Spotfire includes the connector for Google BigQuery.

You do not have to install a driver to use the connector for Google BigQuery.

If localhost is not accessible due to firewall settings, a proxy, or for other reasons, you must define a local redirect address template. For more information, see Defining a local redirect address template for your version of the Spotfire® Server and Environment - Installation and Administration manual.

Google account requirements

To be able to access data from Google BigQuery in Spotfire, you need a Google account. You must be a member of a Google Cloud Platform project where BigQuery is enabled.

Note: In the Google project, make sure you have roles that enable you to view and query the BigQuery data.

For more information on how to use BigQuery in your Google Cloud Platform project, see the official documentation from Google.

Billing in Google BigQuery

When you analyze your BigQuery data in Spotfire, remember that there is a cost for processing queries in Google BigQuery. When you create a connection, you select a Google Cloud Platform project as the billing project. The project you select is charged for the data that Google BigQuery processes for each query.

The number, size, and complexity of queries that Spotfire sends to Google BigQuery depends on many factors. The way you configure your connection or analysis impacts how Spotfire queries the external database. For example, you can try to limit your data selection in the connection. If you select only the data that you want to analyze and leave out data that is not of interest, you can reduce the size of queries. Another factor is the load method for data tables. If you import a data table, only one query for the entire table is sent when you open or refresh the connection. You can also keep the data table external, for in-database analysis. Then Spotfire sends queries to the database continuously based on your actions in the analysis.

Other preparations

Connecting to Google BigQuery

To access your BigQuery data in Spotfire, create a connection to BigQuery. When you have created your connection, you can share it in the Spotfire library. Then you and your team members can reuse the connection for convenient access to the data you selected in the connection.

When you select Import as the load method for your data from BigQuery databases, Spotfire tries to access the data using the BigQuery Storage Read API. This loading mechanism provides much faster import performance compared to BigQuery’s other retrieval methods. To use the Storage Read API, you need the permission bigquery.readsessions.create. When loading data where you lack this permission, for example for public data, Spotfire will fall back to BigQuery’s other, slower retrieval methods. To learn more about the BigQuery Storage Read API, see BigQuery Storage Read API overview from the Google Cloud documentation.

 

 

Creating a new connection to Google BigQuery

If you do not have a shared connection that includes the BigQuery data that you want to analyze, you can create a new connection. Then you can make your own data selection from the BigQuery data you have access to.

  1. Open the Files and data flyout, and click Connect to.

  2. In the list of connectors, select Google BigQuery. Then click New connection.

  3. In the Google BigQuery Connection dialog, click to launch the Google login procedure in your web browser.

  4. In your web browser, follow the instructions to log in with your Google account. In the last step, click to allow the Spotfire application access to Google BigQuery. When you have logged in, return to Spotfire.

  5. In the Google BigQuery connection dialog, select the Billing project.

  6. Optionally, if you want to access datasets that you have access to in projects that you are not a member of, enter the project ID(s) under Include data from additional projects. Optionally, select if you want to Enable repeated columns and Show public data.

  7. To connect, click OK.

  8. In the Views in Connection dialog, select the data you want to analyze in Spotfire. When you are happy with your data selection, click OK.

    Tip: For more information, see Selecting data from Google BigQuery.

  9. The final step is a summary view of the data you are about to add to your analysis. Here, you can configure if you want the data tables to be loaded as Imported, External, or On-demand.

    Tip: For more information, see Choosing how to load the data.

  10. To add the selected data to your analysis, click OK.

Selecting data from Google BigQuery

When you create or edit a connection to Google BigQuery, you select the data you want to analyze in the Views in Connection dialog. The following is information, tips, and limitations that are specific to selecting data from BigQuery in Spotfire.

Accessing different datasets

You might have access to BigQuery datasets in Google Cloud Platform projects that you are a member of, and in projects that you are not a member of.

Datasets that you have access to in projects that you are a member of are always available in Spotfire, when you have signed in with Google.

To access datasets in projects that you are not a member of, you must specify that you want to include data from those projects. Enter the Google Cloud Platform project ID(s) in the Include data from additional projects field, in the Google BigQuery Connection dialog.

Note: If you cannot see a dataset that you have access to, it might be because it comes from a project that you are not a member of. See above.

Standard SQL and Legacy SQL

In Google BigQuery, you might have worked in two different dialects of SQL; Standard SQL and Legacy SQL.

In Spotfire, only the Standard SQL dialect of SQL is supported.

When you select data from Google BigQuery, this means that:

Custom queries

When you select data from Google BigQuery with a custom query, you must write your query in Standard SQL. The Legacy SQL dialect of SQL is not supported.

If the data you want to select with a custom query contains nested, you cannot write SELECT *. The columns returned from a custom query cannot be of the type STRUCT or ARRAY. If there are nested in the data, you must unnest them in the query. For more information about unnesting, see the official documentation from Google.

Repeated columns from Google BigQuery are not supported in custom queries.

Repeated columns

You can analyze BigQuery data with repeated columns. To enable repeated columns, select Enable repeated columns when you create the connection

In Spotfire, the names of repeated columns from BigQuery are shown inside brackets []:

[column_name]

Data with repeated columns is flattened in Spotfire. For more information about how to work with repeated columns from Google BigQuery, see:

Nested columns (records)

A record column in BigQuery is a column that has other columns nested inside it.

You can recognize nested columns and their relationships from the column names in Spotfire. For example, if you have a record column Person, that has the columns Name and Age with related information about the person nested inside it, the column names in Spotfire would be:

Person.Name

Person.Age

The record column Person itself is not listed in Spotfire. The reason is that it is only a container for the nested columns inside it, and does not contain any data on its own.

Partitioned tables

In Spotfire, you see the partitioning columns in partitioned tables as parameters. When you select a partitioned table, each partitioning column is represented by two parameters. The two parameters correspond to a to and from value for the partitioning column.  

 

To input values for the parameters, you use prompts or on-demand. With prompts, you can input values for the parameters when you open the connection. This way each user can choose to limit the data selection to only data within the range of interest.

Note: If you configure your data table as an on-demand data table, you can use the parameters as the basis for limiting data in the on-demand settings. See On-demand overview for more information.

If a partitioned table has mandatory partitioning columns, prompts are automatically created for the corresponding parameters in Spotfire. If you want to use on-demand as input for the parameters, you can remove the prompts for the parameters.

When you define a prompt for a parameter based on a partitioning column, only the prompt type single selection is available. For the end user, it is mandatory to enter a value in any prompts you have defined.

If you do not define a prompt for a parameter, the data is not limited based on that parameter.

Public data

You can decide if you want BigQuery public datasets to be available for selection when you connect to Google BigQuery. To enable public datasets in your connection, select the checkbox Show public data in the Google BigQuery connection dialog.

When you enable this option, you might see very long loading times for listing data tables the Views in Connection dialog. This is due to the large amounts of BigQuery public datasets.

See also:

Details on Google BigQuery Connection

Google BigQuery Data Types

Supported Functions - Google BigQuery

Repeated Columns from Google BigQuery

Enabling Google BigQuery in Spotfire web clients