Accessing data from Apache Spark SQL and Databricks
You can access data from Spark SQL and Databricks systems in Spotfire.
About this task
Before you begin
- The Apache Spark SQL connector requires a driver on the computer running Spotfire. See Drivers and data sources in Spotfire.
- To make sure that your database is supported, see the system requirements for the Apache Spark SQL connector.
Procedure
-
Open the
Files and data
flyout, and click Connect to.
- In the list of data sources, select Apache Spark SQL or Databricks.
- In the panel on the right, choose if you want to create a new connection or add data from a shared data connection:
- Connector for Apache Spark SQL — Features and settings
You can connect to and access data from Spark SQL databases and Databricks with the data connector for Apache Spark SQL. On this page, you can find information about the capabilities, available settings, and things to keep in mind when you work with data connections to Apache Spark SQL.
Working with and troubleshooting Apache Spark SQL data connections
About this task
Prerequisite: Spark Thrift Server
To access data in Apache Spark SQL with the Spotfire connector for Apache Spark SQL, the Spark Thrift Server must be installed on your cluster. Spark Thrift Server provides access to Spark SQL via ODBC, and it might not be included by default on some Hadoop distributions.
Prerequisite: spark.shuffle.service.enabled
If you use the in-database load method when connecting to Apache
Spark 2.1 or later, and you encounter errors in your analysis, the option
spark.shuffle.service.enabled
might have to be
enabled on the Spark server.
Connecting to Databricks SQL Analytics
You can also create an Apache Spark SQL connection for performing Databricks SQL Analytics queries. To be able to connect to Databricks, you must install the Databricks ODBC driver. Check the system requirements for the Apache Spark SQL connector, and see Drivers and data sources in Spotfire for finding the right driver.
Databricks cluster that is not running
When connecting to a Databricks cluster that is not already running, the first connection attempt will trigger the cluster to start. This can take several minutes. The Database selection menu will be populated once Spotfire is connected successfully. You may have to click Connect again if the connection times out.
Apache Spark SQL temporary views and tables in custom queries
If you are creating a custom query and you want to use data from an Apache Spark SQL temporary table or view, you must refer to those objects using their qualified names, specifying both the name and the location of the object. The qualified names required have the following format:
databaseName.tempViewName
By default, global temporary views are stored in the
global_temp
database. The database name can vary,
and you can see it in the hierarchy of available database tables in Spotfire.
To select all columns from a global temporary view named
myGlobalTempView
, that is stored in the global_temp
database:
SELECT * FROM global_temp.myGlobalTempView
Temporary views/tables (listed in Spotfire under 'Temporary views'
or 'Temporary tables') are always located in the
#temp
database. To select all columns in a temporary
view named
myTempView
:
SELECT * FROM #temp.myTempView
User agent tagging
If the ODBC driver that you use supports the
UserAgentEntry
option, Spotfire includes the
following string as the
UserAgentEntry
in queries:
TIBCOSpotfire/<ProductVersion>