Connector for Apache Spark SQL — Features and settings

Page content

Connector features

The following functionality is available when you access data with the connector for Apache Spark SQL.


Feature	Supported?
Load methods	Import (in-memory) External (in-database) On-demand
Custom queries	Yes
Stored procedures	Yes
Custom connection properties	Yes
Single sign-on with identity provider	Yes
Authoring in web client	Yes
Supported on Linux Web Player	Yes

Data source properties

The following are the supported data source properties that you can configure when you create a data connection with the connector for Apache Spark SQL.


Option	Description
Server	The name of the server where your data is located. To include the port number that the Spark Thrift Server listens on, add it directly after the name preceded by colon. Example: `MyDatabaseServer:10001` Default port number: `10000`
Authentication method	The authentication method to use when logging into the database. The following options are available: No authentication Kerberos Username Username and password Microsoft Azure HDInsight Service Identity provider (OAuth2)
Host FQDN	[Only available for Kerberos authentication.] The fully qualified domain name of the Spark Thrift Server host. For more information about the host FQDN, contact your Apache Spark SQL system administrator.
Service name	[Only available for Kerberos authentication.] The Kerberos service principal name of the Spark server. For example, "spark". For more information about the service name, contact your Apache Spark SQL system administrator.
Realm	[Only available for Kerberos authentication.] The realm of the Spark Thrift Server host. Leave blank if a default Kerberos realm has been configured for your Kerberos setup. For more information about the realm, contact your Apache Spark SQL system administrator.
Identity provider	[Only applicable for Identity provider (OAuth2) authentication.] Select the identity provider you want to use for logging in to the data source. The options available in the drop-down menu are the identity providers you have added to the OAuth2IdentityProviders preference.
Scopes	[Only applicable for Identity provider (OAuth2) authentication.] Scopes determine what permissions Spotfire requests on your behalf when you log in to the data source. Default Use the default scopes that you have specified for your identity provider in the `OAuth2IdentityProviders` preference. Custom Enter scopes manually in the text box. Separate values with a space. `Scope_1 Scope_2`
Thrift transport mode	Select the transport mode that should be used to send requests to the Spark Thrift Server. The following options are available: Default (The Spark SQL ODBC driver will use either binary or SASL, depending on the Spark Server version you are connecting to.) Binary SASL HTTP
HTTP Path	[Only available for Thrift transport mode HTTP.] Specify the partial URL that corresponds to the Spark server you are connecting to. Note: The partial URL is appended to the host and port specified in the Server field. For example, to connect to the HTTP address `http://example.com:10002/gateway/default/spark`, you would enter the following: `Server: example.com:10002 HTTP Path: /gateway/default/spark`
Connection timeout (s)	The maximum time, in seconds, allowed for a connection to the database to be established. The default value is 120 seconds.
Command timeout (s)	The maximum time, in seconds, allowed for a command to be executed. The default value is 1800 seconds.

Custom properties for Apache Spark SQL connection data sources

The following is the default list of driver settings that are allowed as custom properties in Apache Spark SQL connection data sources. To learn how to change the allowed custom properties, see Controlling what properties are allowed.

Default allowed custom properties

ADUserNameCase, AOSS_AuthMech, AOSS_CheckCertRevocation, AOSS_Min_TLS, AOSS_PWD, AOSS_TrustedCerts,
AOSS_UID, AOSS_UseSystemTrustStore, AsyncExecPollInterval, AutoReconnect, BinaryColumnLength, 
Canonicalization, CheckCertRevocation, ClientCert, ClientPrivateKey, ClientPrivateKeyPassword, 
ClusterAutostartRetry, ClusterAutostartRetryTimeout, DecimalColumnScale, DefaultStringColumnLength, 
DelegateKrbCreds, DelegationUID, DriverConfigTakePrecedence, EnableAsyncExec, EnablePKFK, 
EnableQueryResultDownload, EnableStragglerDownloadMitigation, EnableSynchronousDownloadFallback,
FastSQLPrepare, ForceSynchronousExec, HTTPAuthCookies, InvalidSessionAutoRecover, LCaseSspKeyName,
MaximumStragglersPerQuery, Min_TLS, ProxyHost, ProxyPort, ProxyPWD, ProxyUID, QueryTimeoutOverride,
RateLimitRetry, RateLimitRetryTimeout, RowsFetchedPerBlock, ServiceDiscoveryMode, ShowSystemTable, 
SocketTimeout, StragglerDownloadMultiplier, StragglerDownloadPadding, StragglerDownloadQuantile, 
ThrowOnUnsupportedPkFkRestriction, TrustedCerts, TwoWaySSL, UseNativeQuery, UseOnlySSPI, UseProxy, 
UseSystemTrustStore, UseUnicodeSqlCharacterTypes

Supported data types

When you are setting up a connection to an external data source, Spotfire needs to map the data types in the data source to data types in Spotfire. The following are the data types that the Apache Spark SQL connector supports.


Database data type	Spotfire data type
BINARY	Binary
BOOLEAN	Boolean
TIMESTAMP	DateTime
TINYINT	Integer
SMALLINT	Integer
INT	Integer
BIGINT	LongInteger
DOUBLE	Real
FLOAT	SingleReal
STRING	String
DECIMAL (precision (p), scale (s))	When p = 0 and s = 0: Currency When p <= 9 and s = 0: Integer When p <=18 and s = 0: LongInteger When p <= 15: Real Else: Currency Note: DECIMAL columns from temporary tables/views are always mapped to the Spotfire data type Currency, because their precision (p, s) is unlimited (0, 0).

Supported functions

Supported functions are the functions that you can use when you work with in-database data tables, for example for calculated columns and custom expressions.

Note: Some supported functions might not be possible to use with your database. This depends on what functions are available in the database, which often differs between database versions and types.

The following are the functions that the Apache Spark SQL connector supports.


Function type	Functions supported
Date and Time	`DateDiff`, `Date_Add`, `Date_sub`, `Day, DayOfMonth`, `From_utc_timestamp`, `Hour`, `Minute`, `Month`, `Quarter`, `Second`, `To_date`, `To_utc_timestamp`, `Week`, `WeekOfYear`, `Year`
Conversion	`SN`
Math	`Abs`, `ACos`, `ASin`, `Atan`, `Bin`, `Ceil`, `Ceiling`, `Conv`, `Cos`, `Degrees`, `E`, `Exp`, `Floor`, `Hex`, `Ln`, `Log`, `Log2`, `Log10`, `Negative`, `Pi`, `Pmod`, `Positive`, `Pow`, `Power`, `Radians`, `Rand`, `Round`, `Sign`, `Sin`, `Sqrt`, `Tan`
Operators	`%`, `+`, `-`, `*`, `/`
Statistical	`Avg`, `Bit_And`, `Bit_Or`, `Bool_And`, `Bool_Or`, `Corr`, `Count`, `Covar_pop`, `Covar_samp`, `Max`, `Min`, `Percentile`, `StdDev_Pop`, `StdDev_Samp`, `Sum`, `UniqueCount`, `Variance`, `Var_Pop`, `Var_Samp`
Text	`ASCII`, `Concat`, `Concat_ws`, `Find_in_set`, `Get_json_object`, `Instr`, `Length`, `Locate`, `Lower`, `Lcase`, `LPad`, `LTrim`, `Parse_url`, `Regexp_extract`, `Regexp_replace`, `Repeat`, `Reverse`, `RPad`, `Rtrim`, `Space`, `Translate`, `Trim`, `Ucase`, `Upper`

Other supported functionality

Temporary view / Temporary tables
Global temporary views

Note: Binning is not supported by this connector.