Working With In-Database Data


When you are working with data from an external data source (in-database or in-db data) there are a number of features that you can use with in-memory data that are unavailable. See below for more information.

One thing to think about when working with in-db data is that changes to the underlying database schema will not automatically be reflected in the Spotfire analysis. This means that if a column is added to a database table you need to perform a Refresh Schema operation in Spotfire in order to see the new column in the analysis. Not all users will have the sufficient database privileges to perform a full schema refresh. However, changes to the actual data can be updated using a simple Reload/Refresh Data by most users.

Note: Before you can work with in-db data in Spotfire there are few prerequisites that need to be met (see the System Requirements at http://support.spotfire.com/sr.asp for details):

Limitation

Why?

The table visualization and the Details-on-Demand (D-o-D) cannot display more than 10000 rows from an in-db data source. The table and the D-o-D do not support data from cubes. Virtual columns are not supported for in-db data.

There is a preset limit of 10000 rows to prevent large databases from overflowing Spotfire.

 

Marking in table visualizations is only supported if the underlying database table has a primary key defined.

A primary key is required to know which rows to mark.

The summary table cannot be used with in-db data.

The summary table has not yet been adjusted to be able to handle in-db data.

The box plot cannot be used with in-db data.

The box plot requires both aggregated and non-aggregated data and has not yet been adjusted to handle in-db data.

Scatter plots and 3D scatter plots need to be configured as aggregated visualizations when using in-db data.

The number of rows available on an external data source may be too large to handle within Spotfire.

Dendrograms can only be shown in heat maps if the X-axis is set to (None), when using in-db data. Highlighting and marking in the dendrogram is not supported.

Dendrograms are row-index based and in-db data tables do not have any row index.

Not all of the standard Spotfire aggregation methods and expressions are available on all external systems. On the other hand, there may be other methods available that Spotfire does not have by default.

The aggregation methods supported by the external data source determines which methods will be available for in-db data.

 

OVER expressions must be written as post-aggregation expressions, or as custom expressions, where applicable.

OVER expressions must be performed on the aggregated view. See Using Expressions on Aggregated Data (the THEN Keyword) for more information.

Automatically created date and time hierarchies are not available for in-db data.

The shortcuts for setting up date and time hierarchies have not been adjusted to handle in-db data.

Transformations are not available for in-db data.

 

Since no data is stored within the analysis in runtime, it is not possible apply any transformation on in-db data.

(However, data tables based on external data connection views can be imported if they are added using the Add Data Tables dialog. Once the data is in-memory you can use Insert > Transformation to apply transformations.)

If the connection to the external data source is lost, no further analysis using that data source can be performed.

Since no external data is stored within the analysis in runtime and each change induces a new query to be sent to the external data source, the data source must be available as long as changes to the visualizations are to take place.

Tags are not available for in-db data.

Tags are dependent on row numbers which are not available for in-db data.

You cannot delete columns or rows from in-db data tables.

In-db data tables contain no data in runtime so nothing can be deleted.

(However, you can hide columns from the source tables when selecting tables in the Views in Connection dialog (for relational data sources) or the Data Selection in Connection dialog (for cube data sources).

You cannot apply any of the following tools on in-db data tables:

Insert Columns,

Insert Rows,

Insert Binned Column,

Data Relationships,

K-means Clustering,

Line Similarity,

Data Functions,

Regression Modeling,

Classification Modeling,

Insert Predicted Columns

In-db data tables contain no data in runtime and can, hence, not incorporate any other data.

Insert > Calculated Column has the following limitations when working with in-db data:

* No cube connectors are supported.

* Some connectors (e.g., Cloudera, Hortonworks and Vertica) do not support aggregation methods.

* Binning is not supported.

* Post-aggregation expressions (the THEN keyword) are not supported, which means that OVER expressions are not supported.

The expression for the calculated column is evaluated in the underlying database management system. This means that only methods supported by the current connector will be available.

Cube attribute or dimension columns cannot be used on numeric axes.

Since all cube dimensions are mapped as String when cube data is added to Spotfire, these columns cannot be used on any numeric axes even if the name of the column may indicate that the result is numeric.

By importing parts of the cube data it is possible to convert string data to other data types.

On-demand is currently not supported for cube data sources (neither for imported cubes nor for in-database cubes).

On-demand has not yet been adjusted to be able to handle cube data.

In contrast to data connections from relational data sources, selections for cube data connections always result in a single data table.

The data selection steps for cube data sources are currently focused on limiting the flattened cube data, not on viewing different slices simultaneously.

Auto-bin Column only works for certain connecors; e.g., Teradata, Oracle, PostgreSQL and IBM Netezza.

The ability to use width_buckets for auto-binning is only supported by some external data sources.

Export Data from Data Table cannot be used when exporting from in-db data tables.

In-db data tables contain no data in runtime so nothing can be exported from the data table.

Tip: Use the Export Data from Visualization option when you want to export from in-db data tables. Note that for all visualizations other than the table and the cross table you need to mark the items of interest before opening the Export Data dialog.

String comparisons between different systems may provide different results due to handling of trailing spaces.

External systems that adhere to the SQL-92 standard (e.g., SQL Server, MySQL and Netezza) will trim trailing spaces for strings.

You need to provide wildcards to obtain fuzzy matching in text filters when working with in-db data.

Typing a part of a word with in-db data will only search for the exact match of those letters. (With in-memory data there will automatically be a wildcard appended to the entered letters.)

If you have multiple filters for an in-db data table, you will not get the visual clue indicating which values have been filtered out using the other filters, the way you do with in-memory data tables.

While graying out values filtered out by other filters is a costly operation, there is currently no visual connection between filters for in-db data tables.

See also:

Data Overview