Importing data from cube data sources
When you add data tables from a data connection, you have the option to either keep the data table external or to import it into the Spotfire data engine and use it in-memory.
When you are importing cube data into Spotfire you will actually flatten the cube into a single data table, which means that some features that you have access to when working with an in-database cube are lost and you may see different results when aggregating your imported data in Spotfire compared to what you get when the external system does all the aggregation. Spotfire does not import aggregated data, only leaf nodes that then can be aggregated using the Spotfire aggregation methods.
Limit the number of dimensions
If you intend to import data from a cube into Spotfire there are a number of things that you must remember.
First of all, the full cube may be enormous, so it is recommended to always limit the dimensions to import to those that are absolutely necessary only, to reduce data explosion as much as possible. If a cube like the one shown to the left below is imported into Spotfire, you will get the table shown to the right. The visualizations you set up will then decide how to aggregate the measures from the resulting table.
When an imported cube data table is aggregated (for example, with the default Sum aggregation), the available leaf values are simply summarized when you show a categorical value of a higher level (e.g. the Sum(Sales) for France from the table above). However, in the in-db cube, it is the external system that decides what an aggregation means, which may include other aggregation methods, data that was not included in the import, and so on, which may produce a completely different result.
As an example of the importance of limiting the data as much as possible prior to import, imagine a cube with the following number of values for different dimensions.
Department: 10
Product: 5000
Currency: 20
Business Unit: 1000
Scenario: 10
Account: 100
10x5000x20x1000x10x100=1x10 12
As you can see, the amount of data very quickly explodes with the number of dimensions used, especially if each dimension contains many different values. Take care not to overload the data source by combining too many dimensions.
Data import best practices
To get the most out of your data, follow these recommendations for import of data from cubes.
- Limit the data to import as much as possible by selecting only the necessary dimensions and measures in the Data Selection in Connection dialog.
- Limit data further by clearing check boxes for undesired categories in the selected data in the Limit Data dialog.
- Click Estimate Import Size to try to evaluate whether import is possible before you try to import a data selection.
Known limitations
- On-demand is not supported for cube data sources (neither for imported cubes nor for in-database cubes).
- In contrast to data connections from relational data sources, selections for cube data connections always result in a single data table.
- You cannot switch from an imported configuration to an in-database configuration in a cube data connection, the way you can with relational data sources, or vice versa. However, you can always replace a data table based on a cube data connection with another data table.
- When cube hierarchies have been added to a connection, the hierarchies in Spotfire will not be updated when the data connection is edited. However, you can always recreate new hierarchies manually.
- Import of large amounts of data may not work. If you try to import and it fails, edit the connection and limit the selected dimensions and/or categories some more.
- The estimation of the expected import size is only an approximation and cannot give any definitive information regarding whether or not an import will work. If the estimation takes a long time it often, but not always, means that the result of the selection is too big to import.
- If multiple instances of one leaf value exist you may get different behavior between imported data and in-db data, since the imported data may have lost hierarchical information required in order to distinguish the leaf values from each other. For example, there is more than one city called Berlin in the world and after an import, all of those cities will be clumped together into one group in the flattened data table, unless you make sure to always include full hierarchies on all axes.
- You will always get the leaf nodes for those branches in the hierarchies you select. This means that you can only select full hierarchies, not single columns from a hierarchy.