Working With Large Data Volumes

When you are working with massive amounts of data there may always be certain operations that take time to perform. However, with TIBCO Spotfire you do not have to be afraid to try out different alternatives. You can always cancel an operation if it looks like it is going to take a long time. You can undo an operation, or switch to a different alternative (e.g., switch to a column with fewer unique values on an axis) if you do not want to wait for the calculations to finish.

However, here are a few tips which can be useful when you are working with large data tables and you want to increase the performance of your analysis:

Visualizations and Analyses

Use aggregated visualizations as a starting point and use details visualizations for smaller, filtered portions of data only. Many graphical elements in the analysis will take some time to render. This is especially important on the web player which does not allow hardware acceleration.
Think about whether there are any alternative ways you can visualize your data in order to see the same thing. Can you use a different visualization type? Or partly aggregate the data? For example, binning can be used to aggregate markers in a scatter plot and still allow you to see a distribution. Using the bin sliders you can increase the number of markers shown until it takes too long time to make changes.
Sorting in cross tables, etc., takes time.
Analysis files which have previously been saved in version 4.5 or older can be saved in the latest version to shorten the time it takes to load the file.
Hide or delete unused filters (or do not create filters for external columns unless you have to).
Use the list box filter or the text filter rather than the item filter when working with columns with a lot of unique values. Item filters are costly to display, even when they are not used. If you have old analysis files using item filters for these type of columns it is recommended to manually change the filter type to a list box or text filter and save the file again.
Some types of aggregations are more time consuming than others. For example, use average rather than median, if possible.
Use the data type real rather than currency. The currency formatter can be applied to the real data type.
It is recommended to use the filters panel or the data panel instead of adding a lot of filters to text areas. Filters in text areas can make the analysis seem unresponsive. The more filters you add to the text area, the less responsive the application becomes.
Calculated values (labels) and sparklines in text areas may also give rise to unresponsive analyses.
Use post-aggregation expressions for all expressions including OVER since these calculations are faster when done on an already aggregated view.

Hardware

Use a fast solid-state drive (SSD) if possible.
Do not run other applications on the same machine when working with large data volumes.

Loading Data

Use sorted input on categorical columns.
Loading data from an SBDF file is much faster than from TXT.
If the data is in a tall and skinny format rather than a short and wide you may obtain better performance.
Remove invalid values from your data before importing into Spotfire.
If you intend to import data from an external data source, limit the selected data as much as possible prior to import. This will increase the chances that the import is successful.

Data Export

Export from a data table rather than from a table visualization.
Export to SBFD rather than to TXT.

Web Player

Avoid visualizations with many graphical elements (no hardware acceleration will make the rendering time very long).
Use scheduled updates, when possible.

Preferences

An administrator can modify the MarkingWhereClauseLimit or the MarkingInQueryLimit preference (under Administration Manager > Preferences > DataOptimization). With lower limits, the allowed complexity of marking queries is reduced. This is important when working with external data sources. See Preferences Descriptions in the Administration Manager help for more information.
Switch off the automatic creation of filters. This can be turned off for a specific data table in the Data Table Properties dialog, and for all new in-memory data tables under Tools > Options – Document.

API

Prefer iterator based data access over random access. Use DataRowCursor API:s over GetValue(rowindex) style API:s.
Be careful when using custom comparers - depending on usage they may become a bottleneck. Consider if the problem cannot be solved in other ways.
If things are slow and you are using old custom extensions, see if they can be refactored or if some time-consuming steps can be removed. Some API:s are by nature slow and old code might benefit from some refactoring. Try loading without any extensions to see if one of them may be the culprit.