Spotfire® Web Client User Guide

Working with large data volumes

When you are working with massive amounts of data there can always be certain operations that take time to perform. However, with Spotfire you do not have to be afraid to try out different alternatives. You can always cancel an operation if it looks like it is going to take a long time. You can undo an operation, or switch to a different alternative (for example, switch to a column with fewer unique values on an axis) if you do not want to wait for the calculations to finish.

However, here are a few tips which can be useful when you are working with large data tables and you want to increase the performance of your analysis:

Visualizations and analyses

  • Use aggregated visualizations as a starting point and use details visualizations for smaller, filtered portions of data only. Many graphical elements in the analysis will take some time to render. This is especially important on the web client which does not allow hardware acceleration.
  • Think about whether there are any alternative ways you can visualize your data in order to see the same thing. Can you use a different visualization type? Or partly aggregate the data? For example, binning can be used to aggregate markers in a scatter plot and still allow you to see a distribution. Using the bin sliders you can increase the number of markers shown until it takes too long time to make changes.
  • Sorting in cross tables, etc., takes time.
  • Hide or delete unused filters (or do not create filters for external columns unless you have to).
  • Use the list box filter or the text filter rather than the item filter when working with columns with a lot of unique values. Item filters are costly to show, even when they are not used. If you have old analysis files using item filters for these type of columns you can manually change the filter type to a list box or text filter and save the file again.
  • Some types of aggregations are more time consuming than others. For example, use average rather than median, if possible.
  • Use the data type real rather than currency. The currency formatter can be applied to the real data type.
  • It is recommended to use the filters panel or the data in analysis flyout instead of adding a lot of filters to text areas. Many filters in text areas can make the analysis less responsive.
  • Calculated values (labels) and sparklines in text areas might also lead to less responsive analyses.
  • Use post-aggregation expressions for all expressions including OVER since these calculations are faster when done on an already aggregated view.

Hardware

  • Use a fast solid-state drive (SSD) if possible (when data or analyses are stored on disk).
  • Do not run other applications on the same computer when working with large data volumes.

Loading data

  • Use sorted input on categorical columns.
  • Loading data from an SBDF file is much faster than from TXT.
  • If the data is in a tall and skinny format rather than a short and wide you may obtain better performance.
  • Remove invalid values from your data before importing into Spotfire.
  • If you intend to import data from an external data source, limit the selected data as much as possible prior to import. This will increase the chances that the import is successful.

Data export

  • Export from a data table rather than from a table visualization.
  • Export to SBDF rather than to TXT.

Web client

  • Avoid visualizations with many graphical elements, if possible.
  • Use scheduled updates, when possible.

Preferences (on-premises only)

  • An administrator using the installed client can modify the MarkingWhereClauseLimit or the MarkingInQueryLimit preference (under Tools > Administration manager > Preferences > DataOptimization ). With lower limits, the allowed complexity of marking queries is reduced. This is important when working with external data sources. See Preferences Descriptions in the Administration Manager help for more information.
  • Switch off the automatic creation of filters. This can be done for a specific data table in the data canvas Settings, and for all new in-memory data tables under Tools > Options > Document (installed client only).

API

  • Prefer iterator based data access over random access. Use DataRowCursor API:s over GetValue(rowindex) style API:s.
  • Be careful when using custom comparers - depending on usage they may become a bottleneck. Consider if the problem cannot be solved in other ways.​
  • If things are slow and you are using old custom extensions, see if they can be refactored or if some time-consuming steps can be removed. Some API:s are by nature slow and old code might benefit from some refactoring. Try loading without any extensions to see if one of them may be the culprit.