Configuring data functions

When you run a data function, you must define how the input and output parameters of the data function definition should be handled within your analysis. It is necessary to specify a mapping of all required parameters to Spotfire, to use the data function in an analysis.

Before you begin

Data functions have been created using the installed Spotfire client and saved in the library.

The parameter configuration can be done, either in the flyout when running a data function from the Files and data or the f(x) flyout, when editing inputs or outputs from the data canvas, or (in the installed client only), from the Edit Parameters dialog which you can reach from Data > Data function properties, or when creating and testing data functions using the Tools > Register Data Function dialog.

Note: All of the required inputs and all outputs (except those you explicitly exclude in the summary view) must be configured to run the data function. Optional inputs can be left without being configured.

General configuration

Determine whether to Refresh function automatically.
Select this check box to update the results from the data function automatically each time the input settings are changed. If the check box is cleared, a manual refresh is needed for any updates to take effect. Refresh can be done from the data canvas, from the Data function properties dialog in the installed client, or, you might add an action control to refresh from a text area.

A data function configured to load automatically will switch to manual update if cyclic dependencies are detected in the analysis.
When making configurations through the Edit Parameters dialog, and the data function is run using one of the services, Spotfire® Service for R, the Spotfire® Enterprise Runtime for R - Server Edition (a/k/a the TERR™ Service) or the Spotfire® Service for Python, you can determine whether to Always run in separate session.

(For the discontinued TIBCO Spotfire Statistics Services server, the default was to not re-use engines within a session.)

In certain scenarios, a specific data function should run in a separate engine session rather than together with other data functions. This can be required if you have package conflicts between data functions, or, if a package can only be used once with acceptable performance. By using this setting, a new, separate, engine session is created and closed for each invocation of the data function.

Note that this means that many more engine sessions are used, even within a single analysis user session, so the service might need to be scaled and configured accordingly to handle the potentially increased number of engines required. One configuration that could help in this scenario is to increase the engine.queue.size for the service to avoid people having to wait for a new engine session. However, this will increase the number of idle resources. See the documentation about allowed engines for your service for more information.
When making configurations through the Edit Parameters dialog, and the data function can be executed either locally or on a server (only available for some engines), you can also specify the Run location for the execution. Available options are Default, Force Local or Force Server. If you select Default, then the data function will be executed using the preferences specified by the administrator, or, by the settings specified using Tools > Options.

Inputs

If the data function is designed to use input parameters to determine on what to base the calculation, you must configure your inputs.

For example, this could be a value, a column or a data table that you select from your analysis. Not all data functions require inputs, and there might also be cases where optional inputs can tweak the result from the calculation. It is the author of the data function that determines what you must provide, by specifying the input type and what is required.

Tip: If there are many inputs in your selected data function, and only some of them are required, you can click Always show required inputs first, when running the data function from the

flyout, to move optional inputs to the end of the list and hide them. (If you insert a data function from the Data Function Properties in the installed client, the interface to specify parameters looks different and you can instead sort on the Required column.)

You cannot choose between all of the input types described below when specifying the input for a selected parameter; you only see options that are applicable to the current data function and your analysis.

Data table – select a data table using the drop-down list.
(The data table selector is often a first step before selecting one or more columns, but, it can also be a separate input type. In this case, the input type only lists data tables where columns of all data types are allowed as inputs to the data function. If you cannot find the desired data table in the list, you can instead use the Columns option to select all columns of allowed data types from a specific data table.)
Column – select a single column from the specified data table using the drop-down list.
Columns – select one or more columns in the Select columns dialog.
Search expression – select a number of columns based on a search expression (press Enter to perform the search). This option is useful if you want to select many columns that, for example, start with the same letters.
(In the Edit Parameters dialog, this is an option under the Columns input handler.)
Custom expression ('Expression' in Edit Parameters) – specify your own expression in the Edit expression dialog.
(Use this option when adding the data function from the Files and data flyout or the f(x) flyout if you must add an input value from a data table or column property, or, to define your own selection.)
Value – type an input value. The value is generally accompanied by a data type selector, where you can change the data type for the entered value.
Document property - select a document property to use as input. Use the search field to help locate your property.
None – no input handler has been selected or no default exists. This can be used for optional input parameters. If the input parameter is required, you must specify an input to be able to continue.

The Limit by option allows you to limit the calculations based on column values or data tables to rows matching a specified combination of filterings and markings only. If more than one option is selected, then calculations will be performed for rows matching the intersection of the selected filtering and markings only. Do not add any limits to base calculations on all rows.

Tip: When a data function instance has been added to an analysis, and the outputs are columns or rows used in a visualization, you can tweak your input values directly from the visualization, and quickly try out different input values. You can also edit input and output parameters from the Data functions in the Data canvas view.

Outputs

The configuration of outputs determines what to do with the result from the calculation. For example, you might get a new data table, new columns or rows, or, a document property value that can be used to define a line in a visualization, or similar. Which types of output are available depends on what your selected data function produces (Value, Column or Table), and what you currently have in your analysis. For example, if you do not have a data table in your analysis when running the data function, the only option you will have is to add a new data table. Once you have a data table, more options can become available.

Add as new data table – create a new data table.
Add as rows to the specified data table. See Adding rows to a data table for more information.
Add as columns to the specified data table, using a join operation. See Adding columns to a data table for more information.
(Only available when running the data function from or .)
Add as calculated columns to the selected (final) data table ('Columns' in Edit Parameters).
If you have chosen to limit the input to marked or filtered rows only, the Map result to limited rows for (or Map to input rows in Edit Parameters) option lets you specify how resulting values should be added to the data table. If this check box is cleared, the results will be added to the first rows in the specified data table, but if you have chosen to calculate results for filtered values only, you probably want to add the results to those rows that were filtered when the calculation was performed instead. Select the input parameter to match against from the drop-down list.
Replace data table – replace a previously added data table by selecting it from the drop-down list.
Add as document property value in the analysis. You can either define a new property or update an existing one. Note that if you define a new property, it will not be created until the data function has successfully finished its execution.
Add as data table property value – select a data table and create or update a data table property.
Add as column property value – select a data table, a column in that data table, and create or update a column property.
None – no option has been selected (only available in Edit Parameters). To be able to continue, you must specify how to handle at least one output. When adding a data function from f(x) or the + flyout, you can instead choose to Exclude a particular output.