Spotfire® User Guide

Registering a data function

You can use data functions to enhance the functionality of Spotfire by adding calculations written in R, or other languages, to your analysis. The data function is executed using a statistical engine such as the Spotfire® Enterprise Runtime for R (a/k/a TERR™) engine, the open-source R engine, or a Python interpreter. When a data function has been registered and saved to the library it can be used in any analysis, also by other users than the script author.

About this task

See also What are data functions? and Authoring data functions for an introduction to data functions.
This example shows how to register a TERR data function, but you would use the same procedure to register data functions using other calculation engines.
Tip: You can edit a previously added data function script using the Edit Script dialog in the data canvas (reached from the fx node) or the Data function properties, using the installed client.

Before you begin

Data functions must be authored using the installed client.

Procedure

  1. In the installed Spotfire client, on the menu bar, select Tools > Register data functions.
  2. In the Name field, type the name of the function.
    If you are going to use packages, remember that Spotfire cannot find a function unless the name is exactly the same as the one used in the package. See the documentation for the service of interest for more information about packages.
  3. From the Type drop-down list, select the type of script to use, for example, R script - Spotfire Enterprise Runtime for R.

    Which options you have access to depends on what your administrators have made available in your Spotfire environment, or which tools you have locally installed.

    For predefined R functions, select Open-source R function. When a function that you want to use from within Spotfire has been defined and saved, you must write down or remember its name, together with the names of all required input and output parameters.

    To define a new script, select R script - Open Source R, R script - Spotfire Enterprise Runtime for R, or Python script. (Your company might also have other options available.)

  4. Optional: If you want to include a predefined function from a statistical package, in the Packages field, type the exact name of the package where the function is located.
    The Packages field provides an opportunity to create a data function based on a predefined statistical R or TERR function by using a downloaded package. Here you can specify any packages to be used by your current data function, separated with semicolons. See the documentation for the service of interest for more information about packages.

    For example, if you want to create a data function based on a predefined statistical R function, provide the name of the package (and optionally download it from CRAN). Type the exact name of the package where the function is located. (This is only necessary if there is more than one function bearing the same name in the repository, or the packages are not loaded automatically.)

    For local Python scripts, this field lets you list Python packages that can be pre-loaded to reduce the time needed when executing the data function. This field is used in the installed client, if the use of hot spares is enabled. It does not affect the remote service or data functions run in web clients. If you are unsure, leave the field blank. You must import the package in the script as well as specifying it here.

    If more than one package is required, separate the package names with semicolons.

  5. When registering data functions based on predefined functions from statistical packages, in the Function name field, type the exact name of the function of interest, as it was defined in the package.
    This step is not applicable for script-based data functions.
  6. Provide a Description of the function to make it easier to be found and used by others.
  7. If the data function should be based on a script, type, paste or import the script to the Script tab.
  8. On the Input Parameters tab, add all required input parameters.
    How the input parameters should be handled is defined upon execution of the data function.
  9. If necessary, move the input parameters so that the order in the list reflects the order in which the input parameters should be retrieved.
  10. On the Output Parameters tab, add all required output parameters.
    How the output parameters should be handled is defined upon execution of the data function.
  11. Optional: Choose an Icon that describes what your data function does.
    By selecting a suitable category, you can make it easier for end users to find the right data function in the library. The icon will be shown in the different interfaces where a data function is shown (for example, in the Files and data flyout, the f(x) flyout and the Data canvas). You can choose from a number of predefined category icons, or use a custom Scalable Vector Graphics (SVG) icon. For more information about creating custom icons that work in the Spotfire environment, see Create an icon for your visualization mod on GitHub.
  12. Save the data function to the library.
    You can specify keywords upon saving that will help in locating the function in the library at a later stage. If you have chosen a different icon than the default, the icon category is automatically added as a keyword.
  13. Click Close.

Results

The data function can now be added to an analysis by running it from the Files and data or the f(x) flyout (in any client), from Data Function Properties > Insert (installed client only), or, by adding it as a transformation using the installed client (see Running data functions from the library or Transforming data for more information).

The Register Data Functions dialog

In the Register Data Functions dialog, you can not only register completely new data functions, you can also Open a previously saved data function from the library for further configuration, Import script function definitions (*.sfd) that you have earlier exported to disk, or Python script files (*.py) or R script files (*.r) created using other script editing tools, and you can Export a script function definition to disk so that it can be shared or further edited in other script editing environments.
Note: You cannot create any Statistica data functions using the Register Data Functions dialog. See the Spotfire Integration with Statistica for more information on how to work with these type of data functions.


If you click Run, you can specify settings for the input and output parameters and execute the current data function. This is mostly meant as a shortcut for testing the data function before it is saved to the library, and embedded instances added while testing should be removed from an analysis when you are done, to avoid having unnecessary instances saved in the analysis.

Instead, run the saved data function from f(x) - analytic tools or Files and data, or using Insert from the Data Function Properties (installed client only) before saving the analysis to be able to synchronize the data function with any updates in the library in the future.

Allow caching specifies that calculations are reused if the same subset of input values has been calculated before. Clear this check box if some of the input data comes from somewhere else than your current analysis and you want a new calculation each time input data changes (even for changes into something that has already been computed before). Input data can change when the input depends on filtered values, marked values, or a property value.

For example, if the data function includes a random number generator, you probably do not want to cache a previously generated random number. Instead, calculate a new random number for each refresh of the data function. Another example where you would clear the check box is if the data function includes the current date or time.

On the Script tab, you can type or paste a script in the specified script type language. You can also edit imported scripts. You can change the font settings for the script tab using Tools > Options > Fonts and selecting Expression and script editor. The Script editor provides syntax highlighting and automatic indenting to make writing and reading scripts easier.

On the Input Parameters tab, you list and define all input parameters that are used in a script. The order of the input parameters in this list determines the order in which the input parameters should be retrieved.

  • Input parameter name or Name is the name of the parameter as it has been referred to in the function or script.
  • Display name is the name of the parameter as you want it to be presented to the end users.
  • Type determines the input type, which can be Value, Column, or Table (data table). This defines whether the input parameter can be one or more columns, or just a single value.
  • Allowed data types specifies which data types are supported by this input parameter. You can select all data types that you want to allow when defining the input parameter. You must select at least one data type for each input parameter.
  • Description can optionally contain more information about the input parameter, to help the end users understand what to provide.
  • Required parameter specifies that the parameter is required when calling the function. If a parameter is not required, the function should be able to work without it.
On the Output Parameters tab, you list and define all output parameters that are used in a script.

  • Result parameter name or Name is the name of the parameter as it has been referred to in the function or script.
  • Display name is the name of the parameter as you want it to be presented to the end users.
  • Type determines the output type, which can be Value, Column, or Table (data table). This defines whether the output parameter can be one or more columns, or just a single value.
  • Description can optionally contain more information about the output parameter, to help the end users understand what they will get.
Tip: If you want to add a simple calculation you can create an expression function where you add TERR scripts directly in the expression language by using the TERR_* functions available under Statistical Functions. They can then be used as any other functions in the expression language in calculated columns and custom expressions. However, note that expression functions cannot be shared between different analyses.