You can use data functions to enhance the functionality of
Spotfire by adding calculations written in R, or other languages, to your
analysis. The data function is executed using a statistical engine such as the
Spotfire® Enterprise Runtime for R (a/k/a TERR™) engine, the open-source R
engine, or a Python interpreter. When a data function has been registered and
saved to the library it can be used in any analysis, also by other users than
the script author.
About this task
See also
What are data functions?
and
Authoring data functions
for an introduction to data functions.
This example shows how to register a TERR data function, but you
would use the same procedure to register data functions using other calculation
engines.
Tip: You can edit a previously added data function script
using the
Edit Script dialog in the data canvas (reached
from the f
x node) or the
Data function properties, using the
installed client.
Before you begin
Data functions must be
authored using the installed client.
Procedure
-
In the installed Spotfire client, on the menu bar, select
.
-
In the
Name field, type the name of the function.
If you are going to use packages, remember that Spotfire cannot
find a function unless the name is exactly the same as the one used in the
package. See the documentation for the service of interest for more information
about packages.
-
From the
Type drop-down list, select the type of script
to use, for example, R script - Spotfire Enterprise Runtime for R.
Which options you have access to depends on what your
administrators have made available in your Spotfire environment, or which tools
you have locally installed.
For predefined R functions, select Open-source R function. When a
function that you want to use from within Spotfire has been defined and saved,
you must write down or remember its name, together with the names of all
required input and output parameters.
To define a new script, select R script - Open Source R, R script
- Spotfire Enterprise Runtime for R, or Python script. (Your company might also
have other options available.)
- Optional:
If you want to include a predefined function from a statistical
package, in the
Packages field, type the exact name of the
package where the function is located.
The Packages field provides an opportunity to create a data
function based on a predefined statistical R or TERR function by using a
downloaded package. Here you can specify any packages to be used by your
current data function, separated with semicolons. See the documentation for the
service of interest for more information about packages.
For example, if you want to create a data function based on a
predefined statistical R function, provide the name of the package (and
optionally download it from CRAN). Type the exact name of the package where the
function is located. (This is only necessary if there is more than one function
bearing the same name in the repository, or the packages are not loaded
automatically.)
For local Python scripts, this field lets you list Python packages
that can be pre-loaded to reduce the time needed when executing the data
function. This field is used in the installed client, if the
use of hot spares is enabled. It does not
affect the remote service or data functions run in web clients. If you are
unsure, leave the field blank. You must import the package in the script as
well as specifying it here.
If more than one package is required, separate the package names
with semicolons.
-
When registering data functions based on predefined functions from
statistical packages, in the
Function name field, type the exact name of
the function of interest, as it was defined in the package.
This step is not applicable for script-based data functions.
-
Provide a
Description of the function to make it easier
to be found and used by others.
-
If the data function should be based on a script, type, paste or
import the script to the
Script tab.
-
On the
Input Parameters tab, add all required input
parameters.
How the input parameters should be handled is defined upon
execution of the data function.
-
If necessary, move the input parameters so that the order in the
list reflects the order in which the input parameters should be retrieved.
-
On the
Output Parameters tab, add all required output
parameters.
How the output parameters should be handled is defined upon
execution of the data function.
- Optional:
Choose an
Icon that describes what your data function
does.
By selecting a suitable category, you can make it easier for end
users to find the right data function in the library. The icon will be shown in
the different interfaces where a data function is shown (for example, in the
Files and data flyout, the
f(x) flyout and the
Data canvas). You can choose from a number of
predefined category icons, or use a custom Scalable Vector Graphics (SVG) icon.
For more information about creating custom icons that work in the Spotfire
environment, see
Create an icon for your visualization
mod on GitHub.
-
Save the data function to the library.
You can specify keywords upon saving that will help in locating
the function in the library at a later stage. If you have chosen a different
icon than the default, the icon category is automatically added as a keyword.
-
Click
Close.
Results
The data function can
now be added to an analysis by running it from the
Files and data or the
f(x) flyout (in any client), from
(installed
client only), or, by adding it as a transformation using the installed client
(see
Running data functions from the library
or
Transforming data for
more information).
The Register Data Functions dialog
In the
Register Data Functions dialog, you can not only
register completely new data functions, you can also
Open a previously saved data function from the
library for further configuration,
Import script function definitions (*.sfd) that
you have earlier exported to disk, or Python script files (*.py) or R script
files (*.r) created using other script editing tools, and you can
Export a script function definition to disk so
that it can be shared or further edited in other script editing environments.
Note: You cannot create any Statistica data functions using the
Register Data Functions dialog. See the
Spotfire Integration with Statistica for
more information on how to work with these type of data functions.
If you click
Run, you can specify settings for the input and
output parameters and execute the current data function. This is mostly meant
as a shortcut for testing the data function before it is saved to the library,
and embedded instances added while testing should be removed from an analysis
when you are done, to avoid having unnecessary instances saved in the analysis.
Instead, run the saved data function from
f(x) - analytic tools or
Files and data, or using
Insert from the
Data Function Properties (installed client only)
before saving the analysis to be able to synchronize the data function with any
updates in the library in the future.
Allow caching specifies that calculations are
reused if the same subset of input values has been calculated before. Clear
this check box if some of the input data comes from somewhere else than your
current analysis and you want a new calculation each time input data changes
(even for changes into something that has already been computed before). Input
data can change when the input depends on filtered values, marked values, or a
property value.
For example, if the data function includes a random number generator,
you probably do not want to cache a previously generated random number.
Instead, calculate a new random number for each refresh of the data function.
Another example where you would clear the check box is if the data function
includes the current date or time.
On the
Script tab, you can type or paste a script in
the specified script type language. You can also edit imported scripts. You can
change the font settings for the script tab using
and selecting
Expression and script editor. The Script editor
provides syntax highlighting and automatic indenting to make writing and
reading scripts easier.
On the
Input Parameters tab, you list and define all
input parameters that are used in a script. The order of the input parameters
in this list determines the order in which the input parameters should be
retrieved.
- Input parameter
name or
Name is the name of the parameter as it has
been referred to in the function or script.
-
Display name is the name of the parameter as
you want it to be presented to the end users.
-
Type determines the input type, which can be
Value,
Column, or
Table (data table). This defines whether the
input parameter can be one or more columns, or just a single value.
-
Allowed data types specifies which data types
are supported by this input parameter. You can select all data types that you
want to allow when defining the input parameter. You must select at least one
data type for each input parameter.
-
Description can optionally contain more information about the input
parameter, to help the end users understand what to provide.
- Required
parameter specifies that the parameter is required when calling the
function. If a parameter is not required, the function should be able to work
without it.
On the
Output Parameters tab, you list and define all
output parameters that are used in a script.
- Result
parameter name or
Name is the name of the parameter as it has
been referred to in the function or script.
-
Display name is the name of the parameter as
you want it to be presented to the end users.
-
Type determines the output type, which can be
Value,
Column, or
Table (data table). This defines whether the
output parameter can be one or more columns, or just a single value.
-
Description can optionally contain more information about the
output parameter, to help the end users understand what they will get.
Tip: If you want to add a simple calculation you can create
an expression function where you add TERR scripts directly in the expression
language by using the
TERR_* functions available under
Statistical Functions. They can then be used
as any other functions in the expression language in calculated columns and
custom expressions. However, note that expression functions cannot be shared
between different analyses.