Contents
The TIBCO StreamBase® TIBCO Enterprise Runtime for R operator (hereafter, the TERR operator) allows StreamBase to use TIBCO's implementation of the R language to analyse and manipulate data.
The TERR operator is a member of the Java Operators group in the Palette view in StreamBase Studio. Select the operator from the Insert an Operator or Adapter dialog. Invoke the dialog with one of the following methods:
-
Drag the Adapters, Java Operators token from the Operators and Adapters drawer of the Palette view to the canvas.
-
Click on the canvas where you want to place the operator, then invoke the keyboard shortcut
O V
. -
From the top-level menu, invoke
→ → .
In order run correctly, the operator assumes that the machine running StreamBase Server and your application has a 64-bit version of TERR version 2.7 or later installed locally. The TERR operator has been tested and validated with TERR versions 2.7, 3.0, 3.1, and 3.2.
The TERR bin
directory does not need to be in the
system PATH, and no environment variables are required. The TERR operator
recognizes and honors the TERR_HOME environment variable if set, and if it points
to the local TERR installation directory; however, setting TERR_HOME is not
required.
TIBCO customers can download TERR from http://edelivery.tibco.com, or download an evaluation copy of TERR from the TIBCO Access Point.
- For Linux
-
TERR is only provided for 64-bit Linux. Download the tar file provided; untar the file into a temporary local directory, and run the
./INSTALL
file provided. The default installation directory is/opt/tibco/terr
, wherever
ver
is the TERR version number. - For Windows
-
Download the zip file provided; unzip the file to find a single installer executable. Run this installer and accept its suggested default location (
C:\Program Files\TIBCO\terr
) or install into the currently recommended location (ver
C:\TIBCO\terr
), wherever
ver
is the TERR version number.On Windows, the TERR installer provides both 32-bit and 64-bit versions of the TERR runtime code. When run on 64-bit Windows, the 64-bit version of TERR is automatically used. Since StreamBase now supports only 64-bit Windows, it uses the 64-bit version of TERR.
To connect StreamBase and its TERR operator to your local TERR installation, you must either:
-
Set the TERR Home property in the Engine Options tab of each TERR operator's Properties view, providing the full, absolute path to the TERR installation directory.
-
Set the TERR_HOME environment variable to point to the full, absolute path to the TERR installation directory. Use this method if you anticipate using many TERR operator instances in your StreamBase applications.
This operator creates an external TERR process which it then uses to run R scripts and retrieve the results. With each input tuple, the values in the tuple are merged with the supplied script and the script sent to the TERR process for execution. When the script finishes, the result is retrieved and translated into an output tuple.
The script can be supplied as data in the operator itself or can be read from a
resource file. The script is a standard R script with one addition: fields in the
script that are to be replaced by tuple data must have the format $[name]
where name
is the name of an
input tuple field. The data in the tuple can be any supported simple data type or
list of a supported simple data type. The supported StreamBase data types for input
are: string, int, boolean, double, list, and timestamp (which is converted to a
string). TERR result types supported are: string, int, boolean, double, array
(list), byte, factor, and dataFrame.
Script substitution is done on a purely textual basis and it is up to R to parse
the results. For example, the string "3", the integer 3 and the float 3 all appear
the same when inserted into a script. On input, lists are handled a little
differently, with the outer brackets removed. For example, the integer list
[1,2,3]
is converted to the string "1,2,3"
. Any substitution variables should be carefully looked at
as to context. In general, if a substitution can be either a scalar value or a
list, it should probably be placed in a c()
construct
such as c($[var])
, which handles both scalar and list
cases.
Booleans are treated specially. On input, a true is converted to TRUE
, false is converted to FALSE
and
null
is converted to NA
;
on output, the opposite conversion is done. For numbers, null
is converted to NA
.
The operator can be configured to watch the script and dataset files. If either one changes, it is reloaded before the next tuple is processed. This is useful for script development as the script can be in an editor, changes made and the file saved before sending a new tuple.
Note
The file system monitoring feature of the TERR operator is supported on local file systems only, and not on remote mounts. This feature is based on code internal to the TERR operator, and does not depend on the TIBCO StreamBase® File Monitor Adapter.
A single TERR instance can be shared among multiple operators within a container, or each operator can have its own instance. If sharing, take care to ensure that the same initial dataset is specified for each use. This is because the TERR instance is started and initialized by the first instance of the operator to run and the other operators will use the already existing TERR instance. The various parameters used in starting the TERR instance should also be the same for the same reason. Startup parameters are discussed later.
An example script along with some input tuples and results are shown next. First, a TERR script:
n <- $[names] s <- $[vals] b <- $[bools] result <- data.frame(n, s, b)
The input tuple (names="one", vals=2, bools=true)
results in:
n <- "one" s <- 2 b <- TRUE result <- data.frame(n, s, b)
The input tuple (names=["one","two"], vals=[2,4],
bools=[true,null])
results in a TERR parse error, because by default TERR
considers each comma as the start of a new field. The solution is to enclose the
field variables in c()
constructs in your TERR script,
like so:
n <- c($[names]) s <- c($[vals]) b <- c($[bools]) result <- data.frame(n, s, b)
In this case, with the same input tuple as above, the results are:
n <- c("one", "two") s <- c(2, 4) b <- c(TRUE, NA) result <- data.frame(n, s, b)
This section describes the properties you can set for the TERR operator, using the various tabs of the Properties view in StreamBase Studio.
In the tables in this section, the Property column shows each property name as found in the one or more adapter properties tabs of the Properties view for this adapter.
Use the StreamSQL names of the adapter's properties when using this adapter in a StreamSQL program with the APPLY JAVA statement.
Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Operator: A read-only field that shows the formal name of the operator.
Class: A field that shows the fully qualified class name that implements the functionality of this operator. Use this class name when loading the operator in StreamSQL programs with the APPLY JAVA statement. You can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start with application: If this field is set to Yes or to a module parameter that evaluates to true, an instance of this operator starts as part of the containing StreamBase Server. If this field is set to No or to a module parameter that evaluates to false, the adapter is loaded with the server, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager. With this option set to No or false, the operator does not start even if the application as a whole is suspended and later resumed. The recommended setting is selected by default.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Data Type | Default | Description | StreamSQL Property |
---|---|---|---|---|
Log Level | drop-down list | INFO | Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level will be used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE, and ALL. | LogLevel |
Reload files when changed | check box | Cleared | If selected, the data file and the script file (if selected) are monitored for changes. If either one changes, it is loaded the next time a tuple is to be processed. | WatchFiles |
Output Status Tuples | check box | Cleared |
Select this check box to have a status tuple emitted on the status output stream for each input tuple.The status tuple includes any errors generated by the script for this tuple. |
SendStatusTuples |
TERR instance to use | string | Cleared |
The name of the TERR instance to use in this operator. |
WhichTERRInstances |
Property | Data Type | Default | Description | StreamSQL Property |
---|---|---|---|---|
Load saved R datasets from file into engine | check box | Cleared | Determines whether an initial dataset is loaded into the TERR instance when started. If one is initially loaded and Reload files when changed is also selected, the dataset is reloaded if changed on disk. | LoadModel |
Data file | drop-down list | Cleared | The name of a resource file to load on TERR initialization. The drop-down contains all the files that are resources to choose from on the current project's resource search path. | ScriptModel |
Property | Data Type | Default | Description | StreamSQL Property |
---|---|---|---|---|
Result variable | string | Cleared | The name of the R variable that will be retrieved as the result of the script. | ResultVarName |
Character Set | Drop-down list | UTF-8 | Specifies the character set to be used when reading a script file from disk. | TerrCharset |
Script Source | Radio button | Script Text |
The source from which to get the script to send to TERR.
|
ScriptSource |
Script file | Drop-down list | Cleared | Active only when ScriptSource is File. In the drop-down list, select the resource file that contains the script. | ScriptLocation |
Script text | string | Empty | Active only when Script Source is Script text. Specifies content of the script to be used to process input tuples. | ScriptText |
These options are used to setup the TERR instance.
Property | Data Type | Default | Description | StreamSQL Property |
---|---|---|---|---|
TERR Home | string | Cleared | The directory in which TERR is installed. If no value is specified, the operator uses the value of the TERR_HOME environment value, if present. If neither this field nor TERR_HOME is specified, the result it a typecheck error. | TerrHome |
Processor Affinity | string | Cleared |
A zero-based, comma-separated list of integers representing processor cores
that the TERR process should execute on. Hyperthreaded cores count as
cores. An invalid core number for the current CPU causes a typecheck error.
For example, on a four-core (2C, 2T) machine, the entry |
ProcessorAffinity |
TERR engine parameters | string | Cleared |
A parameter string sent to the TERR engine on initialization. See the TERR
documentation for usage.
Either type a value in the field, or select from a list of values you
have entered in the containing project's sbconf file as |
TerrEngineParameters |
TERR environment | table | Cleared | A set of Key-Value pairs that are used in the initial startup environment for the TERR engine. See the TERR documentation for usage. | TerrEnvironment |
These options are used to adjust the Java environment for the TERR instance.
Property | Data Type | Default | Description | StreamSQL Property |
---|---|---|---|---|
Java installation to use | Radio button | StreamBase |
Determines where to get the Java version to use to run TERR. In most cases
the default StreamBase is appropriate, but it
is not required that the TERR operator uses the same Java version as
StreamBase, because they run in different JVM instances.
|
UseSBJava |
Java home | string | Cleared | Active only when Java installation to use is set to Custom. Specifies the directory containing the Java installation to use to run the TERR instance. | JavaHOME |
Use the Schemas tab to specify the schema of the output tuple for this adapter.
For general instructions on using the Edit Schema tab, see the Properties: Edit Schema Tab section of the Defining Input Streams page.
The custom schema should use the same names for fields as does the generic schema as those are what are looked at when filling the output tuple.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Caution
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
The TERR operator has one input port, whose schema is determined by the TERR script loaded at operator start time. Any updated script loaded dynamically during run time must specify inputs that have at least the same field names and data types as the initally loaded script.
At run time, input tuples must have at least one field whose name and data type exactly match one of the fields specified by the TERR script. Input tuples do not need to fill all fields in the TERR script, and the field order of input tuples does not need to match the TERR script's field order.
The TERR operator has two output ports: an optional status port, and a result port.
The status port emits tuples that describe the status of processing each input tuple. It is only present when the Output Status Tuples option is selected. The schema of the output tuple consists of four strings:
Field Name | Field Type | Description |
---|---|---|
Type | String |
The type of report, usually Status .
|
Action | String | The action that caused the report. |
Message | String |
The result reported by TERR of running the script. Examples: Success , Parse Error .
|
Object | String | Any extra information about the operator. |
The default schema of the result port consists of:
-
The input tuple, which is passed through unchanged.
-
One added field named
terrResult
, which a tuple of tuples, and contains the results of TERR processing the provided script for each input tuple.
Because the TERR operator cannot know the data type of the TERR result in advance,
the terrResult
field contains a subfield for each
possible TERR result data type. Only one terrResult
subfield is filled in
per input tuple, depending on the value assigned to the result variable by
the script. The other subfields are left empty (null) for each input tuple.
The top-level schema of the terrResult
field is shown
in the following table:
Subfield Name | Field Type | Description |
---|---|---|
double | tuple | The result was a double or array of doubles. |
integer | tuple | The result was an integer or array of integers. |
boolean | tuple | The result was a boolean or array of booleans, |
string | tuple | The result was a string or list of strings. |
dataFrame | tuple | The result was an R dataFrame, which is comparable to a StreamBase tuple. |
byte | tuple | The result was a byte or array of bytes (returned as StreamBase ints). |
list | tuple | The result was an R array, comparable to a StreamBase list. |
factor | tuple | The result was an R factor. |
Each terrResult
subfield is a tuple of lists. The
first third-level field of every terrResult
subfield
is named names
. This is a list of one or more names of
the result fields of the input script.
- The scalar subfields
-
The five scalar subfields of
terrResult
aredouble
,integer
,boolean
,string
, andbyte
. The schema of each of these subfields is the same: a list of returned script fieldnames
and a list of returnedvalues
corresponding to each of thenames
. If the result is a scalar, it is still returned as a list of one item. If the result is a multi-dimentional array, it is flattend to a vector.Third-level Field Name Type Description names list of strings List of one or more TERR script field names. values list of type
List of returned values for each script field in names
. - The dataFrame subfield
-
The dataFrame subfield of
terrResult
consists of a list ofnames
, plus zero or more lists ofintegers
,doubles
,logicals
(booleans),factors
.strings
, orbytes
. If more than one of a type occurs, the resulting list contains the concatenation of the lists.Third-level Field Name Type Description names list of strings List of one or more TERR script field names. integers list of tuples List of zero or more returned integer names-values pairs. doubles list of tuples List of zero or more returned double names-values pairs. logicals list of tuples List of zero or more returned boolean names-values pairs. factors list of tuples List of zero or more returned R factor names-indexes-levels triplets. strings list of tuples List of zero or more returned string names-values pairs. bytes list of tuples List of zero or more returned byte names-values pairs. - The list subfield
-
The
list
subfield ofterrResult
consists of a list ofnames
, plus zero or more lists of tuples of the five scalar types plus factors. The schema of thelist
subfield is the same as for thedataFrame
subfield. - The factor subfield
-
The
factor
subfield ofterrResult
consists of three lists:names
,indexes
, andlevels
. See the TERR documentation for an explanation of the R factor datatype.Third-level Field Name Type Description names list of strings List of one or more TERR script field names. indexes list of integers List of zero or more returned index values. levels list of strings List of zero or more returned level values.
You can specify a custom schema that contains only the result data you know to expect. For a custom schema, the names of the fields must be the same as in the default schema described above, and the sub-schemas must also match exactly.
Typechecking fails if any required fields are not filled in. It also fails if the input schema does not contain all the replacement variables that the script needs. All specified dataset and script files are checked for existence and typechecking fails if any file is not accessible. A result variable must be present, although the TERR script is not checked to see if it uses it. A script must be specified either as local data or as a resource file. The TerrHome parameter must be set so that the process can be started.
All errors in the execution of the script are logged and an optional status tuple is emitted.
The TERR operator uses ConfigurationChooserPropertyDescriptors for some of its properties. This means that it can read default values for these properties from the containing application's StreamBase configuration file.
To use this feature, an <adapter-configurations>
section must be present in the configuration file, with at least one child element of
the form <configuration type="terr
, where terrstring
">string
is one of terrConfigHome
, terrConfigEngineParams
,
terrConfigJavaHome
, terrConfigJavaOptions
, or terrConfigInstance
. In each of these configurations, list the
choices to be presented in the form <choice
id="valueX">valueX</choice>
. Alternatively, indirection can be used.
See the Javadoc documentation in the StreamBase Client API on
ConfigurationChooserPropertyDescriptors for further information.
See the sbd.sbconf
file in the TERR Operator sample for
an example of this feature.
On suspend, the TERR operator finishes processing the current tuple, outputs the result tuple, then pauses waiting for input.
On resumption, the TERR operator continues processing with the next input tuple.
The TERR instance remains running during suspend.
The StreamBase installation includes a sample demonstrating the use of this operator. To load the sample in StreamBase, select Extending StreamBase section for an entry called TERR Operator.
→ and look under the