Contents
The TIBCO StreamBase® TERR Predict operator for TIBCO Enterprise Runtime for R (TERR)) allows StreamBase to use TIBCO's implementation of the R language to load RDS files and perform predict operations.
In order to run correctly, the TERR operators assume that the machine running the StreamBase Runtime has a 64-bit version of TERR installed locally. On Windows and Linux, the TERR operators have been tested and validated with TERR versions starting with 4.0. On macOS, the minimum supported TERR version is 4.2.0.
A copy of TERR Developer Edition is installed as part of your StreamBase installation. The Developer Edition edition has restricted license terms, as described on the StreamBase License Considerations page.
TERR Developer Edition is installed in the directory
and, as a convenience, the environment variable TERR_HOME is set to this directory when using a StreamBase Command Prompt on Windows. TERR_HOME must be manually set
STREAMBASE_HOME/terr
The operators locate the version of TERR to call using the following formula:
-
If the option Use Embedded TERR is selected on the Operator Properties tab of the operator's Properties view, then the embedded TERR engine is used. If not selected:
-
If you specify a path to a local TERR installation in the TERR Home Path property on the Operator Properties tab of the operator's Properties view, that version of TERR is used first.
-
If that property is left blank, the operator looks for a path specified in the
TERR_HOME
environment variable.
-
This sequence lets you override the embedded TERR version with any newer or older version, compared to the embedded version, that your application requires.
To determine the version of TERR installed with StreamBase 7.6.3 or later:
On Windows, using a StreamBase Command Prompt, run:
%STREAMBASE_HOME%\terr\bin\terr --version
On Linux or macOS in a shell configured with the sbconfig --env command as described in the Installation Guide, run:
$STREAMBASE_HOME/terr/bin/terr --version
TIBCO customers can download TERR from edelivery.tibco.com, or download an evaluation copy of TERR from the TIBCO Access Point.
- For Linux
-
TERR is only provided for 64-bit Linux. Download the tar file provided. Untar the file into a temporary local directory, and run the
./INSTALL
file provided. The default installation directory is/opt/tibco/terr
, wherever
ver
is the TERR version number. - For Windows
-
Download the zip file provided; unzip the file to find a single installer executable. Run this installer and accept its suggested default location (
C:\Program Files\TIBCO\terr
) or install into the currently recommended location (ver
C:\TIBCO\terr
), wherever
ver
is the TERR version number.On Windows, the TERR installer provides both 32-bit and 64-bit versions of the TERR runtime code. When run on 64-bit Windows, the 64-bit version of TERR is automatically used. Since StreamBase supports only 64-bit Windows, it uses the 64-bit version of TERR.
- For macOS
-
Download the DMG file provided and run the installer.
To connect StreamBase and its TERR operator to your local TERR installation, you must either:
-
Set the TERR Home Path property in the Operator Properties tab of each operator's Properties view, providing the full, absolute path to the TERR installation directory.
-
Set the TERR_HOME environment variable to point to the full, absolute path of your alternate TERR installation directory. Use this method if you anticipate using many operator instances in your StreamBase applications.
The TERR operators recognize and honor the TERR_HOME environment variable if set, and if it points to a valid local TERR installation directory. However, setting TERR_HOME is not required.
On Windows and Linux, the TERR bin
directory does not need to be in the system PATH, and no environment variables are required.
To use TERR on macOS requires additional settings:
- For TERR versions 4.2 and 4.3, remove spaces in the path
-
TERR releases earlier than 4.4 require the path to the
TERR_HOME/lib
directory to have no spaces in the path. If you installed StreamBase using the DMG installer, your StreamBase home directory does contain spaces, and therefore so does the path to its embeddedterr/lib
subdirectory.To use the TERR operators under macOS with TERR versions 4.2 or 4.3, including with the embedded TERR Developer Edition, rename the folder containing StreamBase to remove spaces in the folder name. For example, change
TIBCO StreamBase 7.7.2
toTIBCOStreamBase7.7.2
. - Identify the location of the TERR native libraries
-
The operators must know where to find the dynamic libraries that implement TERR on macOS. You can use either of these methods:
- Set the DYLD_LIBRARY_PATH environment variable, OR
-
Configure your shell environment to include a line like the following:
export DYLD_LIBRARY_PATH=
/absolute/path/to/libs
See below for example paths. This environment variable method may be more convenient if you are developing or running several StreamBase applications that use TERR operators.
- Specify the library path in a configuration file
-
In your project's
sbd.sbconf
file (or one of its included.sbconf
files), use the<library>
child element of the<java-vm>
element to specify the path. For example:<java-vm> <library path="
/absolute/path/to/libs
"/> </java-vm>This setting must be configured in the configuration files for every StreamBase project that uses one of the TERR operators.
When using the embedded TERR Developer Edition, the value of /absolute/path/to/libs
is a path like the following:
/Users/sbuser
/Applications/TIBCOStreamBase7.7.2/terr/lib
When overriding the embedded TERR version with an external installation of TERR, the value of /absolute/path/to/libs
is like the following:
/Library/Frameworks/TERR.framework/Versions/version-number
/
Resources/lib/x86_64-apple-darwin
In the example above, the long line is broken into two for clarity. Enter this path as a single unbroken line.
This operator allows a stream of tuples to be evaluated by an external TERR process performing a predict operation, with the results returned as another stream of tuples.
The operator can instantiate multiple TERR instances to improve performance. When more than one instance is required, the tuple execution can no longer be guaranteed to be in order, as the operator now works asynchronously.
The input tuple's terrVars field is converted directly into a global TERR variable. A predict operation is then run in that environment and the result variable retrieved and converted to the output tuple.
All tuple entries that are to be read into the TERR process must be in a top level tuple named terrVars.
A list of integers can be sent using the tuple (1) or (list (1, 2, 3)) or the enhanced form (tuple myInts (names = ["one", "two"], values=[1,2])). All data types are supported with the exception of capture fields and functions.
Once the variables are sent to the TERR process, the model is executed and the result is retrieved.
To use a TERR Predict operator in a StreamBase EventFlow module, drag a token for the operator onto the canvas of your EventFlow Editor. Then select the newly placed operator to rename it and configure its properties.
The operator is a member of the Java Operators group in the Palette view in StreamBase Studio. Select the operator from the Insert an Operator or Adapter dialog. Invoke the dialog with one of the following methods:
-
Drag the Adapters, Java Operators token from the Operators and Adapters drawer of the Palette view to the canvas.
-
Click on the canvas where you want to place the operator, then invoke the keyboard shortcut
O V
. -
From the top-level menu, invoke
→ → .
When the dialog is open, enter terr
in the search field to narrow the list of operators.
This section describes the properties you can set for the TERR Predict Operator, using the various tabs of the Properties view in StreamBase Studio.
In the tables in this section, the Property column shows each property name as found in the one or more adapter properties tabs of the Properties view for this adapter.
Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Operator: A read-only field that shows the formal name of the operator.
Class: A field that shows the fully qualified class name that implements the functionality of this operator. Use this class name when loading the operator in StreamSQL programs with the APPLY JAVA statement. You can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start with application: If this field is set to Yes or to a module parameter that evaluates to true, an instance of this operator starts as part of the containing StreamBase Server. If this field is set to No or to a module parameter that evaluates to false, the adapter is loaded with the server, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager. With this option set to No or false, the operator does not start even if the application as a whole is suspended and later resumed. The recommended setting is selected by default.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Data Type | Description |
---|---|---|
Model | String | The model to load into each TERR instance at startup (RDS File). |
Model Name | String | This will be the R variable name set when loading this model. |
Predict Options | String | Specifies a comma-separated list of the predict method options to use. For example: 'interval="prediction", level = 0.99' |
Use Embedded TERR | Check box | When enabled, the operator uses the embedded TERR engine that is bundled with StreamBase (licensed for development use only). |
TERR Home Path | String | When not using the embedded TERR engine, you must supply the home path for the TERR installation to use. You can leave this blank if the TERR_HOME environment variable is set. |
Enable Status Port | Check box | When enabled, the adapter reports data on the status port regarding various adapter states. |
Log Level | Drop-down list | Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE, and ALL. |
Property | Data Type | Description |
---|---|---|
TERR Instances | Integer | The number of instances of the TERR engine to use with this adapter. NOTE: If greater than 1, the operator becomes asynchronous and tuple order is not guaranteed. |
Enable Timing | Check box | When enabled, the result tuples produced include timing information. |
Pause Before TERR Execution | Check box | If enabled the event flow operation will pause in debug mode to allow for user to execute R methods in the console on the current instance before executing the input tuple. |
Pause After TERR Execution | Check box | If enabled the event flow operation will pause in debug mode to allow for user to execute R methods in the console on the current instance after executing the input tuple. |
To TERR Date Format | String | The date format to use when converting tuple data into TERR. |
From TERR Date Format | String | The date format to use when converting TERR data into tuples. |
TERR Engine Parameters | String | The engine parameters to send into the TERR engine. |
TERR Java Home Path | String | The path to the Java Home to use with the TERR instance. If blank, the Java instance embedded with the StreamBase installation is used. |
TERR Java Options | String | The engine parameters to send into the TERR engine. |
TERR Instance Process Affinity | Map | The processor affinity to set for each instance of TERR. Instance values are matched to processors; you can specify an instance number more than once to have multiple processors. |
TERR Environment | Map | The environment to set for each instance of TERR. |
Use the Edit Schema tab to specify the schema of the output tuple for this adapter.
For general instructions on using the Edit Schema tab, see the Properties: Edit Schema Tab section of the Defining Input Streams page.
Use the Import proposed schemas link to import schemas as needed for the various TERR output types. The list of importable schemas is specified in the Definitions tab of the EventFlow Editor.
Only a single field is allowed in the output schema. This represents the result of an R predict execution that is retrieved after the execution of an input tuple.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Caution
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
Use the TERR Instances property on the Advanced tab to enable parallel processing into multiple TERR instances as needed. You can still use the Concurrency tab, but it will have very little impact on performance.
The TERR Predict operator has a single input port to handle all interactions. The schema for this can include any field, but the following are used by the operator; the remaining fields are passed through the operator into an inputTuple field on the output stream.
Field Name | Field Type | Description |
---|---|---|
terrVars | tuple | (Optional) The tuple data to convert into R variables. This field must be a tuple. Each field in the tuple is converted into an R variable based on the fields schema. |
rData | blob | (Optional) The R byte data to load as the new model. |
The TERR operator has two output ports: a data port and an optional status port.
The data port outputs the result of each call into the TERR engine. The resulting tuple contains two or three fields, depending on whether timing is enabled.
-
terrData — The result data pulled from TERR instance after execution. This field contains the values specified from the Edit Schema Tab. Each sub field of the terrData field represents a variable from the TERR instance.
-
inputTuple — This tuple contains all the fields from the input tuple.
-
(Optional) timing — This tuple contains some timing information to help gauge what might be the bottleneck in execution. The timing tuple contains the following fields:
-
eval — The time in nanoseconds it took for the TERR instance to evaluate and execute the R functions.
-
tupleToTerr — The time in nanoseconds it took to convert the input tuple into TERR data objects to send to the TERR instance.
-
terrToTuple — The time in nanoseconds it took to convert the TERR data objects from the TERR instance into the outbound tuple.
-
terrSetVariable — The time in nanoseconds it took to send the TERR data objects into the running TERR instance.
-
terrGetVariable — The time in nanoseconds it took to get the TERR data objects from the running TERR instance.
-
The status port emits tuples that describe the processing status for each input tuple. It is only present when the Enable Status Port property is selected. The schema of the output tuple consists of:
Field Name | Field Type | Description |
---|---|---|
type | String | The type of report, which follows normal log levels: DEBUG, ERROR, INFO, TRACE, and WARN. |
action | String | The action that caused the report. These can be Load R Data Objects , Init , or Execute .
|
object | String | An option object that has been affected by this status. |
Message | String | A human-readable status message. |
time | Tuple | The timestamp indicating when the status occurred. |
inputTuple | Tuple | The input tuple that caused this status message. NOTE: This value is null when loading initialization data. |
This section describes how data is converted from a tuple into Terr Data objects and back again.
This section describes how data is converted from Terr Data objects into a tuple result. Note that the best data conversion option is highlighted.
Note
Primitive types with NA or NaN for doubles will be converted to a null value in StreamBase
Terr Data Type | StreamBase Field Types |
---|---|
Terr Byte (vector byte) |
|
Terr Double (vector double) |
|
Terr Integer (vector integer) |
|
Terr String (vector string) |
|
Terr Logical (vector logical) |
|
Terr Factor |
|
Terr List |
|
Terr DataFrame |
|
Terr Generic |
|
This section describes how data is converted from a tuple into Terr Data objects.
Note
Primitive types (int, double, long, boolean) with a null value will be converted to NA or NaN for doubles in TERR
StreamBase Field Type | Terr Data Types |
---|---|
boolean | TerrLogical — NULL values are converted to NA values. |
list(boolean) | TerrLogical — NULL values are converted to NA values. |
tuple(names list(string), values list(boolean)) | TerrLogical — converts the list elements inside the tuple to a logical vector with names supplied. |
int | TerrInteger |
list(int) | TerrInteger |
tuple(names list(string), values list(int)) | TerrInteger — converts the list elements inside the tuple to a int vector with names supplied. |
long | TerrDouble |
list(long) | TerrDouble |
tuple(names list(string), values list(long)) | TerrDouble — converts the list elements inside the tuple to a double vector with names supplied. |
double | TerrDouble |
list(double) | TerrDouble |
tuple(names list(string), values list(double)) | TerrDouble — converts the list elements inside the tuple to a double vector with names supplied. |
blob | TerrByte |
list(blob) | TerrByte — All bytes from all the elements in the list are copied into a single Terr Byte |
tuple(names list(string), values list(blob)) | TerrByte — converts the list elements inside the tuple to a byte vector with names supplied. |
string | TerrString |
list(string) | TerrString |
tuple(names list(string), values list(string)) | TerrString — converts the list elements inside the tuple to a string vector with names supplied. |
timestamp | TerrString |
list(timestamp) | TerrString |
tuple(names list(string), values list(timestamp)) | TerrString — converts the list elements inside the tuple to a string vector with names supplied. |
tuple(names list(string), indexes list(int), levels list(string)) | TerrFactor — converts the list elements inside the tuple to a factor vector with names supplied. |
tuple(x string, y string, z double) | TerrData (DataFrame) — Each sub field of the tuple is converted to a field in the data frame with the tuples field name being the names supplied to the TerrData objects. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(tuple(x string, y string, z double)) | TerrList (List) — This will create a list with a single row with each tuple field used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(tuple(x list(string), y list(string), z list(double))) | TerrList (List) — This will create a list with multiple rows with each tuple field to create multiple rows used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(list(tuple(x string, y string, z double))) | TerrList (List) — This will create a list item for each item in the values list with a single row with each tuple field used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(list(tuple(x list(string), y list(string), z list(double)))) | TerrList (List) — This will create a list item for each item in the values list with each tuple field to create multiple rows used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
Function | Function fields not supported. |
Capture Field | Capture Fields are not supported. |
Typechecking fails when:
-
Any required fields are not filled in.
-
The Embedded Engine property is disabled and no TERR Home is set or no TERR_HOME environment variable is found.
-
Process Affinity is not an integer greater than 0.
-
The Model RDS data file specified cannot be located.
-
The Model Name is not specified.
-
The output schema contains more than one field.
-
The input schema is missing the terrVars field.
-
The input field terrVars is not a tuple.
-
The input field rData is not a blob.
On suspension, the TERR Predict operator finishes processing the current tuple or tuples (depending on the TERR instance count), outputs the result tuples, then pauses, waiting for input.
On resumption, the TERR Predict operator continues processing with the next input tuple.
The TERR instance or instances remain running during suspension.