Contents
The TIBCO StreamBase® TERR operator for TIBCO Enterprise Runtime for R (TERR) allows StreamBase to use TIBCO's implementation of the R language to analyze and manipulate data.
In order to run correctly, the TERR operators assume that the machine running the StreamBase Runtime has a 64-bit version of TERR installed locally. The TERR operators were tested and validated with TERR version 4.5; the minimum supported TERR version is 4.2.
A copy of TERR Developer Edition is installed as part of your StreamBase installation. The Developer Edition edition has restricted license terms, as described on the StreamBase License Considerations page.
TERR Developer Edition is installed in the directory
and, as a convenience, the environment variable TERR_HOME is set to this directory when using a StreamBase Command Prompt on Windows.
STREAMBASE_HOME/terr
The operators locate the version of TERR to call using the following formula:
-
If the option Use Embedded TERR is selected on the Operator Properties tab of the operator's Properties view, then the embedded TERR engine is used. If not selected:
-
If you specify a path to a local TERR installation in the TERR Home Path property on the Operator Properties tab of the operator's Properties view, that version of TERR is used first.
-
If that property is left blank, the operator looks for a path specified in the
TERR_HOME
environment variable.
-
This sequence lets you override the embedded TERR version with any newer or older version, compared to the embedded version, that your application requires.
To determine the version of TERR installed with StreamBase:
On Windows, using a StreamBase Command Prompt, run:
%STREAMBASE_HOME%\terr\bin\TERR --version
On Linux or macOS using a configured shell, run:
$STREAMBASE_HOME/terr/bin/TERR --version
TIBCO customers can download TERR from edelivery.tibco.com, or download an evaluation copy of TERR from the TIBCO Access Point.
- For Linux
-
TERR is only provided for 64-bit Linux. Download the tar file provided. Untar the file into a temporary local directory, and run the
./INSTALL
file provided. The default installation directory is/opt/tibco/terr
, wherever
ver
is the TERR version number. - For Windows
-
Download the zip file provided; unzip the file to find a single installer executable. Run this installer, and install into the currently recommended location (
C:\TIBCO\terr
) instead of the installer's suggested default location (ver
C:\Program Files\TIBCO\terr
), wherever
ver
is the TERR version number.On Windows, the TERR installer provides both 32-bit and 64-bit versions of the TERR runtime code. When run on 64-bit Windows, the 64-bit version of TERR is automatically used. Since StreamBase supports only 64-bit Windows, it uses the 64-bit version of TERR.
- For macOS
-
Download the DMG file provided and run the installer.
To connect StreamBase and its TERR operator to your local TERR installation, you must either:
-
Set the TERR Home Path property in the Operator Properties tab of each operator's Properties view, providing the full, absolute path to the TERR installation directory.
-
Set the TERR_HOME environment variable to point to the full, absolute path of your alternate TERR installation directory. Use this method if you anticipate using many operator instances in your StreamBase applications.
The TERR operators recognize and honor the TERR_HOME environment variable if set, and if it points to a valid local TERR installation directory. However, setting TERR_HOME is not required.
On Windows and Linux, the TERR bin
directory does not need to be in the system PATH, and no environment variables are required.
To use TERR on macOS requires additional settings:
- For TERR versions 4.2 and 4.3, remove spaces in the path
-
TERR releases earlier than 4.4 require the path to the
TERR_HOME/lib
directory to have no spaces in the path. If you installed StreamBase using the DMG installer, your StreamBase home directory does contain spaces, and therefore so does the path to its embeddedterr/lib
subdirectory.To use the TERR operators under macOS with TERR versions 4.2 or 4.3, including with the embedded TERR Developer Edition, rename the folder containing StreamBase to remove spaces in the folder name. For example, change
TIBCO StreamBase 10.2.0
toTIBCOStreamBase10.2.0
. - Identify the location of the TERR native libraries
-
The operators must know where to find the dynamic libraries that implement TERR on macOS. You can use either of these methods:
- Set the DYLD_LIBRARY_PATH environment variable, OR
-
Configure your shell environment to include a line like the following:
export DYLD_LIBRARY_PATH=
/absolute/path/to/libs
See below for example paths. This environment variable method may be more convenient if you are developing or running several StreamBase Applications that use TERR operators.
- Specify the library path in a configuration file
-
Add an
externalNativeLibraryPath
object to a HOCON configuration file of typecom.tibco.ep.dtm.configuration.javaengine
. For example:name = "javaengine" version = "1.0.0" type = "com.tibco.ep.dtm.configuration.javaengine" configuration = { JavaEngine = { ... externalNativeLibraryPath = { "osx_x86_64" = [ "/absolute/path/to/libs" ] ... } } }
This setting must be configured for every EvemtFlow project that uses one of the TERR operators.
When using the embedded TERR Developer Edition, the value of /absolute/path/to/libs
is a path like the following:
/Users/sbuser
/Applications/TIBCOStreamBase10.2.0/terr/lib
When overriding the embedded TERR version with an external installation of TERR, the value of /absolute/path/to/libs
is like the following:
/Library/Frameworks/TERR.framework/Versions/version-number
/
Resources/lib/x86_64-apple-darwin
In the example above, the long line is broken into two for clarity. Enter this path as a single unbroken line.
This operator allows a stream of tuples to be evaluated by an external TERR process, with the results returned as another stream of tuples.
The operator can instantiate multiple TERR instances to improve performance. When more than one instance is required, the tuple execution can no longer be guaranteed to be in order, as the operator now works asynchronously.
The fields in the input tuples terrVars field are converted directly to global TERR variables. The script is then run in that environment and the result variable retrieved and converted to the output tuple. This allows the script to be very short; a simple function call is sufficient as long as the function is defined in the initially loaded model. Having the values directly converted to TERR variables greatly increases both the speed of processing and the size of the input that can be processed for each tuple.
All the tuple entries that are to be read into the TERR process must be in a top level tuple named terrVars.
Each element in this tuple is converted into a TERR variable.
A list of integers can be sent using the tuple (1) or (list (1, 2, 3)) or the enhanced form (tuple myInts (names = ["one", "two"], values=[1,2])). All StreamBase data types are supported with the exception of capture fields and functions. See the data conversion section of this document for more information.
Once the variables have been sent to the TERR process, the script is executed and the result is retrieved.
To use a TERR operator in a StreamBase EventFlow module, drag a token for the operator onto the canvas of your EventFlow Editor. Then select the newly placed operator to rename it and configure its properties.
The operator is a member of the Java Operators group in the Palette view in StreamBase Studio. Select the operator from the Insert an Operator or Adapter dialog. Invoke the dialog with one of the following methods:
-
Drag the Adapters, Java Operators token from the Operators and Adapters drawer of the Palette view to the canvas.
-
Click on the canvas where you want to place the operator, then invoke the keyboard shortcut
O V
. -
From the top-level menu, invoke
> > .
When the dialog is open, enter terr
in the search field to narrow the list of operators.
This section describes the properties you can set for the TERR operator, using the various tabs of the Properties view in StreamBase Studio.
In the tables in this section, the Property column shows each property name as found in the one or more adapter properties tabs of the Properties view for this adapter.
Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Operator: A read-only field that shows the formal name of the operator.
Class name: Shows the fully qualified class name that implements the functionality of this operator. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start options: This field provides a link to the Cluster Aware tab, where you configure the conditions under which this operator starts.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow Editor canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Data Type | Description |
---|---|---|
Use Embedded TERR | Check box | When enabled, the operator uses the embedded TERR engine that is bundled with StreamBase (licensed for development use only). |
TERR Home Path | String | When not using the embedded TERR engine, you must supply the home path for the TERR installation to use. You can leave this blank if the TERR_HOME environment variable is set. |
Default Script | String | The default script to execute. If a script is passed into the input port, it overrides this value except when doing init requests. |
Enable Status Port | Check box | When enabled, the adapter reports data on the status port regarding various adapter states. |
Log Level | INFO | Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE. |
Property | Data Type | Description |
---|---|---|
TERR Instances | Integer | The number of instances of the TERR engine to use with this adapter. NOTE: If greater than 1, the operator becomes asynchronous and tuple order is not guaranteed. |
Enable Timing | Check box | When enabled, the result tuples produced include timing information. |
Pause Before TERR Execution | Check box | If enabled, the EventFlow operation pauses in debug mode to allow you to execute R methods via a web console interface on the current instance before executing the input tuple. Please see the TERR Console Debugging section. |
Pause After TERR Execution | Check box | If enabled, the EventFlow operation pauses in debug mode to allow you user to execute R methods via a web console interface on the current instance after executing the input tuple. Please see the TERR Console Debugging section. |
Pause Execution Web Port | int | The port the web server will be started on to serve up the websocket terminal page to run the TERR Console on. Please see the TERR Console Debugging section. |
To TERR Date Format | String | The date format to use when converting tuple data into TERR. |
From TERR Date Format | String | The date format to use when converting TERR data into tuples. |
TERR Engine Parameters | String | The engine parameters to send into the TERR engine. |
TERR Java Home Path | String | The path to the Java Home to use with the TERR instance. If blank, the Java instance embedded with the StreamBase installation is used. |
TERR Java Options | String | The engine parameters to send into the TERR engine. |
TERR Instance Process Affinity | Map | The processor affinity to set for each instance of TERR. Instance values are matched to processors; you can specify an instance number more than once to have multiple processors. |
TERR Environment | Map | The environment to set for each instance of TERR. |
Use the Edit Schema tab to specify the schema of the output tuple for this adapter.
For general instructions on using the Edit Schema tab, see the Properties: Edit Schema Tab section of the Defining Input Streams page.
Use the Import proposed schemas link to import schemas as needed, for the various TERR output types. The list of importable schemas is specified in the Definitions tab of the EventFlow Editor.
Each field specified in the output schema represents an R variable that is retrieved after the R engine execution of an input tuple. The field name is used to match an R variable name to pull the values.
Use the AMS tab to specify which artifacts should be pulled from a running TIBCO Artifact Management Server, which is a separately installed product.
Note
If you deploy an artifact from the AMS system, it will first check your list of artifacts to match the path and if matched
will use the model name given. If the path is not matched, then the artifact's filename is used without the file extension
as the model name. Example sample/audit.rds
would resolve to a model name of audit
.
Property | Data Type | Description |
---|---|---|
Required On Startup | check box | When enabled, the artifacts listed are requested from AMS at initialization and the system waits until all artifacts are loaded. |
Artifacts | list (string, string) | List of artifacts to load from AMS. The first value of the path is the project name followed by the full path to the artifact.
Use a / separator with an optional @version at the end. If @version is not specified, then the latest version is assumed.
For example: |
Use the Initialize tab to specify one or more R data objects and variables to load at startup into each instance of TERR called by this TERR operator.
Note
The TERR initialize variables are loaded into each instance BEFORE the TERR Data objects are loaded.
Property | Data Type | Description |
---|---|---|
Initialize Objects | list(string, string) |
Specify one file per row in the File column, providing an optional name for the data object in the Name column. You can specify compressed or uncompressed R data object files, in plain text or serialized format, to be read by one of the read* functions in TERR. The specified files must be at the root of the Studio project folder, or in a directory along the project's resource search path. If specified files cannot be found, the result is a typecheck error. |
Initialize Variables | list(string, string) |
Specify one variable per row in the Name column, providing an expression in the Name column. All static data expressions are valid including tuple and list creation, along with standard data such as int, double, long, and quoted string. Example tuple creation tuple('value1' as x, 123 as y) the same data conversion is done on these variables as the input tuple terrVars field. See Data Type Conversion section. |
Use the settings in this tab to allow this operator or adapter to start and stop based on conditions that occur at runtime in a cluster with more than one node. During initial development of the fragment that contains this operator or adapter, and for maximum compatibility with TIBCO Streaming releases before 10.5.0, leave the Cluster start policy control in its default setting, Start with module.
Cluster awareness is an advanced topic that requires an understanding of StreamBase Runtime architecture features, including clusters, quorums, availability zones, and partitions. See Cluster Awareness Tab Settings on the Using Cluster Awareness page for instructions on configuring this tab.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Caution
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
Use the TERR Instances property on the Advanced tab to enable parallel processing into multiple TERR instances as needed. You can still use the Concurrency tab, but it will have very little impact on performance.
The TERR operator has a single input port to handle all interactions. The schema for this can include any field, but the following are used by the operator, with the remaining fields passed through the operator into an inputTuple field on the output stream.
Field Name | Field Type | Description |
---|---|---|
script | string | (Optional) This field contains the script to be executed by the TERR instance. If this script is null or empty and a default script was set, it will be used, unless the init flag is set. |
init | boolean | (Optional) If true the data is executed on all TERR instances. This field is commonly used to load models and scripts with
functions to be executed later.
NoteIf a script value and rData are both used in the same tuple the order of execution is script first and then rData |
rName | string | (Optional) The name of the R variable to set when loading R data objects using the rData blob input field.
NoteThis value is only used if the init value is true |
rData | blob | (Optional) The R byte data to load. The load is completed differently depending on whether rName is provided. If rName is
provided, the rData is loaded via an R command such as 'rName <- unserialize(rData)' . Otherwise the following is performed 'load(rData)'
NoteThis value is only used if the init value is true |
terrVars | tuple | (Optional) The tuple data to convert into R variables. This field must be a tuple and each field in the tuple is converted into an R variable based on the field's schema. |
terrInstance | int | Optional instance to send this tuple to. |
The TERR operator has two output ports: a data port and an optional status port.
The data port outputs the result of each call into the TERR engine. The result tuple contains two or three fields, depending on whether timing is enabled.
-
terrData — The result data pulled from TERR instance after execution, this field contains all the values specified from the Edit Schema Tab. Each subfield of the terrData field represents a variable from the TERR instance.
-
inputTuple — This tuple contains all the fields from the input tuple.
-
(Optionally) timing — This tuple contains some timing information to help gauge what might be the bottleneck in execution. The timing tuple contains the following fields:
-
eval — The time in nanoseconds it took for the TERR instance to evaluate and execute the R functions.
-
tupleToTerr — The time in nanoseconds it took to convert the input tuple into TERR data objects to send to the TERR instance.
-
terrToTuple — The time in nanoseconds it took to convert the TERR data objects from the TERR instance into the outbound tuple.
-
terrSetVariable — The time in nanoseconds it took to send the TERR data objects into the running TERR instance.
-
terrGetVariable — The time in nanoseconds it took to get the TERR data objects from the running TERR instance.
-
The status port emits tuples that describe the processing status for each input tuple. It is only present when the Enable Status Port property is selected. The schema of the output tuple consists of:
Field Name | Field Type | Description |
---|---|---|
type | String | The type of report, which follows normal log levels: DEBUG, ERROR, INFO, TRACE, and WARN. |
action | String | The action that caused the report. These can be Load R Data Objects , Init , or Execute .
|
object | String | An option object that has been affected by this status. |
Message | String | A human-readable status message. |
time | Tuple | The timestamp indicating when the status occurred. |
inputTuple | Tuple | The input tuple that caused this status message. NOTE: This value is null when loading initialization data. |
This section describes how to get the TERR engine execution to pause and be able to get the TERR console in a web browser.
-
Make sure at least one TERR operator has Pause Before TERR Execution or Pause After TERR Execution checked (or both).
-
Make sure the Pause Execution Web Port has a valid value.
-
Make sure to start the application in debug mode.
-
Open a web browser to localhost:{Pause Execution Web Port}.
-
Click on the endpoint to the operator you want to debug with the TERR console.
-
Make sure the web page terminal window says
Connected and waiting for debug breakpoint
. -
Now you can proceed to send a tuple into the TERR operator.
-
When a tuple goes into the TERR operator, the web page terminal window will display
[Pre or Post] Inst[X] TERR Command [q() to continue]>
at which point you can now perform valid R commands. -
When you are done, type
q()
to end and continue tuple execution.
Note
If you have both Pause Before and Pause After checked on the operator, you will be prompted twice and must perform q()
twice before tuple flow continues, because you are breaking before the tuple is executed by TERR and also after so you can
inspect the TERR instance state before and after.
This section describes how data is converted from a tuple into Terr Data objects and back again.
This section describes how data is converted from Terr Data objects into a tuple result. Note that the best data conversion option is highlighted.
Note
Primitive types with NA or NaN for doubles will be converted to a null value in StreamBase
Terr Data Type | StreamBase Field Types |
---|---|
Terr Byte (vector byte) |
|
Terr Double (vector double) |
|
Terr Integer (vector integer) |
|
Terr String (vector string) |
|
Terr Logical (vector logical) |
|
Terr Factor |
|
Terr List |
|
Terr DataFrame |
|
Terr Generic |
|
This section describes how data is converted from a tuple into Terr Data objects.
Note
Primitive types (int, double, long, boolean) with a null value will be converted to NA or NaN for doubles in TERR
StreamBase Field Type | Terr Data Types |
---|---|
boolean | TerrLogical — NULL values are converted to NA values. |
list(boolean) | TerrLogical — NULL values are converted to NA values. |
tuple(names list(string), values list(boolean)) | TerrLogical — converts the list elements inside the tuple to a logical vector with names supplied. |
int | TerrInteger |
list(int) | TerrInteger |
tuple(names list(string), values list(int)) | TerrInteger — converts the list elements inside the tuple to a int vector with names supplied. |
long | TerrDouble |
list(long) | TerrDouble |
tuple(names list(string), values list(long)) | TerrDouble — converts the list elements inside the tuple to a double vector with names supplied. |
double | TerrDouble |
list(double) | TerrDouble |
tuple(names list(string), values list(double)) | TerrDouble — converts the list elements inside the tuple to a double vector with names supplied. |
blob | TerrByte |
list(blob) | TerrByte — All bytes from all the elements in the list are copied into a single Terr Byte |
tuple(names list(string), values list(blob)) | TerrByte — converts the list elements inside the tuple to a byte vector with names supplied. |
string | TerrString |
list(string) | TerrString |
tuple(names list(string), values list(string)) | TerrString — converts the list elements inside the tuple to a string vector with names supplied. |
timestamp | TerrString |
list(timestamp) | TerrString |
tuple(names list(string), values list(timestamp)) | TerrString — converts the list elements inside the tuple to a string vector with names supplied. |
tuple(names list(string), indexes list(int), levels list(string)) | TerrFactor — converts the list elements inside the tuple to a factor vector with names supplied. |
tuple(x string, y string, z double) | TerrData (DataFrame) — Each sub field of the tuple is converted to a field in the data frame with the tuples field name being the names supplied to the TerrData objects. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(tuple(x string, y string, z double)) | TerrList (List) — This will create a list with a single row with each tuple field used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(tuple(x list(string), y list(string), z list(double))) | TerrList (List) — This will create a list with multiple rows with each tuple field to create multiple rows used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(list(tuple(x string, y string, z double))) | TerrList (List) — This will create a list item for each item in the values list with a single row with each tuple field used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
list(names list(string), values(list(tuple(x list(string), y list(string), z list(double)))) | TerrList (List) — This will create a list item for each item in the values list with each tuple field to create multiple rows used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list. |
Function | Function fields not supported. |
Capture Field | Capture Fields are not supported. |
Typechecking fails when:
-
Any required fields are not filled in.
-
The Embedded Engine property is disabled and no TERR Home is set or no TERR_HOME environment variable is found.
-
Process Affinity is not an integer greater than 0.
-
Any R data object file specified on the initialize tab cannot be located.
On suspension, the TERR operator finishes processing the current tuple or tuples (depending on the TERR instance count), outputs the result tuples, then pauses, waiting for input.
On resumption, the TERR operator continues processing with the next input tuple.
The TERR instance or instances remain running during suspension.