Using the Python Operator

Introduction

The Python operator allows the application designer to execute arbitrary Python code within StreamBase applications. The purpose of these operators is to enable Python-centric teams to reuse their code without requiring major rewrites to execute event processing. This includes execution of models produced with SciPy and TensorFlow.

The Python stateful sessions are attached as child processes to StreamBase applications. Python operators interact with these sessions by setting input variables, executing the script, and reading output variables. The operator guarantees that all three operations are executed sequentially even if there are multiple operator instances touching the same session or the operator is running in asynchronous mode.

Python operators support any Python runtime compliant with Python 2.7 or 3.x. The script code must be compliant with the used runtime. That means it must use libraries and language structures available in the given runtime. The operator treats the script code as opaque and does not attempt to parse or compile it before sending it to the runtime. At the same time, all power of the selected runtime (libraries, Java classes in Jython, .NET access in IronPython) is accessible from the script.

Python Compatibility

The operator integration layer uses a minimal set of features from Python 2.7 and Python 3.x. It requires a pickle library and TCP/IP networking. The constructs used are compatible with Python 2.7 and 3.x.

Runtime Version Notes
Python 2 2.7.x  
Python 3 3.x.x Tested 3.4.x on CentOS 7 and 3.6.x on Windows 10.
PyPy 5.x Tested 5.0.1 on CentOS 7 and 5.9.0 on Windows 10.
Jython 2.7.0  
IronPython 2.7.7 Requires setting the useTempFile property in the configuration file to true.+

Data Conversion

The datatype passed from the inputVars field is inferred from the field type. When you define the datatype for the outputVars tuple fields, the operator runtime tries the best effort to cast the Python objects to StreamBase types. This table summarizes the conversion.

StreamBase type to Python from Python
boolean truth truth, int, float
int int truth, int, float
double float truth, int, float
string unicode (Python2), str (Python3) str, bytes, bytearray, unicode (Python2)
timestamp datetime.datetime (absolute), datetime.timedelta (interval) datetime.datetime, datetime.date, datetime.time (absolute), datetime.timedelta (interval)
blob bytes bytes, bytearray
list list list, tuple, array.array, materialized generator (list)
tuple dict dict
capture unsupported unsupported
function unsupported unsupported

Global Python Instance

Define Python instances in the adapter-configurations.xml configuration file or as local module instances. The latter approach allows you to define Python instances that are private to concurrent regions (for parallelism), but still shared by multiple operators (for example, to separate initialization from execution calls).

For launch parameter reference, please consult the Python documentation:

For configuration-defined Python instances, use the adapter-configuration element.

If a value is not present, the default is used. Those values listed without a default are required.

Property Type Default Description
instance string   This is the name that links the operators together and is displayed in the drop-down list on each operator's property configuration when using the global instance type.
executable string python Path to the Python executable. When absent, the instance is launched with the command, python.
workingDir string . Working directory for the launched process. When absent, the process is started in the same directory as parent StreamBase process.
useTempFile boolean false The flag indicating that the integration layer should create temporary file with Python code wrapping the interactions with StreamBase instead of pushing it through stdin. The latter (default) method works for most Python runtimes. Use this flag when launching IronPython.
captureOutput boolean false Modifies the stdout and stderr behavior. By default, both are chained to the parent's process stdout and stderr. For tests including output, it is recommended to capture this.
envVariables section   Environment variable to be passed/overridden launching the Python interpreter. Use the name attribute to provide name for variable and val value.
arguments section   Argument to the Python interpreter (not script). Can be defined multiple times. The common argument used is -u, which forces Python to use unbuffered stdin/stdout/stderr streams. Use the val attribute to provide a value.
    <adapter-configurations>
        <adapter-configuration name="python">
            <section name="python">
                <setting name="instance" val="python"/>
                <setting name="executable" val="C:/Python/python.exe"/>
                <setting name="workingDir" val="."/>
                <setting name="useTempFile" val="false"/>
                <setting name="captureOutput" val="false"/>
                <section name="envVariables">
                    <setting name="LD_LIBRARY_PATH" val="/opt/3rdparty/lib"/>
                </section>
                <section name="arguments">
                    <setting val="-u"/>
                </section>
            </section>
        </adapter-configuration>
    </adapter-configurations>

For Python instances defined in EventFlow, use the Python Instance operator. It uses the same parameters as the configuration file. The Python operators within the same EventFlow can refer to this instance by setting the Instance Type operator property to Local and supplying the instance name in the Local Instance Id property, where the name is the Python Instance name within the EventFlow.

Python Operator Properties

This section describes the properties you can set for the Python operator, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator. If this operator is a global Java operator or your own custom operator, then this field also shows the fully qualified class name that implements the functionality of this operator. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true, this instance of this operator starts as part of the JVM engine that runs this EventFlow fragment. If this field is set to No or to a module parameter that evaluates to false, the operator instance is loaded with the engine, but does not start until you send an epadmin container resume command (or its sbadmin equivalent), or until you start the component with StreamBase Manager.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Type Description
Instance Type radio button When Local is selected the operator will used the instance defined in the event flow using PythonInstance operator. When Global is selected the configuration defined in the adapter-configurations.xml file is used.
Local Instance ID text When Instance Type has Local selected this provides the name of the local Python Instance operator to use.
Global Instance ID text When Instance Type has Global selected this provides the name of the globally configured Python instance configured in the adapter-configurations.xml file.
Asynchronous check box When checked, the operator executes the script using a non-blocking call. This way, long operations can be executed without suspending the processing in the module. Make sure that module invariants are preserved around the call. Note that, contrary to the concurrent parallel execution in StreamBase, this operator does not allocate additional threads and uses lightweight job scheduling.
Log Level Drop-down list Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Script Tab

Property Type Description
Script multiline text Python code to be executed for each incoming tuple.

Output Tab

Property Type Description
Output variables schema definition Definition for the expected output variables. Each field defined for the schema corresponds to the Python session variable expected to be stored by this operator's script, or any previous call. The output variables must be of type castable to StreamBase field type. Check the type conversion matrix for hints about available types.

Input and Output Port

The input port accepts any incoming tuple transparently. The reserved fields are inputVars and outputVars.

  • inputVars — optional tuple containing variables to be set in the Python session.

  • outputVars — tuple of the structure defined in the Output Variables containing variables read from the Python session.

  • * arbitrary pass through parameters.

Unrecognized fields are transparently passed. The inputVars field is not propagated; the outputVars field is not allowed in the input port.