Working with the Python Processing Step

This page explains the use of the Python processing step in TIBCO ModelOps scoring flows.

Overview
Accessing Upstream Data
Package Dependencies
Artifact Dependencies
Parameters
Script
Sending Data Downstream
Data Type Mapping
Technical Notes

Overview

The Python processing step is one of the several processing steps that can be used in a TIBCO ModelOps scoring flow. Zero or more Python processing steps can be used in a scoring flow. Once Python scripts and other artifacts have been added to the TIBCO ModelOps server, they can be used in a Python processing step. Each Python processing step can consume data from a processing step upstream and send data downstream.

Input Field Names

A Python processing step can access data from upstream steps by using the output field names of the upstream steps. Data from upstream steps, when selected through the UI, is made available in the Python execution context through a Python dictionary named flowInputVars with:

Keys set to the output field names of upstream steps
Values set to Python dictionaries containing data from the corresponding upstream steps

A dictionary that contains data from a given upstream step has:

Keys set to the field names defined in the output schema for that upstream step
Values being set to data for fields flowing from that upstream step

For example, the following Python code would access the SETOSA field from data from the Input step (output variable name: INPUT_DATA) and the LOG_X field from the data from the Python Step - 1 step (output variable name: PY_STEP_1):

# use flowInputVars to access data from a step
data_input_step = flowInputVars["INPUT_DATA"]
data_py_step_1 = flowInputVars["PY_STEP_1"]

# access SETOSA field from Input step
value_setosa = data_input_step["SETOSA"]

# access LOG_X field from Python Step - 1
value_log_x = data_py_step_1["LOG_X"]

Package Dependencies

If certain Python packages are required for the Python script used in a Python processing step to run successfully, those requirements can be specified in the Package Dependencies property through an artifact that follows the standard pip requirements file format. Specifying the package dependencies property can be skipped if no packages outside of the Standard Python library are needed.

Artifact Dependencies

If certain artifacts are required for the Python script used in a Python processing step to run successfully, those files can be specified through Artifact Dependencies. Specifying artifact dependencies is optional.

For example, the following Python code would access the CSV and JSON artifact dependencies described in the figure above:

import csv
import json

lookup_table = None
coefficient_table = None

# read & load look up table from JSON file
with open("lookup.json", "r") as json_file:
	lookup_table = json.load(json_file)

# load data from CSV file
with open("numbers.csv", "r") as csv_file:
	reader = csv.reader(csv_file)
    coefficient_table = list(reader)

# other useful code ...

Parameters

A collection of key-value pairs can be made available in the Python execution context by specifying them through the Parameters property. The properties are made available in the Python execution context through a Python dictionary named flowParameters.

For example, the following Python code would access the parameters described in the figure above:


# some useful code ...

slope_value = float(flowParameters["slope"])
intercept_value = float(flowParameters["intercept"])
skip_col = flowParameters["column_to_skip"]

# other useful code ...

Script

The script to be used in a Python processing step can be specified through the TIBCO ModelOps artifact. This script follows standard Python syntax. The extensions flowInputVars, flowParameters, and flowOutputVars are considered reserved.

Output Field Names

A Python processing step can send data downstream through a Python dictionary named flowOutputVars with:

Keys set to the field names defined in the output schema for the Python processing step
Values being to actual data to be sent downstream

The data sent through flowOutputVars should match the schema specified for the Python processing step.

For example, the following Python code would set values for id, str_value, double_value, array_of_strings, and user fields/keys and send them downstream of the Python processing step:

data_from_input_step = flowInputVars["INPUT"]
idx = data_from_input_step["id"]

# prepare data to be sent
metal = "gold"
weight = 22.4
labels = ["models", "classification", "python"]

user_object = {
  "age": 42,
  "name": "john smith",
  "email": "john@example.com"
}

#sent it downstream
flowOutputVars = {
  "id": idx,
  "str_value": metal,
  "double_value": weight,
  "array_of_strings": labels,
  "user": user_object
}

Python Processing Step Type Mapping

While receiving data into a Python processing step (through flowInputVars) and sending data from the Python processing step (through flowOutputVars), the mapping between Python data types and supported field types in ModelOps Scoring Pipelines is as follows:

Open API Type	Open API Format	Python Data Type	Comments
boolean		bool
integer	int32	int	32 bit signed value
integer	int64	int	64 bit signed value
number	double	float	Double precision floating point value
number	float	float	Single precision floating point value
string		str	UTF 8 encoded character data
string		bytearray	Base64 encoded binary data (`contentEncoding=base64`)
string	date	datetime	RFC 3339 full-date
string	date-time	datetime	RFC 3339 date-time
array		list	Supports all types in this table and can be nested
object		dict	Supports all types in this table and can be nested

Technical Notes

Execution Overview

The Python Processing step operates in three phases: an initialization phase, an execution phase, and a termination phase.

During the initialization phase, the user script, artifact dependencies specified, and package dependencies specified are made available to a Python engine that handles the processing for the Python processing step.

During the execution phase, the Python processing step processes input records as follows:

sets values for flowInputVars and flowParameters based on the record
runs the entire user-specified script using the exec() function
retrieves the value of flowOutputVars back into the record

During the termination phase (when the scoring pipeline is shut down), the Python engine servicing the Python processing step and the related execution context are ended.

The initialization and termination phases are opaque to the user scripts – the user script does not receive any notifications during those phases.

Persisting Values Over Records

To persist values over the span of processing of several input records, store them in the dictionary returned by a call to the globals() function.

The following example script maintains (in the globals() dictionary) a cumulative sum and count of the input field value using keys persistent_count and persistent_sum and outputs them through fields cumulative_count and cumulative_sum.

## a running count and a running sum are
## saved in the globals() dictionary to 
## persist the values across several records

data_from_input_step = flowInputVars["INPUT_DATA"]
value = data_from_input_step["value"]

global_vars = globals()
if "persistent_count" in global_vars:
  # subsequent records
  global_vars["persistent_count"] += 1
  global_vars["persistent_sum"] += value
else:
  # first data record
  global_vars["persistent_count"] = 1
  global_vars["persistent_sum"] = value


flowOutputVars = {
  "cumulative_count": global_vars["persistent_count"],
  "cumulative_sum": global_vars["persistent_sum"]
}

Python Interpreter Used

See the Platforms page for information on the version of Python used to run Python scripts in the Python processing step. Currently, there is no option to use a different version of Python to run scripts in Python processing steps.

TIBCO® ModelOps Documentation