Working with the Python Step

This page explains the use of the Python processing step in TIBCO ModelOps scoring flows.

Contents

Overview

The Python step is one of the several processing steps that can be used in a TIBCO ModelOps scoring flow. Zero or more Python steps can be used in a scoring flow. Once Python scripts and other artifacts have been added to TIBCO ModelOps server, they can be used in a Python step. Each Python step can consume data from a processing step upstream and send data downstream.

Input Field Names

A Python step can access data from upstream steps by using the output field names of the upstream steps. Data from upstream steps, when selected through the UI, is made available in the Python execution context through a Python dictionary named flowInputVars with:

  • Keys set to the output field names of upstream steps
  • Values set to Python dictionaries containing data from the corresponding upstream steps

A dictionary that contains data from a given upstream step has:

  • Keys set to the field names defined in the output schema for that upstream step
  • Values being set to data for fields flowing from that upstream step

For example, the following Python code would access the SETOSA field from data from the Input step (output variable name: INPUT_DATA) and the LOG_X field from the data from the Python Step - 1 step (output variable name: PY_STEP_1):

# use flowInputVars to access data from a step
data_input_step = flowInputVars["INPUT_DATA"]
data_py_step_1 = flowInputVars["PY_STEP_1"]

# access SETOSA field from Input step
value_setosa = data_input_step["SETOSA"]

# access LOG_X field from Python Step - 1
value_log_x = data_py_step_1["LOG_X"]

Package Dependencies

If certain Python packages are required for the Python script used in a Python step to run successfully, those requirements can be specified in the Package Dependencies property through an artifact that follows the standard pip requirements file format. Specifying the package dependencies property can be skipped if no packages outside of the Standard Python library are needed.

Artifact Dependencies

If certain artifacts are required for the Python script used in a Python step to run successfully, those files can be specified through Artifact Dependencies. Specifying artifact dependencies is optional.

For example, the following Python code would access the CSV and JSON artifact dependencies described in the figure above:

import csv
import json

lookup_table = None
coefficient_table = None

# read & load look up table from JSON file
with open("lookup.json", "r") as json_file:
	lookup_table = json.load(json_file)

# load data from CSV file
with open("numbers.csv", "r") as csv_file:
	reader = csv.reader(csv_file)
    coefficient_table = list(reader)

# other useful code ...

Parameters

A collection of key-value pairs can be made available in the Python execution context by specifying them through the Parameters property. The properties are made available in the Python execution context through a Python dictionary named flowParameters.

For example, the following Python code would access the parameters described in the figure above:

# some useful code ...

slope_value = float(flowParameters["slope"])
intercept_value = float(flowParameters["intercept"])
skip_col = flowParameters["column_to_skip"]

# other useful code ...

Script

The script to be used in a Python step can be specified through the TIBCO ModelOps artifact. This script follows standard Python syntax; the extensions flowInputVars, flowParameters, and flowOutputVars are considered reserved.

Output Field Names

A Python step can send data downstream through a Python dictionary named flowOutputVars with:

  • Keys set to the field names defined in the output schema for the Python step
  • Values being to actual data to be sent downstream

The data sent through flowOutputVars should match the schema specified for the Python step.

For example, the following Python code would set values for id, str_value, double_value, array_of_strings, and user fields/keys and send them downstream of the Python step:

data_from_input_step = flowInputVars["INPUT"]
idx = data_from_input_step["id"]

# prepare data to be sent
metal = "gold"
weight = 22.4
labels = ["models", "classification", "python"]

user_object = {
  "age": 42,
  "name": "john smith",
  "email": "john@example.com"
}

#sent it downstream
flowOutputVars = {
  "id": idx,
  "str_value": metal,
  "double_value": weight,
  "array_of_strings": labels,
  "user": user_object
}

Python Step Type Mapping

While receiving data into a Python Step (through flowInputVars) and sending data from the Python Step (through flowOutputVars), the mapping between Python data types and supported field types in ModelOps Scoring Pipelines is as follows:

Open API Type Open API Format Python Data Type Comments
boolean bool
integer int32 int 32 bit signed value
integer int64 int 64 bit signed value
number double float Double precision floating point value
number float float Single precision floating point value
string str UTF 8 encoded character data
string bytearray Base64 encoded binary data (contentEncoding=base64)
string date datetime RFC 3339 full-date
string date-time datetime RFC 3339 date-time
array list Supports all types in this table and can be nested
object dict Supports all types in this table and can be nested