Working with the Python Step
This page explains the use of the Python processing step in TIBCO ModelOps scoring flows.
Contents
Overview
The Python step is one of the several processing steps that can be used in a TIBCO ModelOps scoring flow. Zero or more Python steps can be used in a scoring flow. Once Python scripts and other artifacts have been added to TIBCO ModelOps server, they can be used in a Python step. Each Python step can consume data from a processing step upstream and send data downstream.
Input Field Names
A Python step can access data from upstream steps by using the output field names of the upstream steps. Data from upstream steps, when selected through the UI, is made available in the Python execution context through a Python dictionary named flowInputVars with:
- keys set to the output field names of upstream steps
- values set to Python dictionaries containing data from the corresponding upstream steps
A dictionary that contains data from a given upstream step has:
- keys set to the field names defined in the output schema for that upstream step
- values being set to data for fields flowing from that upstream step
For example, the following Python code would access the SETOSA field from data from the Input step (output variable name: INPUT_DATA) and the LOG_X field from the data from the Python Step - 1 step (output variable name: PY_STEP_1):
# use flowInputVars to access data from a step data_input_step = flowInputVars["INPUT_DATA"] data_py_step_1 = flowInputVars["PY_STEP_1"] # access SETOSA field from Input step value_setosa = data_input_step["SETOSA"] # access LOG_X field from Python Step - 1 value_log_x = data_py_step_1["LOG_X"]
Package Dependencies
If certain Python packages are required for the Python script used in a Python step to run successfully, those requirements can be specified in the Package Dependencies property through an artifact that follows the standard pip requirements file format. Specifying the package dependencies property can be skipped if no packages outside of the Standard Python library are needed.
Artifact Dependencies
If certain artifacts are required for the Python script used in a Python step to run successfully, those files can be specified through Artifact Dependencies. Specifying artifact dependencies is optional.
For example, the following Python code would access the CSV and JSON artifact dependencies described in the figure above:
import csv import json lookup_table = None coefficient_table = None # read & load look up table from JSON file with open("lookup.json", "r") as json_file: lookup_table = json.load(json_file) # load data from CSV file with open("numbers.csv", "r") as csv_file: reader = csv.reader(csv_file) coefficient_table = list(reader) # other useful code ...
Parameters
A collection of key-value pairs can be made available in the Python execution context by specifying them through the Parameters property. The properties are made available in the Python execution context through a Python dictionary named flowParameters.
For example, the following Python code would access the parameters described in the figure above:
# some useful code ... slope_value = float(flowParameters["slope"]) intercept_value = float(flowParameters["intercept"]) skip_col = flowParameters["column_to_skip"] # other useful code ...
Script
The script to be used in a Python step can be specified through TIBCO ModelOps artifact. This script follows standard Python syntax; the extensions flowInputVars, flowParameters, and flowOutputVars are considered reserved and should not be used.
Output Field Names
A Python step can send data downstream by using the output field name specified. In the Python execution context, data can be set through a Python dictionary named flowOutputVars with:
- a key set to the output field name of the Python step
- the corresponding value set to a Python dictionary containing data to be sent downstream
The dictionary that contains data to be sent downstream has:
- keys set to the field names defined in the output schema for Output Variable
- values being to actual data to be sent downstream
The data sent through flowOutputVars should match the schema specified for the output variable.
For example, the following Python code would set values for SEPALLEN, PETALLEN, and IRISTYPE fields and send them downstream of the Python step (output variable name: IRIS_MODIFIED):
# prepare the data to be sent data_values = {} data_values["SEPALLEN"] = 1.2 data_values["PETALLEN"] = 4.5 data_values["IRISTYPE"] = "VERSICOLOR" # send it downstream flowOutputVars = {"IRIS_MODIFIED": data_values}