Working with Python models

This page explains how Python models can be used in TIBCO ModelOps Score Data Processing step of scoring flows. Details that apply to all models can be found on the Working with Models page.

Contents

Overview

Two types of Python models are supported in TIBCO ModelOps:

  • Python Script Models
  • Python Binary Models

For both types, input and output schemas have to be bound before they become available in the Score Data Processing step (see Binding Schemas to Models).

Python Binary Models

Python scikit-learn models can be persisted for future use after training. Scikit-learn Pipelines trained and persisted using joblib.dump can be used for scoring data in the Score Data Processing step. Adding dependencies to Python binary models does not apply and has no effect. The loaded pipeline's predict method is invoked to score input data.

Python Script Models

You can also use Python scripts as models to be used for scoring data in the Score Data Processing step. Artifact dependencies can be added to a Python script model through the Models tab on the TIBCO ModelOps UI.

Script

The Python script to be used as a model needs to contain exactly one instance of a scoring function. The scoring function is a Python function that implements the following signature (see Function Annotations). TIBCO ModelOps framework uses this signature to identify the function to use for scoring data.


def function_name(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]

Input data to be scored is passed as a numpy ndarray into the first parameter. The column names for the input data are passed as a string iterable into the second parameter. The function is expected to return a tuple with two values. The first value is expected to contain scores as a numpy ndarray. The second value is expected to contain column names for the scores as a string iterable. There are no specific restrictions on the name of the scoring function; it can be any valid Python function name. If the Python script (or its dependencies) does not contain a function matching the prescribed signature or has more than one instance, model loading fails.

For example:

import numpy as np

from typing import Iterable
from typing import Tuple

# do some useful model load-time initialization

def score(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]:

    # scoring logic goes in this function
	
	score_data = None # replace with np.ndarray
	score_columns = None # replace with column names
	
    return score_data, score_columns

# other useful Python script model code and functions
	

Package Dependencies

If certain Python packages are required for the Python model script to run successfully, those requirements can be specified through an artifact named requirements.txt added in the List of Dependencies that follows the standard pip requirements file format. Specifying the package dependencies can be skipped if no packages outside of the Standard Python library are needed.

Artifact Dependencies

If certain artifacts are required for the Python model script to run successfully, those files can be added to the List of Dependencies. Specifying artifact dependencies is optional.

For example, the following Python code would access the iris_svm.x and lookup.json artifact dependencies described in the figure above:

import json
import joblib
import numpy as np
import pandas as pd
from typing import Iterable
from typing import Tuple

# load and use iris-svm.x
model_file = "iris-svm.x"
user_model_object = joblib.load(model_file)

lookup_table = None
# read & load look up table from JSON file
with open("lookup.json", "r") as json_file:
	lookup_table = json.load(json_file)

# now use one of those artifacts while scoring
def score(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]:
    
    y1 = user_model_object.predict(data)
    y2 = user_model_object.predict_proba(data)
    y3 = pd.DataFrame([[x[0], *x[1]] for x in zip(y1, y2)]).values
    return y3, None

# other useful code ...

Data Type Mapping

While receiving data into the Score Data Processing step (for scoring with a Python model) and sending data from the Score Data Processing step (to downstream steps), the mapping between Python data types and supported field types in Scoring Pipelines is as follows:

Open API Type Open API Format Python Data Type Comments
boolean bool
integer int32 int 32 bit signed value
integer int64 int 64 bit signed value
number double float Double precision floating point value
number float float Single precision floating point value
string str UTF 8 encoded character data
string Not supported Base64 encoded binary data (contentEncoding=base64)
string date Not supported RFC 3339 full-date
string date-time Not supported RFC 3339 date-time
array Not supported
object dict Supports all types in this table and can be nested

Technical Notes

Execution Overview

When operating on Python models, the Score Data Processing step operates in three phases: a model initialization phase, a model scoring phase, and a model termination phase.

During the model initialization phase, the Python model, any artifact dependencies specified, and any package dependencies specified are made available to a Python engine that attempts to load the model. The Score Data Processing Step loads binary models using pre-packaged methods. The Score Data Processing Step loads script models as modules into the Python process.

During the model scoring phase, the Score Data Processing Step scores input records as follows:

  1. converts incoming data into a numpy array
  2. Sends the numpy array as input to:
    1. a pre-packaged scoring function for binary models
    2. the user-defined scoring function for script models
  3. converts and retrieves numpy array output back into the record

During the model termination phase (when the scoring pipeline is shut down), the Python engine servicing the Score Data Processing step and the related execution context are ended.

The initialization and termination phases are opaque to the Python models – the models do not receive any notifications during those phases.

Persisting Values Over Records

For Python Script models, Score Data Processing Step imports the user script containing the scoring function as a module. To persist values over the span of processing of several input records, store them in the module's namespace.

The following Python script model example maintains a cumulative sum and count of the input data in a module-scoped dictionary values_to_persist (using keys cumulative_sum and cumulative_count). The dictionary values_to_persist is initialized when the Python script is loaded as a module; the key-value pairs of that dictionary can be accessed and updated during subsequent calls to the scoring function.

## a running count and a running sum are
## saved in a dictionary in the module-scope
## to persist the values across several records

import numpy as np
from typing import Iterable
from typing import Tuple

# initialized once when the model loads
# ----
C_SUM = "cumulative_sum"
C_COUNT = "cumulative_count"
C_ID = "ID"
values_to_persist = {
  C_SUM: 0.0,
  C_COUNT: 0
}
SCORE_COLS = [C_SUM, C_COUNT, C_ID]
DUMMY_ID = -1
# ----

# the scoring function is called for every record
def score(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]:
  values_to_persist[C_SUM] += sum(data[:, 0])
  values_to_persist[C_COUNT] += len(data[:, 0])
  scores = np.array([[values_to_persist[C_SUM], values_to_persist[C_COUNT], DUMMY_ID]])
  return scores, SCORE_COLS

Python Interpreter Used

See the Platforms page for information on the version of Python used to load and score run Python models.