Working with Python models

This page explains how Python models can be used in TIBCO ModelOps Score Data step of scoring flows. Details that apply to all models can be found on the Working with Models page.

Contents

Overview

Two types of Python models are supported in TIBCO ModelOps:

  • Python Script Models
  • Python Binary Models

For both types, input and output schemas have to be bound before they become available in the Score Data step (see Binding Schemas to Models).

Python Binary Models

Python scikit-learn models can be persisted for future use after training. Scikit-learn Pipelines trained and persisted using joblib.dump can be used for scoring data in the Score Data step. Adding dependencies to Python binary models does not apply and has no effect. The loaded pipeline’s predict method is invoked to score input data.

Python Script Models

You can also use Python scripts as models to be used for scoring data in the Score Data step. Artifact dependencies can be added to a Python script model through the Models tab on the ModelOps UI.

Script

The Python script to be used as a model needs to contain exactly one instance of a scoring function. The scoring function is a Python function that implements the following signature (see Function Annotations). TIBCO ModelOps framework uses this signature to identify the function to use for scoring data.


def function_name(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]

Input data to be scored is passed as a numpy ndarray into the first parameter. The column names for the input data are passed as a string iterable into the second parameter. The function is expected to return a tuple with two values. The first value is expected to contain scores as a numpy ndarray; the second value is expected to contain column names for the scores as a string iterable. There are no specific restrictions on the name of the scoring function; it can be any valid Python function name. If the Python script (or its dependencies) does not contain a function matching the prescribed signature or has more than one instance, model loading fails.

For example:

import numpy as np

from typing import Iterable
from typing import Tuple

# do some useful model load-time initialization

def score(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]:

    # scoring logic goes in this function
	
	score_data = None # replace with np.ndarray
	score_columns = None # replace with column names
	
    return score_data, score_columns

# other useful Python script model code and functions
	

Package Dependencies

If certain Python packages are required for the Python model script to run successfully, those requirements can be specified through an artifact named requirements.txt added in the List of Dependencies that follows the standard pip requirements file format. Specifying the package dependencies can be skipped if no packages outside of the Standard Python library are needed.

Artifact Dependencies

If certain artifacts are required for the Python model script to run successfully, those files can be added to the List of Dependencies. Specifying artifact dependencies is optional.

For example, the following Python code would access the iris_svm.x and lookup.json artifact dependencies described in the figure above:

import json
import joblib
import numpy as np
import pandas as pd
from typing import Iterable
from typing import Tuple

# load and use iris-svm.x
model_file = "iris-svm.x"
user_model_object = joblib.load(model_file)

lookup_table = None
# read & load look up table from JSON file
with open("lookup.json", "r") as json_file:
	lookup_table = json.load(json_file)

# now use one of those artifacts while scoring
def score(data: np.ndarray, column_names: Iterable[str]) -> Tuple[np.ndarray, Iterable[str]]:
    
    y1 = user_model_object.predict(data)
    y2 = user_model_object.predict_proba(data)
    y3 = pd.DataFrame([[x[0], *x[1]] for x in zip(y1, y2)]).values
    return y3, None

# other useful code ...

Data Type Mapping

While receiving data into the Score Data step (for scoring with a Python model) and sending data from the Score Data step (to downstream steps), the mapping between Python data types and supported field types in ModelOps Scoring Pipelines is as follows:

Open API Type Open API Format Python Data Type Comments
boolean bool
integer int32 int 32 bit signed value
integer int64 int 64 bit signed value
number double float Double precision floating point value
number float float Single precision floating point value
string str UTF 8 encoded character data
string Not supported Base64 encoded binary data (contentEncoding=base64)
string date Not supported RFC 3339 full-date
string date-time Not supported RFC 3339 date-time
array Not supported
object dict Supports all types in this table and can be nested