Scoring Pipeline

A scoring pipeline consists of a data source, a data sink, one or more scoring flows, and zero or more scoring services. When a scoring pipeline is deployed, the data source, data sink, each scoring flow, and any scoring services, are running in separate containers.

Architecture

Records

Data flowing through a scoring pipeline is represented as a record, which is a set of named data values. The data values are stored in fields that are accessed by name. The fields in a record are defined by a schema.

Record lifecycle is transparently managed by data sources and sinks. A record is created by a data source to send data to a scoring flow. When a record is received by a data sink it is destroyed. Records are never created or destroyed by scoring flows.

Execution Model

When a record is received by a scoring flow it is passed to each processing step in order.

Each processing step has access to all fields in the record and can add new fields, or modify existing field values. Fields cannot be removed from a record by a processing step.

Processing steps either pass on the input record immediately after processing, or they perform aggregation. When aggregation is being performed no record is passed downstream until some time later based on the aggregation criteria. Aggregation is performed by the Create Matrix or Python processing steps.

As mentioned in Records, records cannot be created or destroyed by a processing step, so in no case is more than one record sent downstream from a processing step - it is always one (no aggregation) or zero (aggregation) records.

Aggregation

Aggregation is supported by the Python and Create Matrix processing steps.

The Python processing step supports custom aggregation using the Python programming language.

The Create Matrix processing step supports aggregation of multiple input records into one output record that contains a Matrix. The number of input records accumulated before the Create Matrix processing step outputs a record containing the matrix is controlled by the output criteria in the Create Matrix processing step.

During aggregation the input records are added as rows to the matrix being accumulated. The input records may or may not be modified before they are added to the matrix.

A completed matrix is added to the output record as a new field making it available to all downstream processing steps and the data sink. The new matrix field is an array of objects, with each array element holding a row from the matrix.

The diagram below shows an example scoring flow performing aggregation with these processing steps.

  1. Input
  2. Create Matrix
  3. Output

The Create Matrix record output criteria are set such that the fourth record triggers emission of the matrix.

Aggregation

This table summarizes the actions associated with each input record:

Input Record Action Output Record
Record 1 Added as row one to matrix None
Record 2 Added as row two to matrix None
Record 3 Added as row three to matrix None
Record 4 Added as row four to matrix, accumulated matrix added to Record 4 in new field Record 4’(Matrix)

Supported Field Types

Record fields support these OpenAPI types:

OpenAPI Type OpenAPI Format Comments
boolean
integer int32 32 bit signed value
integer int64 64 bit signed value
number double Double precision floating point value
number float Single precision floating point value
string UTF 8 encoded character data
string Base64 encoded binary data (contentEncoding=base64)
string date RFC 3339 full-date
string date-time RFC 3339 date-time
array Supports all types in this table and can be nested
object Supports all types in this table and can be nested