Scoring Pipeline
A scoring pipeline consists of a data source, a data sink, one or more scoring flows, and zero or more scoring services. When a scoring pipeline is deployed, the data source, data sink, each scoring flow, and any scoring services, are running in separate containers.
Records
Data flowing through a scoring pipeline is represented as a record, which is a set of named data values. The data values are stored in fields that are accessed by name. The fields in a record are defined by a schema.
Record lifecycle is transparently managed by data sources and sinks. A record is created by a data source to send data to a scoring flow. When a record is received by a data sink it is destroyed. Records are never created or destroyed by scoring flows.
Execution Model
When a record is received by a scoring flow it is passed to each processing step in order.
Each processing step has access to all fields in the record and can add new fields, or modify existing field values. Fields cannot be removed from a record by a processing step.
Processing steps either pass on the input record immediately after processing, or they perform aggregation. When aggregation is being performed no record is passed downstream until some time later based on the aggregation criteria. Aggregation is performed by the Create Matrix or Python processing steps.
As mentioned in Records, records cannot be created or destroyed by a processing step, so in no case is more than one record sent downstream from a processing step - it is always one (no aggregation) or zero (aggregation) records.
Aggregation
Aggregation is supported by the Python and Create Matrix processing steps.
The Python processing step supports custom aggregation using the Python programming language.
The Create Matrix processing step supports aggregation of multiple input records into one output record that contains a Matrix. The number of input records accumulated before the Create Matrix processing step outputs a record containing the matrix is controlled by the output criteria in the Create Matrix processing step.
During aggregation the input records are added as rows to the matrix being accumulated. The input records may or may not be modified before they are added to the matrix.
A completed matrix is added to the output record as a new field making it available to all downstream processing steps and the data sink. The new matrix field is an array of objects, with each array element holding a row from the matrix.
The diagram below shows an example scoring flow performing aggregation with these processing steps.
- Input
- Create Matrix
- Output
The Create Matrix record output criteria are set such that the fourth record triggers emission of the matrix.
This table summarizes the actions associated with each input record:
Input Record | Action | Output Record |
---|---|---|
Record 1 | Added as row one to matrix | None |
Record 2 | Added as row two to matrix | None |
Record 3 | Added as row three to matrix | None |
Record 4 | Added as row four to matrix, accumulated matrix added to Record 4 in new field | Record 4’(Matrix) |
Supported Field Types
Record fields support these OpenAPI types:
OpenAPI Type | OpenAPI Format | Comments |
---|---|---|
boolean | ||
integer | int32 | 32 bit signed value |
integer | int64 | 64 bit signed value |
number | double | Double precision floating point value |
number | float | Single precision floating point value |
string | UTF 8 encoded character data | |
string | Base64 encoded binary data (contentEncoding=base64 ) | |
string | date | RFC 3339 full-date |
string | date-time | RFC 3339 date-time |
array | Supports all types in this table and can be nested | |
object | Supports all types in this table and can be nested |