Terminology

Logical Architecture

  • Approve - associate an environment with a scoring pipeline or data channel. This allows the scoring pipeline or data channel to be deployed into that environment. It also allows data channels to be exposed to that environment. Approval can happen in either a sandbox space or the published space.
  • Artifact - an object managed by ModelOps, similar to a file within a file system. All artifacts are contained in a project. An artifact can be created through user interaction with ModelOps or can be created using an external tool and imported into ModelOps. Examples are scoring pipelines, scoring flows, models, etc.
  • Data Channel - configurable and deployable component that maps between an external protocol and scoring flows.
  • Data Channel Metadata - data channel configuration that includes a unique identifier, a data schema, and searchable tags.
  • Data Sink - a data channel that consumes output data with a known schema. Scoring flow results are sent to a data sink.
  • Data Source - a data channel that provides input data with a known schema and a standard serialization format. Scoring flow input is read from a data source.
  • Deploy - start a scoring pipeline or a data channel in an environment.
  • Environment - named collection of resources required to execute a scoring pipeline or data channel.
  • Expose - make a data channel visible in an environment other than the one in which it is deployed.
  • Historic Data - data that was previously captured from an action or process and stored in a durable storage mechanism, for example a file or a database, or reference data created in an external system and made available to Modelops. Contrast with streaming data.
  • Job - a uniquely identified context started by the scheduling service that manages one or more tasks.
  • Matrix - a collection of rows and columns treated as a single entity during processing.
  • Model Monitoring - evaluating the performance, both technical (resource consumption, latency, availability, etc.) and quality (variance from expected, business impact, fairness, etc.), of a model. Model monitoring can occur in real-time using real-time metrics, or after the fact using historic data.
  • Model Runner - the part of a scoring service that supports execution of different model types. For example, a Python runner executes a Python model, a PMML runner executes a PMML model, etc.
  • Model - a mathematical specification of an analytical process to transform a set of input data to a set of output data. Example specification languages are PMML, Python, TensorFlow, etc.
  • Processing Step - an individual action in a scoring flow. The input to a processing step is all output data from the previous processing step. The processing step can transform, or augment, the input data before sending it to the next processing step. A processing step can access external services, for example a scoring service, to perform its function.
  • Project - a container for artifacts.
  • Publish - push changes in a project from a sandbox space to the published space.
  • Published Space - a space containing projects that are visible to all users with the required permissions.
  • Record - data flowing through a scoring pipeline is represented as a record, which is a set of named data values. The data values are stored in fields that are accessed by name. The fields in a record are defined by a schema.
  • Real-Time Metric - qualitative values captured during scoring pipeline execution providing Key Performance Indicators (KPI) that can be acted upon to improve quality-of-service.
  • Result Data - data available after scoring flow execution for post-processing analysis. Result data is created by data sinks and is available from the configured data sink storage mechanism.
  • Sandbox Space - a per-user private space where all active work occurs. Modified projects and artifacts in the sandbox space are only visible to the current user.
  • Scheduling Service - schedules and manages jobs to deploy scoring pipelines or data channels.
  • Schema - formal definition of the structure of data, defining types, constraints, and cardinality.
  • Scoring Flow - an ordered sequence of processing steps that operate on data received from a data source and sent to a data sink. The data flowing through a scoring flow can be transformed and augmented by processing steps.
  • Scoring Pipeline - a data source, a data sink, one or more scoring flows, and zero or more models used in a scoring flow. A scoring pipeline is started by the scheduling service as a job.
  • Scoring Service - manages execution of models in the context of a model runner.
  • Space - a container for projects. There is a Sandbox Space and a Published Space.
  • Streaming Data - data that is available immediately following an action or process, for example a Kafka message. Contrast with historic data.
  • Task - execution context for a scoring pipeline or a data channel instance. The execution context consists of the cloud resources required to execute a pipeline or a data channel.