Terminology

Logical Architecture

  • Approve - associate an environment with a scoring pipeline. This allows the scoring pipeline to be deployed into that environment. Approval can happen in either a sandbox space or the published space.
  • Artifact - an object managed by ModelOps, similar to a file within a file system. All artifacts are contained in a project. An artifact can be created through user interaction with ModelOps or can be created using an external tool and imported into ModelOps. Examples are scoring pipelines, scoring flows, models, etc.
  • Data Channel - configurable and deployable component that maps between an external protocol and scoring flows.
  • Data Channel Metadata - data channel configuration that includes a unique identifier, a data schema, and searchable tags.
  • Data Sink - a data channel that consumes output data with a known schema. Scoring flow results are sent to a data sink.
  • Data Source - a data channel that provides input data with a known schema and a standard serialization format. Scoring flow input is read from a data source.
  • Deploy - start a scoring pipeline or a data channel in an environment.
  • Environment - named collection of resources required to execute a scoring pipeline. Scoring pipelines are promoted from one environment to another, for example Development, to Testing, to Production.
  • Historic Data - data that was previously captured from an action or process and stored in a durable storage mechanism, for example a file or a database, or reference data created in an external system and made available to Modelops. Contrast with streaming data.
  • Job - a uniquely identified context started by the scheduling service that manages one or more tasks.
  • Model Monitoring - evaluating the performance, both technical (resource consumption, latency, availability, etc.) and quality (variance from expected, business impact, fairness, etc.), of a model. Model monitoring can occur in real-time using real-time metrics, or after the fact using historic data.
  • Model Runner - the part of a scoring service that supports execution of different model types. For example, a Python runner executes a Python model, a PMML runner executes a PMML model, etc.
  • Model - a mathematical specification of an analytical process to transform a set of input data to a set of output data. Example specification languages are PMML, Python, TensorFlow, etc.
  • Processing Step - an individual action in a scoring flow. The input to a processing step is all output data from the previous processing step. The processing step can transform, or augment, the input data before sending it to the next processing step. A processing step can access external services, for example a scoring service, to perform its function.
  • Project - a container for artifacts.
  • Publish - push changes in a project from a sandbox space to the published space.
  • Published Space - a space containing projects that are visible to all users with the required permissions.
  • Real-Time Metric - qualitative values captured during scoring pipeline execution providing Key Performance Indicators (KPI) that can be acted upon to improve quality-of-service.
  • Result Data - data available after scoring flow execution for post-processing analysis. Result data is created by data sinks and is available from the configured data sink storage mechanism.
  • Sandbox Space - a per-user private space where all active work occurs. Modified projects and artifacts in the sandbox space are only visible to the current user.
  • Scheduling Service - schedules and manages jobs to deploy scoring pipelines or data channels.
  • Schema - formal definition of the structure of data, defining types, constraints, and cardinality.
  • Scoring Flow - an ordered sequence of processing steps that operate on data received from a data source and sent to a data sink. The data flowing through a scoring flow can be transformed and augmented by processing steps.
  • Scoring Pipeline - a data source, a data sink, one or more scoring flows, and zero or more models used in a scoring flow. A scoring pipeline is started by the scheduling service as a job.
  • Scoring Service - manages execution of models in the context of a model runner.
  • Space - a container for projects. There is a Sandbox Space and a Published Space.
  • Streaming Data - data that is available immediately following an action or process, for example a Kafka message. Contrast with historic data.
  • Task - execution context for a scoring pipeline or a data channel instance. The execution context consists of the cloud resources required to execute a pipeline or a data channel.