Data Model

The data model of the entities managed by the ModelOps Server is shown in the diagram below. Subsequent sections describe each entity depicted in the diagram, including its core fields. One field is common to all entities is a ModelOps server-generated unique ID, which allows an entity to be renamed while preserving references to it.

Projects/Artifacts

The ModelOps server acts as a versioned file system whose directories and files are organized as a set of projects and artifacts, respectively.

Project

A Project is analogous to a top-level directory in a file system. Each project has a unique name, an optional description, and zero or more artifacts. When a Project is deleted, all Artifacts contained in that Project are deleted along with it.

Artifact

An Artifact is analogous to a file in a file system. Each Artifact is associated with exactly one Project. To maintain history, an Artifact manages one or more numbered ArtifactRevisions. As an Artifact’s content and metadata change over time, those changes are reflected in new ArtifactRevisions added to the list rooted at the Artifact. An Artifact’s path and version number are derived from its latest ArtifactRevision. The first ArtifactRevision associated with an Artifact is assigned revision number 1, the second revision number 2, and so on.

ArtifactRevision

An ArtifactRevision holds an Artifact’s content and metadata at a specific point in time. An ArtifactRevision is typically in one of two states: SANDBOX and PUBLISHED. An ArtifactRevision in the SANDBOX state is private to a specific ModelOps User, and its content and metadata can change without a new ArtifactRevision being created. The revision number of a SANDBOX ArtifactRevision is 0, indicating it has yet to be published. Once published, an ArtifactRevision is assigned the next revision number in the Artifact-specific sequence, and it becomes visible to other ModelOps users.

Each but the first ArtifactRevision for a given Artifact maintains a reference to its “parent” ArtifactRevision – the ArtifactRevision it was created from. This allows ModelOps to calculate the differences between any pair of ArtifactRevisions. All ArtifactRevisions maintain a reference to their parent Artifact.

An ArtifactRevision can have multiple forward-slash-delimited elements in its path, thereby simulating a hierarchical directory structure. The last element in the path is considered the ArtifactRevision’s name.

ArtifactContent

ModelOps uses a copy-on-write mechanism to optimize artifact storage. Thus, an ArtifactRevision’s content is actually maintained in a secondary ArtifactContent entity. Two ArtifactRevisions that share the same content contain references to the same ArtifactContent instance. Such sharing occurs when only metadata changes are made between two ArtifactRevisions.

Users/Roles/Permissions

ModelOps uses the Apache Shiro security framework to control access to its resources, including projects, artifacts, and environments. Shiro defines a standard three-tier architecture comprised of users, roles, and permissions, each of which manifests as a ModelOps entity.

User

Each person using ModelOps is represented by a User entity, which consists of a name, password, and a set of roles. User entities can be disabled or deleted, preventing a user from accessing ModelOps temporarily or permanently, respectively. To preserve the history of a deleted user’s activity, deleted User entities are marked as such but not actually removed from the system.

Role

ModelOps Users are assigned one or more Roles for the purpose of efficiently granting multiple users access to ModelOps resources. Each Role has a name, a set of Users assigned to the Role, and the resource(s) members of the Role are entitled to access, expressed as a set of Permissions. A User assigned to multiple Roles is granted access to the union of the resources associated with those Roles.

Permission

A Permission specifies the ModelOps resource(s) members of a Role are allowed to access. It includes a three-element permission string, composed of a resource category, an optional action, and an optional resource instance. For example, the “project:read:myProject” Permission string would grant read-only action to a project named myProject. An empty or missing action or resource instance implies all actions or instances, respectively. Thus, a role with the “project” Permission string would be able to perform any actions on all projects, while a role with “project::myProject” would be able to perform any action on the myProject Project. Note that the Permission strings “project”, “project::”, and “project::” are equivalent.

Lifecycle

The lifecycle of an Artifact refers to the state transitions it makes make during its life within ModelOps. Typically, a user creates a sandbox copy of a published ArtifactRevision, changes its content or metadata, and publishes the changes, thereby making them available to other ModelOps users through a new PUBLISHED ArtifactRevision.

Checkout

A Checkout is a per-User, per-Artifact collection of ArtifactRevisions being worked on. The ArtifactRevisions in a Checkout are in the SANDBOX state with a revision number of 0.

Commit

When a User publishes one or more checked-out ArtifactRevisions, a Commit is created. Commit entities include the User that published the changes, the user-supplied commit message, and the specific set of ArtifactRevisions that were published.

Not all ArtifactRevisions in a Checkout must be published together, though certain exceptions exist. In particular, an ArtifactRevision cannot be published if it would result in a reference to an ArtifactRevision in the SANDBOX state; in this case, the two ArtifactRevisions must be published together. The ModelOps server rejects publish requests that violate this rule.

Properties

Property entities are used to attach metadata in the form of key/value pairs to Projects and ArtifactRevisions. ModelOps provides a set of built-in properties and allows customers to define their own.

PropertyCategory

Properties are grouped into categories. ModelOps provides a set of built-in categories and allows customers to define their own. Each PropertyCategory has a name, description, and the set of properties that are group in that category.

PropertyDefinition

A PropertyDefinitions is the blueprint for stamping out property values, which are attached to Projects and ArtifactRevisions. A PropertyDefinition entity has many field, including:

  • its PropertyCategory
  • the property’s data type:
    • boolean
    • number
    • string
    • date
    • timestamp
    • enumeration
    • ArtifactRevision reference
  • whether a property can be assigned to a Project, an ArtifactRevision, or both
  • whether multiple instances of the property value can be assigned to a Project or ArtifactRevision
  • whether an ArtifactRevision inherits a property value from is parent ArtifactRevision
  • whether a property value can be changed on an ArtifactRevision in the PUBLISHED state

ProjectPropertyValue

A ProjectPropertyValue entity holds a property value associated with a Project. In addition to the property value itself, in contains the Project to which it is attached and the PropertyDefinition from which it was created.

ArtifactPropertyvalue

An ArtifactPropertyValue entity holds a property value associated with an ArtifactRevision. In addition to the property value itself, in contains the ArtifactRevision to which it is attached and the PropertyDefinition from which it was created.

Deployment

ModelOps supports two styles of deployment:

  • Deployment of data channels and scoring pipelines to the cloud
  • Deployment of analytic models to operators running within Streaming applications

This section covers the entities that support deployment of models to Streaming applications.

DeploymentDescriptor

A DeploymentDescriptor contains all the information required to deploy an analytic model stored as an ArtifactRevision within ModelOps to one or more streaming application operators. The source of the model is specified as a ModelOps Project, an Artifact within that Project, and a revision number, or -1 to deploy a model from the latest ArtifactRevision. The options for specifying the deployment destinations are described below.

DeploymentTarget

The DeploymentTarget entity is intended to promote reuse by allowing multiple DeploymentDescriptors to share a single DeploymentTarget to deploy various models to the same destination(s). The DeploymentTarget contains information describing the deployment destination(s), including one or Streaming URIs or service names and the Streaming operator name.

DeploymentServiceAddress

The DeploymentServiceAddress entity is an alternate way to specify the destination(s) of a model deployment and, like the DeploymentTarget entity, can be shared by multiple DeploymentDescriptors. The DeploymentServiceAddress has a name and optional description, a target host and port number, and a username and password.

DeploymentHistory

The DeploymentHistory entity captures the results of a model deployment, including whether the deployment was successful or not. It includes the User that initiated the deployment, the ID and revision number of the deployed model, and a copy of the information describing the deployment destination(s). Note that the destination information is copied to prevent the history entity from being invalidated should the deployment target entity be modified or deleted after it was used for a deployment.

DeploymentHistory entities also record the creation, modification, and deletion of DeploymentDescriptors, DeploymentTargets, and DeploymentServiceAddresses.

Miscellaneous

Group

With the Group entity, Artifacts can placed into collections to provide filtered views within the ModelOps UI. A Group is simply a named collection of Artifacts. A Group can remain private to its creator or can be shared with all ModelOps users.

Lock

A Lock entity allows a User to gain exclusive access to an Artifact, preventing other Users from reading or writing it. In addition to its name, which makes it easier to identify within the ModelOps UI, a Lock contains the User holding the it and a reference to the locked Artifact.

OpenIDConnectMapping

An OpenIDConnectMapping entity associates an OpenID Connect identify to a ModelOps User, allowing a User to authenticate against an OpenID Connect provider and thereby skip the ModelOps login dialog. Depending upon how ModelOps is configured, OpenIDConnectMapping records can be created dynamically after a successful authentication against a provider. The OpenIDConnectMapping entity includes the name, subject, and issuer of the provider, the identity of the user as known by the provider, and the ModelOps User that identity maps to.

SCMPluginStorage

ModelOps supports a plug-in architecture for exposing the content of an external source control management system, such as Git, as Projects and Artifacts within ModelOps. The SCMPluginStorage entity allows SCM plug-ins to persist state in ModelOps as key/value pairs. The SCMPluginStorage entity contains the name of the external repository along with the name and value of the stored data.

Subscription

ModelOps supports multiple simultaneous login sessions, each of which can update shared ModelOps state, such as the content of a Project or Artifact. To obviate client polling for detecting changes made by other clients, ModelOps provides a subscription mechanism that allows a client to be asynchronously notified when other clients change a resource of interest. The Subscription entity captures clients that have expressed an interest in a specific ModelOps resource. A Subscription contains the Project or Artifact of interest and the User(s) that receive notification when that Project or Artifact changes.