Metrics

Architecture
- LiveView
Technical
Model Quality
Usage
Visualization

Architecture

Real-time metrics are captured during execution of all services in the ModelOps environment. These types of metrics are captured:

technical - metrics on resource consumption and response times. These metrics are used to determine if a deployed environment has the resources required to achieve its service level agreements (SLAs) defined by the business.
model quality - qualitative measurement of how the model(s) in a scoring pipeline are performing. These metrics are used to improve the effectiveness of a deployed scoring flow and its associated models.
usage - utilization information for deployed pipelines.

Metrics are reported by the ModelOps components as they execute. They are collected by a Metrics Store provided by Prometheus that is installed in the ModelOps cloud infrastructure.

LiveView monitors the Metrics Store in real-time and aggregates the raw metric values to provide support for a rich visualization of a subset of the metrics on the ModelOps UI.

Architecture

LiveView

LiveView provides continuous query access to metric data captured by the Metrics Store to support real-time visualization directly in the ModelOps UI.

The metrics loaded from the Metrics Store into LiveView are controlled by a white-list file containing regular expressions; one per line in the file. All metric names matching the white-list regular expressions are loaded. The white-list file is shipped in the LiveView application archive.

The metrics data is stored in these LiveView tables:

Metrics - stores metric values
MetricsMetadata - stores metric meta-data

The Metrics table has these fields:

Field	Type	Description
EventTime	timestamp	Timestamp of value
Name	string	Metric name (see Data Model)
container	string	scoring pipeline name label
instance	string	scoring pipeline instance label
namespace	string	scoring pipeline namespace label
pod	string	scoring flow pod name label
endpoint	string	IP addresses of the pod
OtherLabels	string	A JSON string that contains Metric labels that do not match a column. (see Data Model)
Value	double	Metric value

The MetricsMetadata table has these fields:

Field	Type	Description
Name	string	Metric name
Type	string	Metric type, one of counter, gauge, histogram, or summary (see Metric Types)
Description	string	Metric description
Units	string	Metric unit (see Base Units)

A size limit for the Metrics table is automatically maintained using these LiveView alerts:

Time window metrics trimming - removes all metrics older than a configurable time (defaults to 5 minutes).
Memory limit metrics trimming - fail safe to ensure that metric table does not exceed a configurable maximum size even with time window trimming (defaults to 50 megabytes).

The is no size limit enforced on the MetricsMetadata table.

Technical

The technical metrics captured by the components in the ModelOps environment are summarized below. These technical metrics are used to support elastic scaling of ModelOps components as required using the standard Kubernetes Horizontal Pod Autoscaler (HPA).

General

Every running container has these metrics captured:

Container Advisor - resource usage and performance characteristics.
Kubernetes System Components - Kubernetes internals.
Node Exporter - hardware and operating system.
Kube DNS - DNS response times.
Kube States - Kubernetes object states.

Scoring Flows

Containers running scoring flows capture these additional metrics:

Metric Name	Metric Type	Description
`builtin_cpu_idle_utilization_percentage`	HISTOGRAM	Percent idle CPU utilization for machine hosting node
`builtin_cpu_system_utilization_percentage`	HISTOGRAM	Percent system CPU utilization for machine hosting node
`builtin_cpu_user_utilization_percentage`	HISTOGRAM	Percent user CPU utilization for machine hosting node
`builtin_engine_`<engine-name>`_heap_memory_utilization_bytes`	HISTOGRAM	Heap memory used (bytes) for engine <engine-name>
`builtin_engine_`<engine-name>`_heap_memory_utilization_percentage`	HISTOGRAM	Percent heap memory used for engine <engine-name>
`builtin_engine_`<engine-name>`_queue_`<queue-name>`_depth_second`	METER	Queue <queue-name> depth per second for engine <engine-name>
`builtin_engine_`<engine-name>`_tuples_rate`	METER	Scoring pipeline request rate for engine <engine-name>
`builtin_node_shared_memory_kilobytes`	HISTOGRAM	Shared memory used (kilobytes) for node
`builtin_node_shared_memory_percentage`	HISTOGRAM	Percent shared memory used for node
`builtin_node_transactions_deadlocks_rate`	METER	Transaction deadlock rate for node
`builtin_node_transactions_latency_average_microseconds`	METER	Average transaction latency (microseconds) for node
`builtin_node_transactions_total_rate`	METER	Transaction rate for node

Model Quality

Scoring flows may optionally publish calculated metrics to support monitoring of model quality. The metrics are calculated by comparing observed, or expected, values with predicted, or calculated, values. These are defined as:

Observed Values - previously recorded desired values. Observed values are contained in the input request, along with the data to score.
Predicted Values - values produced by scoring given input data with a model. Predicted values are contained in the response from a scoring server after scoring.

Calculated metrics are available in real-time via the metrics store, visualized in the ModelOps UI, and also captured in the result data stored in a data sink.

Model Quality Metrics

The diagram above shows both the observed and input values received from a data source, which are then processed in a scoring flow. The Score processing step in the scoring flow adds the predicted values and model identifier to the request data after scoring, which is then passed on to the Compute Metrics processing step. The Compute Metrics processing step uses the observed and predicted values to calculate model quality. The calculated metrics are added to the results data and published to the metrics store in the Publish Metrics processing step. Finally the result data is sent to the data sink, where all values are stored to facilitate post-processing of the data.

The supported calculated metrics are summarized in the tables below for different model types.

Classification Models

Supported metrics for classification models.

Metric Name	Metric Type	Description
`modelops_model_quality_classification_misclassification_rate`	GAUGE	Misclassification rate. The proportion of misclassified instances in the dataset scored by the classification model.
`modelops_model_quality_classification_chi_square`	GAUGE	Chi square. A measure of the difference between the observed and predicted frequencies of the outcomes of a set of input variables.
`modelops_model_quality_classification_g_square`	GAUGE	G square. The likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended. The G–test of goodness-of-fit is also known as the likelihood ratio test, the log-likelihood ratio test, or the G2 test and is preferred when the sample size is large.
`modelops_model_quality_classification_f1_score`	GAUGE	F1 score. A weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal.

Regression Models

Supported metrics for regression models.

Metric Name	Metric Type	Description
`modelops_model_quality_regression_mean_error`	GAUGE	Mean error. The mean of the prediction errors for the dataset scored by the regression model. Here, a prediction error is the difference between the value predicted by the regression model and the actual or true value, available in the input row. e(i) = y_o(i) - y_p(i) e(i) = Prediction error for the ith row, y_p(i) = Predicted value for the ith row, y_o(i) = Observed value in the ith row, i = Row index (i.e. 1,2,3 … ∞)
`modelops_model_quality_regression_mean_absolute_error`	GAUGE	Mean absolute error. The mean of the absolute prediction errors for the dataset scored by the regression model. Here, an absolute error is the absolute difference between the value predicted by the regression model and the actual or true value, available in the input row. e(i) = abs(y_o(i) - y_p(i)), e(i) = Absolute prediction error for the ith row, y_p(i) = Predicted value for the ith row, y_o(i) = Observed value in the ith row, i = Row index (i.e. 1,2,3 … ∞)
`modelops_model_quality_regression_mean_squared_error`	GAUGE	Mean squared error. The mean of the squared prediction errors for the dataset scored by the regression model. Here, a squared prediction error represents the square of the difference between the value predicted by the regression model and the actual or true value, available in the input row. e(i) = (y_o(i) - y_p(i))^2, e(i) = Squared prediction error for the ith row, y_p(i) = Predicted value for the ith row, y_o(i) = Observed value in the ith row, i = Row index (i.e. 1,2,3 … ∞)
`modelops_model_quality_regression_root_mean_squared_error`	GAUGE	Root mean squared error. Square root of the mean squared error, also known as the standard error value for the y estimate.
`modelops_model_quality_regression_r_squared`	GAUGE	R squared. R2 (coefficient of determination) regression score function. R-square is a comparison of the residual sum of squares (SSE) with the total sum of squares(TSS). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the mean value of y, disregarding the input features, would get a R2 score of 0.0
`modelops_model_quality_regression_sum_of_squared_errors`	GAUGE	Sum of squared errors. The sum of squares of residual errors for the dataset scored by the regression model. Here, a residual error is the difference between the actual or true value, available in the input row and value predicted by the regression mode. e(i)Residual = y_o(i) - y_p(i), e(i)Residual = Residual error for the ith row, y_o(i) = Observed value in the ith row, y_p(i) = Predicted value for the ith row, i = Row index (i.e. 1,2,3 … ∞)
`modelops_model_quality_regression_total_sum_of_squares`	GAUGE	Total sum of squares. The sum of squared differences between the actual or true values and their overall mean.

Clustering Models

Supported metrics for clustering models.

Metric Name	Metric Type	Description
`modelops_model_quality_clustering_sum_of_squared_errors`	GAUGE	Sum of squared error for clustering. A prototype-based cohesion measure where the squared Euclidean distance is used. For each point, the error is the distance to the nearest cluster.
`modelops_model_quality_clustering_silhouette_score`	GAUGE	Silhouette score. A measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other. The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Usage

Usage metrics are captured for all deployed scoring pipelines. These metrics are used to report against subscription limits.

Usage Metrics

Usage metrics are available in real-time via the metrics store.

Metric Name	Metric Type	Description
`modelops_usage_input_record_count`	GAUGE	Total count of input records from data sources.
`modelops_usage_output_record_count`	GAUGE	Total count of output records to data sinks.
`modelops_usage_input_bytes`	GAUGE	Total size in bytes of input records from data sources.
`modelops_usage_output_bytes`	GAUGE	Total size in bytes of output records to data sources.

Labels

All of the usage metrics have these labels.

environment - scoring environment name in which pipeline was running
flowName - scoring flow name
jobIdentifier - internally assigned unique identifier for deployed scoring pipeline
jobName - scoring pipeline deployment name
username - user that deployed scoring pipeline

Visualization

Visual model monitoring can be done directly against the Metrics Store using any tool that supports Prometheus, for example Grafana. This allows support for a broad range of model monitoring tools and rich customizations.

In addition, there is built-in support for visual model monitoring using a sub-set of the captured metrics described above. This provides an out-of-the-box high-level overview of model quality and a rough indication of resource utilization.

Metrics

All of the model quality metrics are available. In addition, these technical metrics are available:

builtin_engine_<engine-name>_tuples.rate
builtin_node_transactions_latency_average_microseconds
builtin_engine_<engine-name>_heap_memory_utilization_percentage
builtin_cpu_idle_utilization_percentage
builtin_cpu_system_utilization_percentage
builtin_cpu_user_utilization_percentage

Labels

All of the visualized metrics support these labels.

container - scoring pipeline name
instance - scoring pipeline instance
namespace - scoring pipeline namespace
pod - scoring flow pod name

Labels define separate indexes on the metrics to support aggregation of the values in an easy to understand visualization.

TIBCO® ModelOps Documentation

Metrics

Contents

Architecture

LiveView

Technical

General

Scoring Flows

Model Quality

Classification Models

Regression Models

Clustering Models

Usage

Labels

Visualization

Metrics

Labels