Metrics
Contents
Architecture
Real-time metrics are captured during execution of all services in the ModelOps environment. These are two types of metrics captured:
- technical - metrics on resource consumption and response times. These metrics are used to determine if a deployed environment has the resources required to achieve its service level agreements (SLAs) defined by the business.
- model quality - qualitative measurement of how the model(s) in a scoring pipeline are performing. These metrics are used to improve the effectiveness of a deployed scoring flow and its associated models.
Metrics are reported by the ModelOps components as they execute. They are collected by a Metrics Store provided by Prometheus that is installed in the ModelOps cloud infrastructure.
LiveView monitors the Metrics Store in real-time and aggregates the raw metric values to provide support for a rich visualization of a subset of the metrics on the ModelOps UI.
LiveView
LiveView provides continuous query access to metric data captured by the Metrics Store to support real-time visualization directly in the ModelOps UI.
The metrics loaded from the Metrics Store into LiveView are controlled by a white-list file containing regular expressions; one per line in the file. All metric names matching the white-list regular expressions are loaded. The white-list file is shipped in the LiveView application archive.
The metrics data is stored in these LiveView tables:
- Metrics - stores metric values
- MetricsMetadata - stores metric meta-data
The Metrics table has these fields:
Field | Type | Description |
---|---|---|
EventTime | timestamp | Timestamp of value |
Name | string | Metric name (see Data Model) |
Label | string | Metric label, a comma separated list of <name> = <value> pairs identify a specific metric value (see Data Model) |
Value | double | Metric value |
The MetricsMetadata table has these fields:
Field | Type | Description |
---|---|---|
Name | string | Metric name |
Type | string | Metric type, one of counter, gauge, histogram, or summary (see Metric Types) |
Description | string | Metric description |
Units | string | Metric unit (see Base Units) |
A size limit for the Metrics table is automatically maintained using these LiveView alerts:
- Time window metrics trimming - removes all metrics older than a configurable time (defaults to 5 minutes).
- Memory limit metrics trimming - fail safe to ensure that metric table does not exceed a configurable maximum size even with time window trimming (defaults to 50 megabytes).
The is no size limit enforced on the MetricsMetadata table.
Technical
The technical metrics captured by the components in the ModelOps environment are summarized below. These technical metrics are used to support elastic scaling of ModelOps components as required using the standard Kubernetes Horizontal Pod Autoscaler (HPA).
General
Every running container has these metrics captured:
- Container Advisor - resource usage and performance characteristics.
- Kubernetes System Components - Kubernetes internals.
- Node Exporter - hardware and operating system.
- Kube DNS - DNS response times.
- Kube States - Kubernetes object states.
Scoring Flows
Containers running scoring flows capture these additional metrics:
Metric Name | Metric Type | Description |
---|---|---|
builtin_cpu_idle_utilization_percentage |
HISTOGRAM | Percent idle CPU utilization for machine hosting node |
builtin_cpu_system_utilization_percentage |
HISTOGRAM | Percent system CPU utilization for machine hosting node |
builtin_cpu_user_utilization_percentage |
HISTOGRAM | Percent user CPU utilization for machine hosting node |
builtin_engine_ <engine-name>_heap_memory_utilization_bytes |
HISTOGRAM | Heap memory used (bytes) for engine <engine-name> |
builtin_engine_ <engine-name>_heap_memory_utilization_percentage |
HISTOGRAM | Percent heap memory used for engine <engine-name> |
builtin_engine_ <engine-name>_queue_ <queue-name>_depth_second |
METER | Queue <queue-name> depth per second for engine <engine-name> |
builtin_engine_ <engine-name>_tuples_rate |
METER | Scoring pipeline request rate for engine <engine-name> |
builtin_node_shared_memory_kilobytes |
HISTOGRAM | Shared memory used (kilobytes) for node |
builtin_node_shared_memory_percentage |
HISTOGRAM | Percent shared memory used for node |
builtin_node_transactions_deadlocks_rate |
METER | Transaction deadlock rate for node |
builtin_node_transactions_latency_average_microseconds |
METER | Average transaction latency (microseconds) for node |
builtin_node_transactions_total_rate |
METER | Transaction rate for node |
Model Quality
Scoring flows may optionally publish calculated metrics to support monitoring of model quality. The metrics are calculated by comparing observed, or expected, values with predicted, or calculated, values. These are defined as:
- Observed Values - previously recorded desired values. Observed values are contained in the input request, along with the data to score.
- Predicted Values - values produced by scoring given input data with a model. Predicted values are contained in the response from a scoring server after scoring.
Calculated metrics are available in real-time via the metrics store, and also in the result data stored in a data sink.
The diagram above shows both the observed and input values received from a data source, which are then processed in a scoring flow. The Score processing step in the scoring flow adds the predicted values and model identifier to the request data after scoring, which is then passed on to the Compute Metrics processing step. The Compute Metrics processing step uses the observed and predicted values to calculate model quality. The calculated metrics are added to the results data and published to the metrics store in the Publish Metrics processing step. Finally the result data is sent to the data sink, where all values are stored to facilitate post-processing of the data.
The supported calculated metrics are summarized in the tables below for different model types.
Classification Models
Supported metrics for classification models.
Metric Name | Metric Type | Description |
---|---|---|
modelops_model_quality_classification_misclassification_rate |
GAUGE | Misclassification rate. The proportion of misclassified instances in the dataset scored by the classification model. |
modelops_model_quality_classification_chi_square |
GAUGE | Chi square. A measure of the difference between the observed and predicted frequencies of the outcomes of a set of input variables. |
modelops_model_quality_classification_g_square |
GAUGE | G square. The likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended. The G–test of goodness-of-fit is also known as the likelihood ratio test, the log-likelihood ratio test, or the G2 test and is preferred when the sample size is large. |
modelops_model_quality_classification_f1_score |
GAUGE | F1 score. A weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. |
Regression Models
Supported metrics for regression models.
Metric Name | Metric Type | Description |
---|---|---|
modelops_model_quality_regression_mean_error |
GAUGE | Mean error. The mean of the prediction errors for the dataset scored by the regression model. Here, a prediction error is the difference between the value predicted by the regression model and the actual or true value, available in the input row. e(i) = y_o(i) - y_p(i) e(i) = Prediction error for the ith row, y_p(i) = Predicted value for the ith row, y_o(i) = Observed value in the ith row, i = Row index (i.e. 1,2,3 … ∞) |
modelops_model_quality_regression_mean_absolute_error |
GAUGE | Mean absolute error. The mean of the absolute prediction errors for the dataset scored by the regression model. Here, an absolute error is the absolute difference between the value predicted by the regression model and the actual or true value, available in the input row. e(i) = abs(y_o(i) - y_p(i)), e(i) = Absolute prediction error for the ith row, y_p(i) = Predicted value for the ith row, y_o(i) = Observed value in the ith row, i = Row index (i.e. 1,2,3 … ∞) |
modelops_model_quality_regression_mean_squared_error |
GAUGE | Mean squared error. The mean of the squared prediction errors for the dataset scored by the regression model. Here, a squared prediction error represents the square of the difference between the value predicted by the regression model and the actual or true value, available in the input row. e(i) = (y_o(i) - y_p(i))^2, e(i) = Squared prediction error for the ith row, y_p(i) = Predicted value for the ith row, y_o(i) = Observed value in the ith row, i = Row index (i.e. 1,2,3 … ∞) |
modelops_model_quality_regression_root_mean_squared_error |
GAUGE | Root mean squared error. Square root of the mean squared error, also known as the standard error value for the y estimate. |
modelops_model_quality_regression_r_squared |
GAUGE | R squared. R2 (coefficient of determination) regression score function. R-square is a comparison of the residual sum of squares (SSE) with the total sum of squares(TSS). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the mean value of y, disregarding the input features, would get a R2 score of 0.0 |
modelops_model_quality_regression_sum_of_squared_errors |
GAUGE | Sum of squared errors. The sum of squares of residual errors for the dataset scored by the regression model. Here, a residual error is the difference between the actual or true value, available in the input row and value predicted by the regression mode. e(i)Residual = y_o(i) - y_p(i), e(i)Residual = Residual error for the ith row, y_o(i) = Observed value in the ith row, y_p(i) = Predicted value for the ith row, i = Row index (i.e. 1,2,3 … ∞) |
modelops_model_quality_regression_total_sum_of_squares |
GAUGE | Total sum of squares. The sum of squared differences between the actual or true values and their overall mean. |
Clustering Models
Supported metrics for clustering models.
Metric Name | Metric Type | Description |
---|---|---|
modelops_model_quality_clustering_sum_of_squared_errors |
GAUGE | Sum of squared error for clustering. A prototype-based cohesion measure where the squared Euclidean distance is used. For each point, the error is the distance to the nearest cluster. |
modelops_model_quality_clustering_silhouette_score |
GAUGE | Silhouette score. A measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other. The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar. |
Visualization
Visual model monitoring can be done directly against the Metrics Store using any tool that supports Prometheus, for example Grafana. This allows support for a broad range of model monitoring tools and rich customizations.
In addition, there is built-in support for visual model monitoring using a sub-set of the captured metrics described above. This provides an out-of-the-box high-level overview of model quality and a rough indication of resource utilization.
Metrics
All of the model quality metrics are available. In addition, these technical metrics are available:
builtin_engine_
<engine-name>_tuples.rate
builtin_node_transactions_latency_average_microseconds
builtin_engine_
<engine-name>_heap_memory_utilization_percentage
builtin_cpu_idle_utilization_percentage
builtin_cpu_system_utilization_percentage
builtin_cpu_user_utilization_percentage
Labels
All of the visualized metrics support these labels.
container
- scoring pipeline nameinstance
- scoring pipeline instancenamespace
- scoring pipeline namespacepod
- scoring flow pod name
Labels define separate indexes on the metrics to support aggregation of the values in an easy to understand visualization.