Monitoring

This section describes how to use the installed ModelOps components to support monitoring. How to access the services associated with the ModelOps components is described here.

Logging

All ModelOps components, including scoring pipelines, generate log records that are stored in logging store (ElasticSearch). These services are available for accessing the logging components:

Component Service Default Credentials (username/password)
Logging Store elasticsearch-es-http elastic/elastic
Logging Visualization kibana-kb-http elastic/elastic

Accessing Logging Store - ElasticSearch

//
//  Get elasticsearch-es-http service port
//
kubectl get services --namespace modelops elasticsearch-es-http 
NAME                    TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
elasticsearch-es-http   ClusterIP   10.0.86.229   <none>        9200/TCP   20d

//
//  Set up port-forward to local port 9200
//
kubectl port-forward service/elasticsearch-es-http --namespace modelops 9200:9200

//
//  Open browser window (macOS only)
//
open http://localhost:9200

Accessing Logging Visualization - Kibana

//
//  Get kibana-kb-http service port
//
kubectl get services --namespace modelops kibana-kb-http 
NAME             TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
kibana-kb-http   ClusterIP   10.0.11.207   <none>        80/TCP    20d

//
//  Set up port-forward to local port 9300
//
kubectl port-forward service/kibana-kb-http --namespace modelops 9300:80

//
//  Open browser window (macOS only)
//
open http://localhost:9300

Pod Logging

In addition to accessing logging records in the logging store, logging can also be accessed directly from a Pod using this command:

//
//  Follow log output - replace <pod-name> with actual Pod name
//
kubectl logs <pod-name> --namespace modelops --follow

See Service Pods for instructions on getting Pod names.

Metrics

The metrics architecture and metric names are here. These services are available for accessing the metrics components:

Component Service Default Credentials (username/password)
Metrics Store prometheus None
Metrics Visualization grafana admin/Surp1singlyG00d
Real-Time Metrics modelops-metrics None

Accessing Metrics Store - Prometheus

//
//  Get prometheus service port
//
kubectl get services --namespace modelops prometheus
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
prometheus   ClusterIP   10.0.109.248   <none>        9090/TCP   20d

//
//  Set up port-forward to local port 9090
//
kubectl port-forward service/prometheus --namespace modelops 9090:9090

//
//  Open browser window (macOS only)
//
open http://localhost:9090

Accessing Metrics Visualization - Grafana

//
//  Get grafana service port
//
kubectl get services --namespace modelops grafana
NAME      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
grafana   ClusterIP   10.0.9.59    <none>        80/TCP    20d

//
//  Set up port-forward to local port 9080
//
kubectl port-forward service/grafana --namespace modelops 9080:80

//
//  Open browser window (macOS only)
//
open http://localhost:9080

Accessing Real-Time Metrics - LiveView Web

//
//  Get real-time metrics service port
//
kubectl get services --namespace modelops modelops-metrics
NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
modelops-metrics   ClusterIP   10.0.42.12   <none>        80/TCP    20d

//
//  Set up port-forward to local port 9070
//
kubectl port-forward service/modelops-metrics --namespace modelops 9070:80

//
//  Open browser window (macOS only)
//
open http://localhost:9070

Scoring Pipelines and Data Channels

Scoring pipelines and data channels are started using Tekton pipelines and tasks which are created using Helm charts.

When a scoring pipeline or data channel is deployed a PipelineRun instance is created, along with associated TaskRun instances in the modelops namespace. The PipelineRun and TaskRun instances can be used to monitor the status of running scoring pipelines and data channels.

tkn must be installed on the local workstation. See the general installation instructions for details on installing Tekton.

Running Pipelines and Data Channels

The running scoring pipelines and data channels are displayed with:

//
//  Display all PipelineRun instances
//
tkn pipelinerun list --namespace modelops 

The PipelineRun naming conventions are:

  • file-datasink-* - file data sinks
  • file-datasource-* - file data sources
  • installation-* - ModelOps installation
  • kafka-datasink-* - Kafka data sinks
  • kafka-datasource-* - Kafka data sources
  • scoringpipeline-* - scoring pipelines

Logging

Logging output can be displayed for both PipelineRun and TaskRun instances using these commands:

//
//  Display PipelineRun logs - replace <pipelinerun-name> with actual PipelineRun name
//
tkn pipelinerun logs --namespace modelops <pipelinerun-name>

//
//  Display TaskRun logs - replace <taskrun-name> with actual TaskRun name
//
tkn taskrun logs --namespace modelops <taskrun-name>

Identifying Pods

Scoring pipelines and data channels run in a Pod. The Pod that was started by a PipelineRun to deploy the scoring pipeline or data channel can be determined using the jobIdentifier and namespace associate with the wait TaskRun.

The TaskRun instances associated with a PipelineRun are displayed the describe command. The wait TaskRun has a TASK NAME of wait in this command output.

//
//  Display TaskRuns associated with a PipelineRun
//  Replace <pipelinerun-name> with actual PipelineRun name
//
tkn pipelinerun describe --namespace modelops <pipelinerun-name>

For example:

tkn pipelinerun describe --namespace modelops kafka-datasource-bt2b9
Name:              kafka-datasource-bt2b9
Namespace:         modelops
Pipeline Ref:      install-datachannel
Service Account:   default
Labels:
 app.kubernetes.io/managed-by=Helm
 app.kubernetes.io/name=install-datachannel
 app.kubernetes.io/part-of=modelops
 tekton.dev/pipeline=install-datachannel

🌡️  Status

STARTED     DURATION   STATUS
1 day ago   ---        Running

📦 Resources

 No resources

⚓ Params

 NAME                   VALUE
 ∙ logLevel             INFO
 ∙ sourceUrls           [raw/commit/a75e7941e584889af60260883f4bd9c5c926fd76/sbhosle-race-cardata/car-kafka-source.datachannel.values.yaml]
 ∙ namespace            datachannels
 ∙ externalNamespaces   [development]
 ∙ deployParameters     []
 ∙ durationMinutes      0m
 ∙ trace                false

📝 Results

 No results

📂 Workspaces

 No workspaces

🗂  Taskruns

 NAME                                                 TASK NAME             STARTED     DURATION     STATUS
 ∙ kafka-datasource-bt2b9-wait-jgj2f                  wait                  1 day ago   ---          Running
 ∙ kafka-datasource-bt2b9-install-datachannel-87mrv   install-datachannel   1 day ago   11 seconds   Succeeded

⏭️  Skipped Tasks

 No Skipped Tasks

The wait TaskRun is named kafka-datasource-bt2b9-wait-jgj2f.

This command is then used to display the jobIdentifier and namespace:

//
//  Describe TaskRun details - replace taskrun-name with actual name
//
tkn taskrun describe --namespace modelops <taskrun-name>

For example:

tkn taskrun describe --namespace modelops kafka-datasource-bt2b9-wait-jgj2f 
Name:              kafka-datasource-bt2b9-wait-jgj2f
Namespace:         modelops
Task Ref:          wait-datachannel
Service Account:   default
Labels:
 app.kubernetes.io/managed-by=Helm
 app.kubernetes.io/name=wait-datachannel
 app.kubernetes.io/part-of=modelops
 tekton.dev/pipeline=install-datachannel
 tekton.dev/pipelineRun=kafka-datasource-bt2b9
 tekton.dev/pipelineTask=wait
 tekton.dev/task=wait-datachannel

🌡️  Status

STARTED     DURATION    STATUS
1 day ago   ---         Running

📨 Input Resources

 No input resources

📡 Output Resources

 No output resources

⚓ Params

 NAME                      VALUE
 ∙ jobIdentifier           kafka-datasource-bt2b9
 ∙ namespace               datachannels
 ∙ durationMinutes         0m
 ∙ install-error-message   

📝 Results

 No results

📂 Workspaces

 No workspaces

🦶 Steps

 NAME        STATUS
 ∙ install   Running

🚗 Sidecars

No sidecars

This output indicates that this Kafka Data Source is deployed in the datachannels namespace and has a jobIdentifier of kafka-datasource-bt2b9.

Finally the associated Pod can be found using this command:

//
//  Get all pods in namespace and filter by job-identifier prefix
//  Replace <namespace> and <job-identifier> with values found above
//
kubectl get pods --namespace <namespace> | grep <job-identifier>

Pod names started by TaskRuns are prefixed by the jobIdentifier.

For example:

kubectl get pods --namespace datachannels | grep kafka-datasource-bt2b9
kafka-datasource-bt2b9-b79bd6cc8-pnwpx    1/1     Running            0          31h