Monitoring

This section describes how to use the installed ModelOps components to support monitoring. How to access the services associated with the ModelOps components is described here.

Logging

All ModelOps components, including scoring pipelines, generate log records that are stored in logging store (ElasticSearch). These services are available for accessing the logging components:

Component Service Default Credentials (username/password)
Logging Store elasticsearch-es-http elastic/elastic
Logging Visualization kibana-kb-http elastic/elastic

Accessing Logging Store - ElasticSearch

  1. //
  2. // Get elasticsearch-es-http service port
  3. //
  4. kubectl get services --namespace modelops elasticsearch-es-http
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. elasticsearch-es-http ClusterIP 10.0.86.229 <none> 9200/TCP 20d
  7.  
  8. //
  9. // Set up port-forward to local port 9200
  10. //
  11. kubectl port-forward service/elasticsearch-es-http --namespace modelops 9200:9200
  12.  
  13. //
  14. // Open browser window (macOS only)
  15. //
  16. open http://localhost:9200

Accessing Logging Visualization - Kibana

  1. //
  2. // Get kibana-kb-http service port
  3. //
  4. kubectl get services --namespace modelops kibana-kb-http
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. kibana-kb-http ClusterIP 10.0.11.207 <none> 80/TCP 20d
  7.  
  8. //
  9. // Set up port-forward to local port 9300
  10. //
  11. kubectl port-forward service/kibana-kb-http --namespace modelops 9300:80
  12.  
  13. //
  14. // Open browser window (macOS only)
  15. //
  16. open http://localhost:9300

Pod Logging

In addition to accessing logging records in the logging store, logging can also be accessed directly from a Pod using this command:

  1. //
  2. // Follow log output - replace <pod-name> with actual Pod name
  3. //
  4. kubectl logs <pod-name> --namespace modelops --follow

See Service Pods for instructions on getting Pod names.

Metrics

The metrics architecture and metric names are here. These services are available for accessing the metrics components:

Component Service Default Credentials (username/password)
Metrics Store prometheus None
Metrics Visualization grafana admin/Surp1singlyG00d
Real-Time Metrics modelops-metrics None

Accessing Metrics Store - Prometheus

  1. //
  2. // Get prometheus service port
  3. //
  4. kubectl get services --namespace modelops prometheus
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. prometheus ClusterIP 10.0.109.248 <none> 9090/TCP 20d
  7.  
  8. //
  9. // Set up port-forward to local port 9090
  10. //
  11. kubectl port-forward service/prometheus --namespace modelops 9090:9090
  12.  
  13. //
  14. // Open browser window (macOS only)
  15. //
  16. open http://localhost:9090

Accessing Metrics Visualization - Grafana

  1. //
  2. // Get grafana service port
  3. //
  4. kubectl get services --namespace modelops grafana
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. grafana ClusterIP 10.0.9.59 <none> 80/TCP 20d
  7.  
  8. //
  9. // Set up port-forward to local port 9080
  10. //
  11. kubectl port-forward service/grafana --namespace modelops 9080:80
  12.  
  13. //
  14. // Open browser window (macOS only)
  15. //
  16. open http://localhost:9080

Accessing Real-Time Metrics - LiveView Web

  1. //
  2. // Get real-time metrics service port
  3. //
  4. kubectl get services --namespace modelops modelops-metrics
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. modelops-metrics ClusterIP 10.0.42.12 <none> 80/TCP 20d
  7.  
  8. //
  9. // Set up port-forward to local port 9070
  10. //
  11. kubectl port-forward service/modelops-metrics --namespace modelops 9070:80
  12.  
  13. //
  14. // Open browser window (macOS only)
  15. //
  16. open http://localhost:9070

Scoring Pipelines and Data Channels

Scoring pipelines and data channels are started using Tekton pipelines and tasks which are created using Helm charts.

When a scoring pipeline or data channel is deployed a PipelineRun instance is created, along with associated TaskRun instances in the modelops namespace. The PipelineRun and TaskRun instances can be used to monitor the status of running scoring pipelines and data channels.

tkn must be installed on the local workstation. See the general installation instructions for details on installing Tekton.

Running Pipelines and Data Channels

The running scoring pipelines and data channels are displayed with:

  1. //
  2. // Display all PipelineRun instances
  3. //
  4. tkn pipelinerun list --namespace modelops

The PipelineRun naming conventions are:

  • file-datasink-* - file data sinks
  • file-datasource-* - file data sources
  • installation-* - ModelOps installation
  • kafka-datasink-* - Kafka data sinks
  • kafka-datasource-* - Kafka data sources
  • scoringpipeline-* - scoring pipelines

Logging

Logging output can be displayed for both PipelineRun and TaskRun instances using these commands:

  1. //
  2. // Display PipelineRun logs - replace <pipelinerun-name> with actual PipelineRun name
  3. //
  4. tkn pipelinerun logs --namespace modelops <pipelinerun-name>
  5.  
  6. //
  7. // Display TaskRun logs - replace <taskrun-name> with actual TaskRun name
  8. //
  9. tkn taskrun logs --namespace modelops <taskrun-name>

Identifying Pods

Scoring pipelines and data channels run in a Pod. The Pod that was started by a PipelineRun to deploy the scoring pipeline or data channel can be determined using the jobIdentifier and namespace associate with the wait TaskRun.

The TaskRun instances associated with a PipelineRun are displayed the describe command. The wait TaskRun has a TASK NAME of wait in this command output.

  1. //
  2. // Display TaskRuns associated with a PipelineRun
  3. // Replace <pipelinerun-name> with actual PipelineRun name
  4. //
  5. tkn pipelinerun describe --namespace modelops <pipelinerun-name>

For example:

  1. tkn pipelinerun describe --namespace modelops kafka-datasource-bt2b9
  2. Name: kafka-datasource-bt2b9
  3. Namespace: modelops
  4. Pipeline Ref: install-datachannel
  5. Service Account: default
  6. Labels:
  7. app.kubernetes.io/managed-by=Helm
  8. app.kubernetes.io/name=install-datachannel
  9. app.kubernetes.io/part-of=modelops
  10. tekton.dev/pipeline=install-datachannel
  11.  
  12. 🌡️ Status
  13.  
  14. STARTED DURATION STATUS
  15. 1 day ago --- Running
  16.  
  17. 📦 Resources
  18.  
  19. No resources
  20.  
  21. Params
  22.  
  23. NAME VALUE
  24. logLevel INFO
  25. sourceUrls [raw/commit/a75e7941e584889af60260883f4bd9c5c926fd76/sbhosle-race-cardata/car-kafka-source.datachannel.values.yaml]
  26. namespace datachannels
  27. externalNamespaces [development]
  28. deployParameters []
  29. durationMinutes 0m
  30. trace false
  31.  
  32. 📝 Results
  33.  
  34. No results
  35.  
  36. 📂 Workspaces
  37.  
  38. No workspaces
  39.  
  40. 🗂 Taskruns
  41.  
  42. NAME TASK NAME STARTED DURATION STATUS
  43. kafka-datasource-bt2b9-wait-jgj2f wait 1 day ago --- Running
  44. kafka-datasource-bt2b9-install-datachannel-87mrv install-datachannel 1 day ago 11 seconds Succeeded
  45.  
  46. ⏭️ Skipped Tasks
  47.  
  48. No Skipped Tasks

The wait TaskRun is named kafka-datasource-bt2b9-wait-jgj2f.

This command is then used to display the jobIdentifier and namespace:

  1. //
  2. // Describe TaskRun details - replace taskrun-name with actual name
  3. //
  4. tkn taskrun describe --namespace modelops <taskrun-name>

For example:

  1. tkn taskrun describe --namespace modelops kafka-datasource-bt2b9-wait-jgj2f
  2. Name: kafka-datasource-bt2b9-wait-jgj2f
  3. Namespace: modelops
  4. Task Ref: wait-datachannel
  5. Service Account: default
  6. Labels:
  7. app.kubernetes.io/managed-by=Helm
  8. app.kubernetes.io/name=wait-datachannel
  9. app.kubernetes.io/part-of=modelops
  10. tekton.dev/pipeline=install-datachannel
  11. tekton.dev/pipelineRun=kafka-datasource-bt2b9
  12. tekton.dev/pipelineTask=wait
  13. tekton.dev/task=wait-datachannel
  14.  
  15. 🌡️ Status
  16.  
  17. STARTED DURATION STATUS
  18. 1 day ago --- Running
  19.  
  20. 📨 Input Resources
  21.  
  22. No input resources
  23.  
  24. 📡 Output Resources
  25.  
  26. No output resources
  27.  
  28. Params
  29.  
  30. NAME VALUE
  31. jobIdentifier kafka-datasource-bt2b9
  32. namespace datachannels
  33. durationMinutes 0m
  34. install-error-message
  35.  
  36. 📝 Results
  37.  
  38. No results
  39.  
  40. 📂 Workspaces
  41.  
  42. No workspaces
  43.  
  44. 🦶 Steps
  45.  
  46. NAME STATUS
  47. install Running
  48.  
  49. 🚗 Sidecars
  50.  
  51. No sidecars

This output indicates that this Kafka Data Source is deployed in the datachannels namespace and has a jobIdentifier of kafka-datasource-bt2b9.

Finally the associated Pod can be found using this command:

  1. //
  2. // Get all pods in namespace and filter by job-identifier prefix
  3. // Replace <namespace> and <job-identifier> with values found above
  4. //
  5. kubectl get pods --namespace <namespace> | grep <job-identifier>

Pod names started by TaskRuns are prefixed by the jobIdentifier.

For example:

  1. kubectl get pods --namespace datachannels | grep kafka-datasource-bt2b9
  2. kafka-datasource-bt2b9-b79bd6cc8-pnwpx 1/1 Running 0 31h