Monitoring

This section describes how to use the installed ModelOps components to support monitoring. How to access the services associated with the ModelOps components is described here.

Logging

All ModelOps components, including scoring pipelines, generate log records that are stored in logging store (ElasticSearch). These services are available for accessing the logging components:

Component Service Default Credentials (username/password)
Logging Store elasticsearch-es-http elastic/elastic
Logging Visualization kibana-kb-http elastic/elastic

Accessing Logging Store - ElasticSearch

  1. //
  2. // Get elasticsearch-es-http service port
  3. //
  4. kubectl get services --namespace modelops elasticsearch-es-http
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. elasticsearch-es-http ClusterIP 10.0.86.229 <none> 9200/TCP 20d
  7. //
  8. // Set up port-forward to local port 9200
  9. //
  10. kubectl port-forward service/elasticsearch-es-http --namespace modelops 9200:9200
  11. //
  12. // Open browser window (macOS only)
  13. //
  14. open http://localhost:9200

Accessing Logging Visualization - Kibana

  1. //
  2. // Get kibana-kb-http service port
  3. //
  4. kubectl get services --namespace modelops kibana-kb-http
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. kibana-kb-http ClusterIP 10.0.11.207 <none> 80/TCP 20d
  7. //
  8. // Set up port-forward to local port 9300
  9. //
  10. kubectl port-forward service/kibana-kb-http --namespace modelops 9300:80
  11. //
  12. // Open browser window (macOS only)
  13. //
  14. open http://localhost:9300

Pod Logging

In addition to accessing logging records in the logging store, logging can also be accessed directly from a Pod using this command:

  1. //
  2. // Follow log output - replace <pod-name> with actual Pod name
  3. //
  4. kubectl logs <pod-name> --namespace modelops --follow

See Service Pods for instructions on getting Pod names.

Metrics

The metrics architecture and metric names are here. These services are available for accessing the metrics components:

Component Service Default Credentials (username/password)
Metrics Store prometheus None
Real-Time Metrics modelops-metrics None

Accessing Metrics Store - Prometheus

  1. //
  2. // Get prometheus service port
  3. //
  4. kubectl get services --namespace modelops prometheus
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. prometheus ClusterIP 10.0.109.248 <none> 9090/TCP 20d
  7. //
  8. // Set up port-forward to local port 9090
  9. //
  10. kubectl port-forward service/prometheus --namespace modelops 9090:9090
  11. //
  12. // Open browser window (macOS only)
  13. //
  14. open http://localhost:9090

Accessing Real-Time Metrics - LiveView Web

  1. //
  2. // Get real-time metrics service port
  3. //
  4. kubectl get services --namespace modelops modelops-metrics
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. modelops-metrics ClusterIP 10.0.42.12 <none> 80/TCP 20d
  7. //
  8. // Set up port-forward to local port 9070
  9. //
  10. kubectl port-forward service/modelops-metrics --namespace modelops 9070:80
  11. //
  12. // Open browser window (macOS only)
  13. //
  14. open http://localhost:9070

Scoring Pipelines and Data Channels

Scoring pipelines and data channels are started using Tekton pipelines and tasks which are created using Helm charts.

When a scoring pipeline or data channel is deployed a PipelineRun instance is created, along with associated TaskRun instances in the modelops namespace. The PipelineRun and TaskRun instances can be used to monitor the status of running scoring pipelines and data channels.

tkn must be installed on the local workstation. See the general installation instructions for details on installing Tekton.

Running Pipelines and Data Channels

The running scoring pipelines and data channels are displayed with:

  1. //
  2. // Display all PipelineRun instances
  3. //
  4. tkn pipelinerun list --namespace modelops

The PipelineRun naming conventions are:

  • file-datasink-* - file data sinks
  • file-datasource-* - file data sources
  • installation-* - ModelOps installation
  • kafka-datasink-* - Kafka data sinks
  • kafka-datasource-* - Kafka data sources
  • scoringpipeline-* - scoring pipelines

Logging

Logging output can be displayed for both PipelineRun and TaskRun instances using these commands:

  1. //
  2. // Display PipelineRun logs - replace <pipelinerun-name> with actual PipelineRun name
  3. //
  4. tkn pipelinerun logs --namespace modelops <pipelinerun-name>
  5. //
  6. // Display TaskRun logs - replace <taskrun-name> with actual TaskRun name
  7. //
  8. tkn taskrun logs --namespace modelops <taskrun-name>

Identifying Pods

Scoring pipelines and data channels run in a Pod. The Pod that was started by a PipelineRun to deploy the scoring pipeline or data channel can be determined using the jobIdentifier and namespace associate with the wait TaskRun.

The TaskRun instances associated with a PipelineRun are displayed the describe command. The wait TaskRun has a TASK NAME of wait in this command output.

  1. //
  2. // Display TaskRuns associated with a PipelineRun
  3. // Replace <pipelinerun-name> with actual PipelineRun name
  4. //
  5. tkn pipelinerun describe --namespace modelops <pipelinerun-name>

For example:

  1. tkn pipelinerun describe --namespace modelops kafka-datasource-bt2b9
  2. Name: kafka-datasource-bt2b9
  3. Namespace: modelops
  4. Pipeline Ref: install-datachannel
  5. Service Account: default
  6. Labels:
  7. app.kubernetes.io/managed-by=Helm
  8. app.kubernetes.io/name=install-datachannel
  9. app.kubernetes.io/part-of=modelops
  10. tekton.dev/pipeline=install-datachannel
  11. 🌡️ Status
  12. STARTED DURATION STATUS
  13. 1 day ago --- Running
  14. 📦 Resources
  15. No resources
  16. Params
  17. NAME VALUE
  18. logLevel INFO
  19. sourceUrls [raw/commit/a75e7941e584889af60260883f4bd9c5c926fd76/sbhosle-race-cardata/car-kafka-source.datachannel.values.yaml]
  20. namespace datachannels
  21. externalNamespaces [development]
  22. deployParameters []
  23. durationMinutes 0m
  24. trace false
  25. 📝 Results
  26. No results
  27. 📂 Workspaces
  28. No workspaces
  29. 🗂 Taskruns
  30. NAME TASK NAME STARTED DURATION STATUS
  31. kafka-datasource-bt2b9-wait-jgj2f wait 1 day ago --- Running
  32. kafka-datasource-bt2b9-install-datachannel-87mrv install-datachannel 1 day ago 11 seconds Succeeded
  33. ⏭️ Skipped Tasks
  34. No Skipped Tasks

The wait TaskRun is named kafka-datasource-bt2b9-wait-jgj2f.

This command is then used to display the jobIdentifier and namespace:

  1. //
  2. // Describe TaskRun details - replace taskrun-name with actual name
  3. //
  4. tkn taskrun describe --namespace modelops <taskrun-name>

For example:

  1. tkn taskrun describe --namespace modelops kafka-datasource-bt2b9-wait-jgj2f
  2. Name: kafka-datasource-bt2b9-wait-jgj2f
  3. Namespace: modelops
  4. Task Ref: wait-datachannel
  5. Service Account: default
  6. Labels:
  7. app.kubernetes.io/managed-by=Helm
  8. app.kubernetes.io/name=wait-datachannel
  9. app.kubernetes.io/part-of=modelops
  10. tekton.dev/pipeline=install-datachannel
  11. tekton.dev/pipelineRun=kafka-datasource-bt2b9
  12. tekton.dev/pipelineTask=wait
  13. tekton.dev/task=wait-datachannel
  14. 🌡️ Status
  15. STARTED DURATION STATUS
  16. 1 day ago --- Running
  17. 📨 Input Resources
  18. No input resources
  19. 📡 Output Resources
  20. No output resources
  21. Params
  22. NAME VALUE
  23. jobIdentifier kafka-datasource-bt2b9
  24. namespace datachannels
  25. durationMinutes 0m
  26. install-error-message
  27. 📝 Results
  28. No results
  29. 📂 Workspaces
  30. No workspaces
  31. 🦶 Steps
  32. NAME STATUS
  33. install Running
  34. 🚗 Sidecars
  35. No sidecars

This output indicates that this Kafka Data Source is deployed in the datachannels namespace and has a jobIdentifier of kafka-datasource-bt2b9.

Finally the associated Pod can be found using this command:

  1. //
  2. // Get all pods in namespace and filter by job-identifier prefix
  3. // Replace <namespace> and <job-identifier> with values found above
  4. //
  5. kubectl get pods --namespace <namespace> | grep <job-identifier>

Pod names started by TaskRuns are prefixed by the jobIdentifier.

For example:

  1. kubectl get pods --namespace datachannels | grep kafka-datasource-bt2b9
  2. kafka-datasource-bt2b9-b79bd6cc8-pnwpx 1/1 Running 0 31h