Monitoring
This section describes how to use the installed ModelOps components to support monitoring. How to access the services associated with the ModelOps components is described here.
Logging
All ModelOps components, including scoring pipelines, generate log records that are stored in logging store (ElasticSearch). These services are available for accessing the logging components:
Component | Service | Default Credentials (username/password) |
---|---|---|
Logging Store | elasticsearch-es-http | elastic/elastic |
Logging Visualization | kibana-kb-http | elastic/elastic |
Accessing Logging Store - ElasticSearch
// // Get elasticsearch-es-http service port // kubectl get services --namespace modelops elasticsearch-es-http NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch-es-http ClusterIP 10.0.86.229 <none> 9200/TCP 20d // // Set up port-forward to local port 9200 // kubectl port-forward service/elasticsearch-es-http --namespace modelops 9200:9200 // // Open browser window (macOS only) // open http://localhost:9200
Accessing Logging Visualization - Kibana
// // Get kibana-kb-http service port // kubectl get services --namespace modelops kibana-kb-http NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kibana-kb-http ClusterIP 10.0.11.207 <none> 80/TCP 20d // // Set up port-forward to local port 9300 // kubectl port-forward service/kibana-kb-http --namespace modelops 9300:80 // // Open browser window (macOS only) // open http://localhost:9300
Pod Logging
In addition to accessing logging records in the logging store, logging can also be accessed directly from a Pod using this command:
// // Follow log output - replace <pod-name> with actual Pod name // kubectl logs <pod-name> --namespace modelops --follow
See Service Pods for instructions on getting Pod names.
Metrics
The metrics architecture and metric names are here. These services are available for accessing the metrics components:
Component | Service | Default Credentials (username/password) |
---|---|---|
Metrics Store | prometheus | None |
Metrics Visualization | grafana | admin/Surp1singlyG00d |
Real-Time Metrics | modelops-metrics | None |
Accessing Metrics Store - Prometheus
// // Get prometheus service port // kubectl get services --namespace modelops prometheus NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus ClusterIP 10.0.109.248 <none> 9090/TCP 20d // // Set up port-forward to local port 9090 // kubectl port-forward service/prometheus --namespace modelops 9090:9090 // // Open browser window (macOS only) // open http://localhost:9090
Accessing Metrics Visualization - Grafana
// // Get grafana service port // kubectl get services --namespace modelops grafana NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE grafana ClusterIP 10.0.9.59 <none> 80/TCP 20d // // Set up port-forward to local port 9080 // kubectl port-forward service/grafana --namespace modelops 9080:80 // // Open browser window (macOS only) // open http://localhost:9080
Accessing Real-Time Metrics - LiveView Web
// // Get real-time metrics service port // kubectl get services --namespace modelops modelops-metrics NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE modelops-metrics ClusterIP 10.0.42.12 <none> 80/TCP 20d // // Set up port-forward to local port 9070 // kubectl port-forward service/modelops-metrics --namespace modelops 9070:80 // // Open browser window (macOS only) // open http://localhost:9070
Scoring Pipelines and Data Channels
Scoring pipelines and data channels are started using Tekton pipelines and tasks which are created using Helm charts.
When a scoring pipeline or data channel is deployed a PipelineRun instance is created, along with associated TaskRun instances in the modelops
namespace. The PipelineRun
and TaskRun
instances can be used to monitor the status of running scoring pipelines and data channels.
tkn
must be installed on the local workstation. See the general installation instructions for details on installing Tekton.
Running Pipelines and Data Channels
The running scoring pipelines and data channels are displayed with:
// // Display all PipelineRun instances // tkn pipelinerun list --namespace modelops
The PipelineRun
naming conventions are:
- file-datasink-* - file data sinks
- file-datasource-* - file data sources
- installation-* - ModelOps installation
- kafka-datasink-* - Kafka data sinks
- kafka-datasource-* - Kafka data sources
- scoringpipeline-* - scoring pipelines
Logging
Logging output can be displayed for both PipelineRun
and TaskRun
instances using these commands:
// // Display PipelineRun logs - replace <pipelinerun-name> with actual PipelineRun name // tkn pipelinerun logs --namespace modelops <pipelinerun-name> // // Display TaskRun logs - replace <taskrun-name> with actual TaskRun name // tkn taskrun logs --namespace modelops <taskrun-name>
Identifying Pods
Scoring pipelines and data channels run in a Pod. The Pod that was started by a PipelineRun
to deploy the scoring pipeline or data channel can be determined using the jobIdentifier
and namespace
associate with the wait TaskRun
.
The TaskRun
instances associated with a PipelineRun
are displayed the describe
command. The wait TaskRun
has a TASK NAME
of wait
in this command output.
// // Display TaskRuns associated with a PipelineRun // Replace <pipelinerun-name> with actual PipelineRun name // tkn pipelinerun describe --namespace modelops <pipelinerun-name>
For example:
tkn pipelinerun describe --namespace modelops kafka-datasource-bt2b9 Name: kafka-datasource-bt2b9 Namespace: modelops Pipeline Ref: install-datachannel Service Account: default Labels: app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=install-datachannel app.kubernetes.io/part-of=modelops tekton.dev/pipeline=install-datachannel 🌡️ Status STARTED DURATION STATUS 1 day ago --- Running 📦 Resources No resources ⚓ Params NAME VALUE ∙ logLevel INFO ∙ sourceUrls [raw/commit/a75e7941e584889af60260883f4bd9c5c926fd76/sbhosle-race-cardata/car-kafka-source.datachannel.values.yaml] ∙ namespace datachannels ∙ externalNamespaces [development] ∙ deployParameters [] ∙ durationMinutes 0m ∙ trace false 📝 Results No results 📂 Workspaces No workspaces 🗂 Taskruns NAME TASK NAME STARTED DURATION STATUS ∙ kafka-datasource-bt2b9-wait-jgj2f wait 1 day ago --- Running ∙ kafka-datasource-bt2b9-install-datachannel-87mrv install-datachannel 1 day ago 11 seconds Succeeded ⏭️ Skipped Tasks No Skipped Tasks
The wait TaskRun
is named kafka-datasource-bt2b9-wait-jgj2f
.
This command is then used to display the jobIdentifier
and namespace
:
// // Describe TaskRun details - replace taskrun-name with actual name // tkn taskrun describe --namespace modelops <taskrun-name>
For example:
tkn taskrun describe --namespace modelops kafka-datasource-bt2b9-wait-jgj2f Name: kafka-datasource-bt2b9-wait-jgj2f Namespace: modelops Task Ref: wait-datachannel Service Account: default Labels: app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=wait-datachannel app.kubernetes.io/part-of=modelops tekton.dev/pipeline=install-datachannel tekton.dev/pipelineRun=kafka-datasource-bt2b9 tekton.dev/pipelineTask=wait tekton.dev/task=wait-datachannel 🌡️ Status STARTED DURATION STATUS 1 day ago --- Running 📨 Input Resources No input resources 📡 Output Resources No output resources ⚓ Params NAME VALUE ∙ jobIdentifier kafka-datasource-bt2b9 ∙ namespace datachannels ∙ durationMinutes 0m ∙ install-error-message 📝 Results No results 📂 Workspaces No workspaces 🦶 Steps NAME STATUS ∙ install Running 🚗 Sidecars No sidecars
This output indicates that this Kafka Data Source is deployed in the datachannels
namespace
and has a jobIdentifier
of kafka-datasource-bt2b9
.
Finally the associated Pod can be found using this command:
// // Get all pods in namespace and filter by job-identifier prefix // Replace <namespace> and <job-identifier> with values found above // kubectl get pods --namespace <namespace> | grep <job-identifier>
Pod names started by TaskRuns
are prefixed by the jobIdentifier
.
For example:
kubectl get pods --namespace datachannels | grep kafka-datasource-bt2b9 kafka-datasource-bt2b9-b79bd6cc8-pnwpx 1/1 Running 0 31h