Cloud Installation

Introduction

This section provides general information on deploying ModelOps to a Kubernetes cluster.

Requirements

The following tools are required to complete the installation - these must be downloaded and installed prior to installing ModelOps :

Additional requirements depend on the Kubernetes cloud platform being used:

Optional Tools

These tools are optional, but have been found to be useful

  • Lens
    • macOS: brew install lens

Azure AKS

Azure AKS also requires:

  • Azure CLI Tools must be installed and configured.
    • macOS: brew install azure-cli

OpenShift

OpenShift also requires:

Sizing

Modelops can be quickly configured for a small, medium or large installation whilst also allowing for further customizations as needed.

Whilst many values are best kept at defaults, the values listed below have biggest effect on sizing and so are exposed as install options.

Small - for single scoring

Minimum hardware is single virtual machine, 6 cores, 30GB memory and 100G disk space. Large configurations will allow for more concurrent scoring and faster executions.

Linux based scoring only ( a hybrid deployment of linux and windows nodes is required for statistica scoring ).

The helm install option, –set size=small, defaults to :

Feature Value Override name Units
Scoring flow memory limit 1.5Gi small.scoringflow.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 2 small.scoringflow.cpu See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 2Gi small.nexus.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi small.nexus.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 5Gi small.git.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server memory limit 1Gi small.modelopsserver.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server cpu limit 2 small.modelopsserver.cpu See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 5Gi small.modelopsserver.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 10Gi small.modelopsmetrics.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 30 small.modelopsmetrics.interval Seconds
Modelops metrics table size 50 small.modelopsmetrics.tablesize Megabytes
Modelops metrics data age 5 small.modelopsmetrics.age Minutes
Elastic search disk space 10Gi small.elasticsearch.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 2Gi small.elasticsearch.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 30s small.prometheus.interval Seconds
Prometheus disk space 10Gi small.prometheus.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity

Medium - for small teams, the installation default

Minimum hardware is single virtual machine, 8 cpus, 32GB memory and 100G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 5 virtual machines, although experience shows 2 servers usually sufficient.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set size=medium, defaults to :

Feature Value Override name Units
Scoring flow memory limit 2Gi medium.scoringflow.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 4 medium.scoringflow.cpu See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 2Gi medium.nexus.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi medium.nexus.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 20Gi medium.git.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server memory limit 2Gi medium.modelopsserver.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server cpu limit 4 medium.modelopsserver.cpu See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 20Gi medium.modelopsserver.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 15Gi medium.modelopsmetrics.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 10 medium.modelopsmetrics.interval Seconds
Modelops metrics table size 50 medium.modelopsmetrics.tablesize Megabytes
Modelops metrics data age 5 medium.modelopsmetrics.age Minutes
Elastic search disk space 50Gi medium.elasticsearch.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 5Gi medium.elasticsearch.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 10s medium.prometheus.interval Seconds
Prometheus disk space 50Gi medium.prometheus.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity

Large - for larger teams

Minimum hardware is single virtual machine, 16 cpus, 64GB memory and 500G disk space. Recommend cloud infrastructure configured for cluster scaling to absorb variable demand. A useful cluster scaling configuration is a minimum of 1 virtual machine and maximum of 10 virtual machines.

Additional windows virtual machines can be added to the cluster to support statistica scoring if required.

The helm install option, –set size=large, defaults to :

Feature Value Override name Units
Scoring flow memory limit 5Gi large.scoringflow.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Scoring flow cpu limit 6 large.scoringflow.cpu See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus memory limit 2Gi large.nexus.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Nexus disk space 20Gi large.nexus.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Git server disk space 100Gi large.git.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server memory limit 4Gi large.modelopsserver.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server cpu limit 6 large.modelopsserver.cpu See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops server disk space 100Gi large.modelopsserver.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics memory limit 20Gi large.modelopsmetrics.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Modelops metrics sampling interval 10 large.modelopsmetrics.interval Seconds
Modelops metrics table size 50 large.modelopsmetrics.tablesize Megabytes
Modelops metrics data age 5 large.modelopsmetrics.age Minutes
Elastic search disk space 100Gi large.elasticsearch.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Elastic search memory limit 10Gi large.elasticsearch.memory See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity
Prometheus sampling interval 10s large.prometheus.interval Seconds
Prometheus disk space 100Gi large.prometheus.disk See https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/#Quantity

Further sizing customizations

Each of the above values can be overridden as needed using the override name included above, for example to increase the git server disk space with the medium configuration, use –set size=medium –set medium.git.disk=50Gi.

Passwords and secrets

In order to avoid clear text passwords, Kubenertes provides a Secrets facility. So prior to installation, kubernetes Secrets have to be created to contain the passwords required by modelops.

These are :

Description Secret name Key name Comments
Elastic search elasticsearch-es-elastic-user Elastic search user name See https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-users-and-roles.html - if not set elastic search generates a password
Git server git-server git user name
Nexus server nexus-server admin
Modelops server modelops-server admin
Scoring flow admin scoring-admin admin

These secrets may be created via the cloud infrastructure or on the command-line using kubectl. For example :

  1. # elastic search
  2. #
  3. # note in this case we use apply to avoid elastic search re-creating the secret
  4. #
  5. kubectl create secret generic elasticsearch-es-elastic-user --from-literal=elastic=mysecretpassword --namespace modelops --dry-run=client --output=yaml 2>/dev/null > secret.yaml
  6. kubectl apply --filename secret.yaml
  7.  
  8. # git server
  9. #
  10. kubectl create secret generic git-server --from-literal=modelops=mysecretpassword --namespace modelops
  11.  
  12. # nexus server
  13. #
  14. kubectl create secret generic nexus-server --from-literal=admin=mysecretpassword --namespace modelops
  15.  
  16. # modelops server
  17. #
  18. kubectl create secret generic modelops-server --from-literal=admin=mysecretpassword --namespace modelops
  19.  
  20. # scoring admin
  21. #
  22. kubectl create secret generic scoring-admin --from-literal=admin=mysecretpassword --namespace modelops

It is recommended to install an encryption provider for maximum security - see https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/.

Quick run through

The Helm CLI tool is used to install the ModelOps components to Kubernetes :

  1. $ helm upgrade --install installer helm-charts/kubernetes-installer-1.0.2.tgz --atomic --set cloud=aks

This command first installs and starts the bootstrap pipeline which installes the required Kubernetes operators - this takes a few seconds after which the helm command returns with a summary of the installation.

For example :

  1. Release "installer" does not exist. Installing it now.
  2. NAME: installer
  3. LAST DEPLOYED: Mon Jan 24 14:04:38 2022
  4. NAMESPACE: modelops
  5. STATUS: deployed
  6. REVISION: 1
  7. TEST SUITE: None
  8. NOTES:
  9. Thank you for installing ep-kubernetes-installer configured for docker-for-desktop in kubernetes v1.22.5
  10.  
  11. The Operator Lifecycle Manager has been installed
  12.  
  13. The bootstrap pipeline has started which includes :
  14.  
  15. Adding kubernetes permissions
  16. Installing a nexus server
  17. Installing ElasticSearch and Kibana
  18. Installing Prometheus
  19. Populating the nexus repository with artifacts
  20. Creating a product install pipeline from helm charts
  21.  
  22.  
  23. Starting the nexus server at :
  24.  
  25. Internal web console URL - http://artifact-repository:80/
  26. Maven repository - http://artifact-repository:80/repository/maven-public/
  27. Helm repository - http://artifact-repository:80/repository/helm/
  28. PyPi proxy - http://artifact-repository:80/repository/pypi-group
  29. Container registry - 192.168.175.10:8082
  30.  
  31. Starting prometheus server at :
  32.  
  33. Internal URL - http://prometheus.modelops:9090
  34.  
  35. Starting elasticsearch server at :
  36.  
  37. Internal URL - http://elasticsearch-es-http:9200
  38.  
  39. Userid is elastic, password set in kubernetes secret
  40.  
  41. Starting kibana server at :
  42.  
  43. Internal URL - http://kibana-kb-http
  44.  
  45. Userid is elastic, password set in kubernetes secret
  46.  
  47.  
  48. The docker daemon should be configured to allow http pull requests :
  49.  
  50. {
  51. "insecure-registries": [
  52. "192.168.175.10:8082"
  53. ]
  54. }
  55.  
  56. To track the progress of the bootstrap pipeline run :
  57.  
  58. tkn pipelinerun logs bootstrap --follow --namespace modelops

The output depends on the cloud platform and any additional options selected. These details are also displayed with the helm status modelops command.

The zip of maven artifacts should be copied using kubectl cp command :

  1. $ kubectl cp modelops-repo-1.2.0-mavenrepo.zip mavenrepo-0:/tmp/ --namespace modelops

At this point the installation has been started and, as mentioned above, the status of the installation can be monitored with tkn pipelinerun logs bootstrap -f. For example :

  1. $ tkn pipelinerun logs bootstrap --follow --namespace modelops
  2.  
  3. [nexus : nexus] Installing nexus operator
  4. [nexus : nexus] namespace/nexus-operator-system created
  5. [nexus : nexus] customresourcedefinition.apiextensions.k8s.io/nexus.apps.m88i.io created
  6. [nexus : nexus] role.rbac.authorization.k8s.io/nexus-operator-leader-election-role created
  7. [nexus : nexus] clusterrole.rbac.authorization.k8s.io/nexus-operator-manager-role created
  8. [nexus : nexus] clusterrole.rbac.authorization.k8s.io/nexus-operator-metrics-reader created
  9. [nexus : nexus] clusterrole.rbac.authorization.k8s.io/nexus-operator-proxy-role created
  10. [nexus : nexus] rolebinding.rbac.authorization.k8s.io/nexus-operator-leader-election-rolebinding created
  11. [nexus : nexus] clusterrolebinding.rbac.authorization.k8s.io/nexus-operator-manager-rolebinding created
  12. [nexus : nexus] clusterrolebinding.rbac.authorization.k8s.io/nexus-operator-proxy-rolebinding created
  13. [nexus : nexus] service/nexus-operator-controller-manager-metrics-service created
  14. ....
  15. [install-pipeline-run : run] 14:16:27.765 [main] INFO com.tibco.streaming.installpipeline.Kubernetes - To track the progress of the modelops-server pipeline run :
  16. [install-pipeline-run : run] 14:16:27.766 [main] INFO com.tibco.streaming.installpipeline.Kubernetes - tkn pipelinerun logs modelops-server --follow --namespace modelops

The installation process can run tasks in parallel - hence the output is prefixed with the task and lines are coloured.

Once the bootstrap pipeline has completed, the application pipeline can be monitored in a similar way :

  1. $ tkn pipelinerun logs modelops-server --follow --namespace modelops
  2. ....
  3. [scheduling-server-scale : scale] Resuming rollout of scheduling-server
  4. [scheduling-server-scale : scale] deployment.apps/scheduling-server resumed
  5.  
  6. [data-channel-registry-prepare : prepare] Preparing directory for data-channel-registry
  7. ....

The installation is completed when the tkn pipelinerun logs modelops-server –follow –namespace modelops command completes. The tkn taskrun list command shows the task status :

  1. $ tkn taskrun list --namespace modelops
  2. NAME STARTED DURATION STATUS
  3. modelops-server-modelops-server-scale 1 minute ago 20 seconds Succeeded
  4. modelops-server-modelops-metrics-scale 3 minutes ago 11 seconds Succeeded
  5. modelops-server-kafka-datasource-image 4 minutes ago 2 minutes Succeeded
  6. modelops-server-kafka-datasink-image 4 minutes ago 2 minutes Succeeded
  7. modelops-server-modelops-server-image 4 minutes ago 3 minutes Succeeded
  8. modelops-server-scoring-flow-image 5 minutes ago 5 minutes Succeeded
  9. modelops-server-test-datasource-image 5 minutes ago 1 minute Succeeded
  10. modelops-server-test-datasink-image 5 minutes ago 2 minutes Succeeded
  11. modelops-server-modelops-metrics-image 6 minutes ago 2 minutes Succeeded
  12. modelops-server-modelops-metrics-maven 7 minutes ago 1 minute Succeeded
  13. modelops-server-kafka-datasource-maven 8 minutes ago 3 minutes Succeeded
  14. modelops-server-test-datasource-maven 8 minutes ago 2 minutes Succeeded
  15. modelops-server-scoring-flow-maven 8 minutes ago 2 minutes Succeeded
  16. modelops-server-test-datasink-maven 8 minutes ago 2 minutes Succeeded
  17. modelops-server-kafka-datasink-maven 8 minutes ago 3 minutes Succeeded
  18. modelops-server-modelops-server-maven 8 minutes ago 3 minutes Succeeded
  19. modelops-server-scoring-flow-prepare 8 minutes ago 43 seconds Succeeded
  20. modelops-server-kafka-datasource-prepare 8 minutes ago 51 seconds Succeeded
  21. modelops-server-test-datasource-prepare 8 minutes ago 44 seconds Succeeded
  22. modelops-server-kafka-datasink-prepare 9 minutes ago 42 seconds Succeeded
  23. modelops-server-test-datasink-prepare 9 minutes ago 42 seconds Succeeded
  24. modelops-server-modelops-metrics-prepare 9 minutes ago 1 minute Succeeded
  25. modelops-server-modelops-server-prepare 9 minutes ago 42 seconds Succeeded
  26. modelops-server-scheduling-server-scale 10 minutes ago 39 seconds Succeeded
  27. modelops-server-data-channel-registry-scale 10 minutes ago 19 seconds Succeeded
  28. modelops-server-sbrt-base-image 20 minutes ago 11 minutes Succeeded
  29. modelops-server-statistica-image 21 minutes ago 3 minutes Succeeded
  30. modelops-server-tensorflow-image 21 minutes ago 14 minutes Succeeded
  31. modelops-server-rest-datasink-image 21 minutes ago 4 minutes Succeeded
  32. modelops-server-spark-image 21 minutes ago 14 minutes Succeeded
  33. modelops-server-rest-datasource-image 21 minutes ago 4 minutes Succeeded
  34. modelops-server-python-image 21 minutes ago 15 minutes Succeeded
  35. modelops-server-file-datasource-image 21 minutes ago 5 minutes Succeeded
  36. modelops-server-file-datasink-image 21 minutes ago 3 minutes Succeeded
  37. modelops-server-rest-request-response-datachannel-image 21 minutes ago 4 minutes Succeeded
  38. modelops-server-pmml-image 21 minutes ago 10 minutes Succeeded
  39. modelops-server-scheduling-server-image 21 minutes ago 11 minutes Succeeded
  40. modelops-server-jdbc-datasource-image 21 minutes ago 2 minutes Succeeded
  41. modelops-server-data-channel-registry-image 21 minutes ago 10 minutes Succeeded
  42. modelops-server-git-server-scale 23 minutes ago 15 seconds Succeeded
  43. modelops-server-sbrt-base-maven 24 minutes ago 3 minutes Succeeded
  44. modelops-server-rest-datasink-maven 24 minutes ago 3 minutes Succeeded
  45. modelops-server-file-datasink-maven 24 minutes ago 3 minutes Succeeded
  46. modelops-server-data-channel-registry-maven 24 minutes ago 3 minutes Succeeded
  47. modelops-server-pmml-maven 25 minutes ago 3 minutes Succeeded
  48. modelops-server-rest-datasource-maven 25 minutes ago 4 minutes Succeeded
  49. modelops-server-spark-maven 25 minutes ago 4 minutes Succeeded
  50. modelops-server-scheduling-server-maven 25 minutes ago 4 minutes Succeeded
  51. modelops-server-git-server-image 25 minutes ago 2 minutes Succeeded
  52. modelops-server-file-datasource-maven 25 minutes ago 4 minutes Succeeded
  53. modelops-server-tensorflow-maven 25 minutes ago 4 minutes Succeeded
  54. modelops-server-python-maven 25 minutes ago 4 minutes Succeeded
  55. modelops-server-rest-request-response-datachannel-maven 25 minutes ago 4 minutes Succeeded
  56. modelops-server-jdbc-datasource-maven 25 minutes ago 4 minutes Succeeded
  57. modelops-server-statistica-maven 25 minutes ago 4 minutes Succeeded
  58. modelops-server-rest-datasink-prepare 26 minutes ago 1 minute Succeeded
  59. modelops-server-git-server-prepare 26 minutes ago 40 seconds Succeeded
  60. modelops-server-spark-prepare 26 minutes ago 46 seconds Succeeded
  61. modelops-server-sbrt-base-prepare 26 minutes ago 1 minute Succeeded
  62. modelops-server-tensorflow-prepare 26 minutes ago 38 seconds Succeeded
  63. modelops-server-data-channel-registry-prepare 26 minutes ago 1 minute Succeeded
  64. modelops-server-file-datasink-prepare 26 minutes ago 1 minute Succeeded
  65. modelops-server-file-datasource-prepare 26 minutes ago 37 seconds Succeeded
  66. modelops-server-rest-request-response-datachannel-prepare 26 minutes ago 39 seconds Succeeded
  67. modelops-server-statistica-prepare 26 minutes ago 35 seconds Succeeded
  68. modelops-server-scheduling-server-prepare 26 minutes ago 48 seconds Succeeded
  69. modelops-server-pmml-prepare 26 minutes ago 1 minute Succeeded
  70. modelops-server-rest-datasource-prepare 26 minutes ago 52 seconds Succeeded
  71. modelops-server-python-prepare 26 minutes ago 39 seconds Succeeded
  72. modelops-server-jdbc-datasource-prepare 26 minutes ago 36 seconds Succeeded
  73. bootstrap-install-pipeline-run 27 minutes ago 23 seconds Succeeded
  74. bootstrap-install-pipeline-image 27 minutes ago 55 seconds Succeeded
  75. bootstrap-install-pipeline-maven 29 minutes ago 1 minute Succeeded
  76. bootstrap-nexus-helm-index 29 minutes ago 20 seconds Succeeded
  77. bootstrap-install-pipeline-prepare 29 minutes ago 20 seconds Succeeded
  78. bootstrap-ingress 31 minutes ago 1 minute Succeeded
  79. bootstrap-elasticsearch 31 minutes ago 2 minutes Succeeded
  80. bootstrap-deploy-artifacts 31 minutes ago 2 minutes Succeeded
  81. bootstrap-prometheus 31 minutes ago 1 minute Succeeded
  82. bootstrap-tools-image 33 minutes ago 1 minute Succeeded
  83. bootstrap-tools-prepare 33 minutes ago 7 seconds Succeeded
  84. bootstrap-nexus-repositories 37 minutes ago 4 minutes Succeeded
  85. bootstrap-nexus 37 minutes ago 15 seconds Succeeded
  86. bootstrap-tidy-up 37 minutes ago 10 seconds Succeeded

Alternative ways to manage volumes

By default, the modelops installation will create kubernetes persistent volume claims for the modelops and git servers. However, if needed, these volumes can be managed differently.

  • Pre-create persistent volume claims

In this case the administrator pre-creates git-server and modelops-server persistent volume claims ( perhaps with custom storage class and reclaim policy ) and specifies the createPVC=false option when installing modelops :

  1. $ helm upgrade --install installer helm-charts/kubernetes-installer-1.0.2.tgz --atomic --set cloud=aks --set createPVC=false
  • Use a custom storage class

In this case the adminstrator pre-creates a custom storage class (or specify a non-default one) and specifies the modelopsserver.storageClass and/or modelopsserver.storageClass options when installing modelops :

  1. $ helm upgrade --install installer helm-charts/kubernetes-installer-1.0.2.tgz --atomic --set cloud=aks --set modelopsserver.storageClass=customclass

The installation pipeline

The install process is controlled via a Tekton pipeline called installation. This pipeline first installs the following Kubernetes Operators during the pre-install hook :

Kubernetes permissions are added to support Role-based access control (RBAC), security context constraints (SCC) and Streaming discovery.

The following container images are built in Kubernetes :

  • GIT server image - used to hold the ModelOps artifacts

Modelops helm charts also create :

  • TIBCO Streaming runtime base image - used as a base for further images
  • TIBCO ModelOps Server image - scoring pipeline, flow, and model management
  • TIBCO ModelOps Scoring Server image for for each runner - model scoring
  • TIBCO Data Channel Registry - data source and sink registration and discovery
  • TIBCO ModelOps Scheduling Server - job scheduling
  • Various data channels

The following services are started :

  • GIT server
  • TIBCO ModelOps Server
  • Nexus repository configured with :
    • Maven repository, populated with TIBCO artifacts
    • Python repository (both proxy and hosted)
    • Container registry
    • Helm chart repository
  • TIBCO Data Channel Registry
  • TIBCO Scheduling Server

Finally the installation deploys a helm chart used to later deploy a ModelOps server.

Kubernetes rollout is paused during the installation process and resumed once new container images are available.

Individual pipeline tasks are scheduled by dependency and available resources.

installation pipeline

Cloud platform differences

Kubernetes features differs between platforms and so the installation process also varies slightly. In general, natively provided features are used in preference to custom provided features. These difference are shown below :

Feature OpenShift AKS EKS
Operator Lifecycle Manager Provided Installed Installed
Container registry ImageStream ACR ECR
Network exposure route Ingress Ingress
RBAC supported Yes Yes Yes
SCC supported Yes No No
Windows images supported No Yes No

These differences are controlled via ModelOps helm chart values parameters - these can be viewed with the helm show values kubernetes-installer-1.0.2.tgz command, for example :

  1. $ helm show values kubernetes-installer-1.0.2.tgz
  2. #
  3. # Default values for the chart
  4. #
  5.  
  6. #
  7. # cloud environment
  8. #
  9. cloud: docker-for-desktop
  10.  
  11. #
  12. # image pull policy
  13. #
  14. pullpolicy: "IfNotPresent"
  15.  
  16. #
  17. # sizing
  18. #
  19. size: medium
  20.  
  21. #
  22. # operator lifecycle manager specific settings
  23. #
  24. olm:
  25. operatorVersion: "v0.17.0"
  26.  
  27. #
  28. # tekton specific settings
  29. #
  30. tekton:
  31. operatorVersion: "latest"
  32.  
  33. #
  34. # nexus specific settings
  35. #
  36. nexus:
  37. operatorVersion: "v0.6.0"
  38. internalPort: 80
  39. nodePort: 30020
  40. containerNodePort: 30030
  41. hostname: "artifact-repository"
  42. maven:
  43. maven-proxy:
  44. url: "https://repo1.maven.org/maven2/"
  45. pypi:
  46. pypi-proxy:
  47. url: "https://pypi.org/"
  48. yum:
  49. yum-proxy:
  50. url: "https://repo.almalinux.org/almalinux"
  51. #
  52. # The following values are defaulted depending on cloud type :
  53. #
  54. # installOLM - install the operator lifecycle manager
  55. #
  56. # containerRegistry - base URI of container registry. Use the supplied one
  57. # if available.
  58. #
  59. # containerUsername/containerPassword - if set, used to access container registry
  60. #
  61. # networkExposure - mechanism to use to expose network
  62. #
  63. # createPVC - if true create persistent volume claim in helm chart, if false
  64. # the persistent volume claim must be created before installing the chart.
  65. #
  66. # selfSignedRegistry - if true then skip tls verification on registry
  67. #
  68. # httpRegistry - if true then use http registry
  69. #
  70. # roleBasedAccessControl - kubernetes or openshift
  71. #
  72. # windows - if true build windows container (currently statistica scoring server)
  73. #
  74. # dnsSuffix - AKS only, set azure annotation for pubic dns name, ie <container>-<dnsSuffix>.<region>.cloudapp.azure.com
  75. #
  76.  
  77. docker-for-desktop:
  78. installOLM: true
  79. installMetrics: true
  80. installLogs: true
  81. containerRegistry: "localhost:5000"
  82. networkExposure: "nodePort"
  83. createPVC: true
  84. httpRegistry: true
  85. selfSignedRegistry: false
  86. roleBasedAccessControl: "kubernetes"
  87. windows: false
  88. ingressDomain: "tobeset"
  89.  
  90. kind:
  91. installOLM: true
  92. installMetrics: true
  93. installLogs: true
  94. containerRegistry: "kind-registry:5000"
  95. networkExposure: "ingress"
  96. createPVC: true
  97. selfSignedRegistry: false
  98. httpRegistry: true
  99. roleBasedAccessControl: "kubernetes"
  100. windows: false
  101. ingressDomain: "tobeset"
  102.  
  103. colima:
  104. installOLM: true
  105. installMetrics: true
  106. installLogs: true
  107. containerRegistry: "localhost:5000"
  108. networkExposure: "nodePort"
  109. createPVC: true
  110. httpRegistry: true
  111. selfSignedRegistry: false
  112. roleBasedAccessControl: "kubernetes"
  113. windows: false
  114. ingressDomain: "tobeset"
  115.  
  116. openshift:
  117. installOLM: false
  118. installMetrics: true
  119. installLogs: true
  120. containerRegistry: "image-registry.openshift-image-registry.svc:5000/{{ .Release.Namespace }}"
  121. networkExposure: "route"
  122. createPVC: true
  123. selfSignedRegistry: true
  124. httpRegistry: false
  125. roleBasedAccessControl: "openshift"
  126. windows: false
  127. ingressDomain: "tobeset"
  128.  
  129. aks:
  130. installOLM: true
  131. installMetrics: true
  132. installLogs: true
  133. containerRegistry: "myregistry.azurecr.io"
  134. containerUsername: "azure appid"
  135. containerPassword: "azure password"
  136. azureTenantId: "azure tenantId"
  137. networkExposure: "ingress"
  138. createPVC: true
  139. selfSignedRegistry: false
  140. httpRegistry: true
  141. roleBasedAccessControl: "kubernetes"
  142. windows: true
  143. ingressDomain: "tobeset"
  144. # oauth2: "azure"
  145.  
  146. eks:
  147. installOLM: true
  148. installMetrics: true
  149. installLogs: true
  150. containerRegistry: "eks registry"
  151. region: "region"
  152. networkExposure: "ingress"
  153. createPVC: true
  154. selfSignedRegistry: false
  155. httpRegistry: true
  156. roleBasedAccessControl: "kubernetes"
  157. windows: false
  158. ingressDomain: "tobeset"
  159. # oauth2: "cognito"
  160.  
  161. #
  162. # sizing details
  163. #
  164. small:
  165. general:
  166. cpu: "2"
  167. memory: "400Mi"
  168. nexus:
  169. disk: "20Gi"
  170. memory: "2Gi"
  171. elasticsearch:
  172. disk: "10Gi"
  173. memory: "2Gi"
  174. prometheus:
  175. interval: "30s"
  176. disk: "10Gi"
  177.  
  178. medium:
  179. general:
  180. cpu: "2"
  181. memory: "400Mi"
  182. nexus:
  183. disk: "20Gi"
  184. memory: "2Gi"
  185. elasticsearch:
  186. disk: "50Gi"
  187. memory: "5Gi"
  188. prometheus:
  189. interval: "10s"
  190. disk: "50Gi"
  191.  
  192. large:
  193. general:
  194. cpu: "2"
  195. memory: "400Mi"
  196. nexus:
  197. disk: "20Gi"
  198. memory: "2Gi"
  199. elasticsearch:
  200. disk: "100Gi"
  201. memory: "10Gi"
  202. prometheus:
  203. interval: "10s"
  204. disk: "100Gi"
  205.  
  206. #
  207. # hence the chart may be installed :
  208. #
  209. # helm install installer kubernetes-installer-[version].tgz --set cloud=openshift
  210. #
  211. # or override individual settings
  212. #
  213. # helm install installer kubernetes-installer-[version].tgz --set cloud=openshift --set openshift.createPVC=true
  214. #
  215.  
  216. #
  217. # Kubernetes DNS domain - not generally used but needed for windows work-arounds (see TMO-1156)
  218. #
  219. clusterName: "svc.cluster.local"
  220.  
  221. #
  222. # prometheus specific settings
  223. #
  224. # if storageClass is set, use storageClass in volumeClaimTemplate (otherwise system defult is used)
  225. #
  226. # See https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects for retention time
  227. #
  228. prometheus:
  229. operatorVersion: "30.0.1"
  230. nodePort: 30050
  231. retention: "1y"
  232. storageClass: ""
  233. # see https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfigspec
  234. # alerts:
  235. # route:
  236. # groupBy: ['job']
  237. # receiver: "test"
  238. # receivers:
  239. # - name: "test"
  240. # emailConfigs:
  241. # - to: plord@tibco.com
  242. # from: plord@tibco.com
  243. # smarthost: smtp-relay.gmail.com:587
  244.  
  245. #
  246. # elasiticsearch specific settings
  247. #
  248. elasticsearch:
  249. operatorVersion: "1.9.1"
  250. version: "7.16.2"
  251. nodePort: 30070
  252. username: "elastic"
  253.  
  254. #
  255. # kibana specific settings
  256. #
  257. kibana:
  258. version: "7.16.2"
  259. nodePort: 30080
  260. operatorVersion: "1.9.1"
  261.  
  262. #
  263. # ingress nginx specific settings
  264. #
  265. ingressnginx:
  266. version: "4.0.1"
  267.  
  268. #
  269. # cert manager specific settings
  270. #
  271. certmanager:
  272. version: "v1.6.1"
  273.  
  274. #
  275. # Oauth2
  276. #
  277. oauth2:
  278.  
  279. azure:
  280. # oauth2 values for azure
  281. #
  282. # need a secret "oauth2" with
  283. #
  284. # TENANT_ID set to azure tenantid
  285. # CLIENT_ID set to azure application id
  286. # CLIENT_SECRET set to azure client secret
  287. #
  288. identityAttributeName: "unique_name"
  289. roleAttributeName: "roles"
  290. jwtAudience: "${CLIENT_ID}"
  291. jwtIssuer: "https://sts.windows.net/${TENANT_ID}/"
  292. jwksURL: "https://login.microsoftonline.com/common/discovery/keys"
  293. jwksCacheTimeoutSeconds: "3600"
  294. ssoLogoutURL: "https://login.microsoftonline.com/${TENANT_ID}/oauth2/logout?post_logout_redirect_uri=https://modelops-server.${MODELOPS_DOMAIN}/oauth2/sign_out"
  295. # oauth2-proxy settings - see https://oauth2-proxy.github.io/oauth2-proxy/docs/
  296. provider: "azure"
  297. emailclaim: "unique_name"
  298. azuretenant: "${TENANT_ID}"
  299. oidcissuerurl: "https://sts.windows.net/${TENANT_ID}/"
  300. extrajwtissuers: "https://login.microsoftonline.com/${TENANT_ID}/v2.0=${CLIENT_ID}"
  301. clientid: "${CLIENT_ID}"
  302. clientsecret: "${CLIENT_SECRET}"
  303. whitelist: "login.microsoftonline.com/${TENANT_ID}"
  304.  
  305. cognito:
  306. # oauth2 values for amazon cognito
  307. #
  308. # need a secret "oauth2" with
  309. #
  310. # REGION set to cognito region
  311. # POOL_ID set to cognito pool id
  312. # CLIENT_ID set to cognito client id
  313. # CLIENT_SECRET set to cognito client secret
  314. # DOMAIN set to cognito domain
  315. #
  316. identityAttributeName: "email"
  317. roleAttributeName: "cognito:groups"
  318. jwtAudience: "${CLIENT_ID}"
  319. jwtIssuer: "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}"
  320. jwksURL: "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}/.well-known/jwks.json"
  321. jwksCacheTimeoutSeconds: "3600"
  322. ssoLogoutURL: "https://${DOMAIN}.auth.${REGION}.amazoncognito.com/logout?client_id=${CLIENT_ID}&logout_uri=https://modelops-server.${MODELOPS_DOMAIN}/oauth2/sign_out"
  323. # oauth2-proxy settings - see https://oauth2-proxy.github.io/oauth2-proxy/docs/
  324. provider: "oidc"
  325. emailclaim: "email"
  326. oidcissuerurl: "https://cognito-idp.${REGION}.amazonaws.com/${POOL_ID}"
  327. clientid: "${CLIENT_ID}"
  328. clientsecret: "${CLIENT_SECRET}"
  329. whitelist: "tibco-modelops.auth.${REGION}.amazoncognito.com"

So to choose the defaults for a given environment, just set cloud to the right environment :

  1. $ helm install modelops kubernetes-installer-1.0.2.tgz --set cloud=aks

However individual settings can be overridden if required, using cloud name.parameter format. For example :

  1. $ helm install modelops kubernetes-installer-1.0.2.tgz --set cloud=aks \
  2. --set aks.containerRegistry=myserver:30030

Upgrading

To upgrade the ModelOps components use :

  1. $ helm upgrade installer kubernetes-installer-1.0.2.tgz ...

However, its common practice to use the same command for installation and upgrades :

  1. $ helm upgrade installer kubernetes-installer-1.0.2.tgz --install ...

When the installation is upgraded the installation pipeline is re-executed and a rollout restart is performed on existing pods.

Uninstalling

To uninstall the ModelOps components use:

  1. $ helm uninstall modelops

Note that this doesn’t uninstall the Kubernetes operators (so that a further install is faster).

Troubleshooting

Always ensure the kubernetes context is what you expect. For example :

  1. $ kubectl config current-context
  2. modelops

The context is also displayed in docker for desktop UI.